 There we go, carry on. So we've got our second data frame there by just chucking out some of the values from our previous data frame. Let's go on and add a new column to our original data frame here. So remember that's how to do it. You say data underscore two and just put these square brackets and then the name of the column in inverted commas and I'm going to add this list, but look at this list. This list is a list of text strings, A, B, A, B, A, B. So what we can say let's, well let's just execute that. You'll see. Let's see what it looks like. So let's add this. So let's imagine something. Let's imagine this was a mission hemoglobin values of a set of patients. This was that patients, each individual patient's white cell count. And let's imagine that the patients belong to two groups. Either the patient was in group A or group B. So that would be normal kind of analysis that we would be interested in. And now the beauty of pandas, you can split your data frames up. I'm going to create a new data frame called data underscore four. I'm going to take data two and I'm going to use this method, the group by method. And I wanted to go to the groups column. So data underscore two dot. So you'll have to learn when you square brackets, when to use these round brackets, when to use a dot, you'll get used to it. So I want to group by this. So it's going to look down this column and it's going to see I find A's and B's. If there were C's, I would find C's. If it was a lot of other words, it will just group all the similar ones together. Let's do that and I'll show you what the result is. Now this is really magic as I say there. Look at this. Still columns via one and via two, but now it's made two individual groups. It will only take the values that had fell into the A group and the B group. So if you had two groups and you wanted to compare them to each other, you'll want to do statistical analysis comparing the two groups. It'll actually split your data frame like that. Now we can see there were seven patients in group A and six patients in group B and in the group A patients, that was their mean, that was their standard deviation, etc. And that really is the power. So that's a short introduction to pandas. Let me show you how we're going to use pandas most of the time. We're going to import a spreadsheet. Now in spreadsheet software in Microsoft Excel, it usually exports it as excel s or xlsx files. You can also in the library office open office, export it as open document spreadsheet format. But you can also ask those programs to export it as a CSV file comma separated values. And this is the one I'm going to use here. So inside of the same folder, now I'm just going to show you here, I'm in my desktop healthcare research lectures folder on my computer. That is where this notebook lives. And I've put this CSV file in exactly the same folder as where this notebook lives. That means I don't have to refer to, I don't have to type in the whole, the whole wherever it was see desktop, whatever for Mac, Windows, all different. I don't have to do that. I can just put the name of the file. But please, if you want to keep it simple like this, put your spreadsheet file in the same folder as where your notebook lives, it'll find it easily. Now PD pandas, we know what that is, dot, so not series, not data frame. This time I'm going to say read CSV and there's a read XLS, read XLSX. So if you've saved your spreadsheet file in a different format, there's a read, there's all sorts of read ones that you could do. So it's going to take this whole spreadsheet and it's going to put it inside of this computer bucket called data five. And let's do that. And I just want to show you when you import a spreadsheet file, is to see what data type it's going to be. Yes, indeed, it is a data frame. So it's going to look exactly like this. So if you import a whole spreadsheet, remember data frame allows for lots of different columns, it is going to be that. Now there's a quick way that you can just check if things imported properly. And that's using this method head. So if I typed in data dot five, let me show you data underscore five dot, if I say H and hit tab, head is the one, double click on that, open and close, the default is five. So see that that was in blue, you don't have to put that if you leave it empty, it's going to accept an argument default of five. It's just going to show me the first five rows. And it's just a quick, you don't want to, if you have a spreadsheet with thousands and thousands of rows, you don't want to print them all out to the scheme, but you just want to make sure things imported properly. So just that. And there's our beautiful little spreadsheet. And I'm only showing the first five rows, you can see there the index always starts counting at zero. So let me run through the spreadsheet quickly, because this is the one we're going to use most of the time. It's mock, MOOC mock there you see. So mock, this mock data doesn't come from any real patient. But we have file column, the patient's file, their age, their gender, delay from the time they had symptoms until they came to the hospital, how long they stayed in the hospital, they go to ICU, what is their retroviral disease status, was a CD4 count done, NAN is a term used in computer language is quite a lot if it finds an empty space or something that's not a value in a value column with values, it's going to put NAN stands for not a number. And it's very nice to have that done, because these values will be excluded when you do statistical analysis on the values in this column. Admission heart rate, admission temperature, admission C reactive protein, admission white cell count, admission HP, was there at surgery, was there finding of a rupture of the appendix, this is appendix data, appendix sent away for astrology, was there information, yes or no, did the patient develop any complications while they stayed in hospital and what was their modified Alvarado score, you don't have to worry about this clinical setting at all, you don't have to know anything about appendicitis, we won't go into dealing with a clinical situation as much, I just want to show you now how to do statistical analysis on this data set that we have. Now it might very well be, I just want to show you that you don't like this index, you can change any of these columns to be the index, don't be silly about it though, you see the file contains unique numbers and if you're quite ever interested in doing that, you can say data underscore five, that is the data frames name, dot set underscore index, open and close these brackets and just put the name in quotation marks, the name of, you have to type it in exactly, that was file was the name of that column and I'm going to print to the screen the first three after making my change, let's do that, now file becomes the index, file becomes the index, sometimes though you only want to display the last few lines that makes it very quickly to see how many of your rows actually imported, so instead of head there's also a tail, let's just do the last three and you can see the last three up to file number 150, so there are 150 patients in our data set, so that's a quick introduction to pandas, look through it again, familiarize yourself with it, play with it, it might look slightly daunting especially this first bit, we are not going to construct things in this way but I just wanted to slowly introduce you to pandas, this is the way we're going to do it by just importing spreadsheet files and start playing with them, excellent see you in the next lecture