 Here we have a spreadsheet program, I'm using the LibreOffice version 4.3 here, you can download it from the internet, let's have a look, this is how it's written, LibreOffice, so if you were to search for that, you will find it is available for download free of charge and has most of the functionality that you would find in large commercial products such as Microsoft's Office, now you can use whatever spreadsheet you like, many of these spreadsheet packages actually has beautiful data analysis tools inside of them though, they are limited I'd like to support open source software free software, so really give LibreOffice a try, this is the spreadsheet software, now data collection as far as healthcare science is concerned, obviously statisticians frown upon or many would frown upon the use of spreadsheet software such as this to capture your data, a proper database program written in SQL which stands for structured query language, we know Microsoft SQL server, we know MySQL free of charge open source and there are many others, but the learning curve there is enormous, to develop your own database for data capture for most of us it's just too difficult, takes too much time and a spreadsheet serves our purpose, especially if you save this file and import it into Python and more specifically using pandas, then you can manipulate this data to your heart's content, so what I'm going to suggest for this course is just a basic spreadsheet software such as this, you'll see that there are columns there's column A if I click on the top there, there's the whole column, column B, column C, column D and then there are rows, rows 1, rows 2 and that makes it very easy that will be cell B2 because it's in column B, row 2, it's normal for us to collect data where we put the variable names at the top, so in this first one I can say file for instance, I can give that a name, the second one I might suggest gender and the third one age and the fourth one say white cell count, white cell count, so I can very quickly say this will be my patient number one, my patient number two, my patient number three, four, five, six, seven, eight, nine, it need not be patients, it can be any kind of subject that you are analysing and you can also just drag down on them holding down your mouse key so that they're all selected, not the file, not the first row which contains the variable names and there's a little arrow that appears there and in most software you can just start dragging down and you'll see it is going to autocomplete, see the 26, 27, 28, 29, 30 or say I want 30 values and it did that all for me, 10, 11, 12, 13, 14, it did that autocomplete for me, let's do this now you might then say first patient comes in, it's male, the age is 33 and the white cell count was 12.1, next patient comes in, also male age 45, white cell count was 7.0 and so you can keep on filling these in and you have a beautiful data collection tool here, what is important here is you've got to decide beforehand what the variables are that you want to capture, these are the variables, gender, age, white cell count, HB, whatever type of research you're doing, this is going to be the variables, I can even have one say for group, so a patient might belong to group one, this one might belong to group two, this one might belong to group one, etc. and we can manipulate in pandas to extract the different groups so that I can compare the mean difference between the white cell count of the group one patient from the group two patients for instance, spreadsheet is a very easy to understand, two dimensional representation of your data, structure it this way, put your column headers up there, that's the name of the variable that you're going to collect, each of these variables are of different type, obviously male and female are of a nominal data type, it's a categorical and nominal, in other words you can't order male and female, male is not one and female two or female two and male one you can't order them in any way, although you can also use code words, you might be concerned about protecting the identity of your patients, you don't want that, you can say for instance I'm going to code all my males as eight and all the females as four, so that's the coding that you have running behind the scenes, to protect your patient identity you're going to keep that information separate, not any way near your data collection tool which is in this case your spreadsheet, you're going to keep it separately and only you and your investigators are going to know which is which, it's a little secret key that you keep, eight is male and four is female, that way you can, that's one form of protecting the identity of your patients with data collection, age on the other hand that's a numerical data type, there is an absolute zero actually, you are born on a certain day, so this does really make it a ratio type numerical data, white cell count, so ratio type numerical data, there is an absolute zero which is zero, and furthermore if we look at continuous and discrete, obviously this is going to be discrete, there's only male and female, age is really a continuous data type, white cell count obviously a continuous data type, the groups again are going to be categorical data and it is nominal at that, it's not ordinal, I can't really say because I could have called that group green and this one group yellow, so we can't really order them even though we can use terms like one and two there, so this lecture is just about introducing you to spreadsheet software, this is the spreadsheet software that we're going to use for you to collect your data, structure it with row, that will be one subject's data entry, whether that be some test that you're running in the lab, whether that would be patient encounter, that's the row, and on the columns you make the different variables.