 Welkom te het op het ondersteek van het ondersteek in het private ondersteek. We gaan het ooit over het belangrijkste ondersteek van het medical ondersteek, het belangrijkste ondersteek van het strategie, het het ondersteek van het private ondersteek en het ondersteek, een kwikmeetje van het ondersteek van het typen en dan zal er ook een short introductie van het ondersteek zijn. Het is ooit wat het op het ondersteek om media, mode, meen, standaardigheid. maar ik wil iets doen wat anders, we gaan begin met de p-value dus waarom geef het like dit at al, waarom medical research, het handje is dat er meer te medicin is dan mere service delivery te zien patiënts op een dagelijks basis te diegnoos hem te manage hem is een belangrijk deel van wat we doen, maar one moet wat meer dan service delivery we moeten het oordeel van jouw practies te mentionen, het is heel belangrijk voor elke doktor om de uitkomst van die geschevene management protocols voor verschillende ziezes het is alleen wanneer je goed sit en rees jouw file die is een vorm van research dat je waar jouw succeses is en waar je mou proeven kan Lastly is the pure knowledge of statistics. Even if you do not embark on any kind of research, it's important to critically read the literature. To know statistics empowers you when you do read through a journal. You can critically evaluate the findings of any research. Much more important to do this than to merely read the conclusion and just look for statistically significant p-values. To be able to evaluate the research that was done so that you can make a decision whether you want to implement the findings of that research in your practice. So what is the role of private practice in medical research? While if your practice has been running for a while, you sit on an enormous amount of data locked in your files, in your cabinet, is the answer to many questions that are out there. Now think of the first world setting in which most of you practice. You have access to the best of medical care, the best investigations, the best treatments. Not always so in academia, specifically in more third world oriented countries. Where most of the research is done, there is no access to the variety of medication treatments investigations as they are in private practice. Yet that is where most of the research comes from. It comes from academia. It is important that we consider that we do research from a first world perspective. In a world where our patients are treated, there are a lot of questions that can be answered. And that brings me to the last point here. You know the questions. When you have a Brian's Saturday afternoon, there's a thought that keeps into your mind something that bothers you. Whilst you're seeing patients, you might have a certain practice that sees a lot of a certain type of patients and there's always lingering questions and you wish that you could get the answer and it might not always be in the literature that's already out there. Well, you might be able to provide the world of medicine with that answer. Again that data might be locked up in the files in your cabinet. So let's look at the different types of studies. We're going to talk about observational studies, trials, and then just mention two others. So the observational studies. Now I want to warn you, these different types, the classification system used here is not absolutely demarcated. Aspects of one study can creep into the design of another type of study. But usually we'll group them as such. Case series, case control series, cross sectional studies and cohort type trials. So what is a case series? Well that is a simple descriptive account of a characteristic. Some people also call this a clinical audit. So you'll choose a defined period of time, maybe during an epidemic. And you'll just describe what the patients had. What investigations you did, what the outcome was, what the results of various investigations were, etc. Now a quick word just about retrospective versus prospective. Don't get that mixed in with the classification system that is used here. Retrospective simply means that you are going to use data that was recorded not specifically for this trial. You made clinical notes on your patients. When you design your study now there's certain data points that you decide on. But it's not necessarily noted in the files. You simply have to go look through the files for that data points. The problem with the retrospective audit is that the data was never captured with a study in mind. A prospective study is where you decide I want to from today gather these data points. You then gather them. They might not be part of a study. But when you now do a case series looking back it is on prospectively collected data. You decided that on every patient these certain data points will be noted in the file. So when you now do a case series it's easy to go find that data. You don't have to go through a file and search for probability that that data might have been captured. So how does a case series help us? Well let's say for instance you just wanted to document all the cases of H1N1 that you saw in the last couple of years or perhaps during the time that there was the epidemic. You can identify all those patients and you can simply tell the world how they presented what their parameters were and what the various results of the investigations were that you found. What does it help us with? It helps us with future planning. Should something occur again? You can have this data available to help you plan the next treatments. The second observational type is case control studies. So here we start with usually the presence of an outcome. It might also be the absence but usually the presence of an outcome and we're going to look backwards. So say for instance you group all the patients that you have the same antibiotic to the same disease and you get responders and non-responders. Now we're going to compare the subjects with or without that outcome. So the responders and the non-responders of responders was your outcome. We look for risk factors and characteristics that might differ between the groups. That can help you in the future. You know now that if you have that characteristic it's probably going to be a responder. It is probably going to be a non-responder as an example. Next up is a cross-sectional study. In a cross-sectional study we have a particular observation and a particular point in time. A good example of this is a survey. You might design a survey that you hand out to your patients anonymously. You can look at their knowledge of a certain disease or just their opinion about something. It's an excellent example of a cross-sectional study. It can also be part of other study types. Lastly I just want to mention the word cohort or cohort study. What is a cohort? A cohort is a group of individuals with a common trait. Remain part of that group over an extended period. A good example would be all the framing him studies where that population was followed up over a long period of time and during that time certain traits, certain diseases developed and we can look at the prevalence of that. That group of patients will form a cohort. Next up is a clinical trial, an observational study. We didn't really intervene at all. We merely observed and noted and documented. In a clinical trial we have active intervention. We are going to perform a certain procedure, give a certain drug. To do this we need controls so we need patients who receive the intervention and those that don't. Now there are different types, the patient can be his own control. In other words we will observe certain data points, we will do the intervention and then we will collect those same data points again. You can have external controls so everyone involved in your study gets the intervention and you are just going to compare them to another set of individuals that were not part of the study but were in some way comparable to your patient set and you will compare them. Then you get the gold standard date, randomized controlled trial where a certain group of similar patients will be entered and they will be randomly selected to receive the intervention or not the intervention and this can be blinded in various ways. Lastly there are two other types that I want to mention quickly, the meta analysis that is where we perhaps don't have enough data but if we accumulate a lot of trials that were done on the same subject we can improve the power of the statistical analysis and get a better answer. Lastly reviews where you critically look at everything that has been published on a certain topic and you bring it all together in a nice essay. Now introductory statistics, the p-value. I quickly want to mention software, we're going to talk about the role of dice we're going to talk about data types and the central limit theorem. First of all the software. Now most of you would have a computer at home or at work and some form of Microsoft Office is loaded onto the system. Most of you will be familiar with Microsoft Word but there is another piece of software there that most people never use it's called Microsoft Excel, it is a spreadsheet program. At home in your office is a computer right now which you can use as you go home now and do powerful statistical analysis on simply and easily. There are various other spreadsheet software packages out there I'm going to concentrate you on Microsoft Excel. Now it comes in two versions, one is for the Windows operating system and one for the Macintosh Apple operating system. I just want to warn you there's a slight difference between the two. The statistical analysis package is already built into the Windows one and it is excluded in the Apple version. In the Apple version you have to download a free piece of software from Statsoft and that will give you the same statistical analysis power as with the Windows version. I'm going to show you a short video now on how to get simple two groups, type in the data, press a few buttons and we'll get a p-value. Here we are now we're going to talk about the role of a dice. What it is all about is probability. The p- and p-value stands for probability. Here we have die number one and die number two might be a bit difficult to see but they are all the different combinations that you can get if you roll them. It is mutually exclusive and collectively exhaustive. In other words, all possibilities are listed here. I can throw a one and a one, a one and a two, a one and a three, etc. right till the end, a six and a six. There is no other possibility. The dice cannot land on a seven or one cannot have a zero. Let's total the number that is face up on these two dice. I can throw a one and a one and that will equal two. I can throw a one and a two and that will equal three. You'll see the totals listed there in the column under tot. Let's look at the frequency distribution of it. How many times or how many ways is there to throw a two? There is just one way to do that, a one and a one. How many ways is there to get a three? There are two ways. I can throw a one and a two or a two and a one. Then look at number seven. There are six possible ways of getting the number seven, totalling seven. Now, if I collect all of them as a hundred percent In other words, there is no other possibilities. I am listing there on the right hand side what the probability is. If I throw two dice, we have a 16.67% chance of it being a seven. Purely because there are more combinations making up seven than there are any other. It is very unlikely or the p-value for hitting a two or a twelve is very low. We see the 2.78%. I am using percentages here, but in statistics we don't use percentage. We use a value between zero and one. Zero means there is no probability and one means there is a 100% probability. 50% will be 0.5 and the p-value that we usually choose for statistical significance at 0.05 equates to a 5% chance. So p-value of less than 0.05 says there was a less than 5% chance of having found the results in my study and because the probability was so low, I call this statistically significant. Now look at this. I have represented the probabilities there of the various totals I can hit with my two dice by a little column chart. First of all note this the gaps between. That is because this is called discrete data. I cannot have a 7.5 or a 7.484529. It comes in discrete little packages, a 7, a 6, a 9. And therefore we leave these little gaps. If you quickly look at an article and you look at the columns, the charts, the figures, if someone designed them properly you should be able just to look at it and know well this was a discrete data. These columns should not touch. That would be a proper way of designing this little chart. Because it is discrete, because every value is contained within itself, we call the width of each of these, we give it a value of 1. Now dice is easy, it's 4, 5, 6, 7 and there is a difference between 1, between each of these. But it might also be something completely different whereby we have a few decimal points but that the data itself, the data points themselves are still discrete. We still going to have the gaps between them and we still see the width as 1. Now look at the height of those columns. That first, the middle one at 7 is 16.67. So if I were just to take the surface area, remember how to do the surface area of a rectangle, it's width times height. So if I look at the width of 1 and the height of 16.67, multiply the one with the other and I see that the area, the surface area of that little bar equals the probability. That might sound stupid now, a bit silly, but there is a very deep importance behind all of this. I can equate surface area to probability. Now remember, if I total all of these up, they equate to 1. It includes all possibilities. There cannot be a 1, there cannot be a 13. It's impossible, everything is included here. So if I added up the surface area of all these little bars, they equal to 1 or 100%. So they are all in there. Important to note again, surface area of these little bars give me the probability or the p-value. I can now ask, if I throw a 12 and I chose 5% or 0.05 as statistically significant, I hit a 12, I can say my finding of throwing double 6s is statistically significant. Now let's quickly just mention data types. We have to bring it in some way. There are two ways to classify data types. Right at the bottom you'll see discrete and continuous. We've spoken about the discrete. You don't get a value of 7.5. But then you get continuous data types. In other words, let's say for instance, white cell count. Now you don't get a fraction of a white cell count but because we're really talking times 10 to the power 9 cells per liter, the sheer number is so large that we see this as a continuous data. Now another way to look at it is categorical versus numerical. Let's quickly categorical data. You get a nominal and ordinal type. Nominal is just things like appendicitis, UTI. There's no way to arrange these in some form of order. They're just different types of diseases. Or if you think about responders versus non-responders, that's a nominal categorical type. Ordinal is a bit different. I can put order to these. In a survey, I can ask patients how happy are you with a certain medication? Perhaps not the best of examples, but they can then rate it. 1 star, 2 star, 3 star, 4 star, 5 star. So there is some order to their choice. But if someone chooses 4 stars and someone chooses 2 stars, you cannot say that the person who chose 4 stars is twice as happy as the one who chose 2 stars. There's order to it, but there's no arithmetic to it. For that, we'll use numerical data types. Again, 2 different subtypes, interval and ratio. A ratio is simply summing with an absolute zero, white cell count, HB as an absolute zero count. Degrees Celsius would be an interval type. There is no absolute zero or true zero, I should say. So if it is 10 degrees Celsius outside and 20 degrees Celsius outside, I cannot say this twice as warm. For that, you'll have to use the Kelvin scale. The Kelvin scale has an absolute zero, minus 273 degrees Celsius. So for ratios, you must have a true zero. So here we have continuous data. Now contrast this, cast your mind back to the little bars we had. We said that they had a width of 1. They had a certain height and if I multiply those, I got the area of my little rectangle and that gave me the probability. But continuous data do not have those little gaps in between. So how would I now estimate the area under this curve? Well, integral calculus comes to our rescue. Now we're not going to do integral calculus. The software package actually does that for you. What I want to tie up in your mind now is that we are still going to use area to calculate probability because the sum total of all the area under the graph has to equal 1 if we include all possibilities. Good. Lastly, let's get to the central limit theorem. In order to do that, we have to discuss combinations. It's very easy to do in a software package. You'll see there it says equals common and a little brackets. It's just function in spreadsheet software so I can type equals common, and if we look at the first one there, 10 patients in two groups. I could type in equals common 10,2. If I hit enter, I'll get the answer there of 45. What does that mean? I took ten individuals and I made little groups of two. How many distinct different combinations can I make? Well, 45. Now if I choose Sam and Ingrid, or I chose Ingrid and Sam, that would be this exact same combination. If I viewed that as two different ones that would be called permutations. We're talking about combinations here. What is important to look at how rapidly things expand here? If I had ten patients and I had to choose at random four, how many of those random selections can I make and that they're completely different from each other? I can make 210 possible combinations. Now just think about this. Well, before you do, look if I had 100 patients and I randomly chose ten. Look at all the possible combinations that I could have made. Here comes the very important part. Think about it. Imagine there were 100 patients with a disease and only 100 patients on the whole earth. Only 100 patients had this very rare disease. I want to include some of them in a study and I only have enough time and resources and money to investigate ten of them. Out of those 100, I have to draw out of a hat ten of them at complete random. And remember that's exactly what you do. Even if you were to view UTIs. Now there are many millions of people on earth with a UTI but the ones that came to your practice in that specific time period that you chose is actually just at random. If someone else did it, if you chose a different time period you would have chosen a distinct different ten patients. Even if they just differed by one patient it would be a different patient set that is now involved in your study. Now back to this 110. We have this rare disease, there's only 100 and you chose ten at random. Look at all the different possible combinations that you could have chosen. And if you did research on those ten patients you would get, say for instance we're just comparing two sets with each other. Five in the one group, five in the other group. But you might as well have had one of those other ten 1.7 times ten to the power 13 combinations. You would have gotten another difference between your two groups. And if you had another of those ten in your another one of those 1.7 times ten to the power 13 ten patients selections in your study you would have had another difference. So you see your study that included your ten patients was just one of many many many possibilities. And this is what we have when we get to this well shaped bell curve. What this says is if you could take all those possible combinations 1.7 times ten to the power 13 of groups of ten then a group of five and a group of five and you compare them and you have a difference in a certain data point and you plot that and you plot that this is exactly what you are going to end up. This is exactly the same as the role of the dice. You could total between two and twelve that included all possibilities. Nothing was excluded and they all fall under the umbrella of 100% or one. Same exact thing happens here. Your study that you did falls some way on that curve at the top where it was most likely is that the vast majority of those 1.7 times ten to the power 13 possible studies that could have been done most of them would have a difference between the two patient groups right there up in the middle. And that is where the p-value comes from. If your difference was one of the rere differences you would fall at one of the two edges here and what we are simply going to do is draw a line there that is what the software does and then towards the sides the red here it is going to calculate using interval calculus the area under the curve and because the total area under the curve is 1.0 if your area is found to be less than 5% or 0.05 we would say your finding was statistically significant. Now we are just choosing 0.05 you might as well have chosen 0.01 many studies choose 0.01 but again it just means that if all possible studies could have been done and now we are just talking 110 can you imagine with 7 billion people on the face of the earth how many possible random selections you could have made to include in your patient set that you are investigating you are just one tiny tiny little speck your study is one tiny little speck now there are equations behind this graph so depending on how many patients you include there is various factors that go into this equation I won't show you the equation that estimates how this curve should look and in that way it can then put in your marker do the area under the curve and give you the p-value but how fantastic is that and this is called the central limit theorem is that if you include all possibilities the mean your mean that you found the mean difference between your two groups 5 patients 5 patients and your 10 patient study if you were to get the mean difference say the mean difference in white cell count of all the means all the means are plotted there remember there were many ways to throw the dice to get 7 so there will be many studies that will have a certain difference between the two groups so that is what is listed there the percentage of cases that would have had that difference so if the one you found is right there by the middle hump then obviously the area under the curve if it's on either side it's going to be much more than 0.05 so your finding was not statistically significant now I showed you how this worked in the software one thing look under the green data software it's going to give you two different kinds of p values it says two tail distribution and one tail distribution and the one day is 0.06 and the other one is 0.03 now obviously the bottom one is statistically significant if you chose 0.05 for your statistical significance but the other one is 0.06 now which one do you choose while you can't do it at this stage it is something that you have to do beforehand otherwise you are a scoopless researcher so it is about hypothesis before you start your test you have to ask yourself what the null hypothesis is and what the test hypothesis is null hypothesis is easy it says there is no difference between the two groups we are going to have two groups I am going to do the white cell count and the means of the two groups are going to be exactly the same the test hypothesis is important you are going to postulate and you are going to say one of three things most commonly you are going to say there is a difference between the two groups or you might say group two is going to have a value less than group one or group two has a value more than and that is very important if you look at the next slide there is a one tailed in this instance you can see we are looking at the right hand side of this bell curve so we are saying that our test group our group two is more than we are postulating our test hypothesis more than what the software then does is that it uses the one tailed test in other words I am going to group all the five percent on the one side and you can clearly see the red there representing five percent now this is not done correctly that is actually a bit more I think if you look at it more than five percent but for purposes of illustration it suggests that the red area there would represent five percent of the area under the curve now that green line is where the software works out your mean falls and if we then go to the right we look at the green area and we do the surface area of that it is less than this critical point where the red area starts your area under the curve is now less than five percent or 0.05 so your finding is statistically significant on the right hand side there you see that the green area is actually larger than the red area so your finding would be term statistically insignificant good so you have to say that before hand this is a one tail so the five percent red area falls into one part or if your test hypothesis was there is a difference I am not saying one is more than the other or less than the other I am just saying there is a difference that five percent would be spread on both sides two and a half percent on the one side two and a half percent on the other side and it uses a different equation to work out your little green area so final thoughts we all have questions in our daily practices we have a lot of data if our practices have been running for a while why not decide to use that data and answer some of the questions that you have you can send me a message there is my email address if you have any questions if you want to embark on this if you want to learn more finally my youtube channel there we have youtube.com we will find some more video lectures on various subjects but also on this important subject of clinical research