 Hello everybody, in this video we are going to see about choice of write statistical test. So here are the five steps in choosing the write statistical test. Number one, what is your hypothesis? Number two, are the samples independent of each other? Number three, what are the types of variable? Number four, are they normally distributed? Number five, apply the write statistical test and find p-value. So for this fifth step of this presentation, we have to understand the first four basis to apply the test. First step is we should understand about our hypothesis, that is what is null hypothesis, what is alternate hypothesis. Suppose if our research objective is estimating the prevalence that is estimating a mean or proportion, then you need not, rather cannot calculate p-value. So if you are doing some association, that is you want to study an association between two variables, then you need to use this hypothesis. That is the null hypothesis says that there is no difference between those variables. Alternate or the research hypothesis says that there is a difference between variable. What is p-value here is, p-value is the probability of rejecting null hypothesis when it is actually true. It is kept at an arbitrary level of 0.05, this is called a statistical significance. This does not mean everything in the search, there is also called clinical significance, that is the difference should be assessed clinically, how important it is, how useful in our settings, that is more important while calculating this p-value and statistical significance. So that is about knowing our hypothesis. The second step here is to know whether the two variables which we are associating are independent of each other or paired samples, that is when the measurements are repeated on the same individual, then they are not independent, they are called paired samples, paired observations like pre and post test before and after an intervention like a drug, then that will be called as paired observations. The paired observations and independent observations are important while applying statistical test because both follow different set of statistical tests. In the next step is what type of variable which we are dealing with. In type of variable we have four types, that is nominal, ordinal, interval and ratio. Nominal is just a name, ordinal has an order, that is mild, moderate, severe will become an ordinal interval and ratio. Interval does not have an absolute zero, but ratio will have an absolute zero. For statistical analysis purpose, we can club this nominal and ordinal into categorical variables, interval and ratio into continuous variables. So the third important step here is we have to look at these variables and identify whether it is a categorical variable or continuous variable. Then the next important thing is we need to identify which variable is dependent variable, which variable is independent variable or otherwise which variable is a predictor variable, which variable is the outcome variable. Suppose if you are studying antinatal care and low birth weight, low birth weight cannot be a predictor variable for antinatal care. It should be the vice versa, that is antinatal care should be the predictor variable for the outcome variable low birth weight. And the fourth important step is are the variables distributed normally. Why this is important is when these variables are normally distributed, then we apply parametric test. When these variables are not normally distributed, then we apply non-parametric test. So why we need to find this normal distribution? Only with normal distribution we can apply this parametric test, that is parametric test assume normal distribution. If the variable is normally distributed for the sample population, then the statistical conclusion which we make for the whole population will be valid or more accurate. So that is why we want the data to be normally distributed in order to generalize our study results and also parametric test yield stronger results. What are all the reasons we do not get normally distributed data is the presence of some outlier weight and low sample size, nature of the data itself. For example, usually income will be always, will not be normally distributed and how to check for this normally distribution, we use two statistical tests for test the normality of the data. Shapiro Wilk when the sample size is less than 50, Colmographs enough when the sample size is greater than 50. But for smaller samples with the sample size less than 20, the tests are unlikely to detect non-normality and for the larger sample size, that is sample size is greater than 50, the tests can be too sensitive, they are also sensitive to outliers. So use histograms or QQ plots, the easiest way to check for normality is the visual inspection of the histogram and if there is a bell shaped curve, then the data is normally distributed and also the mean median mode should coincide and assume normality of data distribution if mean is greater than two standard deviation. Now the fifth and the last step is the choice of statistical test. This is the very important slide of this presentation, that is after understanding the basis of selecting the statistical test, we are now going to select the statistical test for our analysis. To understand this flowchart, we start from here. If your outcome variable is categorical, that is cured or not cured. If your exposure variable is between two groups, that is between a drug received and drug not received, then we need to use chi-square test, fissures exact test or logistic regression. Chi-square test is a simple test here, it cannot be used when the expected cell values are less than 5 in more than 20% of the cells, then we need to use fissures exact test. Logistic regression can be used when we want to find out the prediction of one variable by the other variable. The same way if it is, if your categories are more than two groups, then the test remains same. Chi-square test fissures exact test and logistic regression. If the categories are paired, that is the observations are on the same individual, then we need to use McNamar's test, kappa statistic will be for agreement. When the outcome variable is continuous and the data is normally distributed, then we need to use two sample t-test. When the data is not normally distributed, we need to use man-with-me-you test. When the data is paired, that is before and after intervention, then we need to use paired t-test and the corresponding not normal distribution statistical test is Wilcoxon signed rank test. More than two groups, you want to find out the difference between the effect of drug A, B, C on the fever, then you need to use one-way ANOVA test if the data is normally distributed and Kruskal-Varistest if it is queued. For continuous variable, that is you are looking at a continuous variable like age versus temperature, then we need to use Pearson's correlation in case of normal distribution. Spearman's correlation in case of skewed data, if you want to see the prediction of one variable with the other variable, then you can use linear regression. This slide shows choice of statistical tests from bad or matched observations. When we have two categories, that is pass and fail before and after, then we need to use McNamar's test. We have more than two variables, pass, fail or withheld, then we need to use Cochrane Q. When the data is ordinal variable, mild, moderate, severe, before and after intervention, then we need to use Wilcoxon signed rank test, quantitative, discrete or non-normal distribution, then also we need to use Wilcoxon signed rank test. Quantitative, normally distributed, we need to use PADDT test. More than two measurements on the same subjects and it is normally distributed, then we need to use repeated measures ANOVA. More than two measurements on the same subject and it is not normally distributed, then we need to use Friedman's test or EF test. Suppose if you want to find out how much one variable predicts the other variable or how much one factor predicts the other factor, then we need to use this regression. Comparison of multiple independent factors with one factor can be done. When we do comparison of one to one, then it is called as simple regression. Example, maternal weight affecting birth weight. Many to one, it is called as multiple regression. For example, maternal anthropometry factors affecting birth weight. When the outcome is continuous, it is called linear regression. When the outcome is categorical, it is called as logistic regression. When we are using agreement, the commonly used statistical test is Kappa statistics. In healthcare settings, we use this Kappa statistics to compare the findings between two procedures. So as it is close to one, there is a complete agreement. As it is close to zero, there is no agreement. There are other agreement statistical tests, intra-class correlation coefficient and Cronbach's alpha. You need not remember the statistical test. So how to approach this is if we type our search term for statistical test for continuous independent variable and categorical dependent variable, the answer will appear here. So you need not remember or memorize the choice of statistical test. Once you start applying this statistical test, automatically it will be remembered. So what next? Either you can do manual calculation of all these tests. For chi-square test, t-test is going to be very easy. So you can try calculating the test value and p-value manually. You can also use handline calculators for t-test and chi-square, which is exact test it is available. But the easiest way to apply all the statistical test is using softwares. So after applying the softwares, you will get a test where when you are doing a manual calculation, you need to match with the probability table and calculate the p-value. But in softwares, it will calculate the exact p-value. Thank you for watching this video.