 So finally we get to do some inferential statistics. We're going to compare groups to each other Now there's various ways to go about sorting out your data frame. I'm going to use this the simplest way here I'm going to create smaller little data frames I'm going to create for them a minor data frame a major data frame a female and a male and This is the syntax. This is this is how you go about it So I'm going to create these four computer variables. I can call them whatever I want I've used these descriptive terms So I refer to the data frame and then in square brackets. You see the whole square brackets there It takes two arguments. There's my comma and colon at the end and In the first bit we're gonna have data frame the infections column dot equals equals So it's a Boolean question but dot refers to each individual individual cells It's going to go down the infections column and it's going to say is this equal Is this into equal to minor infection if it returns a true it's going to pop it into this new minor data frame Same with major down the infection data frame if it finds major It's going to put in the major data frame again. Look at the syntax It's data frame open and close parentheses then data frame and its column So in its column and then dot equal equal So I'm asking a question is it equal to cell by cell if it finds major infection It's going to put it in the comma colon and I'm using a semicolon right at the end because I don't want this to be Printed out to the screen. So I've created four little smaller data frames And they are very specific as far as their infection entries are concerned and their gender entries are concerned Now let's do some proportional analysis first. We're going to count the level of amputations by gender So we're going to take the minor and the major columns because we want to construct a two by two Contingency table here so that we can do some categorical data analysis for proportions So again, I'm going to use my by function takes the following arguments the data frame The column and then I'm constructing this new variable D such that I create this data frame with an N New column called in which is the size of how many different things it finds So for gender, it's going to find me a male and female and it'll count for me How many of those it finds and it'll pop it into this new column called in there we go so in the minor Data frame I find female and male entries and I find I've got this new New column header called in new variable N Which is a count of those and I'm going to do the same for the major column Now as far as the hypothesis test package is concerned It does not have chi square test, but it does have fishes exact test So I can construct this two by two contingency table So I'm just showing you how it's done I have female and male is my two categories and I have my rows as mine infection major infection So remember for the mine infection we found 29 females 31 males So fill it in there the 29 and the 31 if I do the construction this way the 29 under Female is a the 31 is B the 31 and the female is C and The foot the 29 and the male there is D. So ABC D goes along like that And those are the four arguments for fishes exact test 29 31 31 29 let's run that and The hypothesis test package is very nicely constructed because look at the beauty of all of this The output fishes exact test population details the parameter of interest is an odds ratio You're the value under the under the null hypothesis that the odds are one one to one That there's no difference the point estimate that we get is open eight Seven oh point eight eight my 95% confidence intervals low and upper limits and under a 5% alpha value I failed to reject the null hypothesis because a two-sided p-value gives me that and it gives me a little summary of the contingency table So here I just put it so that we can see what it looks like all you have to do is to put in fishes exact test Well, that's four arguments and it shows you there that indeed you did it right the ABC D And we see that proportional analysis here shows us that there's no statistical significant difference in the proportions between male and female that distribution for made minor and major infection groups Now now we're going to deal with not proportions, but we're going to deal with continuous data type ratio type numerical we're going to compare age age groups for the Minor and major we're going to do the CRP for minor and major and we're going to do the Sorry, the age the HPA 1c and the CRP Now you can do it with plotting the quantiles and seeing if that forms a straight line But there's also a statistical test that you can do that is in hypothesis test package And that's the what the hypothesis test package refers to is the Kalmarov Smirnov test And what does it do it takes the distribution? It takes the actual values of your sample and it compares it to an ideal distribution based on a mean and a standard deviation Now remember the terms used improperly if I have a point estimate or measure of central tendency of my sample such as mean or median That is a statistic If I have the mean or median of a whole population that would be a parameter So what are we doing when we decide between a parametric or non-parametric test? We want to know if the sample values we have if they were taken from a Population in which that parameter was normally distributed If it was then I can use a parametric test if not my p-values will be inaccurate My analysis is wrong and I have to use a non-parametric test. So let's run this This is how it's run exact one sample ks test and it takes these three well two arguments The data frame with all the values so all the age values And it's going to compare that to a normal distribution. So this normal comes from the distributions package which we Which we imported and says construct for me a normal distribution around this mean and this standard deviation and All I have is my sample. I don't have values for the whole population So I'm going to enter the mean and the standard deviation for the ages of the patients that I do have Now I'm cheating a bit here because you've got to do it separately for the minor and the major groups do this case test for both and If either one of them or both of them are not from a normal distribution You cannot use a parametric test. So this is only for illustrative purposes So anyone around run at once just to show you what we have here So as far as the ages just the age of all the patients in my data sets consumed We see a two-sided p-value very significant I then reject my null hypothesis and say that the ages here were not taken from a population in which that Parameter the H parameter was normally distributed. So I cannot use a parametric t-test like a students t-test I have to use a non-parametric test, but again do it for minor and major both Not for the whole group as I've done here. So let's do that. We've actually I think got to do it for all of them, but For argument sake here, I'm just doing it on all the patients So let's compare Hba1c against this normal distribution with this mean and this standard deviation And we see that's not significant. In other words, we fail to reject this null hypothesis. We can use We can use in this instance a Parametric t-test. Let's do it for CRP and we find there a statistically significant difference So let's compare the patients in the two groups. Let's compare their ages Now we saw up here the age for instance We should use a non-parametric test and that's exactly what we're going to do here Numerical data. It's a ratio type numerical continuous variables. We're going to use the non-parametric man-whitney-u test man-whitney-u and it just takes the two arguments and we see here the Null hypothesis states there's no difference between the group. We find a if statistic here for a man-whitney-u test of negative 2.4 that converts to a two-sided p value of 0.28 It's not significant. We fail to reject the null hypothesis and We We can write that in our report Now let's go over to the HPA1C. Now remember as far as the HPA1C was concerned We could use a t-test if I remember correctly, but let's use the non-parametric test and let's run that and again We see while we find a very significant extremely significant p-value. We won't write that in a report We'll just say it's less than 0.01 for instance or less than 0.05 But let's do something. Let's check the variance of the HPA1C in the minor comma and in the major group there And we see that while the variances are nearly equal So we can probably use an equal variance t-test a proper students t-test And this is how it looks like equal variance t-test and takes the two arguments the HPA1C values in the minor data frame And then the major data frame now I'm just going to compare those two values to each other and Lo and behold you see you're still going to find a significant p-value there Nothing is going to change and it gives you the number of observations It gives you your t-statistic it says there was 118 degrees of freedom remember 120 patients minus two groups That's your degrees of freedom and it gives you your empirical standard error there So beautiful there a non-parametric and a parametric test for our data there now I Can describe quickly the HPA1C I can describe the HPA1C for minor major if I had used the Non-parametric test I could describe the medians for both and say that we found a significant difference if I use the t-test I Can describe the two means of the group 7 0.1 and 4.8 and say I found a significant p-value of less than 0.01 there Beautifully it beautiful everything I want is right there even have my 95% confidence intervals around the difference in mean Beautiful beautiful beautiful just for argument's sake let's run the Man Whitney U-Test man Whitney U-Test takes two arguments The values in this array here minor CRP and major CRP Remember those are just two sets of values that I'm comparing to each other and again We see we find a significant p-value In this a little file I've also done comparisons between female and male we can quickly run through them using a non-parametric test HPA1C and then lastly just the CRP and you can see the values that were there and look at these beautiful results We can really really use Julia to do beautiful statistical analysis Everything here from the plots in get fly to the results here Ready to be used in writing a paper for submission to a journal Lovely stuff. I hope you've enjoyed this project. We can do some more