 So here we are in the second lecture, and the first one I was really just interested to see whether I could do some basic medical statistics using Julia and more specifically using some of the packages with which you can extend the base installation or base install of Julia. And I promised we'll look at one or two more of these packages, and in this lecture I just want to look at the distributions package. Remember in Julia we use this keyword using to import a package. So I'm going to import distributions. Once again we are in the Juno development environment instead of my usual Jupyter notebooks. So for Juno we just hover over the end of a line of code there in the Mac OS which I'm here. Now I'm going to hold down command and hit return. On Windows and Linux we're going to use the control key and hit enter. I do that. You can see the little animation at the bottom. It is busy importing the distributions package. Once it stops spinning you'll get a little check mark there. It has been installed. Now I'm also going to import here the get fly plotting library, plotting package. Now if you've just started it up, once again I warn you it's going to take quite some time to do it. If you've opened Juno before or Julia before and you've already imported it during that session obviously it'll be a lot quicker. But I've just started up this computer now so the first time I am going to import this get fly it's going to take quite some time. And what I'll do I'll pause the video and come back when it's done. Otherwise it really is just a waste of time. You can see the spinning cubes at the bottom just indicating that it is busy importing that package. And I'll just pause the video and come back when it's done. Okay we're back and get fly has indeed imported. The last thing I'm going to import is the markdown package. That's a small package that should go fairly quickly. Now I'm going to just again use markdown.pass and just to print something nice to the screen using distributions in Julia. There we go. So the first thing we're going to do is just set a random seed value. So srand and I'm just going to use 123. That means the first time that I run this code if I open this file and I run the code I'm going to get the same random results every time. Now before we use the package I just want to show you that Julia has some random number commands. And the first one that we are going to use is rind. I'm going to use it in a special way. I'm just going to ask it to choose a random value from a set of values which I give it. And so it's rind open and close parentheses and I want to go from one to 10 inclusive in steps of a half. So I'm going to tell Julia that take the values one, one and a half, two, two and a half, three, three and a half, up to 10 and give me any one of them. At random give me one back and it gives me back the value two, two dot there indicating that this is a floating point value. It's 2.0. I can do more than that. I can suggest that I want more than one back and I'm going to add it to an array. So there's my computer variable. I'm calling it x underscore rand using underscores there in my variable names, probably but old fashioned there anyway. Rand open and close parentheses. And inside of that, there are two arguments. One is the range that I wanted to use from one to 100 in steps of a half. And instead of just giving me back one value, I want now one 10 value so I can introduce a comma then 10. Inside of the sarand command here. And if I hit there, I'm going to get this array back, which is called a vector here because it's going to be a column of values. I can click on it and it'll tool open for me. And then there are the number 17 and a half, two, 20, 83, 21 and a half, all 10 of them chosen from this range of values from one to 100 with a half value steps inclusive. Good. Now let's use a random number array in the end. That's going to select a random number, not from the set that I give it, but from the standard normal distribution. And remember, the standard normal distribution has a mean of zero and a standard deviation of one. So if I just were to run the code, it gives me back 1.53. Once again, I can say, don't only give me back one of them, give me back a hundred of them. And I add that to the computer variable. This time I've just called it x underscore r a n d n. You can call it whatever you want. And there's my vector now containing a hundred values, but taken from a standard normal distribution. Just to show you that it is a standard normal distribution, I'm going to plot the density. The kernel density estimate or density estimate of these hundred values. So I'm going to use the plot command, which now comes straight from Gadfly. So I can just use plot function there, method there. x equals my random numbers. And I'm using the density stat. The geometry I'm switching is a line. And I'm giving my plot a name, a title, 100 values from random n. Once again, I'm going to run this code. I'll have to pause the video because the first time you plot something, it takes time. And there we go. We can clearly see this is from a standard normal distribution centered at zero. Most of our values are going to be there trailing off to the side. And I asked for a hundred random values. And it took those values, used the equation for a density estimate to draw that graph for me. Now let's just use, instead of r, a, n, d, n, let's just use the normal distribution as is. What we can do now, or I should say this is part of the distributions package. So I'm still going to say r, a, n, d, remember that was just going to give me back a random value. And I could give it a range from which to take. But now inside of the parentheses, there is something else I could say now normal. And that takes two arguments, the mean and the standard deviation. And I've attached it to a computer variable. This time I've called it x underscore norm. And I want a thousand of those. So Rand still takes these two arguments. The first argument where before we gave the range say from one to a hundred and steps of a half. Now I want instructing Julia, take the normal distribution with a mean of a hundred and standard deviation of 20 construct that. And from that random variable distribution, give me a thousand values. Instantly done. And there is my 1000 values drawn from random normal variable distribution with a mean of a hundred and standard deviation of 20. Now look at this Juno very nicely. I can hover over that 20 click on it and drag it left and right, left and right. I can ask for different values and it'll update the vector. It'll update that vector. Let's just plot that that should go a bit faster now. And there we go. That was almost instantaneous. Anyway, now you can see my density estimator is centered around a mean of a hundred trailing off on both sides with a standard deviation of 20. Now I perhaps can illustrate this a bit better with a histogram. And I will make a video just on the on GAT fly as well. As you can see, this is a histogram. Most of the values were around a hundred there. Good. Let's go and do some descriptive statistics just on these just to make sure that we are indeed dealing with a normal distribution as if those plots weren't proof enough. Anyway, we can ask for the mean and for the thousand that I got this time around. Remember, I used the RAND method. They just give me random values back a thousand of them, but take them not from a range that I give you, but take it from the normal distribution there with those parameters. So indeed, very close. My thousand values got there. Indeed, a median would be close enough there as well. The maximum value I got was 172. The minimum was 37.69 standard deviation. Very close to 20 as we asked. The square of that, which is the variance. And I can also get certain quantiles. The 50th percentile, which we want to say that 80th, 95th, 99th. We can ask for the values there and we're going to get it back as a vector. So that would be the median anyway, the 50th. And indeed, there's the median up there, 99.552, 99.552. And you see the values go there. The length of the vectors is going to tell us how many values. Indeed, there was a thousand in it. We can ask for this thousand that we got what the skewness is and what the ketosis is. It'll work that out for us very well. We can also fit an array of values to a distribution. And here we can use the fit command there. And we say we want it as a normal distribution and we give it our thousand values. And it's going to tell us, well, that thousand that you have has a mean of 99.547, which we already knew, and a standard deviation of 20.7, which is just what it grabbed there. So fit is going to do that for you. Let's look at another distribution, the binomial distribution. Now remember there's a couple of things to the binomial distribution. By meaning two, you've got to have an outcome, only two outcomes. And the flip of a coin, for instance, is only heads or tails. And also you must have the fact that every time you run the experiment, the previous outcome should not influence the current outcome. In other words, if I just flipped a coin and I flipped it again, the second flip is not determined by the outcome of the first flip. So I've got a little vignette, a little story for you here. Let's imagine that we do 30 cases of a certain procedure and the probability of a complication occurs in 2% of cases. Let's make it 0.02. Let's keep it there. So that complication, the patient either has a complication or they don't. So it is binomial. And we know from historical data that this complication occurs in 2% of these operative cases. And that occurrence of that complication would be success. So don't look at the literal definition of the term success because any type of complication is not a success. But the outcome that we're looking for happens in 2%. That outcome, that 2% is our success rate. And we can now ask the following. So give me, I've used a computer variable named x underscore binom from a randomly select for me 500 cases. Take it from a binomial distribution. And this binomial distribution says if I were to do something 30 times. So in the next 30 cases that I do, that's my n, my n value. And the probability of something occurring is only 2%. So use that distribution. And if I were to do these 30 with an incidence of success of only 2%, if I were to repeat this 500 times, tell me in each time what the outcome is going to be. And there you see it. If I do these 30 cases in a row, the first time I did the 30 cases, I got zero complication, zero success. The second time I did the 30 cases in a row, I got two complication, zero, zero, etc. So that would give us this kind of, if I could do a histogram of that, let's just go there, let's draw a histogram of that. And we see, well indeed, it was much the probability or the likelihood of me having zero complications was pretty high if I only did 30 cases and my success rate or my complication rates only 2%. We can see that much better here with a density, with a density estimate here. And indeed, that is so. Your highest likelihood is going to be in 30 cases. You only do 30 at a time. If I were to now report, if I were to do a trial and I only included 30 cases, if something occurred in only 2% of cases anyway, I had this highest probability of a net run of 30 to have no complications whatsoever. I had a smaller likelihood of having one and then two and three and it gets less and less and less as we go. So that gives us an indication, if I'm going to do a trial and I'm only going to include 30 patients and something occurs in such a small percentage of them, the highest likelihood was in the run of 30 that I was going to get nothing whatsoever. So those are two of the probabilities. There are many, many more. You see the website here. You can just go on there and there's a multitude of different distributions that you can use. Here's one example of a continuous random variable and a discrete random variable. I've shown you here, but the distribution package is much more than this. That can also give you a univariate analysis, multivariate, not analysis, univariate random variables, multivariable random variables and even matrix variate random variables. So very powerful package indeed. And as you can see here, very, very easy to use inside of Julia and just asking for some random variables from a certain distribution. Indeed, very easy to do, very powerful.