 Okay, so this is a quite interesting video and I think it's very topical at the moment Of course we get all these results from vaccine trials and when you read these trials You'll see a lot of them expresses the efficacy of The vaccine now in order for us to determine efficacy. We look at the relative risk So this video I'm going to show you a notebook two notebooks in actual fact I'm gonna use our just to explain the language for for statistical computing Just what relative risk is and how we then use it to calculate efficacy And we're also going to look at a specific trial in South Africa There was a small trial as far as the AstraZeneca vaccine was concerned and it showed very poor efficacy And because of that the vaccine was withdrawn from South Africa Which of course is a very big decision was the first vaccine It was supposed to go to healthcare workers such as myself and the vaccine was withdrawn because of the poor showing So we I'm going to show you how they got to this very bad efficacy level of only 21 percent And of course, it's not only about that. It's about this the confidence intervals around that efficacy And that's what I want to explain now, of course with doing that was a very difficult decision And it wasn't only the clinical data. There was some data as well And I'll talk a little bit about it about the neutralizing antibodies, etc A lot more went into the the decision not to use the AstraZeneca vaccine in South Africa So it's all about relative risk and then in this video, I'll show you just how How a relative risk works how you can calculate it and how from that we get the efficacy So here we are on my art pubs website You can view all the documents that I upload you can view them here And of course the original the rmd files are available on github and the links to all of these things will be In the description down below. So we're just going to look at uncertainty in relative risk So first what is risk what is relative risk and how do we express uncertainty in that? And of course, it's very topical with the vaccines these days in the pandemic And we express the efficacy of these vaccines. I want to show you just just how to do that So in this first notebook, we're just going to talk about what risk is and what relative risk is And then what efficacy is and then how can we express uncertainty just due to randomness? We do one trial But there is a lot of other trials that if if we could do it we could do it over and over and over again And this one trial that we have is just one of the possible outcomes And we need to look at where the actual efficacy lies and it'll be in some range That we can simulate around this one This one Study that we've done. So let's first have a look at what risk is And you see then equation one It's just the number of positive outcomes divided by the total in the group Now we've got to talk about what a positive outcome is And we say positive outcome, but it doesn't sentimentally have to be positive The outcome can be quite negative and in this instance It might be well develop the disease and certainly that's not positive But we just use this term. That's the positive outcome that the outcome that we are interested in in an investigating So if we have a bunch of participants in a study A certain number of them will develop this positive outcome the outcome that we're interested in as I said, which might be quite a negative thing But if we just take that ratio that fraction the number of positive outcomes divided by the total number in the group That is the risk Now if we've got two groups one group for instance, I received a placebo intervention and one group received a treatment intervention We can just look at this Difference in the risk between the two So we'll calculate risk as per equation one for both of them And we just take the difference. So the risk in the control minus the risk in the treatment And what we're assuming here is that the risk in the treatment will be lower Because the other name for risk difference is absolute risk reduction ARR absolute risk reduction That's just the difference in the risk And that's a very good way to express A difference in risk between a control group receiving the placebo and the treatment group getting this active intervention So just that difference So you've just got to just watch out if the risk in the treatment ends up being higher than the risk in the control Of course, then we cannot talk about absolute risk reduction Or we still can't talk about absolute risk reduction. We just got to say that being in the placebo group Gives us a reduction in risk for this positive outcome that we're talking about You see there the number needed to treat that is just an equation three there We're not going to discuss it. I've just put that in for you. So what we really want to do is talk about relative risk You'll see our our risk ratio relative risk That is The fraction of the risk of the intervention over the risk of the control So we've worked out the risk in both groups very simple calculations the positive divided by the total the positive divided by the total And for these two we'll just Um We'll we'll just take the ratio Now what we're hoping for you, of course is that the risk The positive outcomes that fraction will be lower in the intervention or the treatment group than in the control group And if that is so this fraction of course will be less than one And if it's less than one it means The risk is lowered by being in the intervention or treatment group So that's when we express this idea of efficacy So let's imagine here in this piecewise defined function On the top row here for efficacy that the risk in the treatment group is lower than the risk in the control group that it was good to develop this because fewer people Fraction wise for the individual groups Was lower if you were in the treatment group than in the control group That means this relative risk will be less than one It'll be a fraction of one And what we do is we subtract that relative risk from one so one minus the relative risk And that gives us the efficacy Now i've put the second part in here, but we really don't use the second part I just want it's just to to Keep your mind or to to simulate your stimulate your mind a little bit Imagine that the risk in the treatment was higher Then we can't really talk about efficacy anymore In as much as if this positive outcome was a negative thing and we want the treatment to lower that risk But now it's going to increase that risk. So we just say relative risk minus one But I warn you there then we can't really talk about efficacy anymore The efficacy is really just for that first line One minus the relative risk. So let's look at some coding examples. So we're going to simulate a study We're taking 41,100 people in our study Which is when these vaccine trials were developed, what is the power? We need to to see how many people need to be in there so that we reach a sufficient Power to this to to to find this significant difference between the two So we say here now control group that were 20,500 people and in our treatment group There were 20,000 six 20,600 people And we're saying in this trial 350 people in the control group develop the positive outcome. So this is just thumbs up numbers Not from a specific trial And 115 in the treatment group develop the positive outcome So in the case of a vaccine We have 350 people in the control group. They just received a placebo They developed the disease and only 115 then in the treatment group who got the active vaccine Develop the positive outcome which is developing the disease So we're just saving that also as some computer variables. So we've got a underscore control and underscore treatment And if we just divide those 350 divided by 115 That's not something that we would express but just for interstate then There are three times more positive cases in the control group than the treatment group now There is no denominator in here. So if I had lots more people in the placebo group Then in in the treatment group that value was going to change So that's not a valid expression really of what is happening here. It's just expressing in this Nearly equal groups that we had There were three three times more positive cases in the control group. What we interested though is is is the risk So let's see what the risk in the control group is which I'm going to call risk underscore control here So that's a that's the control. How many got the disease divided by how many was in the control group So that's going to be the 350 divided by 20 500 and we see a value of 0.017 Now what you would commonly see is this multiplied by a thousand So you can say that the risk of developing the disease would be if we multiply that by a thousand the one two three That'll be 17 people per thousand people So you'll always see that per thousand per ten thousand per hundred thousand depending on what makes sense Now let's do the same for the treatment group. So that's the A underscore treatment That's how many got the disease in the treatment group and divided by how many there were that's just the risk And we see a much lower risk here of 0.005. So there'll be five and a half people per thousand people Develop the disease Was it the risk then of developing the disease and now we're just going to do the relative risk So it's the risk of the treatment over the risk of the control And now we see it's 0.32 multiplied by a hundred. That's about that's 32.7 So we want to express efficacy though So we're going to subtract it from one. So one minus that one minus the relative risk and we get the efficacy 67 So we can say the efficacy of this Of of this vaccine was 67 And that is how much it lowered the risk really that is what we're expressing But as I said before that is just one study we did And if we started a week later, we would have had other people in our study if we could do this over and over again We would get different results every time So this one result this one efficacy that we found this is one of many possible ones And what we are interested in is not only this 40 41 100 people in our study We want to infer these results because it's just a sample On the larger population. So what is the true efficacy in the population as a whole because eventually we want to inoculate everyone So we need to express a confidence interval around this efficacy And how do we go about that? You know, so in different papers, you'll see different methods of doing this There are different equations for doing this clopper Pearson There's some other ways of doing this as well But what we're going to do here is something that I like to do and that is just to do resampling From the results that we found And in that way we're going to build up simulate a bunch We're going to do a thousand As if we're doing our study a thousand times over that gives us a distribution of poor possible Efficacies and we can then calculate the 95 percent confidence interval Now remember what confidence interval means. Maybe I should should keep quiet now When we get to confidence intervals when we've calculated the confidence intervals, I'll just remind you exactly what that means So here in our we've created a little function. I'm going to call it simulate underscore group It is a function. It takes two parameters n and p Now what we're going to do is create this Local variable inside of the function. It doesn't exist in the Global space. So it's just internal to our function. I'm using the our unif function random uniform So we're going to draw a random number from a uniform distribution uniform distribution Remember is that every element in the sample space Of of this distribution has an equal likelihood of being chosen at every time we run the experiment So the random uniform is going to draw From zero to one in that interval on that interval zero to one. It just draws a random value So it will then throw that value back in the bin so that the next time around we do it that number is It is possible to draw that number again But every value from zero to one has an equal likelihood of being chosen a uniform distribution It's not bell shape that certain values are more likely to be chosen at random. It's uniform And now what we're going to do is we're going to look at this number That is chosen and we're just going to sum them up if That value is less than p the probability Now that p that we're talking about here is actually the risk remember There were a certain number of people who got the disease inside of that sample of people And if you think about that that the risk that is if we do that division The number got positive divided by how many there were That's in essence a probability of this p that we're talking about here in our function the parameter that we've called p So we're saying we're going to simulate this And we're going to say that a random person We're just simulating this random person. They get this random value And if that value is less than this p p value now, I'm not talking about a statistical p value not the famous p value I'm just saying this risk If that was smaller than that when then we're going to start summing them up And that's a simulation because anyone can get from from zero to one And this value that was chosen there and we only sum them up We only sum it up When it is less than the numbers that are come out less than this p value the risk in in actual fact And what we're going to return is this k divided by in The in that we supply to the function the sample size So let's let's look at that. So I'm storing p now as In this instance, we're going to take the risk in the treatment group. So the number got positive And divided by the total number. They were that's our p And now we're going to run a single simulation So we're just going to say in treatment p and we just run that And now we have not seeded the seed of random number generator here So every time you run this code, you're going to get a different value But in this instance, I got 0.0055 Now we're going to use list comprehension and you'll see early on this notebook. I just I just Loaded this comprehend our Package because it contains this function to VEC And it's going to allow us to do list comprehension or in this instance vector comprehension So I'm just going to simulate this thousand values all in one go So that I just have this vector of a thousand possible instances I'm going to store that as t So we're just going to simulate this risk in actual fact over and over and over and over again So sometimes The value is going to be more than p and it won't be summed and sometimes it's less than p and it will be summed From what we've done way up there So if we do that, we get these thousand values and we can draw this kernel density estimate plot And it shows us this distribution of risks simulated risks There we go for these thousand times as if we could do the study a thousand times over And there you see the distribution of risk a lot of them here by 0.05 just over that open of 0.05 three thereabouts four And that was just simulating that over and over again Now one way that we can express this sort of range of values The measures of dispersion would be standard error And what we've done here is just to take the standard error. That's the standard deviation of t in this instance and we can see it there And because we've got this distribution now we can we can calculate the From all these values these t values we can calculate these cut-off values So for that i'm going to use the quantile function And i'm just passing that 1000 values to it And i'm saying calculate for us these intervals that will be five percent cutoff at the bottom 95 percent cutoff at the top. So if you look at this basically it's going to cut off The bottom five percent and cut off the top five percent and that gives us a 90 confidence interval then and the values within this 1000 values of ours that gave us that was 0.00475 to 0.0064 So then we're saying a 90 percent confidence interval around this Around this risk of ours the the the Mean risk Was going to be these values. So you're not necessarily interested in this single one. We want the relative risk So we're going to create another little Equation here another function. I should say and it's just going to do what we've done before twice For the control group and for the treatment group this time so that we can have this efficacy as one minus The relative risk Risk two that's going to be the treatment divided by risk one, which is going to be the control So we've got to pass four parameters there So the p-value that we are going to ask for it to count below As the p1 will be the controls risk and the p2 will be the treatment arms risk And then we just simulate that once And it'll give us back one possible efficacy 65.9 percent Now again using this two-vec function, I'm going to create this vector of a thousand Efficacies all in one go calling this function of our simulate trial And the simulate trial remember it calls the simulate underscore group function twice one for each of these Two groups of ours And if we plot the kernel density estimate there, we see that we have this Range of values possible values. We just simulated So we resampled based on this finding that we had for this specific study And then we see the mean efficacy of the 1000 simulator trials was 0.67 our actual one was also 0.67 That's where it's going to work out. So if we want a 90 confidence interval for that We just pass those thousand values to the quantile function And we set a probability cut off the bottom 5 percent the top 5 percent What were the values within t2 Which is where we saved a thousand values would be 60.8 to 72.7 So we can say the efficacy of this study for this drug or this vaccine was 67 percent 90 or 90 confidence intervals 60 to 72 Or about 61 percent about 73 percent So that's 90 confidence intervals And now let me just remind you what confidence intervals is we we're not saying that we are 90 confident That in the globe as a whole if we gave it to everyone And we are 90 it's not to say that we're 90 percent sure that the true efficacy in the Will population as a whole will be this. No, that's not what it means. It means if we ran the study 100 times over Just imagine what that would cost now long that would take but if we ran it 100 times over Every time we're going to get slightly different confidence intervals in 90 percent of those The true population Will be True population efficacy will be within the confidence Intervals that we set so we're not saying that's true for the specific one is if we could do it 100 times over in 90 of them 90 of them will capture within its bounds the true population efficacy Now that we've done that let's go and see the trial that was done in South Africa with the astrazeneca trial And now this trial i'm going to warn you i mean from the time of this recording It's just on as available the pre-printed was not Not accepted by a journal yet not peer reviewed yet And on the space on the evidence of this very small study The the vaccine was withdrawn from South Africa So there was about a million doses available. This was the first doses Of any vaccine to land in South Africa It was ready to be distributed amongst healthcare workers And it got pulled On the strength of the study now the study has more than one component So what we're going to look at is just the clinical side of it Looking at the relative risk and the efficacy But it was also about it was about more than that there was Part of the analysis was also on the neutralizing antibodies And whether they could were effective against the the the variant that's prevalent within South Africa So it's not just based on this But i want to show you What what the what relative risk does in efficacy what it would look like at least the Uncertainty in the value when we're dealing with a small trial and how difficult it becomes In a small trial where the risk is relatively low. So let's look at that So you can click on the link there to find the pre-print of this article And there were some potent points So the primary endpoint and i've just listed them here the primary endpoint was this efficacy of Confirmed symptomatic COVID-19 more than 14 days after the second dose So waited 14 days after the second dose then start looking at whether people develop confirmed symptomatic So the primary for the primary efficacy analysis only per Perperatical seronegative participants were included And the vex vaccine efficacy ve was calculated as one minus the relative risk and 95 percent confidence intervals were calculated Using the clopper Pearson exact method as reported. So that's not what we're doing. We are using resampling Which is what i like to do There's a bunch of these tests and they have their strengths and weaknesses and debate about You know, whether they work or not and when they should be used and when not My feeling on the matter use use resampling So with a thousand and ten participants received vaccine and a thousand and eleven received placebo And in the end there was a lot of exclusions You can read the paper as to why in the end though remember our Simulated study before had over 40,000 people in them 20,000 more than 20,000 each arm Yeah, there was only seven hundred and fifteen in the vaccine group. There was analyzed in seven hundred and seven fifteen vaccine In seven seventeen in the placebo group. So very small now and Other problems with the trial and and i don't say that's problems. The trial was correctly done Is this we want to infer that to a larger population the population out there and of course The participants in the study were not representative of the population as a whole. It was only representative of certain people In the population because the median age was very young was only three one years 56 and a half percent were male And then there was a racial distribution as well All 42 endpoint cases were graded as either mild or moderate. So there were no severe cases Following this trial. So even in the placebo group, there was no Uh, severe cases and we can see in the vaccine group 15 Mild and four moderate and then 17 mild and six moderate So there are no cases of severe disease or hospitalization in either arm So that makes it very difficult because half of these people receive placebo To infer the results to the larger population So let's just see You know if we do the relative risk And we look specifically at the uncertainty around the relative risk. That's what this is all about So there was 717 in the control group 750 in the treatment group 17 plus six that's the mild plus moderate I got the disease in the control group and 15 plus four that's in the treatment group So we're going to work out the risk in the control group Which is just that very simple ratio and the risk in the treatment group so that we can express the relative risk And the relative risk as you can see there that was lower than 1 So we subtracted from 1 and we get the efficacy and that was the headline. It was only 21 effective Only 21 effective and of course Mostly you'll see that we're looking for at least 50 efficacy in these but here comes the problem And we need to express our uncertainty In this 21% efficacy And so again, we're going to create our two functions the one that just simulates again from a random uniform distribution Just as we did before only counting those That was lower than this risk value And then we're going to simulate it The two risks and we express the efficacy again as 1 minus The risk 2 over risk 1 so risk of the treatment divided by risk of the control group NLP values here again remember that just taken from the outcome of this actual study The two risks and then we're going to simulate it and I've run a simulation just this once And we get a negative value We get a negative value And now this becomes very difficult to interpret And let me show you why because now we're going to simulate it a thousand times using this two-vec function So we can do list or vector comprehension So in one go we get our thousand values And here I've used plotly and not the inbuilt r plotting So we can see here this distribution around the around the efficacy And you can see how it tails off below zero Now how do we interpret that? So I'll leave that for you in the comments to interpret that Because if you think about it, we're actually back in the real realm of the vaccine actually increasing the risk Okay, and that's the way that the values are going to come out And you can see this bump here right at the top It is at about About 21 22 23 percent and the efficacy And what we're interested in now is the 95 percent confidence intervals So we said the efficacy of this study of the astrazeneca Vaccine was only 21 and now you see the 95 percent confidence intervals around that From negative 50 percent to about 59 percent So it's quite possible that the efficacy of this The efficacy out in the population just based on the study that the efficacy of astrazeneca vaccine out in the population Is somewhere between there So this is bizarre on this side, of course And as I say leave your thoughts in the comment what this negative values would really be but This is we're still within the 95 percent confidence intervals almost 60 percent efficacy of the astrazeneca so Very difficult to make decisions based on a small study like this Where our risk is very low The risk is very low for someone to develop the disease both in the placebo arm and in the treatment arm and The efficacy they've The the two risks were very close to each other meaning the efficacy was only 21 percent But because these values are so small and we simulate a thousand of these when we express our uncertainty in this 21 percent you see that this uncertainty is quite big and as I say this The the the idea behind the study was not only looking at this to make the decision to withdraw the use of the vaccine Astrazeneca vaccine in South Africa because a part of the study was also about neutralizing antibodies, etc There's a whole other parts to that But see how difficult it is How difficult it can be if you've got relatively small sample sizes and the risk is also small Then you are going to struggle to just make a decision Because your uncertainty in your efficacy is not going to be very good There's going to be a wide confidence interval in the efficacy So I hope this was interesting for you. Leave some comments down below Remember to like the video You can read this these notebooks here on our pubs or follow the link down below to download the rmd files the our markdown files if you've got our studio You can just import that and you can play around with these values Maybe simulate some more draw some pretty graphs, etc And I hope this really helped you to understand the idea a very relevant example of understanding relative risk and efficacy