 And then we'll move on to David Prieto, who is going to help us compete with the presentations next door, because it's like heavy methodology. David is a statistician who is based here, as well as at UCL, as well as in Spain. And he's recently taken on this topic and this domain after working a lot on big data and clinical trials and pharmacovigilics. Okay, thanks a lot. So, well, I have to apologize first, because I don't know much about non-communicable diseases or I don't know anything about non-communicable diseases, probably. And I don't know much about challenging settings. I work in a very comfortable setting. Most of my job is up the road in UCL. At UCL, we're working far institute for electronic health records. So, we analyze loads of data. I kind of swim in a wealth of data. So, I don't have all these problems that I'm listening here about lack of data and bad data, things like that. My data is pretty good. So, I don't know what I'm doing here. I should be the next. So, I can do those guys there. But the thing is, I came across Bayard and Pablo and Kieran and we worked together in some things and they asked me to do a couple of things. Oh, thanks, sorry. So, I think this is actually a more challenging and more interesting setting than next door. Don't tell them. So, when I was preparing this, because I'm not an expert on these two things, NCD is challenging settings, I thought I would talk about myself and tell you my story. My story in this particular problems. The other thing is that, from what we've seen here, it transpires that there is a lack of evidence in how we, in what we know about NCD is in humanitarian settings. Now, to produce evidence, you need two things. You need data, the data, and analysis of that data. Now, I'm in the analysis part of the data. Now, statisticians, when they do the analysis of the data, assume the data has certain properties. You've collected the data with certain quality. And they normally blame you if they don't do it. And if you don't get what you want, they blame you even more. So, the blame is always on the data. So, we tend to do that. So, the thing is that most of our statistical methods assume some qualities in that data, some good qualities of the data that you've collected. And I've realized very soon, very early, that in these settings, you cannot rely on those assumptions. So, you have to improvise. You cannot rely on the standard methods that you take of the shelves and you read on the books. So, I might disappoint you. I'm not going to tell you about super innovative statistical methods. I'm just going to tell you about the old methods and how we try to adapt the old methods to data that is not excellent and very good data. So, I might talk about two examples that have been involved in. One is the estimation of the prevalence of disease diabetes in a population where we don't have access to the population. And the other one is trying to generate some evidence, trying to measure the effectiveness of an intervention in a refugee camp where you cannot do a randomized clinical trial. So, if you want to generate evidence on the efficacy or the effectiveness, I never remember the word, of an intervention, you should do a randomized clinical trial. That's what it says in old books, of people numerology and statistics. But that's not always possible. So, what do you do instead of that? Let's go to the first one, to the problems one. So, Pablo and Kieran came and say, DRC project, we would like to estimate the prevalence of diabetes in the population. Can you calculate sample size? So, quickly calculate sample size, that was very easy. And then they say, oh, but we cannot just reach individual people. We have to do a cluster survey. We have to go to a village and then get some people in the village and then move to another village. So, you recalculate the sample size with the cluster design. And they say, well, the security situation has deteriorated. So, we cannot just random sample, get a random sample of villages. We can only go to a few big towns and then get some people very quickly because then we have to move out before the bad guys come in. So, we have kind of recalculate again and take into account the biases. And then they came and say, well, we actually cannot leave the hospital. We have to stay in. So, in my statistics group, there is no chapter where it says how to estimate the prevalence of disease in a population where you cannot actually go and survey the population. So, we have to kind of improvise things. So, this is schematically, I've deleted most of the formulas by the way. What we want to do, we have a population here. Some people have diabetes. This is a health facility. We like to go there and sample people randomly in the population and then just get a proportion of them that have diabetes and all of that. This is how you normally do. But you cannot do that. You have to stay in your health facility and let people come in. And you can actually only see the people that you have inside. So, how do you calculate the proportion of these guys in the population from this number here? Now, this is what gets interesting, right? I'll give you an example of what we thought about. We thought that, okay, those people, they are all not going to come into the health facility. So, within a period, a window of time, only a proportion of them, let's call it K1, 10% of them, whatever it is, come into your health facility. Now, that means that this number here is the total number of individuals times that proportion. You multiply a proportion of income into the hospital. So, if we knew K1, then we can calculate back the number of people with diabetes in the population. That was very clever, but we don't know K1. Now, we know what proportion of people end up. So, we thought about something else. We might have other diseases in that population that we might know better than diabetes, maybe malaria or something else. A proportion of those people will also come. So, we have this H1 that we know, people with diabetes in the hospital, people with these other diseases in the hospital, and they both follow these formulas, right? So, what about if we divide this one over that one? I started thinking about it because that's what mathematicians do when we don't know what to do. So, I've deleted the formulas and put everything with graphs, so I think people can follow. So, this is the number of people with diabetes multiplied by the probability that they will come to the hospital is the number of people with diabetes in the hospital, and this is with the other disease. So, if I now do a very complicated move and then put this thing on the other side, right, that goes here, that goes there, the simple algebra, and these goes there. You get that the number of people with diabetes will be equal to that, multiplied by that, multiplied by that, right? So, the number of people with diabetes is equal to the ratio of diabetic patients over the other patients in the hospital, multiplied by the ratio of these probabilities of coming and seek help in the hospital from these two kind of patients, multiplied by the number of people with the other disease in the population. Now, if you divide these by population, and this turns out to be the prevalence of diabetes in the population, that turns out to be the prevalence of the other disease in the population. Now, I still need to find this, and this is data that I can get from the hospital. This I don't know, and that I don't know. But, if I think about it carefully, the first big here is the probability of seeking help and service from patients of these two different diseases, right? So, I have to figure out this ratio. I could give numbers to the... I could guess numbers for that ratio and do different estimations. Or maybe I can think, well, maybe they are the same. Maybe patients from diabetes and from these other disease will have the same probabilities of ending up in the hospital. So, this is just one. And the other bit of unknown information is the prevalence of disease, too, in the population. Now, the good thing here, the trick here is to choose a disease that you actually know was the prevalence in this population from previous studies. So, let's say malaria is a very well-studied disease, or HIV, for instance. So, by putting something here and putting something in there from previous studies, then you go back to the formula, you have that one, that one, and that one, and you have an estimate of the prevalence of that diabetes in the population without living in the hospital. Yeah? That's the kind of thing. We haven't tested these, by the way. But this is the kind of things that you will have to get your statistician to do if you hire one of them to help you in these very complicated settings. Right? The good thing about that, then I got excited about this, and said, oh, what about if you know a particular disease, you can turn the question around. If you know the prevalence of a disease, and maybe you have two different towns and the prevalence of the disease is similar in both towns, but you're not sure about how people seek a health services assistance in both towns, how often they come to their hospital from both towns. It could be a very interesting question to see if there are a difference between different human groups in seeking your health services, right? So if you actually know the prevalence of the disease in these groups, then your unknown could be this thing here, the ratio of the probability of going to the hospital between two groups. And these would be the known thing, you put it on the other side of the equation, and this is observed, that's known, and then you can get whether in one location people are more able to go to your hospital than in another location. Three minutes, oh my God, okay. All right, so I might not get to the end of the presentation. So advantage of this is obviously you don't have to leave the hospital, you don't have to go out there. You can do a certifications of your estimations by patient characteristics like age, sex, residence, a lot of things. You can use information from, previous information from nearby facilities, nearby populations, historical information to inform the parameters. You can do continuous monitoring of prevalence over time without having to repeat very expensive surveys out there in the population. And if you had a reasonable knowledge on the prevalence, then you can turn the question around and try to estimate how often people go and seek help for the problems, yeah. The other example I want to give is how do you estimate the efficacy of an intervention when you cannot do a randomized clinical trial? And this is an example of a project in a refugee camp in Lebanon. And the intervention is, can I say this? Yeah, okay. It's not a state secret. It's using the polyfill, for instance, for CBD patients, right? But you cannot randomize individual patients. You have to give it to the whole area, to the whole camp or not. What we were thinking of doing is interrupt the time series analysis. So this is an outcome. Let's say this was blood pressure or whatever it is. And you divide your period of observation in two periods. Control periods before you put the intervention, that's the intervention, and after you do the intervention, intervention period. Basically, you compare the time series of something that you want to monitor. In this case, it will be kind of blood pressure, to the drug. Before the intervention, we have to do, you check how it changes. It might change with the change in the slope, or with the sudden change in the level, right? And this is a method called interrupted time series analysis. But you want to have a control group, because there could be circumstances that make people change over time. Maybe when you did your intervention, something else happened, some political situation or war situation happened. So you ideally want to have another side where you're doing the same thing without the intervention. And then you compare the change in this side with the change in that side. So these are basically two ways where you try to... These have several advantages, because you use the same group before and after to our self-control. That takes away a lot of compounding. And the main confounders could be things that change over time. But to control for that, you use another group. And you don't kind of randomize the people, the people, the communities. Okay? And I think that's it. Any questions? Thank you, David. I'm sorry that we had to keep it short, but I think there's a lot to say there and there are some good insights. Any clarification questions before I hand over to David the main discussion? No.