 Well, thank you Keith for that introduction and thank you for inviting me and also thanks to all the participants for giving up their Christmas preparation time to do so. So as the title suggests what I'm going to do in this talk is explain the fundamental limitations of what we can learn from data alone, even using the fanciest machine learning techniques. And when I say that causal reasoning and knowledge comes to the rescue what I mean ultimately is that Bayesian reasoning and causal Bayesian network networks come to the rescue so I'm going to explain what these are. And I'm going to use them to resolve two statistical paradoxes which are called Simpson and Berkson paradoxes, which inevitably compromise what you can learn from data alone. Can I just check that everyone can hear me okay. We can hear you. There's no slides yet I guess that's. No, no, sorry that that should. Let me just check why that is. Okay, now it's coming. Okay, so that's the thought I was sharing and I wasn't okay. And so I'll present some real world examples and their Bayesian network solutions. Now, in case there's a suspicion that we can overcome the kinds of problems that I'm going to talk about if we had really big data. Then I'm going to end with a surprisingly simple example which slams the door on that idea. So I'm not saying that machine learning algorithms are waste of time. There are many important applications like certain types of classification and image recognition, whereas, whereas possible with sufficient data to achieve human level accuracy but it's not AI. And those techniques don't help for most types of critical decision making where data are limited so I'm talking about problems like assessing and avoiding cyber security risk, determine an optical medical treatment for specific conditions. Determining whether a new type of aircraft will meet aviation safety standards, predicting operational risk exposure of a bank. These are some of the many areas where these Bayesian network methods, which enable us to fuse limited data with causal knowledge have been used successfully. And this will also be of special interest, presumably to many of you is that there's actually this is I found this recent special journal issue which focuses entirely entirely on the use of Bayesian networks to improve improving environmental risk assessment, which is also something that we've worked with including done some work with Keith himself and their group. So, here is a typical example of a big data driven problem where machine learning algorithms are used. So banks gather comprehensive data on customers to whom they give loans, and they use this information to help them future assess to help them future assess risk of future customers are those most likely to default on a loan. So, here I've highlighted. So here I've highlighted the customers who've defaulted. Now what the statistical machine learning algorithms do is learn what the distinguishing features are of those customers who default and how they differ from those who don't default. The first problem with this type of data so it might be big. It's completely missing data for entire class of customers who we also need to learn about. Specifically, yeah, sorry. So I'm having trouble with my screen. So specifically, it's restricted to customers who are given loans. And it tells us it's got tells us nothing is there's no data there for an entire class of customers for the for those customers who weren't given loans. So we say that such data is censored and therefore biased. And we can't learn anything about customers who refuse loans. And this creates this creates other problems leading to fundamentally flawed algorithms. So take a look at the ones are now highlighting. So these are this is a handful of customers who happen to be unemployed teenagers. And they've been given very large loans and they don't default. See there's no default here. Now we know that most unemployed teenagers are going to ask for big loans are refused because they'd almost certainly default. But that those teenagers never made it into the data and so we don't learn such obvious characteristics. Instead, what we have here is a small number who don't default because they were special case exceptions. It turns out they were the children of wealthy customers known to the bank. So instead of learning that unemployed teenagers are high risk to purely data driven algorithm learns the exact opposite unemployed teenagers apparently don't default. Now here's a model which represents essentially what all machine learning and statistical algorithms are doing. There's they basically learn this this the target outcome as a function of the inputs for which they have data. But the model can't learn the impact and relationships involving unobserved variables like I've highlighted here. So the variable loan given was censored because we only had data for for those who were given loans. And we're also missing special client information and latent variable loan suitability. So expert domain knowledge and common sense understanding of causality can provide the graphical causal model before considering the data. And that also would include relationships between things like salary being dependent on agent employment. Now there are real benefits in simply drawing graphical causal models like this before doing any data collection analytics. We need to guide the collection of the right data and avoid most of the errors that people make when analyzing and learning from data. But to use such models for prediction inference and decision making. We need a computational framework to support it and base there and provides exactly that. So to keep things topical I'll introduce base room by way of the coven 19 testing example. So let's suppose that the false positive rate for a particular test is 2%. So that means that for every 100 people without COVID, there'll be two test positive and 98 will correctly test negative. Suppose there's 20% false negative rate. So that means that for every 100 people with COVID, there'll be 20 or wrongly test negative but 18 will correctly test positive. So let's suppose that the current population infection rate is one in 200. And that in random testing Sarah test positive. So what's the probability that she has the virus. Well, let's imagine a random sample of 1000 people because there's a one in 200 infection rate about five of these are going to have the virus. But that means that 9995 of the others don't have the virus. Now, because there's this 80% true positive rate for the test that means about four of the five who have the virus will test positive. But because of the false because of the false positive rate, which was 2%, about 20 of the other 995 will test positive. So if we actually strip away all those who don't test positive are only less left with those who test positive here. What we have is about 24 testing positive of whom the red ones which is about four actually have the virus. So that's about one in six, which is less than 17% of those who test positive actually have the virus. So Sarah test positive what we can actually conclude is that it's about one in six probability that she has the virus. Now what you had here was basically a visual explanation of Bayes theorem. But formally we can think of Bayes theorem as providing the underlying computational framework for performing the inference that we need in causal models. So with Bayes we start with an unknown hypothesis H, such as in this case the person has covered. We have a prior belief in H which we express as a probability in this case one in 200 or 0.005 and we've got a corresponding probability table for that. Then we get some evidence about H in this case a test result and note the direction here is a causal direction. The test result is caused by the virus status and not the other way around. But we know the probability of the evidence given whether H is true or false and we write this as the probability of E given H and the probability of E not given H. And we've got the probabilities for that, which can also express this rest associated with a probability table associated with this node. But what we really want to know is the revised probability of H given the evidence I the probability of H given E. And this is called the posterior belief in H I we want to know how to update our belief in an unknown hypothesis when we observe evidence about it. I we want to be able to infer back from the high evidence to that unknown hypothesis and base there and provides the answer to this with this formula. And that expands to this and all of these terms, we actually know the values and when we plug those in, we actually get that result which was approximated by the visual representation, which is just under 17%. Now, the good news is you never have to worry about doing any of these calculations manually, because what we have here is actually a very, very simple example of a Bayesian network which is just a directed graph like this with these associated probability tables alongside each node. And in these models we can do the prediction forward from inference course effects and backwards. And here's an example of that that model actually running in one of these tools I'm just going to run this now. So first of all you can see I can select that node and look at the node probability table. There you see it. Now I can enter an observation like positive on the test result here, and you can see update automatically updates when I run the model, the probability here and if I put negative in and run the model again. You can see it's updated to very low probability not point 1% the person has covered. Now, of course that was a very simple model most models. Most problems like this when you dig deeper and more complex and involve multiple unknown hypothesis and evidence with complex interdependencies so we might for example, have a second run a second test. So that a person having covered is dependent on whether they've been in contact with an infected person, and whether they've got symptoms or not of course is going to increase that that probably is going to increase if they've got covered. And symptoms also depend on things like age and a multitude of other factors. Well actually let's run that model and let's see what happens with that model when we run it in at all. So you can start off by entering a positive observation here, which when we run it updates the probability of COVID as well as the infection rate, we enter some information like no symptoms, and we're over over 60 Ruan the model is decreased the probability is decreased. If we enter a second pot net test which is negative you can see it goes down even further. So we had a positive test, things change completely. So that gives you an example of a model running. And actually, here's a real model which our research group developed from a combination of data and knowledge working with clinicians. Models like this can be deployed and run now on web or mobile phone with a simple questionnaire interface to shield the users from any underlying complexity, and you can actually go to this website and run the model which I'll just sort of quickly demonstrate. So all that you need to do is you answer relevant questions and, you know, you click a calculation button so here we might enter the result of a test if you've had one you don't need to have had a test. You run the calculate button, and you can see the result calculated against the baseline and you can provide information about underlying medical conditions. You can enter information on any other symptoms, etc. Now we've got a rigorous and practical framework for causal models that enables us to combine data and knowledge. So we're going to return to the problem of why this framework is needed. It's the fundamental limitations of what we can learn from data alone. And the starkest and best known examples of the danger of reading too much in what can be learned from data as furious correlations like this one. For example, very close correlation between number of people drowned by falling into a pool with number of films Nicholas Cage has appeared in Japanese passenger cars sold in the US correlates very closely with suicides by crashing of vehicles. These are genuinely spurious correlations without any causation. But this one looks like it's the same it looks kind of like weird per capita consumption mozzarella cheese and civil engineering doctorates. But actually it's more problematic. Because it isn't spurious. The problem here is that there is there is actually some underlying if this is actually predictable, because there's some underlying as an underlying hidden common cause explanation. Namely wealth, as the wealth of a country increases over the years so does its consumption of many things whether it's mozzarella cheese or civil engineering document documents. Once you've got this causal model you can also see the explanation for the association between eating engineering doctors. And basically we can drop that link because the link the associate is just an association because of that hidden cause explanation. But the problem is that in general, there may well be a direct causal relationship between variables which also have a common cause. For example, we are interested in the effect of a specific treatment on the outcome of person with a particular medical condition. We know that there's a direct link but we don't know how much it is and the fundamental problem with trying to learn such a relationship from data is the hidden cause where explanation often means that we learn the wrong thing. And this is where I focus on the two paradoxes which are at the heart of all such problems here. The existence of a competitive a confounder can completely invert the correct relationship that we might relearn we might learn between treatment and outcome. So that's the base and basis of Simpsons paradox. But they're also maybe a hidden collider. And this can also completely invert the correct relationship. There are conclusions that you come to based on the data alone. So typically the collider will be a constraint on the data sample, which may have inbuilt bias. And that's just what we saw in the bank example where the collider was the fact that the data was constrained to a sample of people who previously been given bank loans. So this is called Berksons paradox or collider paradox. So this is a real example of Simpsons paradox. What we have here is a large observational study of patients treated for kidney stones. So the study was based on equal number of patients given two particular treatments and be there was a successful outcome for 78% of the patients treated with treatment a successful outcome for 83% of the treatment of the patients treated with treatment B. So clearly treatment B was most effective. Well no, it turns out that treatment A was more successful when you looked at each different type of kidney stone that was that was treated. How is this possible. Well again, what I'm going to do here is actually is actually run the model to show you what's going on. So, first, first of all, we'll confirm the results I just showed you the success rate for a is 78% and the success rate for BS 83%. But what I'm now going to do is reveal the hidden confounding variable. And there you see it is stone size, small or large. See for stone be it was much more likely that it was a large treatment be a much more likely large stones were treated treatment a. It was much more likely there was a small stone. Now, what we're going to do here is run the model but fix it for the size of the stone fix it to a large stone. And we're going to run it now for a and the result is 93% success for treatment a for be with a small stone. The success rate is 87%. We're now going to fix for fix for large stones. We're going to run the model for treatment be that 69% success rate for treatment a. It's a 73% success rate. So what's actually happened here is that for both large and small stones, I was more effective. But the overall results are confounded because treatment a was much more likely to be given to patients with large stones, which are more challenge which are more challenging to treat successfully. So although the study had an equal number of patients treated with both a and b, there wasn't an equal number of patients with large and small stones treated by each treatment. In other words, the study didn't control for the confounding variable stone size. The critical benefit of a causal Bayesian network model is that we can use it to simulate a proper randomized control trial, just from the observational data, I we can simulate the control we can simulate control of the confounding variable. All we have to do is basically cut the link from the confounding variable to the stone size I the thing that we're going to intervene on. And this enables us to calculate the overall unbiased result. So let's do exactly that. Again, in the tall, all I'm going to do now is make a copy of the model. So I'm just taking that model. I'm doing a copy and pasting it. It's exactly the same model. It's retaining all the information, but I'm breaking that link now. So what I get now when I run the model is the overall unbiased result. So overall 81% a success. But when I look at now the unbiased treatment of a the overall success rate is 83% and for. And for be the overall success rate is 78%. And so the overall treatment success rate 81% for treatment 83% treatment be 79% and there you see was that that reversal. Now here's another interesting real data from Cambridge University admissions. It's another nice example of Simpson's paradox. So it was argued that the admissions data proved bias against female applicants because the overall success rate among women was lower than that of men 23% to 24%. But in every single subject. The acceptance rate for women was actually higher or at least equal. Now what's happening here is that there are some subjects like engineering, which have an overall higher acceptance rate than other exception than other subjects like veterinary medicine, more women apply for these things than those. And that's what explains Simpson's paradox. So specifically, in this case, gender is a confounding variable causing Simpson's paradox. Let's move on to Bergs's paradox, but use exactly the same kidney stone example as before. But again, the wrong conclusion arises here for a different reason and that's the collider bias of Simpson of Bergs's paradox. So again, we've got the an equal number of patients treated with A and B. For this data, the overall success rate for treatment A has gone up it's 89%. And for B, the overall success rate has gone up to 95%. So we already know that now we already know that because those have gone up there's probably some kind of sample bias here. But also we already know that treatment A was more successful overall when you narrowed it down to the different types of stones and we know that if we controlled for that, A was also more successful overall. So we know that these results must be wrong. But the explanation is different in this case. So remember, those are the true unbiased results that they should have been. A was 83% overall successful, B was 78%. So again, what's the explanation in this case? Well, in this case, again, I'm just going to enter those observations to prove that we got the site, this is what it gets, that 89% for A and for B, we get the 95%. But what's the explanation? I'm going to reveal the hidden node here, which is the confounder, and you can see that is set to true. So we're restricted to what was in the sample. And you can see that more A were in the sample than B. If I remove the restriction to the sample, so I'm going to remove that, we've now got the unbiased results. I'll run this now for A and we'll get the results of 83%, which we know actually is the correct result. We've seen that from the previous study. For B, the overall unbiased result is 78%. So as soon as we remove the constraint on the sample, we get the correct results. Now, interestingly, the, yeah, sorry, I'll just, yeah, sorry. So interestingly, both of those paradoxes are known to have been the cause of incorrect conclusions about risk factors for COVID, based on very large observational studies. So for example, different studies concluded strangely that smoking reduces COVID risk. So they concluded that there was this causal link, that that may have been a flawed conclusion. So crucially, these studies were biased because they restricted to people who've been formally tested. And at the time, when these, when that data was collected, almost the only people were formally tested with those who either had severe COVID symptoms or were healthcare workers. And because healthcare workers are less likely to be smokers, yeah, because healthcare workers are less likely to be smokers, the sample was biased because it meant smokers were underrepresented in this sample. And healthcare workers were also more likely to get COVID, which means that we also had this healthcare worker as a confounding variable. Now once you recognize these colliders confounders, the direct causal relationship from smoker to COVID actually vanishes. And moreover, I predicted that using this model that you would get a similar incorrect conclusion if you looked at a factor like stress rather than smoker. It would also conclude that stress reduces the risk of severe COVID symptoms. And I wrote this up in this paper and lo and behold, while I was writing the paper, studies came out with exactly that result claiming that the hypertension stress also increased the risk of suffering severe COVID. That was all predictable and it's completely wrong. Now, the fundamental limitations of purely big data driven approaches has been brought to the fore recently by world leading computer scientists and AI researcher Judea Pearl. He has argued convincingly the need to incorporate causal knowledge in order to improve the state of the art of AI learning. And he characterized this by a his so called three step ladder of causation, where the steps were seeing doing and imagining. Now level one of this ladder this ladder we learn by seeing, but we only learn associations this way such as from this data is this particular kidney stone treatment effective, or in our bank loan example. We can learn whether unemployed people given loans are more likely to default. At level two we learn by doing interventions so we learn about things such as if I have this if I have this particular kidney stones treatment will it be effective for me, or in the bank example if I'm unemployed will I default on a loan. At level three we learn by imagining. Now that way we learn about counterfactual such as if I hadn't had this particular kidney stones treatment. Would I still have been okay, or in the bank example if I've been if I've been employed would I still have defaulted on my loan. Now Pearl argues that if we rely on statistical machine learning from data only, then we can only ever get to level one on this ladder. Getting to get into both levels two and three requires causal knowledge of the relationships, not just between factors in the observable data sets, but also including unobservable ones. Now I already showed you how the Bayesian net approach enables us to get to run two of the ladder. We accurately simulate the effect of intervention by breaking any links that go into the variable want to intervene on. In our example, that was the, the confounder treatment type. Sorry, that was the, yeah, that was the treatment type. So we intervene on that. But Bayesian nets also enable us to get to level three on the ladder. By creating a twin network version in which we model both the real world and the counterfactual world. But again, we drop the link into the treatment variable into the thing we want to intervene on in the counterfactual world. So suppose we learn in the real world that I had treatment be and it was successful, then what it's done is update my belief in the unknown probability of the stone size. So it's much more likely that the stone was small. And now, in the counterfactual world, I can simulate the intervention. What would have happened if I had treatment. I and it turns out it would have been it's very highly likely that it would have been a success. So causal models really done come to the rescue where machine learning from data alone files. Finally, as promised, I'm going to show you why even the biggest possible data might not be enough to learn the things that you want to learn. So I want to imagine that we've got trillions of data on the relationship between just two binary variables and independent variable X and a variable Y, which we believe depends on X. Imagine that we've got trillions of these pairs of numbers showing the relationship between X and Y. Then surely we must be able to learn the relationship between X and Y. You think so. The problem is that there's zero correlation between X and Y. I went X is zero Y is equally likely to be zero one. So the best machine learning algorithms must conclude that there's no relationship. But actually, I'm going to tell you that X and Y simply represent two light bulbs where zero is off and one is on. So there's the data for light bulb X and there's the data for light bulb Y. And here corresponds to the fact that both light bulbs are on the zero zero, they're both off the one zero X is on, Y is off, etc. And I now can tell you and reveal that actually there's a hidden switch Z, which also has an on or off position. When it's off, light bulb Y will be in exactly the same state as light bulb X. So if light bulb X is on, light bulb Y will be on. So when light bulb X is off, light bulb Y will be off. And when the switch is on, it will be the opposite. So when light bulb X is on, Y will be off. And similarly with where X is off, Y will be on. So the hidden switch provides a completely deterministically causal relationship between X and Y. Once we know the switch value on or off, the value of Y is completely determined by the value of X. And again, we've got the simple causal model and it's completely deterministic in this case if Z is zero, Y is X else, Y is not X. And that's it. Now machine learning from data can never learn that very simple causal relationship, but knowledge with data can. And this example, I mean while everybody knows that you can have correlation without causation, many assume you can't have causation without correlation. And this example proves that you certainly can. So, so what we argue then for is a so-called smart data approach rather than a big data approach. So, you know, if you focus on big data and machine learning, then you're likely to end up with not very, you know, if you're focusing just on the algorithms alone, you might end up with garbage, whereas with a bit of causal knowledge and a smart data approach, you might end up with something useful. So thank you very much. I'm happy to take questions. Good. Thank you very much. I think a really important lessons for anyone dealing with environmental data. I have a question here for you from Chris Stanker, whether you could recommend where subject matter experts who are not statisticians or data scientists can learn how to sketch out their knowledge in a graph to summarize subject matter expert knowledge for the benefit of data-driven colleagues who are not, you know, knowledgeable on base and detect whether biases might be creeping in. Yes, in fact, if I can, I'm not sure, was I, am I still, might just go back to the screenshot. I should have. So, what I didn't do on this. Yeah, so basically the information that I provided there actually covers this, this, this, this very well. So that I assume that people will have access to the, they've got the access to the materials so they'll see these links. Is that, is that right, Keith? Yes. So the point is that there are links there and in the, in the book we actually describe a kind of an idioms, a very simple idioms based approach, which helps you frame that basically helps you get the basis of a causal model in terms of very simple idioms. So for example, the idea that whether or not the, what you're dealing with, what are the underlying hypotheses and which data is unknown and which data is evidence and you have this so-called inaccuracy idiom, which is a very crucial idiom, which actually drives most of the causal models. You'd be surprised how easy it is to draw these causal models once you understand that relationship between an unknown hypothesis, some data that you might observe, and the cause of the inaccuracies of that data. And then you've got, yeah, and then there's, depending on the particular context, so we've got, for example, in legal reasoning, as opposed to other areas like medical reasoning, we have specific idioms, which are related to those subject areas. So that's our guidelines for doing that. And it does help people and we've published papers in the area, not unfortunately an environment, I'm off to say, not an environmental risk, but in the area of medical risk assessment and legal risk assessment. We've actually published papers specifically showing how to help people can use the idioms based approach without, you know, you know, in order to actually build the causal models before you actually do any of the kind of like the Bayesian analysis. Yeah, good. Thank you. I have a question from Alina. So, one has a causal model, but it's not sure on the existence of a particular link, and one wants to test it with some data. Can you do it being consistent with causal rules? So you can. Okay, so remember if there's if there isn't. Because there isn't, remember what we're talking about here in the models that I show in the Bayesian network, just because there isn't a direct link between two nodes doesn't mean that there's not a causal relationship between them. And if you think back to the example that I showed of the cheese, the mozzarella cheese and the engineering doctorates, the correct causal model didn't have doesn't have a direct link between the cheese and the doctorates, right. But nevertheless, there is a causal link via the common explanation. And you'd see that, and you get that by running the model and you can and you'd see that in the data so the model would actually that case would show that there is a causal link even though you haven't got a direct causal link shown in the model so that where so that's that's the answer to part of the question in the case where there really isn't any causal relationship, there really isn't one. And the data shows that there is one then the model is wrong. Okay, so I mean so one of the things that we do if we part of the valid. I mean, nobody is suggesting that that there's a magical way that when we get down with X when we sit down with experts and our kind of knowledge of building these models and their expert domain knowledge that we're going to come up with the right nodes and the right causal direct causal relationships each time we don't. We don't we often find, not only that we missed, we missed out key variables, but we also find that we missed direct calls, we've missed direct calls or relationships which we thought didn't exist. So, and you get that by once you've got some data, right, you can use that to validate the model because you can see whether the model is predicting what the, what the date what where you've got the data for outcomes is predicting so you get that you'll be able to see that and that's why that that's part of the validation process. I have a question here from Neha Joshi. He wants to use machine learning to estimate yield crop yield. And then wants to detect a causal relationship with stress like water pest diseases. I think it's a good question and Berkson's paradox applied to this with those. Look, look, it applies every I tried. People don't. I mean when I mean pearls book is saying that, you know, I recommend people to use before undertake any study like that although I talk about here but what I think pearls book doesn't say succinctly is what I've tried to do in this talk it's those two paradoxes. I'm looking for the founders and looking for evidence of collider bias in the data that you've got, which will screw up any, any causal relationship that you can learn purely from the machine learners you've got to identify those before you collect a lot of data. Right and before you run any machine learning algorithms. It's fine you can run them look in our tool we've got we do machine learning we can you can in our Bayesian network tool once you've got the causal model and that's the crucial thing. You can throw the spreadsheet you can throw the data at it and it will learn the parameters of the of the probability tables. It will do that. Okay. It's when I when we say it's it's learning what the nature of the causal relationship is not whether there is or is not a causal relationship and the danger with simply running a machine learning algorithm with the data that you have is that you're going to learn. It will it will learn a causal relationship, which is just wrong, you know, as we've seen it will you'll end up learning as in those cases that treatment be was better than treatment more effective and treatment a when in reality it was exactly the other way around. This is inevitable. It's inevitable. If you are not. If you if you don't realize that there are some if you unless you've controlled for every confounder and unless you're absolutely convinced there's no possible bias in the way that you've sampled the data, you will run into problems. Yeah, I'll throw one in here. I mean, a very common situation we find in our field in agriculture, for example, is that, you know, there's, there's, there's variables which people simply have no data on at least those economic variables and therefore people simply leave them out of their model, which we know is a kind of big sin and so on. But, you know, your advice would be to incorporate those variables in the model if you were aware from your expert knowledge that there's likely to be a causal association. But then how do you represent that when you've got no data? I mean, what's your advice you put in some proxies that you think like a prior and then go try to go out and get some more data on that. You go and get it. I mean, in the case of just those very simple examples of, you know, the Simpsons Paradox example, the confounder, you know, they the the observational study, the data, the original study data didn't have those obvious confounding variables. So you could say, OK, well, once I've got once I realize that there's a confounding variable there, I haven't got the data. Well, I'll go and I'll go out and collect it. But of course, that's not simple. If you've already done the studies, it's very difficult to do that retrospectively. But what you actually find with those sort of confounding variables is that you can get either from expert judgment or from actually a very, very much smaller sample. You don't have to repeat the whole experiment. From much smaller samples, you can get quite convincing evidence about what the effect of the confounder is. It's much easier to it's much easier to get to get that effect from experts and from small data sets than it is to get the information that you needed about the outcomes from those trials. So it's often the case that the things that you're missing are actually the things which people actually have more kind of like real knowledge about and are able to give you subjective judgments which are accurate and can in any case actually be got from relatively small studies. So what we've done in some cases, yes, we've gone out and not just relied on subjective judgments, but done some small surveys or questionnaires to get that missing data. A lot of the way you phrased your problems is in terms of risk assessment, but you know, in many of the problems sort of we face is a question of evaluating whether one intervention is better than another. I mean, we don't really think of that in terms of risk assessment. We really just want to project which interventions likely to be better than another, but presumably that everything you said is just as applicable to. That's the level to that's the level to on on the ladder that's that's and that's exactly what we can do. We can we can simulate the interventions and in fact, you can also, you know, we you I mean an extension of Bayesian networks which incident is also in these in these tools is influence diagrams which enable you to actually so that these would these would have decision nodes, exactly like that, which enable you to see the overall, which are the optimal decisions you can run the model. And there are, you know, again, there's underlying algorithms in there which will compute. Once you've selected suitable utility outcome utility nodes will give you automatically identify what the optimal decision is. Now these models can get very complex, which is why, yeah, it's not necessarily easy to do that but the Bayesian network certainly do give you the framework to do that by being able to model interventions. Yeah. Yeah, I mean one thing. You know that we perhaps it's a trap that we frequently fall into is trying to make models very complicated and they get, you know, quite unmanageable strictly, you know as Bayesian networks. So in cases like that is your advice to kind of simplify, try to simplify the model down to, you know, more basic level and build back up from there or, or start with some what sometimes I know what Sam Savage calls the paper airplane model which is you know very simple models to kind of show that it works before you start to add more more complicated complications into it. Yeah, big supporter of, you know, a starting small we wouldn't have. I mean that the real model that we showed of COVID which incidentally didn't show all of the nodes so it was, it was, I mean it was not a massively complex model but it's certainly not a simple model, but we wouldn't have got to that model. And we not started with the much simpler types of models which I showed earlier so you start with a kind of a six node, you know five or six node model which captures the sort of the key features. And, and then the experts see how that works and then they say are but you know you really need to add this or what and then that's how it develops. Yeah, I'm going to go back to Neil from Kaiser alley which is, could you please tell, tell more about how about big data will I guess when they big data would not be necessarily needed. And where it's not providing enough information about underlying confounding variables. What possible sources of smart data what what would be the possible sources of smart data. There are people who claim to be able to genuinely learn missing data now when I'm going to say missing data this is where there's there's a problem when they'll say that are but we can learn these missing confounders well actually you can't learn them if you're not at least aware of what the substance of those confounders are right. There are lots of fancy machine learning techniques and some of my very close colleagues, you know, do this and can do some quite, you know, interesting and powerful things with missing data. Unfortunately, you cannot learn what you don't what you really don't know. Okay, you will you. And that's the reason why I gave that example at the end, which some people said I was a little bit facetious for the life of no, it is the. If you don't know, if you generally didn't know that there was that missing switch, which is causing that relationship. Nothing you throw out that data, absolutely nothing is going to learn it doesn't matter it doesn't matter what you have to have that that knowledge insight, which simply doesn't come from data. Now, okay, people will go to extremes and say, well, a meta level. They'll say well actually a meta level we've seen these examples before where you've got masses of data like this and there's zero correlation and we're going and we will learn that because there are examples like the light bulb example. And now try out something like a missing switch to see if this works right well, then you'll get it then you'll say yeah then you can try because it might there might be several types of switches or whatever which could create that. But the idea that there was the possible missing switch is not something which you learned from that data. I can argue that you've seen lots of examples like this before but that's that's I think that's stretching it to say that you can learn that from the data. Good. I got a question which probably be the last one from way Iran. Could you please explain more detail in more detail how we determine the structure of the causal graph only based on expert knowledge and common sense. And do you usually give a very sparse graph the bunch of missing causal links, especially when we have too many variables. How can we solve this problem. Okay, so what happens yeah one of the one of the mistakes that people make right is that they'll have in mind the the set of factors that are relevant to their problem and the set of outcomes that they want to predict. And of course, as we know when the causal relationships in these models are very complex right it's it's getting it's being able to minimize we know that as we've seen in the examples that you don't have to have a direct link. To still be able to make the causal inference between them right so the the subtled is how do you how do you minimize a number of links while still retaining those causal dependencies so you're minimizing the direct link. That is the challenge again. There's some ideas in that books for doing that. And the, and the, the idea of using the idioms based approach really does help you here. But another, another idea is to make is to try and group these when you end up with multiple facts to try and group them just grouping them into sort of different concepts you might realize that you can actually simplify the structure by doing that. So there are no simple answers there but there are some heuristics and idiom based approaches which which kind of help. Very good. So I think to me a really important kind of lesson for your environmental scientists coming out of this is that you know if you think a variable is important and you don't have data on it. Then go out and get some estimates from experts or do a small survey and include it in your model. Otherwise, you're likely to be in a very rocky place. So I think that's a very good one. I would like to thank you and pass on behalf of myself and the National Environment Research Council for spending time of putting together this talk which has been highly informative and I, it'll be a great resource to have. And I think it's a very recommended reading in my view for anyone. So with that I would like to thank you and pass on behalf of myself and the National Environment Research Council for spending time of putting together this talk which has been highly informative and I, it'll be a great resource to have up on the website for others to look at and I think some really essential lessons there for anyone in this field so I thank you very much again Norman for putting together a very clear and informative presentation and and wishing you all the best for the coming holidays. Yeah. Thank you very much. Thank you very much Norman and thanks everyone for joining us. Thanks Charlotte for all your help as usual. No worries and Merry Christmas everyone.