 So our next talk goes in a little bit different direction. It's about causal inference Hi, yeah, how are you? Hi, my turn? Hi, everyone From where are you streaming from? From London London. Okay. Great. Yeah So and you are talking about causal inference now Yep Okay Great. Thanks a lot So hi everyone as I said, my name is Ayal. I'm speaking from London I am a senior data scientist in a health tech company named Babylon and I'm really excited to talk about causal inference and for Most of you who this is an introductory to I'm very glad to talk about this just before I start I want to just emphasize that these are my opinions not those of my employer and Martin can I just ask how many people are attending the talk? Do you happen to know that? In any case Unfortunately, I can't I can't tell you because I can't see the number at the moment, but I can't I can't tell you after the talk Okay, great. So in any case I recommend whoever likes taking notes. That's great But I also put up all the slides on Google On Google slides So feel free to use that if they happen to be more than 100 people on the talk then It'll be good courtesy to make a copy of your own because they do have that limit Okay, great. So I'm ready to start We put a lot of trust in data So much so that we value the opinion of experts for example researchers to tell us what is healthy and not healthy to put in our bodies We have so much trust in data that we're willing to provide it to non-sentient beings like machines to Analyze it learn from it and make suggestions for us for what actions to take a procedure that's commonly known as machine learning We're all beneficiary beneficiaries of this for example We live healthier lives in previous generations and we're also walking around with smartphones with really powerful apps That make great suggestions. My personal favorite is Google Maps I love the fact that I have access to so much data so I can better navigate the city so One way to put it is that we have a lot of confidence in data and the decisions that we can make based on it But we all know that sometimes researchers might misinterpret the data That will be a big part of this talk of how Data might be misinterpreted and how we overcome that We also know that the algorithms that machine learning use sometimes they result in ridiculous conclusions for example This image recognition result totally failed to recognize the stop sign It has a lot of confidence in calling it a speed limit sign And so these raised the questions How much trust can we put in the data the reason being is that the researcher or the machine learning developer? They're kind of sometimes they're apologetic in which let's say well We don't really truly understand the causation because correlation does not imply causation And so then we really have to ask the question. How much trust are we going to put data on data on its own? As a researcher my experience tells me that data is a great first step But what we really want is an understanding of the data in order to trust it What I've been following recently is Judah Peril who's considered the godfather of causal inference him and his researchers for the past 30 years have really been Propagating what they call the causal revolution and so we can truly get beyond correlations and really make decisions based on Causality meaning understanding the impact of one parameter on another So just one way of paraphrasing a lot of his research is within the sentence is Which is the main takeaway from this talk is that the story behind the data is as important as the data itself So throughout this talk, I'll emphasize this point time and time again Where what we'll start with is misinterpretation of data how by understanding the on the story behind the data We can avoid misinterpretations I will also introduce an awesome tool called graphic models Which help us visualize the story behind the data and this is a very useful addition for any analyst toolbox Those are the main two points. I'm imagining for most of the audience Another takeaway for those who are really keen and want to take things further is I will suggest first steps to learn The topic that I tell were useful for me. There's no expected background in terms of statistics It's very minimal Target audience that is in mind is anybody who makes data-driven decisions So analysts, of course will benefit from this but not limited to developers or managers also they all make data-driven decisions and What you'll get out of this talk is going beyond the common usage of correlations and better to understand What is required to make those better decisions using causality? I have over 15 years of experience of data analysis. I start my career as an academic Throughout my career. I used Different data sets for example, I started off analyzing astronomical data to learn about things like the expansion rate of the universe Then in 2014 I transitioned to the private sector. I worked in consulting for a while So there I worked for example with restaurant chains or bus operator data then I went back into the sciences in which I worked in a In a biotech lab in which I worked with protein engineers So in DNA data and now I work with health data For me causal interference is a real emotional roller coaster as I described over here If I'm if I'm successful today, you will learn a few things if I'm really successful You'll have a better appreciation for us as as as homo sapiens and how we managed to really critically think and manage to Quantified impacts of intervention But I do have to give your heads up that there will be stages today of complete confusion And if I'm good at what I'm doing, hopefully you'll be able to overcome that In order to overcome confusion what I found useful is normally I prototype within Jupyter notebooks But in order to really understand statistics, I like to visualize and play around use toggles and for that I'm really really very appreciative for For people that work in streamlets in which they really Enable me to go from prototype to a nice UI Web app that I can host both locally and they're hosting for me and I'll present I'll give you access to apps that help me understand this topic of causal interference better So what we're going to do you can imagine that we're gonna we have this What you know how it causes causal ladder and what we're going to do is I might describe climbing of these rungs We're right now. Most of us are right over here When we're dealing with statistics and machine learning they give us really amazing results but at the end of the day, it's dealing with correlations and It's you know, we're robots and maybe more primitive sentience are and as well and today I'll kind of bring you towards this second run called doing in which we're able to do causal Inference for some populations And then I'll just mention for those who are keen to learn more about how to get to the third run in which you'll be able To do causal inference not for a group at large But rather for each individual each individual, which is a really powerful thing and that will be in the summer So this will be our Our trajectory for today. I'll start talking about the limitations of correlations I'll give a lot of examples and I'll talk about Simpson's paradox. What it is. It's a situation In which data might be misinterpreted So I'll describe what it is. You learn how to identify how to resolve for it. And this is where That's said app will come into play and you'll be able to play around with that for yourself Then I'll introduce something that's missing in the In common vocabulary in mainstream statistics, which is graph models Which enables us to visualize the story behind the data so we can have true trust in understanding what we're doing with the data And we'll use it also to do Very important step in causality, which is controlling for what's called confounders. It's okay. If you don't know what that is I'll describe it later. I love the way that Randall from XKCD describes various Various things in math and computer science for example here very nicely articulates like correlation versus causation in which you see at the end of this conversation that She basically asked was this interventional successful and he is inconclusive and that's what happens a lot of times with correlation All right at the end of the day no matter how much effort put into it We just don't know we're inconclusive about the result and The reason for that can be seen as like this example over here in which if we might have two data sets There are totally they might appear to be totally unrelated and they happen to correlate Right, if you just do linear regression or you give it to a machine learning model Right, it's just two columns of data and it might find something potentially interesting But you really need somebody who actually understands what this data means in order to see this up That we can actually take action on is the marriage rate in kataki anywhere related to the amount of people who drowned after falling from a fishing boat So what's the story behind the data? We'll continue with the with the theme of drowning But from a different aspect Let's imagine for example that you work in a city council that has a beach and you want to figure out when you should put life Cards in order to reduce the amount of drownings in the city So what time of year would you put it or what time of day in order to answer that question? Well, what sort of data do you need to answer that challenging question? Well, if you if you take this sort of data and you compare it to a lot of other and you just look for correlations Then your linear aggressor will just just hey you can compare this to ice cream sales Obviously if if a company that produces a lot of ice cream suddenly sells more because of a promotion Well, you know, that's not gonna cause many drownings So most of you in the back of your mind, you already know the solution to this and I'll get to this in the next slide But first let's look at two more useful examples. So what has to do with fire alarms, right? So let's say that you're working in a company. That's engineering for home safety devices and you want to improve the fire alarm So what sort of data would you collect in order to test to see if you're really doing any improvement? Should you test for like fire data like changing the distance or the heat of it? For example, or is there something else some other data that might be useful to better learn about this device that you're developing Nowadays we know that smoking causes lung cancer, but back in the 1960s That was still controversial like we had data on smoking. We had data on lung cancer There was a correlation, but was there causation? For example, the tobacco industry made the argument that well, maybe there's a causal a common component called DNA that might be our genes might be impacting both which is not present in the data and that's very hard to analyze So what do we see here is a problem, right? We have a lot of problems that we want to solve and we have one that we have one data set like drowning What other data would be useful in order to answer for this question? And so in order to resolve this we need to know the story behind the data How were the drownings generated? How where the ice cream cells come from? So most of you probably guessed that it has to do with weather Right the hotter it is the more drownings and the more ice cream you're going to sell Okay, so this is a common factor between them. Okay, or a common cause When I said fire fire alarm actually what I was referring to is more probably it's called a smoke detector Because what is detecting is not the fire. It's detecting its mediator, which is called smoke Which is the smoke, right? You can imagine for example a room full of fire But there's like a huge breeze towards the window and no smoke gets to the alarm and so the alarm won't detect the fire So this mediator smoke in this case is what you really want to test in order to Figure out how the quality of your smoke alarm and the conclusion of the Of smoking causing lung cancer without was resolved due to the fact that Researchers realized that the cigarette itself is not causing pure poor health. It's actually the tar that's being Deposited within the lungs and that does not depend on the DNA And so this is another mediator in which that resolved the question of causality And so that means that they understood the story behind the data and they realized Oh, we should be collecting data on the star in order to prove causality So that summarizes the first step in which we appreciate now the limitation of just correlation And why we need the story behind the data now We'll talk about an interesting artifact called simpsons paradox, which is a situation in which data might be misinterpreted Just before I start I want to emphasize that all the data here is made up But it is plausible. You might have situations in your own datasets that you're currently analyzing or making decisions on And for simplicity, you might see small numbers, but we're not concerned with statistical significance Just if you are concerned with a small number just multiply times the thousand or a million that is the same conclusions So we're going to go through this imaginary study and I'm going to present Um So Imagine that you are in the board meeting in a pharmaceutical company You've been waiting for months for results of a study and you have to make a very expensive decision If to continue to develop a drug for market or should you actually stop it and go with another drug? So the study contains 2006 patients where it's equally distributed to control group Which means they have a placebo versus treatment where they actually get the drug And so you're listening to the analyst and they say well The result from the treatment group is that 58 percent recovered and the result from the control group Actually 72 percent recovered So the immediate conclusion from that is 58 is smaller than 72 and by 14 percent And so that means the drug is dilaterious It has a harmful effect and that suggests that we should stop this very Very expensive project But of course You know, you don't want to make those decisions. Hey, so you want to really understand the data? And so the main stakeholder asked the analyst Well, let's look a little bit deeper. Maybe it's beneficial for males and not females And so they they go into um, oh just beforehand. Um, it's very useful to look at visuals of distributions. So Before males and females here, you can see the control the difference between control and treatment is 14 percent Okay, so now the analyst looks and they say well, I see actually two interesting things First of all the females they recover much more frequently than the males. It's 90 percent versus 50 percent Uh in treatment and 80 percent versus 40 percent in the control But also when we compare treatment versus control, we see that the treatment For both males and females recover 10 percent more than the um Then than the control and that suggests an alternative solution that um or result that the Drug is actually beneficial by an absolute 10 10 percent So it's the same data and we have two alternative conclusions So this might appear to be some sort of analytical illusion It actually has it has a name. It's called simpsons paradox And what we'll see the next few slides what it is at the end of the day. It's a misinterpretation of the data It's named after Edward Simpson who developed it in the 1950s It was actually mentioned in the early 20th century by you and one interesting aspect of him is that Before that during world war two He actually worked about one hour train ride from where I'm sitting at What's called bludgeoning park in which him and the father of computer science Alan Turing and other statisticians They managed to break the german uh enigma encoder in order to help the allies win world war Two and uh, so if you're looking for something off of the beaten track here in london I highly recommend visiting that museum All right back to our case study. So all above is the same thing. We have the two separate conclusions What i'm adding here is the uh full data of the cohorts Um, like I said, you have these slides. You can sit on these later If you're not seeing it right now, but basically all the numbers add up Where the end result the treatment minus the control calculation for the total population is minus 14 for the males It's 10 percent A positive 10 percent from females. It's also a positive 10 percent So this is the confusion that arises. This is the symptoms paradox and here. I'm just visualizing it Here we're doing treatment minus control population minus 14 and then each cohort has 10 percent So how do we resolve this? It's very useful as you can see by now that visuals Are key in order to resolve to understand these problems and to solve for them So what i'm highlighting here in in red is the actual population results. So here you have control and here you have treatment um In which uh, you can see this difference that I mentioned minus 14 percent Which is here on the horizontal axis should also emphasize the vertical axis. This is the number of participants So you see 1 000 participants in each um And which these red lines are actually factors of these gray ones It's a weighting on them in which you can see that here. We have a group with a With with a recovery rate of 80 percent versus and that's 800 a number and this is 40 percent at 200 And so this is the average of them and we have same thing about the opposite over here So how to resolve this? Well, um What we can do is actually can look at the cohorts themselves So here you can see here the females and here the males and the 10 percent is what I'm displaying over here Here you can see both males and females have Uh a beneficial effect By the way, I want to give credit to mark class who came up with this design of presentation And uh in the comments, I have a link to his blog post in which he first showed this so What is the mission? What was the source of the misinterpretation? Well, probably a lot of you have figured out by now that what's happening is that we have an uneven distribution of the Um of the cohorts So even though we have a thousand males and a thousand females Actually, you can see the males are only 200 of the control 800 of the treatment and vice versa for the females And so what that means is that both the recovery and the group variables They both depend on gender and that's what we call a confounder And the problem the the fact that we have minus 14 here and 10 here and 10 here is because it's symmetric that I call Uh treatment minus control does not account for this confounder And so it's it gives it this this weird biased result. What should we trust? Well, we know if something's beneficial is both for males and for females Well, it's going to be beneficial for the population. That's just common sense And so how do we result how can we create a metric that's actually more meaningful for the population? Well, before we get to that, let's formally define Um, so instead of calling just treatment minus control what it is It's called a risk difference and here I'll kind of formalize it a bit more So the risk difference is the difference in the recovery rate between the Uh control group. Sorry the treatment group to the control group If you're wondering what this pipe here, that means just conditioned on that means here We're focusing just on the treatment group and here focusing just on the control group and we're uh subtracting So it's basically this 58 percent minus 72 or 50 minus 40. That's all we're doing over here and here. I'm just courting the results So, um, this is a good step except for the population We should not always find the dope for the males and females But not for the population for the population What we want to do is you want to take the same risk difference But we want to weight it by the population itself and that's a way for us to control for that confound What we're doing is we're doing a weight adjustment Uh, and so here I'm just um here. I'm just rolling it out You can see the risk difference of the males 10 percent times the 50 percent of the female of the males in the population at large That's so that's the way plus the same for the females and that gives me this average causal effect Or ace for short of 10 percent Uh, so that's the solution to this and here's just a visual where I showed the problem before and here's the solution We see it at 10 percent Now I'll be the first to admit that this is still confusing. This took me A few good weekends in which I had to sit down. I had to visualize and try to make sense of this And what I found very useful for me is to just create an app and that's what I'll show you over here Um, you actually have access to it. So you can go online and play around with it Um, and a lot of thanks to streamlin again for for their api and for hosting Uh, so here, I'll just show you a quick run of of of the demo Which is interactive. Um, so here I'm just showing the data frame of this Of this, uh, made up data And so here's the problem with this visual. Uh, here's the mark plus visual Showing the same minus 14 percent And here I'm showing the solution in which what we want actually instead of this risk difference of the population we want to use the ace and here is this visual which you can see that this Plays well with both the males and females. You can see the treatment is better than group But let's take it. Let's look at the interactive stage of it So there are a few parameters that go into this and the one of of interest to result That causes simpson's paradox is the fact that the males and females didn't have the same Distribution within the treatment. So here it was at 20 percent if I bring it higher to like 35 percent You can see that these are not impacted But this one which doesn't take into account is confirmed the factor It's actually, you know, gives us another like meaningless results. Okay Um, and here you can see the heights are nearly equal Um, to really bring it home. What we'll do is we'll bring this to 50 percent And this is mean that we have even males and even females And so what that shows over here is that the risk difference of the population is as you'd expect from, um, from from, um And this is the in this is the case of the Uh Of what we'll get to later as random control trial um I think there are quite a few people on the app. So I did prepare Another app I'll just hold and post it locally because I do want to show one more thing, but feel free to play around with it um Yeah, so another example. I want to run is um here. I'll just put this Uh and here, um, so let's say That Yes, so you notice here how um, I'll just show this I'll be in solution mode. You can see how here the males are 80 percent here and 70 percent here This is another parameter. I can play around with And so here's the 80 percent and 70 percent and now let's make these even. Let's see what happens So here I bring it down to 70 percent and here you can see now this and this are even And what do we see here? Well here we see that the males. There's no difference, right? That's this fact that these are aligned And the females are the same the same 10 percent difference between this and this Okay, and what um here it shows us something noncicle. It doesn't Take it to account the confounding factor What does the ace do? Well, the ace does exactly what we expect the average between zero to 10 percent We get this five percent There's a lot of other examples that I found useful playing around with this app So feel free to um to use it in order to understand this complicated topic of simpson's paradox Again, it's really not intuitive and take it from homer What I suggest is that You know take your time sit on the numbers play around the app at the end of the day You'll figure it out Visuals are very useful and I love this visual that I pulled from info We trust in which it introduces the continuous version of simpson's paradox simpson's paradox as I described it Is uh with discrete data Um Right it was categorical recovery no recovery genders Um treatment control but lord's paradox deals with the situation which you have continuous data for example medicine and milligrams or For example health score and what you see beautifully here illustrated Is the simpson's paradox or lord's paradox here imagine that you didn't know The cohort of each right each simpson here is a cohort So if this was all like great dots then basically what you have here is um You have an apparent uh correlation But once you've got to get the information on the cohorts men boys girls and female and and and women You actually see that there's a deleterious effect, right? It's going in the opposite direction So this is a very good useful visual to Understand simpson's and lord's paradox So that concludes the first part of this talk We talked about the limitations of correlations and then we introduced simpson's paradox as a situation in which the result of a population Might be in total conflict of that of its cohorts And so that's a concern because you might misinterpret your data and get to wrong conclusions and make wrong decisions How do we resolve for this? Well, we appreciate the fact that we have a confounding factor that we have to control for Those who are really interested in understanding causal interference I suggest mastering this topic of simpson's paradox. So then you can really get into deeper levels of understanding of causal interference For most of the next part next part of this talk I will focus on this issue of confounding factors and how we control for them when we should when we shouldn't And what tools we have in order to do so but beforehand I have to address random control trials, right? That is what most people understand as to be the golden standard of causal interference in the sense that we use We do the scientific trials in which we We we try our best to control for as many parameters as possible except for the treatment And the output parameters we hold everything fixed We hope for the best and then we can say ah, we have causality But there are a lot of limitations to this first of all later on I'll talk about the fact that we shouldn't confound for everything. It should be actually justified beforehand But even regardless to that there are many practical issues. For example, random control trials are very Can be very expensive and the logistics logistics behind them, right equalizing between the genders where you can do age demographics, etc So that's like can be a logistical nightmare Second things might not be physical or ethical. For example, if you want tests for smoking habits Well, you can't force people to smoke for 10 years, right? And also a lot of times you want to investigate populations in their natural habitat not in something artificial And so all these are arguments for What can come in place of random control trials, which is much more accessible That's what's called observational data. Basically, that's the data that we have with our own work environments or Just free online that we can use and so This causal revolution is helping is building has been building tools for the past 30 years in order to Do causality on these much more accessible data where the main idea is understanding the story behind the data And because that's important as the data itself So I talked about the topic of controlling for parameters I'd love to listen to podcasts like Freakonomics in which you have talk experts and researcher talking about their About their topics of research and a lot of times you might hear them talking about controlling for many factors like age, gender, uh income, uh, social social status, etc And just like a shopping list of parameters. The question is is that true should we should we control for everything? And the answer to that is we should not do this blindly but rather every time we control for a factor It should be justified. Well, how do we justify this? Well, we have to use some sort of common sense And this is where graph models introducing it to our vocabulary and adding it to our Toolcase of analysis. It's a really powerful way for us to tell the story behind the data in order to make Better educated judgments on what we should control for and what we shouldn't and that will be the focus of the next Of the last part of the talk in which I'll introduce graph models Why it helps us visualize the story behind the data and how we use it to justify controlling for confounders So first let's start with the definition of a graphical model So just imagine that you have variables and you just graphically you put them into nodes or vertices That's what these are and if I want to relate between two parameters Then I just draw an edge between them. So for example, c and a have some sort of correlation between them Where c and b are independent from each other Changing one does not impact the other That's a graphical model Then we have something called a directed graph, which is taking the next step right want to go beyond correlations We want to go to causation And so by instead of replacing adding to the edges in an arrow So that says which parameter is listening to who and what I mean by that for example is changing a will change Cause some sort of change in d but not vice versa Here we happen to have a cyclical relationship between d and c changing d will change c and changing c will change d In cause and cause of interference most of the time what we deal with is actually something Even simpler than that. It's called a direct and a cyclic graph or that in which we make sure that we don't have that Sort of feedback. So what that means in practice is starting from any parameter. For example a following the arrows You'll never get back to a that's what a dive is essentially So why graphical models useful? Well, they tell the story behind the data For example, they enable us to tell how the data was collected We can put that information in the graph and like I said before we can say which parameter is listening to which So what can we actually do with that sort of information? Well, first we can Design better experiments. We can be more cost efficient. It's beyond the topic of this talk But for example, you can figure out which parameters are worthwhile collecting information from in which you actually don't need to To tell the story and of course the most important thing is with these graphs, we can draw causal conclusions and not Be stuck just with correlations So let's actually look at a graph model that will simplify our understanding of simpson's paradox So I know it was kind of a burden trying to understand it and it's still confusing But that's where graphs are great. They help us understand what's actually going on and here's the story that Here's exactly the story that we were telling before if you remember. So gender is independent. Nothing's impacting it But the group Was a function. It wasn't independent. It depended on the gender, right? We had that uneven split And that's what we're learning from this arrow and recovery rates depended on both So this is presenting the simpson's paradox problem in order to solve for it. What do we do? Well, we controlled for The confounding factor the gender and so effectively what we did is we did what's called graphical surgery and we took Off this arrow. So now both group and gender are independent Where recovery rates still depends on both. So what does this graph represent? Well, it either represents what we're actually doing in a random control trial where we Control for the gender distribution, but even with observational data, right? Without without controlling beforehand, we can still do the waiting adjustment I mentioned before what we call the ace metric. So it represents both basically So I have to address the topic of subjectivity Because probably in the mind of many people listening the fact that I'm applying my opinion on the relationships Between the parameters a lot a lot of people think that researchers statisticians And anything else with science Are are really objective and because that's what we see read textbooks But in everyday research what we really what we really know is that there have to be subjective decisions made along the way And the challenge is really to make sound judgments that reflect reality I'll just give you a few examples Any developer knows that if you have this distribution And you want to quantify that somehow. Well, if it's an if it's a nice belt Curve shape, then it's pretty simple. You have the mean and you have the variance, but if it's kind of skewed Well, what's more meaningful? Is it maybe the median or if there's like a few modes to it? Do I choose the highest mode? So that's a decision another decision that a lot of times we make is Binning of histograms or how we present numbers it can be on a linear scale Or if there's a few orders of magnitude, we might do a log scale So all of these are subjective decisions that we do on our daily lives as analysts and decision makers But the challenge again is to make sound decisions. So yeah, so graphs Are based on subjectivity, but beyond the scope of this talk is the fact that there is Methods in which we can challenge our opinion Use the data to challenge our opinion about different regions within a graph I'm just showing three nodes over here, but imagine the graph with many nodes and we can actually Challenge our notions with the data. But again, that's the subject for what was ever really interested in taking their studies further I should also mention that Causality isn't always possible. There's a list of assumptions that are required to pass for completeness I'm showing them here. These are each of these are talked within their own But I'll actually the purpose of this section is talking about confounders This is what's called ignorability. It's the fact that all confounders have been treated And so that's this last part over here. Let's see how we can Good and bad practices of controlling for parameters and how we use graphs in order to justify this So this slide is the same as before but just to emphasize the fact that graph models are actually our vocabulary to Indicate which parameters we want to control for in which we should it And now I'll talk about this in practice when we should it when we should it based on these types of graph flows over here So here we have A relationship between x and y in which it has this fork in which they have this common cause Variable called z as opposed to this inverted fork in which there's this collider called c So you can see the relationships are totally different And what I'll show you the next few slides is that in the case of forks We want to actually control for this Common factor in order to resolve for spurious correlations, but actually in inverted forks We do not want to control for this collider because it will generate spurious correlations All right, so let's go back to this example of drowning in ice. Well, we know that there's this common cause of the Weather right if the sun is out or not. So what happens if we don't control for the common cause? So we get something like this. We have this spurious correlation, right in which We let's say we know nothing about the weather in these data points And so our eye and our regression In our regression algorithms will point out. Hey, here is a Correlation and of course this is spurious Why because this this common cause is a confounding factor and we should control for it Just before we control for it. Just to formalize this a bit We say that x and y are dependent when we do not condition on a so even though we these are not That that these don't impact each other the fact that we did not control for for the weather means that They are mathematically. They are dependent when we didn't Control for it. So how do we resolve for this? Well, of course here this shaded color means that we control for it And here we clearly can see if we analyze for example the sunny days Then there's no real correlation between the two and the same for the chilly days. Okay, so by Controlling for the common cause we result spurious correlation So that's an argument a justification for when we want to do that and just to formalize this again We say that x is independent from y conditioned that we when we condition on the common cause a So that's quite intuitive. The graph makes a lot of sense Now I'll show you something that's a bit more challenging So this is an inverted form and I'm going to use switches on the wall in a light bulb as an example So here you can see that the bulbs on the wall they Bulb on wall b It depends both on switch one and switch two and the switches them and it's called a collider the switches themselves. There are independent and So here you can see by that I mean that one's one's up and the other's down You when one's up or down and that's one you don't know what s2 was happening with s2 I didn't talk about the impact on b is and it could be anything It could be an or relationship For example, it turns on as long as one of them is down or an and relationship Or takes both of them to be down in order to go on or I'll actually talk about this other another logic Called xor in which it's on if they're actually in opposite situations So let's look at this and the conclusions are relevant for any resulting bow But it's useful to demonstrate on this So when I'm not controlling for the for the bulb itself Then these are independent as we saw as we started off But the interesting thing is that if we control for example for when the bulb is off You can see that by knowing s1. You know what s2 is they are correlated either up or down Right, and if we control for on they're actually anti-correlation ones up the others down and so you see by by controlling for this For this collider we're actually introducing spurious correlations that weren't there before This is such a brief summary if you want to go over the slides later in which I'm presenting all the cases But I think that this example is enough to understand This situation and so again, why am I talking about this? I'm talking about this because There's I'm talking about the importance of spurious correlations Okay, thanks for that mentioned martin. I see it. So I have until 10 minutes past 2. I see correct um So yeah, so here I'm just summarizing the fact that we wanted to justify when we can When we can control for parameters and when we can't so here you can see for example in forks We justified it what we want to resolve these correlations, but when inverting forks We we do not want to control these colliders most of the time because we'll just be introducing these spurious correlations and so That's the main takeaway for most of the audience. I'll just mention those who're really interested in in causal inference and take it The next step. This is a big part of causal inference Just identifying these colliders these common causes or if you remember early. I mentioned mediators just When we should confound for them in which we when we shouldn't that is a big part in causal inference Uh, just for completeness. I'm mentioning that I talked about two types of flows within daggers I talked about forks the inverted forks and there's another one which is just called the chains and here I'm just providing a cheat sheet in which you can learn about the dependencies when you are Controlling for parameter a in each case We're nearly at the end. Um, so here I just like to um summarize Um, what we've learned so far and those are interested how we can how you can move forward Learning the topic. So I made the argument that at the beginning. We're at this bottom room is seeing in which correlations is like seeing because Because seeing may be deceiving And I made the argument that just by using correlations. We might misinterpret results I demonstrated that both in Simpson's paradox and I also talked about spurious correlations With observational data, we can actually resolve this Um by doing action called doing as sentence. We actually learn right ever since we're babies by actually doing things And our mathematical equivalent of doing is controlling for the confounders, right? We can do random control trials, for example All right, or we have this asymmetric And with that what we can do is we can learn causality on populations, right? We learned about For example, uh drug tests on genders, for example For those interested, there's this whole topic called do algebra, which actually gets into the fine details of of this What's very much beyond this talk is this next one here, which is imagining And so what imagining is is we can do what's called what if conditional hypothetical conditioning and what the beautiful thing about this is that we'll be able to make Understand the impact interventions not on a group, but rather on an individual for example, we'll be able to answer questions like how much would one's salary increase By another two years of college or we'll taking this medicine Cure my headache given past results So you can really do it on an individual level as opposed to a large swath of of populations And so the topic this is called counterfactuals So here I'll be a bit cheeky and I'll take william demings. He's a statistician I'm just going to paraphrase his quote and not only It's not enough to bring data But you have to actually bring the story behind it as a data scientist I have many examples in which I got data and I didn't really trust until I really understood the story behind How it was collected and then things actually made sense for me The mantra of this talk was that The story behind the data is as important as the data itself And the reason for that is that it's crucial to understand how the data was collected and to understand the causal relationships between the parameters and only in that way we can actually make causal inference So the way I demonstrated the importance of the story behind the data Is we talked about how data might be misinterpreted if it's Simpson's paradox if it's spurious correlations I introduced an awesome tool to add to your toolbox called graph models in which you can visualize the story behind the data And that enables us to see the causal relationships between the parameters and we can justify actions like confounding for variables For those interested in taking this further, I suggest mastering Simpson's paradox Again, I recommend using this calculator. Any feedback you might have on it. That would be great Also, there's this topic of dual algebra and counterfactuals in which you can really assess causal impact on cohorts or individuals For those interested I I refer you to a lone nears talk. He's an excellent speaker He'll give you a few different angles about causal inference tomorrow and how to slide your way into that I'll just end here with my resource pages things that I found useful in my learning curve In which I do highlight two books by judo peril. One is a popular science book Um, which really gives you a good intuition for the challenges and how to solve for them And those more mathematically minded. I do suggest this textbook Which has only like four chapters in it and you can really learn a lot from it So with that, I think yeah, I'm glad to take questions at this stage Thank you very much wonderful talk. You've got some comments like this is a brilliant talk loving it by the way, so That's always Great. There are some questions and we we do have a little bit time maximum five minutes We are already into the coffee break, but I think that's fine So Would you recommend repositories catalogs of curated graph models? Repositories catalogs of curated graph models. I'm not sure what that means, but I can I can't get my opinion About repositories. Uh, here. I suggested two repositories. Um, I'm not that familiar with them I suggest anybody who uses it try to, you know, understand what happens behind them Uh, I know the intuition of wanting to use a package as a black box A lot of us Kind of do it sometimes with uh scikit learning things like that But truly you want to understand for example if you use a random forest You want to understand the intuition what the hyperparameters actually mean In order to build, um in order to build, uh Like graphs I highly suggest this, uh, excuse me for the blasphemy. There's no solution for this in python yet So this is an r in which you can actually Create your own graphs here. Um, you can change parameters and really once you learn causal inference You'll actually understand what all this is what I was talking about like, um independence between a and b A and d are independent condition that we Control for e that's what I was talking about. They have a whole lot of examples For example, this is actual research. He's interested in the relationship between tooth loss and mortality a lot of amazing things online So, um, I have a resource page here. You'll have access to I will add also a repository Um in the future alone alone near and I we're building a repository, um to um in order to um To provide the sort of tools and places that you can further learn the topic Okay, next question beside medicine. Can you cite examples of a causal interference? Inference has allowed to establish causal relationships or interpret data correctly the question continues Are there such examples in economics? For example, proving that raising certain tax has a certain effect on unemployment Sure, that's an excellent question. And of course, um, yeah, so I can tell you from my personal experience that That causal appearance helped me structured my everyday work Like my current hobby is learning a system and quickly like creating one of these graph models like in the stagety app And understanding the relationship between the parameters and then I can first convince myself that What i'm seeing is actual um causality or vice versa at work We have conversations that um how to avoid simpson's paradox and certain decisions that um That our stakeholders might be making so in I can on my own experience I could say definitely when it comes to impact on on the journal public Um, I am not that that that first I do recommend There are a lot actually there are a lot of examples within the book of why so there you can really learn about example He has a whole chapter dedicated to the resolution of the Of the issue of the impact of smoking on um Uh The impact of smoking on lung cancer. So that is a huge triumph. Um, oh you said besides health and economics Uh, I don't remember on the top of my head, but I would kind of refer you to the book of why that will explain that That will give you like examples So sorry about the dog No, no worries And I also want to add the fact that yeah, um a lot of a lot of time causality is actually it is hard And you sometimes have limitations of parameters that you feel that you need But it's just very hard to get uh to to obtain and of course, it's always the topic of subjectivity. So Um, you know, so put all that together Hopefully within your work, you'll be able to implement cause of fear and for what the immediate things that you try to account for I think you have time for one last question. It's about the common sense Why ice cream sales can't cause drownings and sunshine because we use some kind Common sense to exclude that and not a question real problems for different people Each have different common sense Sure, but that remember that that's what steers are left, right? We make decisions based on our interpretation of the world Um, that's me getting a bit philosophical. I can't address like the ice cream sales, for example Well, just do an experiment like we'll ask the question How much money would you be willing to put in? You know, let's say that that you're running an ice cream company and you want to really test if I sell more ice cream Will I see more people drowning at the beach? I don't think any Right-minded person will do that experiment and I'm pretty sure that if they do They'll fail because We know how you know, you do want to consult with world experts and us as world experts of You know of weather like the impact of weather We know that more people go to the beach when it's sunny outside, right? We see that We go to the beach. We see it ourselves or if we read it in the newspapers So that's why you know at the end of the day like, you know Any developer knows that you have to put your common intuition into things and he can't think that oh everything's we have to be Uber objective. No, we do have to apply domain knowledge Into our mechanism. Those are familiar with, you know, bayesian versus Frequentness inference. You know that Bayesian tends to give more reasonable results when dealing with smaller sample sizes because you can introduce your belief Where of course the disclaimer is you have to be still cautious about that Yeah, so I think I kind of digressed here Okay, thank you very much again. Wonderful talk Great Q&A session So we have five minutes time for a coffee break and the next will be the keynote