 Okay, shall we start? Yes. Okay, great. Also, I'm extremely honored to be invited to talk at this meeting. As I looked at the program, I see that many of you are much ahead of me, so I'll try to give some philosophical perspective of where data science is going and where the causal data science fits in. That will be the focus of my talk, so it will be more philosophical and bear with me. I'll go later into the nitty-gritty algorithms and facilities, but I'll start with the philosophical question of what data science is all about. Can you click on the next one? So my announcement will be to talk about the story of two revolutions. One is the big data revolution that we are all familiar from the media has to do with the successes such as self-driving cars and natural language processing. It has a limitation that we have seen in the literature. People who are heavily involved in that revolution are not aware of the limitation. Now the other paradigm will be a causal revolution. That is the transition from data fitting into data understanding, and I will call it a scientific overview or scientific method, and I'll give you an overview of what I mean by scientific method. Then I'll go to the ladder of causation and the two fundamental laws of causal reasoning, which many of you are familiar with, and see how they fit into the applications of seven wisdoms or tools of causal thinking that allowed us to do marvelous things that we couldn't do before. Then if I have time, and I hope I will, to cover some future directions. These may not be your future direction, but things that I think are very exciting for my viewpoint, one of them will be next slide, please. Paul, I'm going to raise my finger like this, so I wouldn't have to say, okay. No. Sorry, it's very hard to go like this, but we'll manage. Can you show it bullet by bullet? Yeah. The data-centric paradigm, that's one that governs today the big data industry, has the following wisdom. All wisdom comes from the data. Therefore, our job is to best fit the data so as to maximize some goodness of fit over the same data, and what constitute our success, next bullet. We can, the answer that we really want is implied by the function that fits the data. The next paradigm, I call it scientific paradigm, it's totally different. The wisdom does not come from data, the wisdom done from the model of the world. Next bullet, scientific paradigm ask the question, next bullet, what should the world be like before I can answer a research question about what? Not about the data, but about the world. Next bullet, we are done if the model that we have of the world satisfies the condition automatically because we asked the question already about the world. So once our model satisfies the conditions that we sought here, we are done. I'll move from this philosophical discussion of paradigms into something that we are more familiar with, and that will be, next slide, it will be the inference engine. The scientific paradigm embedded in an inference engine, here we have a query about the world, the thing that I want to estimate. I don't have the world, but I have a model of the world, this is a phase of reality. I don't even have a full model of the world, I have a partial model, many times in the form of a graph, it's my perception of reality. And these two are two input to my inference engine. That's a great calculation. So my input is the query, things that I want to estimate, and my partial model of the world. And I have another input, data, the data coming, not only data, but also qualified by how the data was collected. It could have been collected by experimental exercise, or by observational methods. So the data comes with a identity of the source and its quality. Now we have not, you see three inputs, one, two, and three into that engine. And then engine supposed to give you a solution. The solution is supposed to be estimate, something that you can estimate based on your samples taking from the data. So this is will be the focus of my first discussion. What is that calculus? And how do we know that the calculus give us what we really want? We are talking about inputs, which are different qualitatively from each other. This is data, this is a query phrased in some language, and here is a model of reality. So these are different mathematical objects, and we have to marry them all and come out with an estimate of the query. So we'll say that the problem is solvable if my model permits the calculus to express the query in terms of the available data. Okay, we have many things undefined here, and I'm going to define them one by one. First of all, what is the world? What is reality? A model of reality? What kind of query and what kind of data we are talking about? And specifically, what qualified the calculus to give me a valid result? I'll go back to something that we all familiar with, the origin of logic. And next slide, please. Yeah, the origin of logic is Athens. This is the Agora in our days, and it was a cradle of two great developments, democracy and logic. And it's not a coincidence that the two evolved together at roughly the same period because democracy has led to something which was very disturbing to Aristotle and his generation, even before him. Next slide, please. It was constant argument, useless argument, day and night about who is qualified best to rule the city. And people argued and argued indefinitely. And at that very time, people sought for something objective that will not be subject to argumentation so that they can find comfort in the methods or in the conclusion. And Pythagoras, at roughly the same time, developed his axiomatic version of geometry. So the geometry on one side developed to give some sense of objectivity to the arguments that make people tired and feel useless. So Aristotle, in the next slide, asked some questions. What distinguished philosophy, which literally stands for the love of wisdom from demagoguery at leading people? Obviously, people loved the former and hated the latter. The next question he asked, logic should do that. What distinguishes true from rhetoric? And the next one gave him some clue. Logically defined as a quest for objective truth, a way from arguments. The next, and the truth comes, he discovered something amazing. The truth in many cases comes out of form and not out of the content, for instance. When you have a statement that all Greeks are human, all humans are mortal, therefore all Greeks are mortal, then it's true not only for Greek and human. And mortality, in the next slide, you can replace Greek and human by any proposition, by any qualities, A, B, and C. And the sentence still holds true, which means the truth of the sentence comes from its form, comes from the meaning of all and are, and not from the meaning of these propositions. Then he asked, what is it about the form that gives us the sense of validity? Next slide, please. And that took years to come out in this condition. His children just said that if you have this sentence, premise one, premise two, leads to conclusion is true whenever the conclusion holds in all worlds in which the premise do. We have to discuss what worlds are, but the premises are very clear. These are the truth values of the proposition that stands there. Now in the next slide, the world will be a truth value assignment to all propositions of interest. For instance, if you have propositional logic, then you have a truth table, right? Every combination of values for the propositions assigned is assigned a one or zero. If you are dealing with probability theory, then it's a probability table for every combinations of values. You have a number between zero and one, or the number sum to one, but they give you a truth value assignment for all propositions of interest. So this is a meaning of the world. Now we have to find what a world means in the context of causal reasoning. These are no longer propositions, no longer probabilities, but something different. So now we come to the next slide. And here we have a diagram which simply display nicely the previous slide. Here is the meaning of, from premise one and premise two, I can go to conclusion if the conclusion holds in all worlds in which the premises do. So you have this Venn diagram and these dots are worlds. These are the oracles that assign truth value to every query. So you can see that in this case, the conclusion indeed holds. Here's the conclusion. It holds in the intersection of the two premises. Okay. Now we have to ask how does it translate to a logical definition? What is a causal inference? It's the logic and the tools for answering causal questions. We still don't know what the causal question is, right? So let's go to the next bullet. The next bullet says what are the causal questions? Next bullet. We have to understand what is a valid inference in that logic? Next bullet, what is a world? Next bullet. And the answer is a world is an oracle that assign truth value to all causal questions of interest, not to all propositions. Next bullet, it's a mathematical object capable of answering all causal questions of interest. Next slide. So here are typical causal questions. How effective is a given treatment or preventing a disease? Next, what is the new tax law that caused our sales to go up or our marketing campaign? It's very important if you are sales manager, what is annual health care costs attributed to obesity? Next, I'll give you some examples of a typical causal question that have been in the news one time or another. And there's no disagreement that they constitute causal questions. And I'm not talking about how to solve them, how to answer them. I just want to investigate their anatomy. Can hiring records prove an employer guilty of sex discrimination? Extremely important in the time of algorithmic fairness. And the fifth one is a personal decision. I am about to quit my job. Will I regret it? OK, what's special about those highlighted in yellow? They are inarticulable in the standard grammar of science. I call them causal. And one unique thing about them, you cannot express any of these questions in the standard grammar of science. Why? May I have the next bullet? Here is why. Because the grammar of science is wedded to the equality side. An equality sign is symmetric. So if y equal to ax, then x is equal to y over a. If Newton Law said that f is equal to ma, then m is equal to f over a. However, all these are asymmetrical. For instance, if I give you an example, if y stands for the barometer deflection and x stands for the atmospheric pressure, then you and I agree that x causes y and not y causes x. So even if I go and intervene on the barometer and I move the needle left or right, I would not change the weather tomorrow. So here we have this conflict between our conception of causality being an asymmetrical relationship. And the grammar of science, it has not been generous to us, has given us a grammar which is tied to symmetry. How can we get out of these constraints? In computer science, we call it an assignment operator, which is different than equality. Assignment means that in nature, mother nature looks at one variable and accordingly assigns a value to another variable. In computer science, you look at the content of one register and you assign the content to another register. That is sometimes designated by an error, sometimes by equality followed by a line assignment operator. So we're going to interpret science or causal thinking in terms of an assignment operator. Nature looks at atmospheric pressure, does some computation and accordingly assign a value to the deflection on the needle of the barometer, not the other way around. Everything which I'm going to say has to do with one thing, namely replacing the equality of science with an assignment operator that governs our thinking, our causal thinking. And humans are causal thinking animals. I can give you many examples that show that humans are not too good in the ordinary arithmetic, nor in ordinary correlations or ordinary probability theory, but they are good in processing causal relationships. And we are restless until we get causal explanations for things. That is why, because we are causal animals. We have this assignment operator hardwired in our brain, not the equality side. And many efforts of students in high school had to do translating that error to that equality, not the other way around. Here today as a scientist, we have to go to a greater evolution to reverse the process. We start by learning arithmetic in high school, which is an under equality rule. We go into college and we go and become scientists and we are bound to the equality. And then it takes a revolution to get us back to the way we are thinking. And this is the assignment operator. Okay, enough to philosophize, go back to the kind of question of what kind of questions the oracles must answer. Here I'm essentially repeating the type of question that I showed before about discrimination and so on. And I look at what they type, some of them are observations. What if I see a next slide, others are what if I take an action, what if I do a third will be what a kind of fossil question, what if we did things differently than we actually did, that's a kind of action. And continue, please continue. You can assign to them some probability, but that's an option. And they have a syntactic signature to them. The next, what is, go ahead, go ahead, please go ahead, what if, why, and they form a causal hierarchy, which I'm going to show next slide in the form of another. They also have a syntactic signature, go ahead, Bing. Here's the ordinary probability, probability of why given A, and that means if I see A, that one we can designate as the probability of why given that I do A. And this one, the counterfactual probability of Y being what it is, had A been A prime, given that in reality I saw that A is equal to A. That's why we call it counterfactual. There is a conflict, a logical contradiction between the subscript and the things which I condition on. So we have a syntactic distinction and we have a causal hierarchy. And what we were able to prove in the past two decades is that this is mathematical hierarchy. Namely, you cannot solve, you cannot answer a question at level I unless you have information at level I or higher. Here I should say lower, lower, one, two, three. Yeah, go ahead, the next slide. Here is a hierarchy, which I call the laser of causation in the book of Y. I don't want to repeat the examples of observing, what if I see, what if I intervene, if I take an aspirin with my head that can be cured and counterfactual, what if I've done things differently? What is the aspirin that stops my headache? What does it mean? What is the aspirin? It means had I not taken the aspirin and I know that the headache is gone, would the headache still be gone had I not taken the aspirin? But I am the type of person that the aspirin worked on. I did take aspirin and my headache is gone. So here comes a conflict and here is a level that I call counterfactual and we can prove today that you cannot answer a question at level I unless you have information at level I or higher. This is not a conjecture, this is a mathematical proof. So when people come to me for machine learning and data and deep learning and say we can do it, we can do it, I do not know how to answer them except saying it's a mathematical impossibility to get an answer at any level unless you have information. Information can come either in a form of assumption or in a form of experiment. If we can do experiments then we can answer interventional questions. We don't have experiment that gives you counterfactual so we must rely on assumptions. In this level we must rely on counterfactual type of assumptions and we'll see what forms they undertake, what permits me to assign a truth value to counterfactual assertion. Next slide please. I'm going out to talk about the oracle because I promise you that I'll give you an oracle which assign a truth value to every, every, every causal query including counterfactual. So here's the oracle. It's a collection of functions. Every variable in the world is connected with a certain function to other variables. There's no constraint on that. This is the picture of the world. We're never going to have those functions, but it's our conception that the world is being pulled by strings, by springs of that nature. The mere assumption that the world is ruled by a collection of springs gives us the ability to answer the question, what if I don't have the world? What if I have only partial knowledge of that world? But we have to start with that hypothesis. Okay, so here's an example of, I'm going to give another example. Yeah, go ahead. Just to share it, I'm not talking only that this doesn't cover only toy example with sprinklers and the rain, but it covers important economical questions such as supply and demand. It's one of the pages in my book and it has a cycle here. So there's no restriction on those functions. Next slide, please. And once you have those functions, you assign, you can assign truth value to a query about an action. For instance, would the pavement be wet if we turn the sprinkler on? The way we're answering is simply, go ahead, is a cutting operation. May I have the next slide? Yeah, we are doing, we assimilate the action. There's nothing magic about this knife. It's simply we simulate the action. Previously my sprinkler was connected to my automatic device to sense of the climate. Now I am removing, I am emancipating the sprinkler from the effect of my controller who was tied to the climate and I'm subjecting it to a new master, my muscles. And I let my muscle control the sprinkler and I assign it to value one. It's a simple simulation of what we mean by action. So here I have an answer to my question. I find out if W will be equal to one in the mutilated model in which all the parents of the manipulated variables are cut off and are replaced by a constant. S is equal to one. Okay, next one. We can also find action to answer to counterfactions from the model. For instance, what if the rain, would the pavement be wet had the rain been on? Now we cannot talk about my muscles because I do not have control over the rain, but I can imagine and that is the beauty of the theory that you can imagine controlling the rain in the same fashion. You are hypothesizing that rain is no longer connected to the climate, but it is assigned the value one by some magic. So it's the same way we can answer the counterfactual. What if the rain were true? Moreover, we can go ahead. So what we're doing is asking what is the value of F of W once we substitute the value of the rain to be one? Go ahead. Another one. We go ahead. Every counterfactual. Can you go back? Can you go back please? Yeah, every counterfactual has a value in M. Here's my M and every even as crazy and wild questions about counterfactual that you can dream of. So one, it assigns a value in the model. Next slide. And this is what I call the law of counterfactual, the first law of causal inference. I have an oracle. I call it SSCM, Structural Causal Model. It's an oracle for counterfactual. Here's an example. A collection of functions. I ask what will be the value of W, W sub X. Had X been small X, I simply remove the parents of X, replace the function that previously connected X to its parent, replace it by constant. And what I see there, the solution of this set of equations, it gives me the value of the counterfactual, the subscripted entity. Good. Next slide. It's embarrassingly simple. The sentence Y would be small Y in situation U had X been X, denoted that way. It means the solution for Y in the mutilated model NX with input U is equal to small U. And continue, please. I call it the fundamental equation of counterfactuals. Here it is in the frame. Why is it in the frame? It assumes status. And I'll move to the next slide, please. And now we can see how the oracle comes in. Now we have an oracle. We know what M is. We know what the partial model is. It's simply aspect of the model. And we can continue and ask the question of what is the nature of the calculus? Go ahead. Another one. A valid influence in that calculus would be when the assumptions, when the premises, but instead of premises, we now have two inputs, the assumptions and the data. So an answer would be valid. A valid, the query will be true if the intersection of the assumption and the data is within the query domain, which means in every oracle which satisfies both the assumptions and the data should also satisfy the query. So now we answer the question of what the worlds are. What replaces the idea of a world in ordinary logic when we come to causal logic? Few holes in all oracles in which A, the assumption and the data do. That now gives us an answer to what constitutes a valid influence. Next. So now I'm going to summarize it as two laws of causal influence. One is, go ahead. The one we discussed, the law of counter-function. And the other one really derives from that, but it's good to take, to treat it as a new law. It's a law of conditional independence. I think most of you have learned graphical theory. It is really nothing but a de-separation, but it's good to treat it as another law side by side with the first law. It is derivable from the first one. But for working purposes, it's good to regard as an independent law. Good. Separation. It tells you that whatever you have a separation in the model, it implies independence in the data, in the distribution that governs the data. Continue. I just demonstrated this separation here. If I'm back to our example of the sprinkler, whenever you have, if the use are independent, then every, when the distribution of the observed variables must satisfy certain constraints that are, these constraints are independent on the functions and independent on the probability of the use here. As long as they're independent, you can have a zillions of microportices connecting, for instance, the sprinkler to the wetness or the climate to the rain. It doesn't matter. As long as you have dependencies among the use, and then you have these constraints applied to the data. Can I have the bottom of the slide? Sorry. Okay, go ahead. Let's go ahead. Every missing arrow in the graph advertises an independency conditional on the separating set. For instance, this missing arrow here between climate and wetness, and the separating set is S and R. I know, without even checking the data, I know that to expect conditional independence in the data, C will be independent on the separating set S and R. And similarly, next slide, if I have a missing arrow between S and R, I can be sure that if the model is correct, then S and R will be independent. Sorry. What is a C and W? Will be independent given, not as an R, but given C only for a reason of collider, which I'm not going to discuss here. It can be studied in three minutes, but the applications are enormous. Now, go ahead. It allows me to do model testing. Number two, it allows me to do structure learning. I can look, I can prune away all the models that are not incompatible with data and be left with a set of compatible models. I can, now what I'm going to show you, I can even reduce the question of what if I do two symbolic calculus, and this is the dual calculus. The next one, we cannot see because it resides below the three. Anyhow, let's go to the next slide. Now, I'm going to show the ramifications of this hypothetical logic. Seven wisdom, sometimes I call them tools of causal inference. Tool number one is the ability to encode causal information in transparent and testable way. And this, I believe, is the core and the power of causal data science, your ability to encode what you know and bring it to bear together with the data. Let's not underestimate this ability because the idea of transparency is extremely important. Testability is also important, but that's good for statistician. Transparency is the key. Why? Because your conclusion depends on the quality and the credibility of your assumptions. And if you don't have a way to express your assumption in something in a transparent mathematical object, then you forget your assumptions. And you don't know how to put them together. And you don't know how to explain the conclusion. So transparency, I'm going to only talk about it here, or maybe a couple of slides later, but it's continuously underestimated in our discussions with other fields, with the economists, for instance, the economists that subscribe to the potential outcome school. They don't have transparency. Therefore, the credibility of the assumption in every exercise from A to B is vulnerable to misrepresentation and to forgetfulness, lack of credibility. And we cannot prove it to them because they don't have a ground truth and they don't have toy problems to test the ability of the method. Okay, tool number two, predicting the effect of actions and policy. We talked about it. We show theoretically how it can be done with cutting off the errors. Now let me demonstrate it through some easy examples of sport medicine. Continue. Tool number three will be computing counter function and finding causes of effect, not only effect of causes, but causes of effect. That leads to being able to attribute credit and blame for various events or actions to construct explanations to talk about susceptibility of people to a certain treatment or disease and so on and so on. I'll demonstrate each one of them separately later in a separate slide. Continue. Tool number four, computing direct and indirect effects, which we call mediation, extremely important for discrimination cases, inequalities and fairness. Next tool, tool number five, integrating data from diverse sources. We call it fusion, transportability, the machine learning, people call it transfer learning. But when you talk about transfer learning in machine learning language, they talk about Desiderata. That's something they would like to accomplish. It is a failure because we can prove that you cannot transfer them unless you have causal model. The idea of transferring it from one environment to another, from one population to another, relies on causal information. It is not in the data. Tool number six, recovering from missing data. Here is the slogan that many statisticians like to adapt. And this is that the causal inference is nothing but a missing data problem because you don't know the counterfactual. Had you known the counterfactual and that it would be the missing data problem, you would know, you would be able to answer all the questions. So causal inference is a missing data problem. All you have to do is to fit in the data in the table and you are done wrong the other way around. Missing data is a causal problem. You cannot fill in the data that is missing. You cannot answer questions about a data where some variables have some values of a variable are missing unless you have the reason for missingness. And the reason for missingness is expressible in causal term, must be expressed that way. Tool number seven, yeah, we can still get it. Yeah, causal discovery. And I'll have a special, here we are talking about the ramification of the dissipation and other asymmetrical properties of the SEM of the classical causal model. It allows you to find when a model is incompatible with the data, prune away those which are incompatible, find a parsimonious representation for the set of structures which are compatible and do an exercise and answer queries about the set of compatible structures. Next, now I'm going to go each tool by itself separately and show you and discuss with you some of the possibilities that are opening up with those tools. Okay, let's go to the next one and talk about predicting the effect of action and demonstrate it in a simple example that was put forward by Schreier and Platt in 2008, sports medicine. You want to know the effect of warm-up on injury. Evidently sports medicine, it's still unclear whether warm-up prevent injury or encourages injury. It depends on so many factors. So here we are, we understand there's some connection between to them. Some physicians claim you should warm-up to minimize injury and some physicians claim don't do it too much because you're going to cause injury. So we know there's some process that leads you from warm-up to injury and this is a state of mind of a scientist when faced this problem. In some processor, next slide, there are thousand and one different factors that affects the relationship. There are not a factor describing the game condition and the coach condition and so let's try to get them one at a time. What are factors that come to your mind? Next slide. Neuromuscular fatigue is one come to mind. Previous injury of the player, right? The coach experience, the fitness level of the players. These are factors that come to mind. Now we're trying to structure them. Does one of them cause the other? Does one of them cause injury, is likely or cause any of the intermediate events to try to structure them? And when we structure them, we come up with something like that. Next slide. Oh, I'm sorry, I cannot read the bottom one. Okay, let's skip it. Yeah. So we come up with a graph of that nature. We describe each factor and its effect on other factors as well as the input and the output. And now we have to ask the question of which one should be measured because the each measurement is costly. For instance, to find out the experience of the coach or to find out whether you have a neuromuscular fatigue might take some study. It's not easy. So perhaps we can get the answer to the question without measuring all the factors. Just a small subset of them. Which subset will lead us to the correct relationship between warm-up and injury? Next slide. Oh, I cannot read it. Is there some way to raise it? It says to adjust or not to adjust. Oh, to adjust or not to adjust. Yeah. And even after you decide what you want to measure, you have to decide what you do analytically to find the answer from the structure. Which one you should adjust for? Continue? Good controls and bad controls. Right. I use this term because this is how economists call the questions. Bad control and good control. Some of the factors, if you control for our bad control, give you the wrong result. For instance, previous injury is a collider. It's strange, but if you adjust for previous injury, you're going to create here a collider that will give you a wrong and biased result. So some are good control and some are bad control. And even if they're good control, they can be minimized so as to minimize cost or to minimize to minimize error or to maximize power of estimation. The dimensionality of these measurements differs from one variable to the other one. Some of them are continuous, some of them are binary. Go on. Yeah. If you do the analysis, you find out, yes, if I only give me the parents of the output, I'm done. If you only give me the parents of the input, I'm done. Because they obey what we know to be the backdoor criteria. Continue. Here's another set, which is sufficient. If you are able to measure those, you adjust for coach and for fitness level and you are done. Next. It so happens that you, no, no, it's a collider, continuous. Continue. Here's another one, which is a strange kind of adjustment. It depends on the front door criteria. It takes us beyond adjustment to do calculus. But it so happens that if you can measure these two variables, then you are done and you get unbiased estimate. Continue, please. Let's go. I cannot read it. I'm sorry. I'm sorry. Completeness, testability, effect identity. All these are features that you get from the analysis and I'm going now to beyond adjustment just to give you two slides on the do calculus and the philosophy of the do calculus and how it takes us from adjustment features into beyond adjustment or to more complicated type of adjustment. So here is the next slide, please. Here is the mechanics of the do calculus. We have an inquiry. I want to find the effect of smoking on cancer. You see probability of cancer given that I do smoke. And this can take a zero one value. This is my query about the world, not about the data. And I want to be able to answer it. Answer means express the query in terms of the data available to you. What data is available to you? We cannot measure the genotype, but we can measure perhaps smoking or the conditional probability on these visible variables, smoking, tar and cancer, tar accumulated in the lungs. If we can measure that, then if we are able to express the query in terms of the properties of the distribution that governs the visible variables, probability of S, T and C, then we are done. Here we can see we can do it. I can transform query into an expression which involves no do operator. The do operator is more disturbing. And we are doing it by a set of rules. Some look at the condition of the graph. The graph, if the graph satisfies certain conditions, then we are allowed to apply the rule. And then we can manipulate the expression. Some of the expressions remove a do operator. Some of them replace the do operator with a conditional, namely observational operator. Here you can see what's going on. And we are successful in this case. I'll show you how in a second. Transforming the query into an expression that involves only properties of the available data. That is the rule of the game. Next slide. Just to say that we have about 20 minutes left. Beautiful. You know what you can accomplish with 20 minutes. Terrific. Yeah. So here's the do calculus. It's the three rules. Some of them allow you to ignore observation if the green expression, the green light, if a certain condition holds in the graph, you have the license to ignore certain observation. If you have another condition in the graph, then you are allowed to exchange action with observation. Another condition, ignoring action altogether. And this is the name of the game. Now you have a calculus that if you can show that you can transform using this rule, transform the query expression to another expression, which is expressible in terms of the available data, you are done. Not only you are done, you are getting an answer, which tells you how to combine the data, what portion, what aspect of the data you should concentrate on, you should bring together and how you should combine them to get an answer to your query. Continue, please. I'm going to the next one. Computer counterfactual, especially finding causes of a thing. And this brings us to the question of attribution, assigning credit and blame. Next slide, please. I look at the case where we are dealing with a lawsuit. Your Honor, my client, Mr. A, died because he used the drug manufactured by this manufacturer. Therefore, the manufacturer is responsible and ought to pay damages to the family of the disease, of the deceased person. Now the court must decide whether indeed the manufacturer is responsible for the damage caused here. The court of law has a certain rule called but for. The court must decide if it is more probable than not that Mr. A would be alive but for the drug. And here is the famous but for criterion, which lawyers use qualitatively, not formally in the court of law. But the lawyers were the first artificial intelligence researchers. They had to articulate common sense into rules. So they and look, they chose the rules of the language of counterfactual to express what they believe will be more meaningful to other lawyers. In what language have they chosen the language of counterfactual? Look, I have two counterfactuals. The first sentence, the first question is probabilistic. Something we must be greater than 50 percent. But what is it something? It's a counterfactual term, but for the drug. Now we have to go to probabilities of counterfactual. How can we do that? Next slide, please. On a bullet being I can still see it. Okay. I call it the probability of necessity. A probability of Mr. A would be alive. Had he not taken the drug, given that in reality, he did take the drug and he's dead. That should be greater than 50 percent. Well, are we going to give up? No, because we know how to assign truth value to every counterfactual if we have a model. So here we have a probability, a new kind of probability. I call it PN involving probability of counterfactual. And we can compute it. Unfortunately, we cannot compute it, that we can bound it. And sometimes the bound are meaningful and gives you a nice bound. So as declare the manufacturer either innocent or guilty. We have the next slide, please. If we know the functions behind the model, then of course we compute the counterfactuals and the probability of counterfactual. But we don't know the functions when we go to the next bullet. If we don't, we can bound them using the logic counterfactual. Continue. And the bounds improve whenever we can combine data from various sources. And that allows us to go down from the population to individual responsibility. Next bullet, please. And sometimes we are lucky and we can find that this probability of necessity can be bound between one and one. Namely, the guy is definitely guilty. That's in the book, causality, the example. Sometimes the combined data reveal individual behavior precisely. So we can proclaim the defendant guilty in this case, which is really strange. How can we find somebody guilty on the basis of population data? That's the beauty of the counterfactual logic. It allows you to assign, look, lawyers are doing it day by day. They are doing it mentally, intuitively, and quite successfully because the court has the full respect of the public. So if they do it intuitively, we can do it much better formally. That is where the counterfactual logic comes in. Continue. I come now to something which I call personalized medicine based on the same principle. It's based on the fact that counterfactual analysis permit us to take population data and estimate the probability that a given individual, you, would benefit or be harmed by a given treatment as opposed to the average recovery rate in the subpopulation resembling the individual. Normally what we have is a quantity here, which is called average treatment effect, ATE. Here's the expected value of the difference between treatment and not treatment. Yes. For the population whose characteristic, C, resembles that of the individual in question. U is the individual in question. So this is ATE. But if we are interested in finding out the benefits for this individual, not the population, we have to deal with that kind of quantity. The difference or what is the probability that the individual will benefit if given a treatment and will be damaged if not given a treatment given the characteristic of the individual. And that is a P&S, probability of necessary and sufficient. And it cannot be done from experimental studies. Randomized controlled experiments cannot give you that quantity. How do we know that? We simply have a counter example. Assuming that you run a randomized control experiment and you find out that no effect. The treatment gives you the same average effect as the not treatment, as control. What can you say about it? You still don't know if the person in question, no. You don't know whether the treatment has no effect on any individual or that it kills some and kills others. Those two will cancel out and give you the same average treatment effect. And that is already a proof that you cannot get the answer from randomized critical trial, which we know ahead of time. How do we know from the ladder of causation? This is rank three quantity and this is a rank two. Randomized control experiment, rank two. We cannot get an answer to a question in rank three unless you have a question information from rank three or higher. And our available data comes from rank two, namely randomized control experiment, impossible. So we know it ahead of time. And here I give you a proof, a counter example. But we can still bound it. Continue. How can we bound it? It can be bound. Improve if we have a combination of data from two different sources. For instance, next the bullet. The reason why a combination of experimental and observation studies gives you more information about the individual is because of observational data with a bias that normally disturbs us and normally causes us to discredit observational studies. Bias is a proxy for individual whimsical decision and individual whimsical qualities. That's what makes one individual different from another. And that sometimes is revealed in the observational data when a guy is given a free choice to choose a drug or not to choose a drug. So what we call bias, what we call whimsical decision confounding which disturbs us in observational data now becomes a blessing because it gives us the characteristic of the individual which he would like to capture and which randomized control trial oppresses. That's why the combination gives you more information about the individual, ramification, next bullet. But for it's only one example of that. Next bullet. Reason development. Continue. We can answer the question, for instance, about who are the patients which are susceptible to a certain treatment. Continue. That's repetition of the same idea of the virtues of P&S. Let's continue, please. Yes. Continue. I think in general going from group data to individual behavior requires counterfactual logic and that leads to personal medicine. Continue. Next slide. Identify, for instance, in business, previous slide, please. Identify voters swiable by a given slogan. So it's good for political sciences to continue. And it's in a paper, recent papers by Lee and Molo. Let me continue now. Five minutes? Yeah. Five minutes? Yeah, I can still do it. Okay, I have to go quick. Computer director, indirect effect. It's all in the literature on mediation. So I don't have to go. I've got to go quickly. Go ahead. Continue. Yeah. This is a famous policy of mediation. Never mind. That's important. Yeah. Integrating data from diverse sources, sometimes called fusion, sometimes transportability and funds for learning. I'll give you an example. Continue. The general problem is how to combine results from several experimental and observational studies, each conducted on a different population and under a different set of conditions, so as to continue in the bullet, so as to construct a valid estimate of an effect size in yet a new population, unmatched by any of those studies. This is a general problem. It looks very ambitious. And it is ambitious, but we can, we have a beautiful result now, primarily due to the work of Elias Barrenboy. So to answer many of the challenges here. Continue. A sub-problem anyhow. Good. Here, continue. I'll give you an example that we have. There is a previous, go to previous one, please. Yeah. We have data from several hospitals about effect of treatment. Some of them come from experimental data. Some of them survey data. And we need to combine them together to come out with the probability over the treatment effect in Arkansas, which is the target population. And when presented this way, it's too ambitious. We cannot do it. That is the way that lawyers have to work with. Qualitative data, intuition. We can, once we formalize it in a form of a graph, go ahead, it's manageable. We can look at each, at what makes one environment different than another. And we can, if we can quantify or simply identify the reason for differences between the various sources of data, then we can start managing it. So here you'll have a difference in the form of a variable called S. This is the source of suspicion for differences. Once you tell me where the S comes from, then we can, we can quantify, formalize it and come up with an answer. Next slide. And we can combine them and find, yeah, it can be fused. Next bullet. Whenever, whenever we can put it in a selection diagrams, which I showed you earlier, these are the diagram with the selection node. And when the estimate is feasible, then a fusion formula can be derived in polynomial time. What does it mean for a fusion formula? Next bullet. It tells you, not yet, the fusion formula tells you, no, no, no, not so far. Here, a fusion formula tells you what information you should take from hospital A and from hospital B, hospital C, and how to combine them together so as to get an answer to your query about hospital D, not yet seen. So how to combine features of the various data sources so as to constitute an answer to your query. And the algorithm is complete. That's also important, which is many times underestimated. Completeness means that no one can do better. That no algorithm and no smart economist can do better than that based on the same assumptions. If you strengthen the assumptions, you can do better. But based on the same assumptions, it gives you the ultimate answer. Continue next slide, please. A special case in the selection bias is going from a study in which people are selected by certain features to the study. Perhaps they were homeless or perhaps were terminally ill and they came to the study because of the expect some magic fuel. So these are not representative of the population, the target population. This is a basic problem of randomized control experiments, selection bias. The people participating in the study had to be incentivized. The only people in the study are those which are needing your incentive. They are not representative. They are hungry students or homeless. So they're not representative of the population at large. More and people essentially do not know how to overcome it. It's a problem for all trialists. There's a solution here. If a certain condition hall, we can go and find out, recreate what the world, what the population would be like in the target studies. Had people been selected by random, randomly. Here's an example. Next slide, Bing. Here's an example. You have a model. People are selected by virtue of severity of their complication. So they go to the hospital. So only people from the hospital can be seen. You have to find out and transform your query into an expression which has an S in it. That means you have data for this expression. And two rules, of course, of due calculus allows you to elevate yourself from a biased population to the target population. Good. I'm glad I came that far. Let's continue. We're already over time. Over time? Okay. So I'll go to the summary. Just two or three slides. I'm skipping the missing data. I'm skipping the lessons. I just go to the lessons. Okay. Every causal inference that must rely on judgmental extra data assumptions. That's something that deep learning people do not want to accept. We have ways of encoding those assumptions. We have mathematical machinery for to take those assumptions, combine them with data and derive answers to questions of interest. We have a way of doing two and three in a language that permits us to judge the scientific plausibility of our assumption and to derive their ramification swiftly and transparently in an over-emphasizing. And it's easy and fun. If we had more time, we can enjoy some of the fun that emanates from this game. Next key insight, data science is a two-body problem. It's connecting data and reality, which includes the forces behind the data, not only the data, but those springs, those are the ropes that pull them together. The data science is an art of interpreting reality in the light of data, not a mirror to which data sees itself with different angles, with different representation. Next. So what, now comes the million-dollar question that every, both students and people in different discipline are, what if I don't have a model? And I have four answers to this. Number one, go ahead and study SEM because COVID-19 can't wait. You don't have a complete model, but you know something about the epidemiology of COVID-19. Bring to bear whatever you know, because it's an urgent case. What you don't know, put on the graph if you don't know. It's easy to put what you don't know and continue. Number two, study SEM to help find one. You can do causal discovery. You can do experiments. You can decide what experiment needs to be done. Number three, study SEM to help you use the one that you find. If by some miracles you find a model or you find consensus about the model, you have to be able to use it. And number four, study SEM to help explain your finding. Eventually, you have to explain why your method, you have to explain to a policymaker why your assumptions are reasonable, plausible, and why your recommendation should be adopted. And it so happens that most policymakers are human and human are looking for explanation in terms of causal-effect relationship, not in terms of this is how I was programmed to do. I'm contrasting with deep learning kind of explanation. Continue please. Today, only one of every 1,000 deep learning students studies the science of causal-effects. So they have at least four reasons to study. And I'm sure that in this conference, and I'm speaking to the choir, you have gone deep into SEM and you are enjoying the fruits of that exercise. May I continue now? Well, just I wouldn't have time to talk about my pet project, but I'll just list them and go ahead. Automated scientists, which have to do with curiosity, also automating curiosity and social intelligence going from deep learning to understanding model of other agents and to what to begin by social understanding and natural communication among robots and among agents to build trust, to talk about desires, about responsibility, awareness, intention, and motivation so that different programs, different automated robots or AI projects could communicate with each other, learn from each other, experience in a natural language. And the natural language are language of emotions. Let's continue. Bing. Just to show you how difficult it is to formalize even the notion of responsibility. We looked at the encyclopedia of what responsibility means, and it's full of counterfactual term, it's full of distinct between explicit and implicit knowledge. All this must be formalized one day. And we will. Thank you. Next one. The conclusion there has been a causal evolution as expressed by Gary King, political scientists in Harvard, more has been learned about causal influence in the last few decades. And the sum total of everything that had been learned about it in all prior recorded history. So this is a very sweeping statement. You can only afford to do it if you are in Harvard. I happen to be at UCLA, so I'll just back it up with the spite of my own. Go ahead. The next evolution will be even more impactful upon relying the data science is the science of interpreting in reality, not of some summarizing data. And I want to attribute this revolution to the transformation of language. It did not come about before we had no notation. It device first pioneeringly by civil rights in the 1920s. They courage to put an arrow instead of equality sign that we have we all civil rights, the geneticists in 1920s. And that gave us the boost that we see today, later on by other contributors, some philosophers, Neyman did a contribution, Ruben did it. Going away from probability language into a language of that complies with the asymmetry we have of assignment, asymmetry of causation. And that's why I conclude my concluding slide will be the next one that I'm repeating myself. So let's go to the next one. Yeah. And that's why I end up with a quote from August De Morgan, contemporary of George Gould, who said, every science that has driven upon its own symbol must be its own symbol, not symbol of another sign. And the same applies to the sign of causation. We owe everything to the symbols without the ability to articulate in new, we have to recognize that the old symbols are insufficient. And we need a new symbol that has been, I think, the great impetus that brought us to where we are today. And where we are, we are today, as we have a conference in a workshop like yours, which you have dozens and dozens of new results in coming up and industry interested, the pharmaceutical industry is interested. We are going to revolutionize the science of medicine. I have no question about it. So thank you and I owe a lot to my students and my co-authors. I'm not sure that I owe much to my reviewers, but I'll give them a credit too. Thank you. Thank you.