 The motivation for this research project was to find exoplanets. I was working together with astronomers. I spent some time at New York University, hosted by an astronomy department. I've always been interested in astronomy, even though my field is machine learning. And one of the most fascinating problems of astronomy is to find exoplanets, which means to find planets that orbit other stars out there in space in our Milky Way, typically. How do we do this? The most prominent method that's being used to find exoplanets is based on light curves. So it's based on monitoring the brightness of a star as accurately as possible. And to do this as accurately as possible, astronomers have resulted to the help of space telescopes. And the telescope that has led to the largest number of discoveries is the so-called Kepler telescope that was launched by Neissar some years ago. It's named after the German astronomer Kepler. And this Kepler telescope, stared at one patch of sky, record very accurate light curves, which are not affected by the effects of atmosphere, etc., because it's out in space. And this data is shared among all astronomers. And we can now try to analyze it and find exoplanets. And in order to be better than other people at doing this, we have to be better at removing what's called confounders. So confounders are processes or effects that distort the signal that we're interested in. So we really want to get the signal that's from out in space, but we measure it through a telescope that might have little pointing errors, that might introduce some noise, maybe the sensor of the telescope introduce noise, etc. So we are interested in a signal that's confounded by these, sometimes also called systematic errors, and we want to remove these systematic errors. And that's an example of a larger class of problems that's applicable to astronomy, but also of independent methodological interest for us as machine learners and causal modelers. Our approach is that we try to think hard about what kind of information is available in the data. We are interested in one star. This one star is confounded by the effects of noise taking place in the instrument. We don't have direct access to this kind of noise. Therefore we have no direct access to the physical signal out there in space that we want to reconstruct. However, in addition to the one star that we're interested in at any given point in time, we have 150,000 additional stars. Now if it's the case that these additional stars are also affected by the same noise sources, then in principle it might be possible to remove that information from the other stars to correct the light curve of the star that we're interested in right now. So whatever the stars share might be new to the confounding effects of the instrument. Whatever they don't share is actually the true physical signal out there in the world, because in space these stars are separated by light years and they don't directly interact. From a practical point of view, if you work in machine learning or pattern recognition and you try to tackle such a problem, what you do is you get together with domain experts at the blackboard and you try to brainstorm what kind of information is there and how could we use it. So in my case the whole thing started with a short sabbatical that I spent at New York University in 2013. And this time was so productive that not long after that visit, we got a return visit from our colleagues in New York and then we sat together in Tippingen in late summer 2013. Talking about this problem again, we hadn't solved it yet. We're sitting at the blackboard thinking about what kind of information is there, drawing down diagram, what affects what, and starting to think how could we phrase this as a machine learning problem. So how can we phrase this as a problem where we have data, inputs and outputs, where we have observations on which we can train a system that can learn to predict outputs from inputs and how we can use such a system to correct these light curves. The result of our research is a new method to detect the presence and correct the effect of a confounder. So to correct the effect of something that is caused by the measurement process, but that's not really what we're interested in. We're interested in a true astrophysical signal out there in space. We're not interested in the noise that our instrument adds when measuring that signal out in space. Now we're using the other stars to estimate the effect of this confounder because all the stars are actually affected by that confounder and at the same time the stars out in space are independent from each other. So we have a set of independent things that are light years apart. They don't interact directly out there in space. And then we measure something, light curves, which are dependent because they share the same confounder. And then we can use these other light curves to correct a light curve of interest and we do this by something that's called regression modeling. So we try to predict the star of interest from the other stars and then that prediction we subtract from the star of interest. And it turns out subtraction is exactly the right thing to do if a certain additivity assumption about how the confounder acts holds true. So we can actually prove theorems about this. We can prove that if the additivity holds true and if sufficient information is present in the other stars about the effect of the noise then subtracting that regression is the right thing and it corrects the signal and reconstructs the true signal up to an offset. Now there's a nice analogy that we can draw that's maybe easier to understand and it works quite parallel. Suppose you have a set of children that all share the same mother but one of them looks very different. In America they call this the milkman's or in England they call it the milkman's child. I think in America it's called the male man's child and we all know what this refers to. And then the problem is the following. Suppose you have such a set of children. Now here we have one that looks a little different and what you can try to do is you can try to explain how that one child looks in terms of how the other one's looks. So they have certain similarities. These similarities are caused by the fact that they share a mother. Now in our astronomical application the mother is the instrument that records the signals. So the instrument makes sure all the measured star signals share some information. We want to remove that effect and reconstruct the star out in space. Now in the milkman case we would take the children, we look at one, maybe the one that looks different. We try to explain the appearance of that child in terms of the siblings. If we then explain away this appearance by subtracting now it's going to be more complicated in this case because the effect is not additive subtracting what we can explain in terms of the siblings then hopefully if our mathematical assumptions hold true which they don't in this case we would recover how the milkman looks. Now in practice things are a little bit more difficult. So in practice we not only use the other stars to predict the star of interest in order to correct the light curve corresponding to that star but in practice we take into account what we're interested in in that light curve. So what we're interested in are actually transit events. A transit event means the geometrical alignment of star and planet is such that as we look the planet passes in front of the star and occludes part of the surface of the star which leads to a small dip in the light curve a small decrease in brightness. So that's what we're interested in and this is a so-called transient event. It just takes a few hours. So for instance if you were to look at Earth and Sun from space if you were lucky to pick up a transit it would take half a day. So we're looking for such signals that take a few hours or take a day in the light curves. So we want to retain this kind of information in our light curves but remove everything else. We're just interested in that. And it turns out if we use present not just the present of the other stars the present values of the other stars but also the past and future values of the star that we're actually analyzing we can do an even better job. And we have to make sure that this past and future is sufficiently separate from our point of interest so we don't actually explain away the transit itself. But if it's sufficiently separate we can do a better job not only at removing the confounding effect of the instrument but also we can remove the variability that's intrinsic in the star itself because it turns out stars are not as constant as we used to think. Almost every star shows some brightness fluctuations anyway and if we look for exoplanets we are not interested in the brightness fluctuations of the star but we're interested in the fluctuations that we get if an exoplanet occludes part of the star. Our results are relevant foremost in terms of the methods that they propose. So we develop methods and we have a new method along with some performance guarantees under which conditions it works and this method is applicable in various domains in bioinformatics medicine but of course we developed it for astronomy and it's applicable in this field and we have applied it in another paper and this other paper is looking specifically at new data from the so-called Kepler-2 mission. Now Kepler-2 means at some point the Kepler satellite broke. The satellite had four so-called reaction wheels that are used to stabilize the position of the satellite in space in order to make sure it's looking at exactly the same stars all the time. There were only two reaction wheels left and people, which means that the satellite cannot stabilize anymore but people had this idea that they can use the remaining fuel and the thrusters like in a rocket you know you have thrusters that are driven by fuel to try to stabilize the satellite as well as possible. Now that's much worse, it's not as good as before but in a sense that's good for us because we exactly deal with this problem how to remove these kind of errors. So the data that the satellite, the satellite in the Kepler-2 mission and started looking at some other fields that haven't been observed before and then we immediately started looking at this data and then in this data essentially using our method combined with some clever method for searching through light curves because our method just corrects light curves. Now we have to search for these dips that come from a transit. So combined with some method for searching these light curves we came up with a list of exoplanet candidates published this list, it was a list of around 30 candidates and about half of them were confirmed afterwards relatively soon using other means. So they are considered as true exoplanets by the astronomers of course nobody has visited them but that's the same for all exoplanets we know. So that's an outcome and we've been quite happy with this. Where do we go from here? Of course there are many other applications for these specific methods some in astronomy, some in other fields but really we are methods developers and we are interested in the broader picture of the relationship between statistical observations and causal structure. And it turns out that statistical observations are just a surface phenomenon and underlying we always have to have a causal structure that brings about these statistical dependencies in the first place. This was first understood by a physicist and philosopher Hans Reisenbach who postulated what's called the common cause principle. We have reason to believe that if we understand this underlying structure we can build machine learning systems that will generalize better from one task to the next one. So currently in artificial intelligence we have amazing successes with machine learning systems that are very good at solving one task. We can train object recognition systems on labeled data if we have millions of images of animals each of them labeled as cat, dog, etc. We can train a classifier that will recognize cats and dogs extremely well and maybe more accurate than a human by now. But we're not very good at transferring knowledge. So if someone gives us a new type of animal to recognize we can only do that well if this person also gives us millions of training examples again. So we're not good at transferring the variability that we've learned from the animals we've seen to a new class of animals. And we think the reason for this is that the current systems only focus on statistical information and don't try to understand the causal structure underlying these observations. And we think if we can model the causal structure while retaining the attractive property that we do this in a data-driven way we don't want to sit down and write down differential equations for everything. This is a data-driven approach to handling complexity. So if we can retain this attractive property but go one level deeper trying to automatically learn causal structure that gives rise to the statistical structure we have reason to believe that in this case we will generalize better to new settings. So this is the big challenge for us at the interface of causal modeling and machine learning. How do we automatically learn causal models from data and how do we exploit such models to generalize across different tasks which is something that humans and animals are good at and that artificial intelligence systems so far don't know how to do.