 that our last keynote speaker had fallen sick. So we were left without the final keynote speaker, but luckily Munich is full of great researchers and luckily we know a few. And we could convince Bastian Rieck from the Helmholtz Center in Munich to come and to deliver the final keynote of our symposium as a replacement. So Bastian's background is in mathematics. He's one of the experts in topological data analysis in machine learning. He did his PhD in Heidelberg and then postdoctoral time in Kazuslautern and later on in ETH Zurich in my lab in fact. And then he became a PI at the Helmholtz Center here in Munich. Recently he's a rising star at this intersection of topological data analysis and machine learning. He's also the program chair of this new learning on graphs conference, LOG, that's happening for the first time this year. So we now move from the more medical perspective of process medicine, from the more medical perspective to the more mathematical perspective on what you can do in the life sciences and in data science in the life sciences. Thank you very much for coming. On short notice, Bastian, we are really happy to have you here and we are looking forward to your talk. Thank you very much for having me. Is the microphone already live? I have to think it's not. Maybe I need to do something. Or is it just live? Is it already live? Ah, yes, now this is so much better. Thank you very much for your really kind words. It's also a pleasure to be here on short notice. So I cobbled this together for a general audience as I was told. So I tried to take everyone with me and I hope that we can have a nice tuned A session as well. So the title of this is more like a framework, I would say it's called a good scale is how to find shape analysis using topology. And first let's maybe give you a small induction of what I like to do. This is one of my favorite pictures. It's not drawn by myself, it's drawn by an AI. It's supposed to represent this idea of what you can do with topology and how shapes are kind of melting. And maybe it has a certain surreal characteristic which is what I'm going for because in practice we almost never are dealing with the right shape but we're dealing with deformations, we're dealing with deformed perturbed variants of such a shape. Now, let me talk a little bit about algebraic topology. My special subfield back in the days when the dinosaurs roamed the earth is what I studied in mathematics. It's a fear that is supposedly about counting and calculating stuff. And when you look up a definition of algebraic topology you will find a bunch of them, of course. So a very prosaic one is we want to develop invariance that classify topological spaces up to homoemorphism. There's a bunch of weird words already in there, we don't have to use them. Another one is we can use tools from algebra to study topological spaces, whatever these topological spaces are. But what I like to think about in the terms of topology is we want to understand shapes through calculations. And I'm stressing the calculations part because we humans have a very good visual system and probably most of you know this better than I do. I mean, mine as you can see is also not working properly most of the time. So we're really, really good at recognizing shapes, really, really good at seeing each other. And the question is now how can we bring this into the computer world? How can we somehow leverage things that our cortex can do on its own? Now, a first taste, and this is probably something that you will encounter lots of times when you look at my stuff or things that deal with topology in general. That's the seven bridges of Koenig's back problem. The first, I would say, first historical occurrence of topological data analysis, if you want to call it that, it dates back to Euler because, of course, it does. Everything dates back to Euler if you look long enough in mathematics, either that or Gauss. So I think I said this in another talk before, but if you ever do a pub quiz and you're asked about mathematics, then Euler and Gauss are your sure guesses, I would say. Now, in the seven bridges of Koenig's back, the question is, can you do a walk through the city that crosses every bridge of Koenig's back exactly once? And if you look at the city, there's a map, so there's some geometry there, and it's kind of hard to do that. And you could probably walk through Koenig's back all day long. It's a nice city I've heard, so you could do that. Or you could abstract this, and you could build a graph that has the bridges as its nodes, and then it tries to build a connectivity from up there. And when you do this, you will find that this is actually one of the first theorems that you often learn in graph theory, in undergraduate graph theory. No such walk can actually exist, because there are more than two vertices with odd degree. And this is so fascinating to me, because this is a geometric problem, or, well, let's maybe not stretch real-world problem too much here. But for me, this is a real-world problem. And you have abstracted it just by virtue of putting it into a graph and then ascertaining some of the properties of that graph. And that has almost a magical character to it to me. And this is one of the aspects of topology and topological data analysis. That's, of course, not the only thing we can do. So I have said this before, we'll stay with Euler for a while, because he's the man. So there's also other invariance that we can calculate of spaces. An invariant is something that remains fixed while we transform the space, while we subject it to certain transformations. The transformations can be homeomorphisms, so stretching something or bending it without tearing it, or they can be something different. If you are dealing with graph machining, for instance, you have probably encountered the term permutation invariant or permutation equivariant sometime. And this is exactly one of those invariants that people are now looking for that they're interested in. Because if I give you a graph or a machine learning algorithm, then probably the output should not change if you just change the ordering of the vertices. It should change, however, if you start reviving the graph. And coming back to this Euler characteristic here, the Euler characteristic is a very simple way of defining a polyhedron, so a shape that we can nicely draw. It's defined as the number of vertices, v minus the number of edges, e plus the number of faces, f respectively. And you can calculate this, and there is a nice theorem, of course, attached to this, because this is how math works. And you can show that the Euler characteristic of every platonic solid is exactly 2. Interestingly, this theorem also characterizes platonic solids. So if you have this theorem, you can characterize the solids, or you can do it the other way around, because you can show that any one of those shapes must, by necessity, be a platonic solid if it has this Euler characteristic here and satisfies certain other properties. Now, some of you might recognize those. I took them, I think, from a very nice book by Kepler. The symbols there are not meant to represent anything here. I think this was some alchemical part here, which we're not doing here, because that's, of course, not science. But I think it looks nicely drawn. So we can check this, of course. Let's briefly walk through this, just so you can see that I'm not lying to you, or at least I'm not lying on purpose to you. I might be lying because I'm just playing wrong in some of the aspects here, because that's also something that we have in science. But it's not a misinformation on purpose here. So for the tetrahedron, we can do 4 minus 6 plus 4. That's 2. We can do the same for hexahedron, for the octahedron, for the dodecahedron, and for the icosahedron. And there we have it. And by the way, I think I switched two of these places here. That was just for someone to check. But yeah, I guess I could have done it on purpose as well. Now, let's walk through some more invariance here and see what we can do in higher dimensions. Because, well, I mean, platonic solids are all well and good. But I mean, I don't know about you, but the last time I encountered a platonic solid was when I was playing Dungeons and Dragons about 10 years ago. And so this is not really what we're dealing with in science anymore. But we can go higher, of course. There's another nice invariant that is called the betti number. It's named after Enrico Betti. And this deep betti number counts a number of d-dimensional holes in a space. Whatever that might be, we'll see some examples of this later on. And primarily, it can be used to distinguish between spaces because it is what we call a homeomorphism invariant. So if you bend your space, if you stretch it, if you tear it a little bit, it will not change. So in that sense, it is a characteristic property of a space. The betti numbers have very nice properties in that they represent, I would say, intuitively capturable objects or capturable characteristics. For dimension 0, for instance, these are the connected components. For dimension 1, these are the cycles in a data set or in a graph. And for dimension 2, these are the voids. So for instance, when you think about proteins or molecules or something like this, and you thicken them a little bit, then in d equals 2, you will find the pockets that they enclose. So I've been told that this is something that people are being interested in. I myself have not been working with such data. So this is just an educated guess on my part. Let's look at some examples here. The ordinary point, so nothing without any extents, has a betti number of 1 and 0 and 0, respectively. So just one connected component, nothing else going on. We can do a little bit more with a cube, a cube, which I assume to be a thing that contains some space, has a betti number of 1 in dimension 2, because it encloses one void. And we can do the same for the sphere and for the torus as well. I'm not going to go into the details why the torus has exactly two cycles here. That's actually a very interesting and deep theorem in topology as well. But if you want to go for an intuitive explanation, I would say there's one cycle that you can immediately see, because it's the one that you can put your finger through. So if you eat a donut like this, you can eat it around your finger. And the other cycle you can do by thinking of hanging it up on a Christmas tree as an ornament. That's the second cycle you can find. Now, at this point, since we're now walking towards computation topology, let me say a few words about why these invariants are useful in general. So not the betti number specifically, but why it's really useful to have invariants. When you define an invariant, when you look for characteristic properties, you're always in a struggle between having something that is extremely precise, so you want something that tells your data sets a part that has very high expressive powers. So for instance, for those of you that are into the graph machine learning domain a little bit, you might have heard about the Weisfeller-Lehman hierarchy or the Weisfeller-Lehman graph kernel or the Weisfeller-Lehman test for subgraph isomorphism. And these are exactly things that you're looking for. So you want something that is very expressive and that tells your data sets a part. On the other hand, and that's kind of the balancing part of this coin, you also want something that you can compute effectively and efficiently. It's not useful if you have an invariant that is NP-hard to compute, so where you don't have an efficient algorithm, because then it doesn't scale and you can't use it in practice. And at the risk of being proven wrong with YouTube comments or comments from the audience here, I think that computational topology tries to navigate this path a little bit. So we try to develop invariants that are sort of expressive, while still being sort of computable. Of course, that doesn't work in all dimensions and for all kinds of data sets, but for lower dimensional data sets and lower dimensional topological features, we are doing quite well and we'll now be looking into what this means in practice, moving to computational topology. That's a new subfield that has been rising for about a decade maybe, give or take. And in computation topology, the idea is that you take all the methods from algebraic topology and you make them actually computable. You make them actually implementable on a computer and potentially also in your network. We'll see a bunch of examples of this as well. Now, reality, of course, is always messy. Maybe often is not even appropriate. Maybe it's always messy. Now, what we see typically is we deal with a point cloud like this, just something that looks a little bit like a torus potentially, what we see as a human or what we try to link this back to if you're a Platonist and this is the thing that you're looking for on the right hand side, you try to link this to an actual, to an idealized torus. And computational topology helps us bridge that gap, going from the left hand side to the right hand side, going from the unstructured, discrete point cloud to the nice shape on the right hand side. Now, how does it do that? I, of course, can't give a very nice introduction into the intricate details of the algorithms here, so I'll just opt for a very intuitive and hopefully visually appealing way of doing things. What we're essentially doing with these topological methods and persistent homology is one specific one here. We approximate a point cloud at different scales in the data, so we look at it from nearby and from far away and we observe how topological features appear and disappear as the scale changes. For those of you in the knower, for those of you who want to know more, be assured, there'll also be some references later on. This is known as a Vietorius-Ripps complex calculation and interestingly enough, I have to mention this historical tidbit because it's fascinating to me. This was actually developed in the beginning of the 20th century, so not the 21st, mind you, but the 20th. So I think Leopold Vietorius wrote this seminal paper on calculating these types of complexes from point clouds in 1928. Of course, his terminology was a little bit off. I mean, he wouldn't say point cloud or a computer or whatever, but the principle is the same. He was already thinking by then about how to leverage the power of algebraic topology, which was a very, very young field back then as well in order to describe discrete data sets because he had the hunch that this might be something very relevant and very interesting. I think there's also a very nice way of showing a conference between statistics and data analysis because I think his main paper was motivated by tabulating certain statistical calculations and statistical results from a census or something like this. Anyway, this Vietorius-Russ complex is super easy to calculate. We pick a distance and we pick a threshold epsilon and then we just start connecting subsets if the pairwise distance of their points falls below that threshold. And of course, we do this only for pairs that are non-identical. And now as we grow this threshold, here's a nice animation I've prepared. You can see that more and more things start to be connected. And now suppose that we're looking for cycles in this data set. Then at some point, going from this scale to this scale, we have finally identified a cycle. This cycle then persists for another scale. This is also where the name persistent homology is coming from because we are tracking how long features survive in this process. And after a certain threshold, it gets closed again. It gets swallowed because we have reached a maximum of our zoom level, so to say. Now, that's the intuition behind that. You can actually use that for all kinds of point clouds. It's not restricted to point clouds. We'll see also some examples of this later on in the talk. But to illustrate this principle once more and what to actually do with these topological features, let me show this to you with a projection of a 2D point cloud. Again, we can start growing our Euclidean spheres around the individual points and we can track topological features. And in the end, what we do get out of this, and this is, I think, the key takeaway that I want you to have. If you forget everything else from my talk from the keynote, please take away two things. Namely, that Euler did a lot of stuff in topology and that this descriptor on the right-hand side is called a persistence diagram. The persistence diagram, this little diagram on the right-hand side captures topological features and it captures their creation and destruction across different scales. So basically, the more activity you have in there, the more topological complexity there is in your object. And you can do all kinds of interesting things with this, of course, and I promise that I'll go light on the formulas here, but I just want to show this to you at least once to see why people are doing this and we'll see more motivations for why this is interesting in machine learning as well. So the first thing that you can do with these persistence diagrams with these descriptors is you can calculate a distance between them. In fact, maybe you've heard about the concept of optimal transport. This is one of the nicest applications of optimal transport that I'm aware of. So you can take two persistence diagrams, d and d prime, and you can calculate what is known as the bottleneck distance or also some kind of Wasserstein distance between them, which is essentially solving an optimal matching problem. So you try to take all the points in one diagram, you try to match them to the other diagram, and you have an internal cost for doing so, and then you try to solve for the infimum over the supremum of this respective cost function. So this is also where the idea of the bottleneck comes from. You're searching for the best matching that you can find, and then you take the largest distance, the largest cost that you have to entail to make the matching stick. That's known as the bottleneck distance. There's a relaxed variant of this distance around as well, which is known as the Wasserstein distance. Technically, I should say the Wasserstein distance between persistence diagrams because you can, of course, also calculate the Wasserstein distance between probability distributions. So these are actually the same. It's the same distance, in some sense, it's just being calculated over different spaces. Now, why would you want to do this? Well, one nice thing is that persistence diagrams are actually stable under certain transformations of the data, and in particular, one stability that is really great for us when we do machine learning later on is that they are stable under certain sampling conditions. So in particular, you have probably heard about the term mini-batch or batch in machine learning, of course. So when you do batches, when you take subsamples of your data and you work with them, then you are almost virtually always guaranteed, and we have actually a theorem that quantifies this a little bit, how the persistent homology, how the topological features of your data set behave under this subsampling. Here, I give you just an intuitive view. So you can three point clouds, and you can see that the respective persistence diagrams in, I think, that's dimension one, they are more or less of the same shape. I'm saying more or less because you can see that the blue point cloud here, oh no, I'm not going to try this because it's an extended screen. Oh no, but it works. The blue point cloud has a little bit of a different sampling here, so I change the sampling conditions here on purpose, but you can see that the persistence diagram is still kind of similar. Now, let's make this more precise. Since we already have a distance definition, we can also define this more formally. So if you have a triangular little space, so something that you can add a triangulation to, which in, I would say, our modern parlance is almost any data space that you will ever encounter, and you have a continuous tame function that is a function that has only a finite number of critical points, so it's not something that is really degenerate or ill-behaved, then the corresponding persistence diagram satisfy a relationship in that their bottleneck distance is bounded by the house of distance between the functions. And that's also a very surprising result to me because if you let that sink in for a minute, you have on the right-hand side a house of distance, which is a fundamentally geometrical property, but on the left-hand side you have a topological distance, you have a property between topological features. So here geometry and topology go hand in hand and one bounds the other, which is a really nice way to think about these computation topology methods because, and it's good that this is live streamed as well, I think the field is misnomed. It should also contain some form of geometry in there. So if people hear computation topology, they think, oh, we're doing just discreet stuff and we throw geometry out of the window, but that's actually not true. So as you can see here, geometry is being kept around and is being used. Now, moving on a little bit, this is a slide that I discovered recently when I did some historical digging here because persistent homology is actually an idea that has been around for some time and it affords a very generic view on your data, which is something that I'll try to convince you in the second part of this keynote lecture when I show some applications and I'll try to not butcher this quote too much. There's a quote by Victor Ego, which is, I'll resist à l'invention des armées, on ne résiste pas à l'invention des idées. So something very freely translated, this means you can't resist the power of an idea whose time has come. And this, you can see this when you go through the papers that I mentioned here because there have been some precursors of persistent homology already back in the 90s. They called it a distance for similarity classes of submanifolds of Euclidean space, or they called it the frameworks complex and its invariance or a size functions from a categorical viewpoint. All of these things are, if you look at this mathematically, precursors to the things that I showed you before, precursors to this idea of looking at data at various scales and seeing how its properties change. But the fundamental or seminal paper that I was growing up with, so to speak, as a researcher is called Topological Persistence and Simplification. And let me just give you a quote here so you can see that these notions are applicable in general settings. In this paper, the Edelsbrunner and colleagues write, we formalize a notion of topological simplification within the framework of a filtration, which is the history of a growing complex. So already here you only need like some way to order your data in some sense of fashion. And this can always be done. They then go on to say we classify topological change that happens during growth as either a feature or noise depending on its lifetime or persistence within the filtration. We give fast algorithms for computing persistence and experimental evidence for their speed and utility. And this was done in 2002. And I would say it sparked this whole field of topological data analysis that we're now starting to reap the fruits within the context of machine learning. Now, this is the last slide before the applications, actually, and I was arguing with myself whether I should name this slide, why should you care? So I didn't do this. Instead I asked the question now here as a rhetorical question. This is the slide that tells you why should you care about these topological features. This all looks nice. It can be like a mathematical game, mathematicians like to play, they like to develop new stuff. That's kind of cool, okay, fair enough. But why should you care about this in the context of machine learning? Well, there's a bunch of evidence, going back to work I did with Carson and colleagues and even some PhD students that are sitting here. And this shows a lot of nice properties. So for instance, in the context of machine learning, you can think of topological features as constituting an additional set of inductive biases. So you already know that the recent paradigm shift has started to happen in deep learning. So instead of saying, okay, we can learn everything we can from the data, people are now also using specific inductive biases for specific tasks. For instance, when you want permutation invariance or permutation equivariance. So adjusting the model to respect certain things, certain properties has become a staple of modern machine learning research. Now, in another fashion, topological features can also be shown to compliment existing machine learning algorithms and endow them with an expressivity that cannot be achieved otherwise. So for instance, our recent paper on topological graph neural network showed that with topological features, we were able to escape the Weisfeller-Lehmann hierarchy. So we are able to be together, we are more expressive than the individual parts. And last, but certainly not least, of course, topological features have also some advantageous theoretical properties. So for instance, in our paper on topological autoencoders, the last one in the slide, we were actually looking at the subsampling conditions and we were able to show that provided your subsampling of your data, so your mini batches, provided that they are kind of well-behaved, not going into the details here, then your topological features and also your reconstruction is also well-behaved. So this is essentially why you might want to care and why these features are really being useful. Now, let me show you how to use this in the applications. There is a generic topology-driven machine learning pipeline and recently we've started to upgrade this pipeline quite considerably, let me show this to you. So ordinarily, people would use a point cloud, they would do persistent homology, then they would get persistence diagrams from out of there, so these topological descriptors and then they would look at those diagrams and they would say, okay, this diagram tells me something about the data. So they were being used as static features that can be used in an exploratory data analysis context. But recently, at some point, people realized that, hey, wait, we can also use them as input features for machine learning. I mean, that's not surprising to anyone here, I guess, if you have something that you can calculate and you can represent it somehow, then you can also, of course, throw it as additional features in your machine learning algorithm. But the really interesting thing is, and this is a really brand new result that started to occur about, I would say, two years ago and this is, we can actually back propagate information through this whole pipeline. That is specifically a gradient from the machine learning algorithm, whatever that might be from a deep network, for instance, exists under certain mild conditions. So for instance, in the topological autoencoder papers, we were able to enumerate this condition as saying that all the distances in your dataset have to be well-behaved, you're not allowed to have infinite distances and you're also not allowed to have distances that are too close to each other. So what I mean to say here is that it is possible to go back and to go this pipeline in the other direction as well. And therein, I would say, lies the true value of topological machine learning at the moment because you don't only get static features out there and you can say, oh, I'm looking at my coffee this morning and the coffee grounds, they look a little bit different today, so this means something, but no, no, you can use those features in classification scenarios, for instance, you can use them for reconstruction purposes and for many other tasks as well. Now, let's take a look at one specific application in the life sciences. We use this back in the days, we use this for characterizing fMRI datasets. Let me give you a brief rundown and I'm sorry for being a little bit cursory here because I'm not an expert in fMRI, of course, so this is also my understanding of the technology. So fMRI, to my understanding, measures some kind of blood oxygen level dependent activation in your brain, so if you think about it some very, very hard about something and certain brain areas involved, then I'm being told you need more oxygen there in there and this slides up under the machine. It is a technique that has temporal and spatial components, so you do this measurements not only for a single time step, but you do this for a longer time, so people are lying in the MRI for, I don't know, 45 minutes or maybe shorter, I don't know, but you can do this for as long as you want and you can measure this signal all the time. However, since I guess every one of us, and that's actually a very philosophical issue, so maybe we're at the right place here, every one of us perceives the world probably differently and has a different way of thinking about things. Hence, in fMRI data analysis, you are plagued by a large degree of intra-subject variability, so even if you and I have ostensibly the same hardware, or I guess I should say wetware in this case because it's a brain, then it will still behave a little bit differently, so of course we know where the eyes are and we know how the cortex works sort of, but still how I perceive a certain stimulus is different from how you perceive it probably. Moreover, there are also issues with geometrical alignment and this is already where maybe the alarm bell should start ringing and you could say, ah, okay, we take something that is maybe invariant to certain geometrical transformations and yes, this is what we did, we used topological data analysis to characterize time-varying fMRI data. Specifically, we characterized them using cubical persistence, that's in Europe's paper from 2020, and cubical persistence, not to go into the details here, but it illustrates one of the nice points about this whole topological data analysis framework, namely that it indeed works for all kinds of data if you're able to rephrase your problem in a specific manner and in this case, we were able to reframe our problem as saying that, well, fMRI data is dealing with volume data, but volume data is something that can be considered a special type of topological complex, in this case, a cubical complex. And so with minor modifications, all of the things that I said before, so this tracking of topological features, cycles, voids, and so on, this works in this setting as well. This technique is built on previous work by Wagner and colleagues published in 2012 on efficient computation of persistent homology for cubical data. Now, what we did specifically in this project is we looked at this bold activation function, so the blood oxygen level dependent activation function, we considered this to be a time varying function on some manifold, and in one of the few cases where this is actually working quite well, we were able to also understand the manifold directly because the manifold was just the volume data that we got, and so we were able to calculate topological features of this manifold measured via this function f and obtain stable topological summaries at different resolutions of this function. Now, the main advantage of this is that this was working on the raw data, and I'm putting raw in quotes here because it's not really the raw data, my collaborators did a great job in cleaning this up for us and aligning this of course, but it is as raw as you can get without doing auxiliary representation. So for instance, if you're looking at some fMRI publications, you will find that people often use an Atlas, so they think about which region should be present in the data or they use a correlation graph, something like this, we don't need all of these things. In particular, we don't need to do certain modeling choices, but we can use the data as is as some kind of time-bearing volume. And let me show you the pipeline with the cubic complex being highlighted as the central or pivotal element here. So we start with an fMRI stack on the left-hand side, we obtain an fMRI volume from this by putting all of this together. It's time-bearing, this I can't show because else this slide would make you a little bit queasy, I guess, we transform all of this as a cubicle complex, and from this we extract persistence diagrams. So again, these nice diagrams that characterize topological features. Now, if you're attentive still at this slate hour, you might see that the persistence diagrams that I'm showing you here, they have a third dimension here. Well spotted in this case, the third dimension is time. So we are really lazy here and we're just stacking them on top of each other because we just use the time dimension as an individual axis here. Notice that for those of you that are interested in time series analysis in general, we are, for this approach at least, we're not using any relations between time steps. So rather we are parallelizing everything and we're just treating every time step as an independent instance of a topological expression. We could do smarter things here. In fact, we're still working on this. You will find some references to this, but this is what we did back then. And the data set that we are looking at comprised about 155 participants who were all watching the film partly cloudy. So I do wanna stress that this was not a distressing study for anyone because 122 of our participants were children and only 33 of them were adults. So they were just watching the movie, nothing else was done. They didn't have to solve any tasks, but what this amounted to, as I'm told, is it's a continuous stimulation of participants. So it's not something that is known as resting state data or resting state fMRI or something like this, but no, no, they had to watch the movie. Of course, we didn't force them to watch the movie. So they could have closed their eyes and dozed off. We didn't actually enforce anything here, but they all had the same stimulus, which is great because now we can compare their responses to certain things in the movie. And what we did first is we tried to predict their ages. This is, as I'm being told, this is in neuroscience, I would say in the parlance of computer science. It's a smoke test. So it's a test for whether the representations that we are extracting are actually any use at all. And it turns out that they have to do that to actually work with an age prediction task. We had to evaluate the norm over persistence diagrams. So that's another neat mathematical property that you can have of these diagrams. It's essentially just the maximum of the points distances to the diagonal that you can have. This norm is also stable. It's highly useful in particular when you want to obtain simple descriptions of time-varying data sets because by calculating the norm, of course, you turn your high-dimensional topological descriptor into a single time series and then you can evaluate this in other forms of fashion. Now, let's feast our eyes briefly on this table here. This is one of the nicest results of the study, I would say this is the age prediction based on the summary statistics of all the participants. You can see that high scores are favorable here because it's a correlation coefficient. So ideally you would want to have some kind of correlation coefficient of about, I guess, 0.9. Of course, not there yet, but still it's pretty nice. I think the mean squared error that we had was about 2. something years, which is not too shabby. You can see that in particular, if you compare this with shared response models, this SRM-based technique, which we only had available for a specific subset of our data set, then we still outperform them considerably. And I'm mentioning this because it's surprising that a data collection process and a data analysis process like ours, which just looks at raw data without any bells and whistles and which also doesn't include any prior biological or neuroscientific knowledge that this still works that well. But that I think shows how expressive the topological representations can be for these tests. Of course, age prediction is not something that you want to do in practice. So you could do some, a lot more things. One thing that we tried, and this is still ongoing work because that is really, really complex and there's no pun intended with a complexity analysis here. We tried to do complexity analysis based on the actual brain states that participants went through. It turns out that the data are very noisy and we had to aggregate these complexities by the cohort. So we had to aggregate by the years and what we were looking for is to what extent a younger cohort is exhibiting higher topological, sorry, lower topological complexity than an older cohort with the idea being that if you're a very young child and you're watching this movie, then you probably don't understand a lot of what is going on, but you see that there's lots of sounds and noise and fury signifying nothing. And if you're an adult, you probably have an emotional context and another context that you can relate this to. Now, again, stressing this, this is ongoing work. For instance, one thing that we're looking at nowadays is we're trying to relate such trajectories also with actual events and an emotional context in the stimulus, but this is of course hard to do. So there's all kinds of interesting annotated data where people are tracking facial expressions or where they are asking people in this movie scene, which emotion are you predominantly experiencing? Anger, shame, disgust, joy, surprise, apathy, whatever. Actually, there's more negative there in there than positive ones, but that's not my key. So I'm not responsible for this. But so this would be one of the steps where we want to take this next and then we would characterize the actual shape of our brain state trajectory as we watch or as we encounter a stimulus. Now, with this macroscopic considerations of a brain, let me maybe now zoom in quite a lot and talk briefly about the prediction of the shape of cells. So now we have a quite different task. So now we're actually having something where we can measure the outcome quite concidically. So this is relatively recent work and also still ongoing because it's a complicated problem. We'll see why there's the case. So my collaborators from Helmholtz, they have a lot of nice cell images and they say it's images of single cells, but in case you're also as confused as I am, this has nothing to do with single cell data analysis or single cell or SCNR analysis. This is something completely different. So what they mean is really like they take pictures of individual cells under the microscope. That's great. They use a confocal fluorescence microscope for this and then they're interested in predicting the 3D shape of a cell from this 2D image. This is also known as a morphological analysis and it's a crucial way to detect certain pathologies. So one very similar paper in this area is paper by Ford on red blood cell morphology and this states that when used properly, RBCs or red blood cell morphology can be a key tool for laboratory hematology professionals to recommend appropriate clinical and laboratory follow-up and to select the best tests for definitive diagnosis. So in some sense, and it's maybe some of you already recognized this, in some sense this is what a company called Theranus tried to do. We're not claiming that we can even do 5% of what they claim to do because it turns out that they were a hoax company, so bad for them, potentially good for us. But the goal would really be, if this works, would really be that we take some blood sample, we try to reconstruct this from individual microscopy images and then we know something about the patient's state of health. And I'm stressing this because, and I hope no one is too queasy in the audience here, blood is a really nice substance in that it's almost always available in patients. So unless a patient is really, really sick, you can probably spare at least a drop of blood in the hospital. So it's really a substance that you can easily get and you can easily analyze it. So drawing inference, making inferences from small quantities of human blood is a very nice, well, technology to have in the future. Spoiler alert, we're not quite there yet, but we're making some progress and I'm gonna show you what we were able to achieve with topology. So let's first start without topology, namely we take a look at the pipeline that we had before adding some topological information here. What we do is we start with a 2D input on the left-hand side, we throw a machine learning model on there. That's by the way, that is dotted because it's of course something that you can easily replace if you find something better. We let the model predict a 3D shape and then we use a geometrical loss term more about that in a minute and compare it with the ground truth. And the mathematicians or computer scientists and the audience that might appreciate this, this is really hard and it's really hard because it's a complicated inverse problem. So we're going from 2D to 3D. I mean, you already know that if I look at my shadow then I can reconstruct all kinds of interesting things. So it is also essentially an ill-defined problem with a large number of potential solutions. So we do need a lot of input data to make this work semi-reliably and I can already tell you that there will of course be cases depending on how you look at the cell where this reconstruction can never work because you're just missing features. But we are content with capturing 80% of the cases maybe quite well and then raising a flag for the cases that we can't handle. So that would already be a nice result there. Now, what we have here is the so-called Shaper, the shape reconstructing auto encoder. And this is a very simple machine learning technique that employs a convolution neural network with some fully connected neural networks. And as you can see it kind of decreases the picture first and then it blows it up again into volume. So it starts with a 64 by 64 image and then it gives you out a 64 cubed voxel volume. And what this essentially does under the hood is it is learning a likelihood function. A likelihood function is a function that maps every voxel of this grid here. So every point in R3 to some scalar value. And this scalar value indicates the likelihood of a specific voxel being part of the true volume. So that is what this method does and how it tries to reconstruct the images. Now, for the normal loss function or for the geometry-based loss function, the Shaper method uses a geometry-based loss that consists of two components. One is a dice loss. The other one is a binary cross entropy loss. And without going into the details here, let me just give you the intuition here. So essentially it compares the geometry of the resulting volumes on a per voxel basis. So what it does is it's looking for whether the reconstructed volume is well-aligned with the ground truth one. But there's one issue at least, namely on their own these losses are not sufficient to capture shape variation because if I modify the shape a little bit then its topological characteristics of course don't change. So I can rotate my icosahedron or my platonic solid in space all I want. It's still a platonic solid. But of course these losses that are very restricted to the voxels themselves, they will then raise an alarm and will say, no, no, no, this deviates. So you're learning a very restricted set of shapes or shape features less so than you could do in practice. But now let's add topology to the mix and let's hope that this helps solve some problems. This is what we did in our recent Mikai paper. So essentially if you recall the previous slide what we added is these three components below. So we used topological features, calculating of both the 3D prediction and the 3D ground truth. And then we have a topology-based loss that we can combine with a geometry-based loss above to obtain a joint loss and to kind of balance out the geometry-based reconstruction and the topology-based reconstruction. And to go into some more details about this loss it's something that you have encountered before in these slides. It's the sum of Wasserstein distances between the persistence diagrams and a term that I haven't introduced before but that I would just now call here total topological variation. Sometimes it's also known as total persistence. Now these terms have two different components or two different raison d'etre, if I can call that. The first is you align the ground truth likelihood function and the predicted likelihood function f prime. So you want your ground truth and your predicted function to be as close as possible in the topological sense, mind you. The second term, so this total variation term or this total topological variation term is only applied of course to the predicted likelihood function because we can only change that prediction, right? We cannot change the ground truth data and it's added there to reduce the geometrical topological variation of the predicted likelihood function. So essentially, and you can try this out. I have a website for this if you want to check it out and this reduces the wriggles in the surface that you get because if you have a nice surface, if you have a nice dodecahedron, one of the issues with topological features is that I can add a lot of wiggles around the surface and this won't change the overall topology but it's something that is really, really undesirable. And so adding this additional persistence term shown in blue here gets rid of this. And then of course we can combine them and this is what you're all familiar with and what you all know and we can choose a lambda parameter to, oh, what was that? Just a chair. Oh, okay, okay, great, great, okay. So nothing, I didn't destroy anything, that's good. So, and then we obtain a combined loss by adding those together and by waiting them accordingly. So now two interesting things here. So namely, we of course write out what happens if you only go for the topology-based loss and everything explodes again because topology on its own is not powerful enough to regularize your shape nicely but geometry on its own is also not powerful enough. So there you have it. This is really something that needs to be optimized jointly. On the other hand, one interesting tidbit that we found in this paper is that it's actually sufficient and this is one of the moments where, well, in hindsight, you're always smarter, right? But in hindsight, it was clear to me why this should be the case. So I found a theorem from some topology book that explained this a little bit but it turns out that we don't actually need the sum over these Wasserstein distances but it's sufficient to do one Wasserstein distance in dimension two specifically because cells have kind of nice features and there's a duality in topological persistence going on that we can exploit for these purposes. I'm showing you the nice version though or the big beefed up version here though because that's what we initially trained with and then the rest is more like an empirical result that might not hold for generic data sets. Now, let's us briefly look at the results here and I don't want to delve into the details what these arrow metrics mean but essentially what we found is that just by adding this simple calculation, the simple loss term here, we're able to reduce arrows in all relevant metrics quite substantially except for of course the odd one out there's always one experiment where this doesn't work namely in the surface roughness for the nuclear data set. The results are still very, very close and I think this is more like an initialization question. So if we run this multiple times it might be that we end up having something something that is relatively close here. One thing I do want to stress here though and this is one of the reasons why I'm excited about this project is that typically you might run into scalability issues if you have very high dimensional topological features. That's not something that I mentioned too often here because it didn't appear in these cases but for this type of data set we are actually, we have almost no performance decreases whatsoever because it turns out that the topological features are by themselves expressive enough to be calculated on a very, very simplistic rough version of the data. So we don't have to use the 64 cubed volume data to calculate this part of the loss term but we can use a down sampled one and the down sampling is almost free in terms of computational performance. That's really nice because it really ties this together and also demonstrates that those features bring in complementary perspectives. Now, that's all I have for you today so I hope I was able to convince you a little bit about the fact that topology can provide useful inductive biases for shape reconstruction tasks in particular. I do want to stress that another takeaway so I already have two so don't forget about oiler being the man, then persistence diagrams those are the topological descriptors and the third one is that they actually encode geometrical and topological properties of the data so don't get fooled by our bad advertising. Mathematicians are really bad at advertising and naming things. Carson knows this from our work together I really am bad at this. And so we name it computation topology but it's actually more. It also encodes some geometrical properties not all of them but some and moreover and this is the fact that I'm most excited about the integration into standard machine learning methods is now possible. So we have something for autoencoders we have something for graphs we have something for shape reconstruction tasks more is hopefully to come, I mean knock on wood, right? If you want to learn more there's a recent survey that I co-authored with Felix Hensel and Michael Moore also of Carson's lab. Now I think a postdoc in Stanford it's called a survey of topological machine learning methods and it's an open access publication in frontiers and artificial intelligence and last but not least, this is now the advertisement part I hope it's okay. If you're interested in topological machine learning and you wanna check this out on your own my lab and I we're trying to make the software work it's called PyTorch Topological I know very creative name you see I'm kind of like true to form here it's not a creative name but it works and this gives you the power of topology at your fingertips and at least at the moment of me saying this I mean depending on how fast the others are with a pull request we can do the fMRI data analysis we can do the shape reconstruction stuff what we can't do yet and maybe someone in the audience wants to do that is we can't do the graph neural networks yet but that's just a matter of time until we have rescued our old code and put it into an ISA framework but it can already do quite a few things and I'm happy to discuss more and maybe answer any questions about the software and now of course I'm very happy that I could be here and I'm looking forward to some of your questions now thank you very much. Thank you very much, Bastian. Both for jumping in for this inspiring talk thank you very much. So are there questions for Bastian? Yeah, amen. So at a certain point I missed a step I think so you kind of lost me. Because maybe you're totally in that field you started out explaining that you measure the shape of these red blood cells and you mentioned that your colleagues have a confocal and somehow the next slide was how you want to go from a two-dimensional to a three-dimensional reconstruction but if you have a confocal you have the three-dimensional. Yes. So at that point I lost the connection. Very good point. I'm sorry this is also my lack of biology talking there so the way I understood their problem is that they say these doing 3D reconstructions here fast and doing a lot of those is time-consuming for them and doesn't... So they do it in two dimensions in a real application. Exactly, yeah. So they can do it in 3D. I mean in fact the data that we got here and this is actually, I want to stress this because this is an actual ground truth data that we got so I didn't make this up or anything this is really one of the ground truths and one of the predictions that we have from our algorithm they are being done by our collaborators but they are telling me that this is a time-consuming process and it doesn't scale very well. So in the meantime what they are looking for is something that gives them that gets them 80% of the way there and that maybe can tell them, oh this is a blood cell that looks very, very anomalous from the 2D slides. I'm sorry I should make this more clear but this is a very, very good point. Thank you very much. Further questions for Bastia, Giovanni? So thank you. It was incredibly fascinating. I have a curiosity, so you showed quite a few results for images and 3D space. Yes. How far can you push dimensionality in this setting or to put it another way, how badly is this topological setting affected by the curse of dimensionality? Yes, that's a very good question. So now, okay I don't want to do a politician stands and give you a half answer. So let me disentangle this though. So first of all, it is not affected as much by the curse of dimensionality as other methods because fundamentally it's built upon the idea of having good distances in your data. And of course, yeah, Euclidean distance suffers from this but you can throw your own distance in there. You can even throw learned distances or metrics or Mahalanobis distance or whatever you want in there. So in that sense, this can be mitigated but where the curse of dimensionality really hits us is when we go back, let's hope that this works. This is the first time, no, the second time actually I'm giving a talk since 2020 in real life again. So where this curse of dimensionality really hits us is if you look at this V epsilon expression on the bottom of the slide, you have all these subsets that are within epsilon or less to each other. And if you don't restrict the size of your subset in the worst case, you get two to the power of the number of your points of your subset. So it is a very, very bad scaling in this sense. And this is where the curse of dimensionality hits us. This again can be mitigated by saying, okay, we're only interested in topological features of a certain dimension. So for instance, we can say that empirically speaking for graphs, it's sufficient to do zero D and one D features to already get a nice performance improvements for higher dimensions, 2D might still be feasible. I personally haven't encountered a data set where really much more beyond 2D was required. I mean, you can build data sets where you can only characterize your data by having higher order dimensions but it's rare. And now to give you a very precise answer, so this scalability is abysmal. You have to cheat yourself a little bit around this. It is possible, but it's hard to do. But at least if you have a 1,000 dimensional point cloud and you're only interested in a bunch of low dimensional topological features which can still be very expressive mind you, then it's still something that can be applied. I hope this was good. Thank you. You're welcome. Sure, Philippe. Yeah, thanks. Very great talk. And my question was in terms of applications. So you mentioned a few of them, but I'm sure there are plenty of others. So what about structural bioinformatics? So structures of molecules, either small molecules or proteins. And in particular, since you mentioned that you can back propagate, it sounds like things like alpha fold, et cetera, which predict a structure using some lost functions potentially could also use some of your stuff. Did you look at that? Absolutely. No, not yet, to be honest. So I would very much love to. So in fact, I think that our success in this topological graph neural networks kind of gave us the motivation to dig a little bit deeper into this realm. I do think that this is one of the application areas where we would require a joint optimization or kind of a joint view on the data because I don't think that the topological features on their own are any more powerful than what is already out there. But potentially if we phrase this right and if we set up the task right, then we could have something that is really complimentary because it can capture things that you cannot capture in other ways. So yeah, I would definitely be interested in this. There's also some, I only mentioned this briefly, I think. Let me go back to this. Yeah, it is actually the right slide. So no, it's actually not. There is work with Leslie, but okay, Leslie is on the slide, that's good. There's other work with Leslie where we're looking at evaluating generative models for graphs. And here I think the topological perspective would also be very much warranted because graphs are already topological objects. So it would be very interesting to kind of characterize the expressivity of a generator of a distribution in terms of its topological properties. So to say, can we get all the modes that live in this space or are we restricted to a certain class of graphs? But I also have to say that this is living a little bit in future worlds for now. So ongoing or rather planned as research, but definitely interested in, I think definitely worthwhile. And can I have a second question or someone else? Yeah, so super technical question when you talk about the differentiability of the topological representation with respect to the input data. Can you say a bit more about, so I suspect the function is not differentiable, but maybe everywhere, almost everywhere differentiable. The question is, in the points appear, there's area where there is no point and suddenly it moves away from the diagonal. Exactly. So one thing that we exploit here, maybe just to go back to this diagram here. So one of the, there's multiple ways of going around this. Some of my colleagues, for instance, Elkanan Solomon has been looking into this and Mathieu Caillier as well. Here for this diagram, here we would exploit the fact that every point has at least a very, very small neighborhood around itself that doesn't contain any other points. So there's like no overlapping points in this diagram. If we have this condition, then we can show that the mapping from the space to the diagram is constant. And then the composition rule of gradients tells us that we can ignore this part. There's also more technical results if you use different representations because a lot of things exist in this space that I haven't told you about, unfortunately, here in this talk. Apologies for this. If you use a different representation of your topological features, then you can show, for instance, that the mapping is Lipschitz and then you can allude to the, now I'm blanking on the name of theorem, but then you can allude to a theorem that tells you that the mapping is differentiable almost everywhere. And then you need some computational tricks to make this gradient actually unique in practice. But it can be done and it works, even kind of out of the box with PyTorch, it works surprisingly well. Even if you kind of ignore some of the degenerate cases, it works surprisingly well, yeah. Thank you. You're very welcome. So let's thank Bastian again for this very inspiring keynote. Thank you, Bastian. Thank you.