 But yeah, let's boldly go forward and I can tell you the worst is already behind you or the most complex things are already behind you. Thank you, looking forward to it. All right, okay, without further ado, let's move on. Again, this is the slide that I'm repeating everywhere because I have no way of knowing how many people are dropping in or dropping out. So any questions, et cetera, write me an email, ping me on Twitter and you can also find the slides and additional information on this website. There's also workshop going on, but let's discuss this in the last lecture maybe. So quick recap, what did we see? What did we try to understand? We wanted to distinguish between topological objects and for this we used the bitty numbers which essentially count the number of high dimensional holes in an object. So it's a very coarse calculation. It's a very coarse invariant, but it's something that turns out to be really, really useful. But the calculation requires a simple complex and some linear algebra. And so now let's see, in this lecture, let's try to bridge theory and practice. So let's, first of all, let's look at how can we bring this into the real world? So real world objects are typically not described in terms of a simple- Sorry? Yeah. I think you're not sharing a screen. Oh, whoa, whoa, whoa. This is a very good point. This is a very good point. Okay, so you missed the cool bridge. Anyway, we're going to bridge this gap in this lecture and we're trying to figure out how real world objects work and how we can describe them in terms of simple- So complexes. All right. So here's an example. This is what we typically get when we work with real world data or well, contrived real world data that someone invented for a lecture. What we want or what we see as humans is we see this torus, right? So this is what we get. This is what we see or what we want to see. So we want to understand why this is a torus or why this point cloud is likely to have been sampled from a torus. But this is really hard to say. So how can we get there? Moving from a point cloud to a simple- This is actually something that goes back a long time. So like over, I think over a hundred years now. So even then, back then before computers, people thought about these things. So we will calculate something called a vitoris rips complex. And we will see a lot of examples of it for this. Given some set of points, x1 to xn, in a metric space and a metric itself, such as the Euclidean distance, we can pick a threshold epsilon and we build a vitoris rips complex defined as all the subsets of x, such that the distance between all the points of the subset is less than or equal to epsilon. So we just take a look at the pairwise distances in every subset. And if the subset satisfies that all the distances are less than or equal to epsilon, then we add the subset to the complex. Notice that this has a nesting property, which we will formalize in a second because if this holds for let's say five points, if five points satisfy this in a pairwise fashion, then of course all their subsets will also satisfy it because going down is easy, going up is hard. Equivalently, you can think of it like this. The vitoris rips complex v epsilon contains all the simplices whose diameter, so the maximum distance between them is less than or equal to epsilon. And notice that on this slide, I make my life a little bit easier, so I will only write v epsilon to indicate a vitoris rips complex of some space and I will hope that it's pretty clear which space it's calculated from. All right, so first example, how does that look in practice? So we take those, we take this point out here, we take certain epsilon values and I'm indicating those as discs or as balls around the individual points and whenever two of them intersect, we draw an edge or we draw the corresponding simplex. So when two of those spheres intersect, we draw an edge, when three of them intersect, we draw a triangle and so on and so forth. And you can see that as we modify this threshold, the point cloud becomes progressively more triangulated and progressively higher dimensional simplices are being added. So here we only add those edges and here we add some triangles because we have the first intersections with three things and then we add more and more and it gets higher dimension. Just a second, I saw a question pop up on the chat so then I should open this again. Okay, now it works, okay. So essentially, we just draw these Euclidean balls of a diameter epsilon and we create a case simplex sigma for each subset of K plus one points that intersect in a pairwise fashion. And the cool thing is you can already see that this looks roughly circular, I would say for most people and if you have the right threshold, you see the circular structure here in the simplisher complex because this is a simplisher complex that has, and we will validate this of course, that has one hole like a circle and this hole remains here and then at some point it's being closed. This is the whole idea of this calculation. And I just have to mention this because it's so awesome. So it dates back to a 1927 article by Leopold Vietoris über den höheren Zusammenhang kompakter Rheume und eine Klasse von Zusammenhangstreuenabbildung which is still readable today if you speak German of course. And this is, this was the first mathematician who described this sort of calculation for for metric spaces. And then it took until 2010 and a paper by Afra Zamorodian to describe several construction albums. So that would be the paper fast construction of the Vietoris Rips complex. The basic idea for this is we built higher dimensional simplices inductively from lower dimensional ones. So we first have to get away with the neighbors and then we have to take a look at whether those neighbors intersect in the right way to build a high dimensional simplex and so on and so forth. I also have to stress that in the worst case this complex will contain all two to the power and subsets of the underlying point cloud X. So it's not the most efficient way of describing a space, but it is one of the most efficient ways if you want to calculate stuff with it because it just requires a distance or a metric and a set of points and everything else goes automatically from there. All right, so now let's couple this with the, with the Betty numbers that we have in just a second I have to drink something. It's kind of tough to talk all the time. Anyway, coming back to this example the, let's look at the Betty numbers of the Vietoris Rips complex. So let's connect what we learned earlier in the first lecture to this calculation here. So we only have to take a look, we could look at Betty numbers in dimension zero as well, so at the connected components but let's only take a look at the tunnels. And again, let's do this intuitively. So let's not just, let's not calculate stuff but let's just say that whenever we find this cycle here this circle, then we say, okay we have a Betty number of one and whenever we lose this circle we say, okay the Betty number now drops again. So this is the only thing we are looking for now. And you can see that as we move along these different thresholds here you can see that at some point for epsilon zero point two the Betty number jumps from zero to one because at this threshold here finally all the points are connected with each other and they are forming a circular, a circular structure cycle. And this cycle remains for quite some time until at epsilon zero point one it is closed again because everything is now connected to everything else. So there is no tunnel anymore, no cycles anymore, no nothing anymore, so it drops back to zero. So this already tells you, gives you a rough glimpse as to what we wanna do namely evaluating Betty numbers alongside this complex for all thresholds. But there are certain issues with this approach. So the first obvious issue is that how can we pick this epsilon parameter? Because there might not be one correct value for epsilon. If do we want to pick epsilon zero point two? Do we want to pick zero point two point one? Do we want to move even higher? Do we want to pick something that is as close to one as possible? So in essence, it's also it's very hard and it's very intractable also because we would technically have to calculate this matrix reduction algorithm that we learned about in the last lecture or that at least we glimpsed in the last lecture. This has to be performed for every simple initial complex that we get. So for every value of epsilon here, this is really not smart, it's really inefficient. But so nonetheless, what can we do now? The answer may surprise you. We just calculate topological features for all possible scales. So for all epsilon parameters at the same time. And this might sound crazy because why should this be more efficient than doing it individually? But it actually is at least because we can constrain the calculations somehow. So let me give you an intuition here without going into the details of what all of this will mean. We will of course discuss it subsequently. Let's just say we have such a point out here. It also looks a little bit like a yummy bagel and we just start growing a simplicial complex, growing a vitorious rips complex by having this epsilon threshold and we let it grow and grow and grow. And we always take a look at what the resulting simplicial complex has in terms of topological features. So now if we can do this, if we can go through all these potential scales and track topological features, then at some point we can end up with a descriptor that tells us at which scale is a topological feature created and at which scale is it destroyed? And this is, we will make this much more formal and we will make this much more clear, but the underlying intuition is nothing but that. We just go through all the scales and we look at which scale a feature occurs and at which scale it disappears. So if you go back to this example with a cycle, at some point the cycle has been filled and then it is destroyed again, right? And this is nothing but tracking all topological features over all potential distance thresholds. So why does this work? It works because as I already insinuated, the vitorious rips complex is nested. It has a nesting property. So given two distant thresholds, epsilon one and epsilon two, we have that the vitorious rips complex of epsilon one is a subset or equal to the vitorious rips complex at epsilon two. And this nesting property is the key towards these improved calculations. So essentially you can see it like this. This complex, and I repeat the same here to tell you that they do not necessarily have to differ by additional simplicity. It's also perfectly fine if we just keep repeating the same complex. But this complex is definitely a subset or equal to this complex here. This complex is a subset or equal to this complex here because we just add some edges. And likewise here going from here to here, we just add some more triangles maybe and going from this guy here to this guy here, we add even more triangles and even more simplices. And so they're all of them are nested. And again, I'm showing these Betty calculations here on the bottom. For all these thresholds, we can calculate the Betty number and at some point it will remain for a while. It will remain, we will have this whole and at some point everything is closed and then it will go down. So this is the idea of this nesting relationship and it leads us to something that is called a filtration. So recall that this Betty number of the data, it persists over a range of this threshold parameter. So there's quite a big range of the threshold parameter epsilon here for which the Betty number has the right and I'm using air quotes here, the right value because we would expect such a point cloud, we would expect a point cloud like this to be roughly circular at least if we look at it from certain scales, right? So for a long range of the threshold parameter epsilon, it has the right value. So how can we formalize this? How can we formalize that a topological feature like the cycle persists over a range of the parameters? To formalize this, we make a very, very simplifying assumption. Namely, we assume that the simplicies in the vatorious rips complex are added one after the other. So we just pick some ordering of the simplicies and add them in one after the other. And this gives rise to a filtration to a sequence of nested simplicial complexes, K zero, nested K one and so on and so forth until we end up with the original vatorious rips complex. And each of those Ki in this sequence is a valid simplicial sub-complex of the vatorious rips complex. And that kind of makes sense. We saw that if we increase the distance threshold, more and more simplices will be added, right? So in a way, there is always a valid value of a valid simplicial complex in there regardless of which threshold we pick. And now if we have this... Why would these all be nested? I mean, if I have one, for example, some points are making a complex because of my epsilon and some other are making a complex somewhere else. I mean, how is one nested inside the other? Let's say we have two points very much to the right and three points very much to the left then they are not really nested now, are they? And they all make one vatorious rips complex. They are nested if we just keep growing the threshold parameter. So if we start with a very low distance threshold and we just start to connect points that are very close to each other, and if we increase this parameter, then more and more points will be connected to each other. This is what I mean by nesting. Also K zero and K one are not, are those v epsilon that came for a smaller epsilon? Exactly, yeah. So if we increase the epsilon parameter, then we also add more and more information but we never lose any information that is part of a lower dimensional vatorious rips, not lower dimensional pardon me. Not, we never lose any information that comes from a lower threshold because once a simplex has been added to the vatorious rips complex, it can never be removed because its vertices satisfy a pair-wise distance less than or equal to epsilon. And if we increase epsilon now, then the smaller scale simplices that we've already seen they remain there because if something is connected already for a threshold of let's say 0.2, then it will of course also be connected for a threshold of 0.5. So this is just the consequence of the ordering, right? If something is less than or equal to 0.2, it's also less than or equal to 0.5, right? So it doesn't, so what I mean is that we never lose information that we've already seen at a lower scale. This is what I mean by the nesting. And this is what this graphic two slides before show shows? Yes, yes. So when we, when we- No, no, one, one, one before. Yes. This auto to the right, no, one slide, go. One slide back? No, forward, forward, forward. Forward, okay. Okay. One more. One more. Okay, now the one to the destruction creation one. This is what is each dot, is this? Each dot is a topological feature. This is just an intuitive view. We will formalize this later. Okay, sure. Each topological feature, each cycle, for example, could be created at one threshold, at one creation threshold, and it could be destroyed at one distraction threshold. So the point I'm just making here is that we can track topological features alongside this filtration. All right. So moving on. Oh, whoa. One question. Is every KI necessarily a Vittorius-Ribs complex? This filtration? In this filtration, yes. Because you can always find a parameter that gives you a Vittorius-Ribs complex for this. Although of course there's this simplifying assumption that it only differs by one simplex, but we can leave that away for practical purposes and we can just pretend that these are all the Vittorius-Ribs complexes that have been calculated for a certain threshold. We will see later on that we actually don't need this. This is more like the formal, the syntactical sugar in a sense to make these computations work. But in essence, later on we will never have to calculate more than one Vittorius-Ribs complex. We'll only have to calculate a big one and we will get the smaller ones automatically. This is kind of the intuition or the takeaway message from this sort of theorem. Since everything's nested, you get all the information already on the right-hand side. You don't have to look at all those individual complexes here. It's still confusing that these are called K and the other one is called V. Oh yeah, but this is by design here to be honest, because I want K to stand for an arbitrary simplification complex and I want V to represent a Vittorius-Ribs complex for a certain threshold. But I agree. The V is the big one. The K are the ones we get for free. Yeah, the K are the ones that we get for free, exactly. Yeah, that's a very good point. So I hope that I give lectures of this kind, maybe some at some other conferences as well. And then I should probably change this and make it a V throughout or something like that. That's a good point. So anyway, before we can finally reap the fruits of our labor, we have to make one jump back but maybe I'll skim a little bit over this because if you say that this was very confusing in the algebraic topology setting then it's not going to be a lot easier in this setting now but bear with me for at least a few slides. I'm just formalizing what we get essentially. So since we observed that these simplification complexes are nested in the filtration, so K i is a subset or equal to K j for i less than or equal to j. We also obtain a sequence of homomorphisms connecting the homology groups of essential complex, right? So what we learned in the last lecture is we could apply homology group calculations to all these individual simplisher complexes. This is why they're called K i because they are just a simplisher complex and we can calculate their homology groups. But they are also nested and this gives rise to a sequence of homology groups that are connected over this filtration. So they give, so you have two indices i and j here referring to the fact that you can connect the homology group in dimension P of the simplisher complex K i to the homology group in dimension P of the simplex K j. And so again, this works and we can reformulate this. And I thought about how an audience might perceive this and this looks a lot like abstract nonsense maybe, but the fact that we can reformulate these concepts that we already observed and we can reformulate them in the context of filtration, this illustrates that this is a very generic formulation that we are talking about. So we really, this makes a lot of sense and it's just a way to like formalize the idea of subset relations and of invariance between different maps. And I would urge everyone to look at this excellent blog post by Ty Dana Bradley. It's called, what is a functor? And this is an excellent explanation of what we're actually looking at here. And it's also, functors are also increasingly becoming important in machine learning by the way. But anyway, to finally to sort of describe what is actually going on here, we can now define a so-called persistent homology group along the filtration. So we take two indices E less than or equal to J and we define the persistent homology group H P index by I and J as the cycle group coming from K i modulo, the boundary group coming from K j intersected with the cycle group coming from K i. And again, now we see why I try to add this vocabulary in the previous lecture. This can only be read or only understood if you know what these boundaries and cycles actually mean. But for us, this should just mean that we have all the homology classes, all the topological features of the simple structure complex K i that are still present in the simplicial complex K j during the filtration. So the running example in your head should always be this cycle that gets increasingly closed by high dimensional simplices. And so while this process happens, while these cycles grow, you can calculate topological features and you can see how long they survive in a sense over the filtration. And the implication of this formulation is that we can calculate a new set of these homology groups alongside the filtration and assign each topological feature a duration. So previously, to make this loop back, previously we were only able to count, we could say, ah, this is a torus. It has, let's say, one connected component. But now we also have a duration since we have this concept of a scale at which to look at the simplicial complex, we suddenly incorporate geometrical information that was previously not available. But now we can say, ah, we have a point cloud. And if we view this point cloud at a certain scale, we can say that at this scale, it has this kind of topological features. And this is the novelty. This is the new thing that will help us to actually describe real-world datasets because real-world datasets don't come with proper scale information attached. We will always have to guess the scale somehow. But this guessing is not a mug's game. This guessing is smart because we assume that most of the real-world datasets that we encounter are actually samples from some unknown high-dimensional manifold, right? That's the whole hypothesis in deep learning or in image processing, for example. It's the fact that the things that we observe come from some unknown manifold. We just see the discrete samples, but we're interested in describing them. We're interested in figuring out their topological features. And for this, we need this scale representation. And this is where this persistence comes in, this persistent homology group. So not gonna torture you with any more details, but the takeaway message here should be, we can assign every topological feature, every class here that is created in some complex K i and destroyed in some other complex K j because, for example, the cycle was closed because there are too many high-dimensional simplities. We define the persistence as the difference in weights of the jth-simplisher complex and the i-th-simplisher complex. And we take the absolute value for good measure because we wanna make sure that this is always a positive number. And w in this case is just some weight function that assigns each simplisher complex of the filtration, but here described via its index, a weight. And this weight, you can think of this weight as, for example, the associated distance. So if we have this distance parameter epsilon, then wj could just be one epsilon parameter for one simplisher complex. And the idea is that the persistence of a topological feature measures the scale at which a certain topological feature occurs or at which it kind of endures or persists in this simplisher complex. And I'm not making this up this terminology. This is actually the people who invented this in particular one mathematics professor from Austria, his name is Herbert Edelsbrunner. He actually had this image in mind with this persisting. So this is why it's called persistence because the classes or the topological features, they persist over a range of thresholds over a scale of different thresholds. All right, so before we look at how to calculate this in practice, let me discuss two or three standard filtrations here. Most of them you already know, they're super easy to implement and they're super easy to extend to a simplisher setting. So that's really no problem at all. It's more like creating a shared vocabulary. So given a distance metric dist, such as the Euclidean metric, the distance filtration assigns weights based on pairwise distances between points. So if we take a simplex sigma, then we give it a weight of zero if this simplex is a vertex because a vertex does not have an assigned distance. So in essence, all the vertices are there at the beginning of the filtration for a threshold of zero. Whereas if we have an edge, so sigma is an edge between two vertices, U and V, we assign it a weight of the distance between U and V. And if for any higher dimensional simplex, we take the maximum weight of all the subsets of that simplex. So essentially, once we have defined the distances between pairs of vertices, we just extend that with a maximum to higher dimensions. And this gives us a simple way to sort simplices. Namely, we can use the ascending order of the weights. And in case of a tie, faces should precede co-faces. What I mean by this is, I mean to say that the lower dimension comes first. So if we have an edge, so one simplex and a triangle, a two simplex, and they have the same weight, then the edge must come first and then the triangle. This ensures that our simplex complex is built in a consistent manner because the idea is that you first have to add all the edges before you're allowed to add the triangles of the same scale. Think that kind of makes sense, right? It's the intuitive way as to how you would build your object. And just as a side note, this distance filtration has some very neat properties. It's very related to things like the Johnson-Lindenstrauss lemma, for example. So persistent homology turns out to be capable of preserving distances under random projections. This is kind of a very neat result from Donald Sheehy. There's also a different kind of filtration that you will encounter later on, in particular for the graph setting. So if we have a scalar function, f, that goes from the vertices of the simplex complex to somewhere else to the real numbers, for example. It could be a temperature measurement, but it could also be a degree. Then the sublevel set filtration propagates those weights through a simplex complex by setting the weight of a vertex to the function value at this point. And the weight of any higher dimensional simplex is set to the maximum weight of its faces. So it's, again, this is a way to ensure consistency of the complex. And again, we can sort simplices in ascending order of the weights, and in case of a tie, faces proceed co-faces. Conversely, so this is called the sublevel set filtration because it grows in this direction. Conversely, you can also calculate a sublevel set filtration by using a minimum instead of a maximum and sorting simplices in descending order of the weights. So this is the neat thing. There's all kinds of dualities happening in persistent homology or in algebraic topology. By duality, I mean that oftentimes flipping the sign of an operation will change the underlying computations, of course, but it will not change the meaning too much. But again, we won't have time to go into this, but I'm at least dropping some hints in the slides so that you can look this up in case you're interested. All right. So now we return to our previous example. It turns out that it's super easy to extend the boundary matrix calculation alongside a filtration because we can just calculate this while we add the additional simplices. So just as in the previous case, and just start adding simplices, we can see that it's a vertex so it doesn't have anything in its boundary. We add an edge, AB, it has two vertices on its boundary, and we do this for the other vertices as well, pardon me, the other edges as well, and so on and so forth. And here we added a triangle simplex, so a two simplex, and it has three edges on its boundary. So also notice that there is a neat shift going on in this calculation. So the two simplices have a boundary that consists of one simplices. So it always goes one dimension down. And this is our boundary matrix calculation. And now we can also take a look at how to reduce this boundary matrix by column operations. So it becomes, interestingly, let me give a brief side remark here. In the previous setting of computational topology, everything, or in algebraic topology, it was a little bit complicated, if you recall these calculations, was like, oh yeah, it's spanned by these simplices and we have to calculate this and that. But here it's, here this additional information of the filtration makes everything a lot easier. So we have one big boundary matrix to look at and we can just reduce it by a very simple algorithm that just takes a look at the lowest entry in every column, the lowest non-zero entry in every column. And the algorithm works like this. We start with a boundary matrix, M. And then we just repeat this while we find, as long as we want, as long as we find pairs to calculate, as long as we find columns to calculate, we just repeat this again. So as long as there exists a column I prime that is less than I, such that the lowest one in column I prime is the lowest one in column I and it's non-zero, then we just add the column I prime to the column I. But we do this addition again over the field of the coefficient zero and one. So if we add one plus one, we get zero. So this is why I have this strange O plus notation here and not the regular plus notation. This also works for different coefficient fields, as mentioned in the previous lecture, but we do it for this field here and because it's easier. So let's take a look at how this works. So a little bit too fast there. So we first take a look at this matrix. Let's see where we can find two columns that satisfy this. So here are two of those columns. So the lowest one in this column is the same as the lowest one in this column. But of course, this column here precedes this column here. So what we do is following this algorithm, we add this column to this column, modulo two coefficients. So if you do this, you can see the one gets canceled and the other one gets transferred here. So this is what we end up with. Again, this column is untouched. Now let's see whether we can find another of those pairs. There we have it. This column here has a lowest one here. This column here is a lowest one here. So now we add these two columns together. And as you can probably imagine, they will cancel each other out because they are the same. So this column vanishes completely. And now we don't have any other of those columns anymore. So we obtained a reduced boundary matrix. And this is the matrix from which we can now finally read off all the information about the filtration, all the information about the simplicial complex. And it's a very simple way of reading off these things. Namely, we look at the column I. If the column I is empty, then the corresponding simplex is a so-called positive simplex that creates a topological feature. So in this example, simplex A is positive. Simplex B is positive. Simplex C is positive. Simplex AC is also positive. So they all create topological features. I should have said that by empty, I mean that there is no one in there. There is no other simplex in there. So this is just an empty simplex chain, right? Because it doesn't consist of any simplices. There are no values in there. If column J on the other hand is non-empty and the lowest one of column J is K, then we call sigma J a negative simplex that destroys the topological feature created by sigma K. So in this case, simplex AB, for example, is a negative simplex. It destroys a feature and it destroys the feature created by, let's look in the lowest column, created by B. Or as another example, the simplex ABC destroys a topological feature created by AC. So again, notice this nice duality. AC is a positive simplex because it has no entries in its column. And ABC is a negative simplex that destroys the feature created by this simplex. Intuitively, this means the following. The addition of the simplex- I think both of you- Please go ahead. So the areas will not work. Sorry, I don't get that. Okay, maybe, sorry, I don't hear anything. Maybe let's move this question maybe to the Q and A if you want. Okay, so what I mean by this example is that the addition of the triangle, so of the simplex AB and C, destroys the cycle that we've previously seen. So in a sense, this closes the hole that we have seen. So if we only have the edges of the triangle, so only the skeleton of the triangle, then it has a hole. But if we add the triangle itself into the simplex complex, then this hole vanishes. And this is what this notation expresses. And notice that the algorithm is very simple to do. It just involves matrix operations, right? It just involves addition of columns. It can be super easily implemented and it just requires this boundary matrix implementation. So again, let's use an illustrative example and let's calculate the persistence of some features so that we can connect this. So here, our topological feature is the circle that underlies this point cloud here. And since this persists from 0.2 to 1.0, its persistence is 1.0 minus 0.2 is 0.8. So this circle has a persistence of 0.8 alongside this filtration because it's available here and it's not available here anymore. You have to believe me that it would be available for any smaller parameter than 1.0, but let's just assume that this is the case. So now let's formalize this maybe and let's see how to track those topological features correctly. So again, we saw that there are different types of topological features. We saw that there is in dimension zero, we have the connected components in dimension one, we have the cycles and in dimension two, we have the voids. So whenever we have a topological feature with associated simple shell complexes K, I, and K, J, so the creator and the destroyer of this feature, we store a point, WI is the weight function and WJ in a descriptor called a persistence diagram. We will see an example of this in a second, but first notice that it's possible that a feature is never destroyed. It's possible that a feature persists until the end of the filtration because of course, no one forces us to add a triangle, for example, to our simple shell complex. So in some simple shell complexes, there are holes that we can never close. And if that's the case, we assign such a feature a weight of positive infinity. There's also other ways of assigning it weights, but this is the simplest one. So how does that look in practice? So for our simple filtration here, we would take a point at 0.2 on the x-axis and 1.0 on the y-axis, meaning to indicate that this cycle here is created at a threshold of epsilon 0.2 here and destroyed at a threshold of epsilon equals 1.0. So this is all there is to it. This descriptor can carry all the information of all the topological features in all the dimensions if we wanted to. Of course, here we only have a single topological feature, but as you saw previously, there can also be more and it's kind of easy to extend this. Let's take another look at an illustrative example that only requires counting of connected components and of cycles. So suppose we have the following graph here and I really hope that I counted this correctly. So let's assume that it's right. If not, this has to be updated. But anyway, for epsilon equals zero, we have 16 connected components and nothing to do here. For epsilon 0.25, there's only 11 connected components because let's assume that those things are connected already. So we have to mark the destruction of certain connected components in this persistence diagram. All of them were created at epsilon equals zero. So now we just add a few points at epsilon at the coordinate zero comma 0.25. You can also see one disadvantage of the persistence diagram here, namely we have multiple points that we would have to add here, but the multiplicity is not apparent here. We'll talk about how to circumvent this later on, but for now we just see that there's like, there appears to be a single dot. The multiplicity is actually higher. Now let's move on. At epsilon 0.5, we have only one connected component. So again, we have to mark this here. We have 0.5 and we have 12 cycles. So I marked this here at the persistence diagram on the right-hand side with the dashed line indicating that at this threshold, suddenly cycles are being created. And if we now move on, we get even more cycles. We get more cycles, 57. And at some point, we don't know where to put them because now we're done with our filtration, but we only have a single connected component, but 57 cycles still. And so technically they should all be added at positive infinity, but positive infinity is a little bit too large for these slides. So I added all the cycles at a destruction threshold of epsilon equals one. And I set their creation threshold to be the dashed line. So moving back, we can see that at epsilon 0.5, some cycles are being created, hence a line here at epsilon 0.75, cycles are being created here. And at epsilon 1.0, other cycles are created. So again, a dashed line here. And since we're now done with the filtration, I'm setting all those destruction thresholds at a value slightly higher than one, or maybe it's one, depends on how you look at. So this is essentially the whole magic that is happening or the whole descriptor that is happening for a simple graph. We just take a look at how many topological features we have. And whenever this number changes at certain thresholds, we mark it in this persistence diagram. And the great thing is that in contrast to the way that we were doing this right now in an illustrative fashion, we can actually do this by just reducing this complete boundary matrix of the simplicial complex. So there's no need to actually count individual topological features for every one of those complexes, but we can get the whole reduction scheme by reducing one big matrix. So these red dots were created. These red dots are the points itself when we did not have any edges? Yes, every point creates a connected component. And at some threshold, I should say, because point is a little bit overloaded now. So every vertex creates a connected component. And if that vertex is connected to another connected component using an edge, we have to decide which connected component persists. So from two connected components, if you connect them with an edge, then there will only be one connected component. We make the decision based on some arbitrary criterion, but this can also be formalized. And then we add a point to the persistence diagram. So in case this might get a little bit confusing, but the cool thing about this diagram here is that its entries will always be points. So it will always be a two-dimensional diagram, potentially with infinity for the second axis. But everything that will be added here can be represented as a point in this plane. Does that answer your question? Yes, you could scale the epsilon axis to such a way that you transform it such that one is infinity. Yeah, you could also. I don't know whether that makes sense, but use a function then you could put it in there. Yeah, yeah, that would also work, yeah. And in fact, it's also, so this is just a convention, people said that, let's say the topologists are calling for using this infinity notation in order to distinguish between what they call essential features of a simplicial complex and non-essential features. But in data analysis, when we're dealing with real-world data, it actually makes no sense to think in that way because we almost never know what kind of manifold gave rise to the data set. So in some sense, all features are essential or all features are non-essential, right? So this is why most people just use a value that is beyond the normal filtration value to indicate the high dimensional, to indicate the features that cannot be destroyed in this filtration. But there are also filtration that automatically do this for you in a sense by doing different passes through the simplicial complex. There's a little bit more complex even. And so I did not add this here, but just bear in mind that all of this is possible. And of course, in data analysis, we have to make certain adjustments. But we will later on encounter a deep learning approach that actually makes use of those guys here. So it actually just takes the essential features and it handles them in a different fashion than the non-essential ones. So people have thought about this and it actually is really easy nowadays to integrate into deep learning architectures. We'll see that in a second. Anyway, moving on a little bit. So how does that work in practice? And I urge every one of you to try out this implementation because it's really fast, it's really great. The calculations for persistent homology for this V-toys-RUPS complex, they're essentially bounded by matrix reduction complexity. So since we have to reduce a matrix, there is no way to get around a certain worst case threshold. And in fact, you can calculate or generate examples that take a long time. But in practice, there are certain speed ups that you can make and smart implementations of this calculation really make a difference. So it's not, some people in the early beginnings of persistent homology or computation topology, people scoffed at that and they said, ah, well, a matrix has to be reduced. So we are at, let's say, O to the power of, no, pardon, we're at O, N cubed, where N is the number of simplices and so this will never scale. But this just turns out not to be true because in practice smart implementations make a difference and you can always gain some other speed ups by reducing your simplisher complex or by reducing yourself to certain features in certain dimensions. So you could say, oh, I only calculate features until dimension two, even though I have 10-dimensional data or whatnot. So in practice, this can be done and it works. And Ulrich Bauer created a very efficient tool that is a single C++ library called Ripsa and it's extremely fast. It even has a web interface so you can try it out yourself. There are certain speed ups that I won't go into details here, but essentially they involve different smarter orderings for the column reduction steps. So the idea being that if you figure out which columns you have to reduce in a certain manner, then you can also reduce them from your cache. And so the whole algorithm is faster because it doesn't have to jump between different locations in your cache. Then there are also implicit representations of this simplicial chains and those can be made much more efficient than explicit ones and much more. So there's really been now a lot of thought going into these calculations, I would say approximately 10 or maybe even more years of research. And it really is at a point where it scales relatively nicely, at least if we constrain the dimension two, let's say five, 10 or something like that. And in fact, in practice, it turns out that for graph classification, you don't need to go higher than dimension two, which we will see in the third or fourth lecture. So practically it's already there and it can already be used. Of course, deep learning itself, maybe something like a convolutional net scales, of course, much better because you have this embarrassingly parallel calculation that you can do over different GPUs, things like this. But I think it's only a matter of time before these sort of things are coming to the persistence realm. And I just wanted to drop this paper here because it's really remarkable. There's also Python bindings now, but I'm not sure whether they are endorsed by the author of this package because Ulrich took a lot of effort into making the C++ code really efficient. And I think if you compile that incorrectly, then it might be not so efficient anymore, but that's beside the point. All right, I wanna end on something for this lecture that is very dear to my heart, namely there are also ways to calculate topological features of non-simplical domains. So previously we only saw this triangular setting, right? So we added a triangle and then we added a tetrahedron and so on and so forth. But it's also possible to define a filtration over cubicle domains. I modeled this so-called cubicle complex here after Robert Grice's excellent book, Elementary Applied Topology. This is also something that I would recommend to anyone. I will also link it in the notes of this lecture. And in fact, it's free to read or something like 20 bucks on Amazon. So I would do both essentially. And these cubicle formulations or these cubicle filtration, they actually crop up in a lot of other domains in there. So there are potentially a lot of data sources, namely any image with pixels is a cubicle complex essentially or any finite element simulation that numerical analysts do is a cubicle domain or voxel-based spaces. So fMRI data or MRI data is a voxel space. And in fact, I even have a preprint out which you might wanna look at which does that sort of thing. So it shows you how to calculate topological features over cubicle domains and thus assess things that are going on in fMRI data. And the nice thing is that, and this is where again, you have to believe me but it all works. So all the considerations and all the concepts apply virtually unchanged. So of course, the fact that this works is surprising and should be celebrated. But it's also really good to know that all of these concepts translate over to other domains and you just have to remember this idea of calculating topological features at different scales. And this is all you get whether it's a cubicle complex, whether it's a simplicial complex, this is all you get in the end. All right, so with this, I wanna give you two current research directions in this field because these are questions that I get a lot. There are two things that people are most interested right now. One is the properties of filtration. So are there filtration that are more robust to noise that are easier to calculate that are more expressive? We'll see some examples of these issues in the next lecture. But there's also this other important question which always comes back to bite you when you're doing deep learning. Namely, can we find a sparse representation? So not all the simplices, not all the things that we calculate are equally important and there might not be sufficient memory to represent all of them. So can we find sparse filtrations that have fewer simplices but essentially the same topological features? Because if we could find those, then we could scale up the calculations to massive amounts of data and we could profit a lot. We could scale persistent homology to really big data sets and at a fraction of the cost that we currently have because I wanna be honest with you, of course. Despite the advances in complexity, there are still some very strange things happening as you move into deep learning, for example, with topology. So we'll encounter some of those in the fourth lecture where I'm showing off some work on topological autoencoders and there we have a very strange scaling property but more about this later. So the takeaway message and then we can move into the Q and A and into a short break. So point clouds can be converted into simplificial complexes and what we can get out of there is something called persistent homology. It's the multi-scale equivalent of simplificial homology that we saw. So instead of counting the number of topological features, we take a look at the scales at which they appear and we track them over different scales. And again, calculation of persistent homology also boils down to linear algebra. In fact, it's even easier for some reason. That's really neat if you think about it that making the underlying representation more complex makes the algorithm more easy. There should be a word for that. I don't know, the word I would use is surprising and essentially filtration, so these orderings are the key for tracking topological features. So again, that's all I have for you. Take a look at the website if you're interested in more and I would now open the floor for a brief Q and A. Yeah, sorry. I'm still confused about the filtrations and the relation to the rhetorical thrift complexes. So for example, on your slide 20, which is a bundle of a few slides, but if you go to the one, for example, which has A and B connected and C just being a vertex. Yeah, how is that? So I don't quite understand how this could be a notorious rips complex because I don't think I can find an epsilon such that I can connect A and B but not C. You are absolutely right. This is an example of where I said you have to think of adding simplices in some order and you have to define, you have to, in case of a tie, you have to add one simplex after the other. So I'm using this example, you're absolutely right. All these edges will be added at the same time in a sense at the same epsilon, this is what you mean. They will all be, let's assume that the triangle is really an equilateral, then they will all be added at the same scale. This is what you mean, right? Yeah, so I mean, I see how it's a value of filtration but I don't see if you add one at a time but I don't see how it could, how, yeah, I have a filtration only consisting of value of notorious rips complexes. Yes, no, you're absolutely right. This is where my sort of hand-waving explanation breaks down a little bit. I just wanted to make this easier for the listener. So of course there is a threshold that makes it into a notorious rips complex but the notorious rips complex contains all the edges of the triangle, that's true. Nevertheless, if we have a filtration, we can pretend in a sense that these simplices are all that, right? And we have to define, we just have to define which simplex precedes which other simplex. And then we can still do this calculation, right? Of course it's a matter of preference whether AB comes before BC, for example, that's still a matter of preference. But notice that this does not change the end result of the calculations. But you're absolutely right. It's like, this is where this intuition breaks down, yeah. Okay, thank you. So in this case you're adding, you say you add AB and you put a one and a one there because AB is destroying A and B. Right now, I'm not okay. So what's that before? Now you're adding, or now you're adding, at this point you're adding BC and you're destroying B and C components. Something like this. Actually, I have to correct you here. So this is just the boundary. So this is not about destruction. This is more about saying the edge BC consists of the simplices B and the simplex C on its boundary. So this is just keeping track of the boundary matrix. So this is more like evaluating the boundary operator for all the simplices that we observe. Okay, and then how does this reduction work then? Reduction is the other step. So if we take, let's now move here. So if we take this whole boundary matrix, this boundary matrix now captures all the boundary relationships in the, in the simplexial complex. And now we can start reducing it with the matrix reduction scheme. And this matrix reduction scheme, this would lead a little bit into very, very deep words into very, very deep theorems, but essentially it's related to this idea of a normal form, of a Smith normal form. This is like, I'm sorry that I can't give any more details here, but we would really have to uncover a lot of the topology here, but essentially it mimics this idea of looking at features that are created by certain simplices and of tracking them along the filtration. So basically we do the semi-magical method and then we get the result of the semi-magical method, gives us a matrix, which we can use to read something out of about the structures that are created and non-created. And why this magical result matrix is giving us this, these readings and why we are able to read this out of it as we do. That is what you said a little beyond the chorus. Yeah, I'm really sorry about this. This is really a little bit unfortunate because we would need to, then we wouldn't be, well, no, we would be finished by six PM anyway, but we wouldn't cover the interesting machine learning part. Yeah, sure, sure, sure. So this is kind of like the trade-off that I make, but you're absolutely right. No, no, don't be, I mean, this is, you're absolutely right. This is kind of the trade-off that you have to make here. I can only offer some intuitive hand-wavy explanation in terms of gaussian elimination, where you just pick the, where you just try to, you try to make this matrix into a matrix that consists of kind of independent columns and this is why you're looking for these lowest ones, for these lowest entries that are the same because having two entries, the same there is kind of weird because it means that you could theoretically express one column by the other column and you want to avoid that. So you want to find out how many columns there are in this matrix that are really independent from each other. Does that make sense? Does that help a little bit? Yeah, maybe sort of a gut feeling, yeah. Yeah, I mean, yeah, this is kind of the thing that I'm aiming for here because otherwise, I mean, we will, otherwise we won't be able to like skip to the actual part of the... So you're kind of getting, trying to get, find out where the independent features are like a circle and something like this and once you get to this matrix which kind of represents those independent features then you are able to read them out more easily from the matrix sort of. Yeah, yeah, exactly. So we're trying to look at how the filtration changes our space or how the filtration gives rise to certain topological features. And for this, we need to figure out which features are actually independent in the complex and which features are related using this creator-destroyer relationship. And usually this is a very large matrix and this Ripsa library is able to do that pretty fast but as I understand, not by changing the runtime complexity, right? But just being fast, but it improved the constants. Exactly, yeah, absolutely, yeah. That's a very succinct summary. So of course it cannot change the nature of the universe. So certain operations take a certain time but what it does, for example, is it tries to have a smart order in which columns are being added, things like this. And then you can have smart data structures that are keeping track of these things. You can also, in one of the implementations I think but I'm not sure whether Ripsa does it too, they have a delayed cancellation. So they wait for a certain amount of operations happening to a column before they actually look at what is going on in the column because they think that it's easier to do this. Let's say you add the columns 10 times to each other or something like that and then only after the 10th time you look at what is actually coming out of there. And if you do this, so all kinds of careful hacks I would say that are really well thought out and really smart and they make it possible to make this calculation really, really fly. Whereas if you try this in a very naive way, it's going to take a long time. But yeah, this is what Ripsa does for you. And the great thing is that it only requires a distance function on your data and we will essentially see in lecture number four where we have the machine learning applications, we will see that this is all you need to calculate persistent homology. You only need distances in your space. So you don't even need the actual objects. You only need a distance matrix. But the whole thing is then sort of quadratic in the, or it is quadratic in the size of the matrix. And since this kind of explodes with the number of features this is the problem, right? Yes. If I add more data points, I get sort of like probably something like exponential number of features and then it's square over that because the matrix is squared. So this is where the bad stuff comes in. Yeah, exactly. I mean, this is very much putting the finger in the wound. By the way, you should see this, you should see the lecture slides for number three now, right? Just checking. Yeah, we see them. Okay. So this, yeah, your matrices get bigger and bigger. And this is also what I meant by the scaling property. So this is a little bit different from the usual scaling that you would get in let's say batch processing for deep learning, right? So topological methods essentially scale worse with increasing batch size instead of better because you have to account for more interactions. Thank you. Yeah, you're welcome. Thanks for being so interactive.