 This is a good point. So my name is Bastion. I am a senior assistant in the machine learning and computational biology lab of the Department of Biosystem Science and Engineering at ETH Zurich. And it's my pleasure today to talk about topological data analysis for machine learning. So you're more interested in the subject than you are in me, I hope. But just one key aspect of my career is that I was or used to be a mathematician, but then I switched a little bit tracks and added computer science to my portfolio. And I'm doing machine learning since the start of 2018. And I'm particularly interested in topological data analysis because I used to do a lot of topology during my mathematics studies. And I think that this is an exciting topic that can be extremely helpful for machine learning research. And we will see why that is in the course of this lecture. I'm going to start from the very basics here. We're starting with algebraic topology. So you don't need any knowledge whatsoever, except for maybe a few courses in high school mathematics or something like that. But I tried to make this course as self-contained as possible. So without further ado, these slides are available on topology.rocks. Slash ECML underscore PKDD underscore 2020. You can also just go to topology.rocks. I think I configured that correctly. There you'll find other resources. The slides and the resources are continuously being updated. So if you find any errors, then I will, of course, rectify them and make a new version. And I would love to hear from you about this course. If you have feedback or any questions during the course, I will, of course, not be able to answer any emails. But afterwards, I would be happy to answer any questions that you have. So I have an email address. You can also reach out to me on Twitter with a handle at pseudomanifold. And the slides are freely available. They can also be freely disseminated. So they are under a Creative Commons license. And if you want to share them with your friends, with your colleagues, be my guest. I can also make the sources available on request, if that is of interest, because sometimes people ask me about pictures there. And apparently, the pictures are pretty. So you can also have them whatever you want. All right. So with that out of the way, let's get into algebraic topology. And more to the point, let's get into computational topology. Because algebraic topology is a sub-brand of mathematics that deals with shape analysis. And computational topology is the part of algebraic topology that tries to make these things actually computable on an actual computer and not on a piece of paper. And the classical example that many of you have by now already seen is this idea of distinguishing between different spaces. So in algebraic topology and also in computational topology, we are interested in figuring out what makes a cube. So this thing on the left, fundamentally different from, let's say, a torus. Or more to the point why a cube and a sphere are more alike in some sense than, say, the sphere and a torus. So again, let's focus on this cube. The cube has some edges, and it's certainly not smooth. But in the end, it shares some characteristics with, let's say, the sphere. Because both of them enclose a certain kind of space, and they don't have any holes by themselves. Whereas the torus has at least this hole in the middle. And as we will see, it also has some other characteristics. So without any further information, we can tell a torus apart from a sphere by just figuring out that one of them has a hole in the middle and the other one doesn't. Of course, that's kind of obvious to us human beings. But it is certainly not obvious from the point of an analysis step from the point of something that we want to calculate. So in essence, we're asking ourselves which qualities of the sphere make it different from the torus. And one of the qualities of the sphere that we can look at is the so-called beti number. It's named after a mathematician called beti. So it's not beti. It's really beti if you want to pronounce it in the Austrian way. And the deep beti number counts essentially the number of d-dimensional holes. You will see in a second what that means. So if you have this information, you can use it to distinguish between different spaces. And it turns out that this beti number actually has a very intuitive explanation in lower dimensions. And you've all encountered at least some of those components before. Namely, the 0th beti number corresponds to the number of connected components in a space. The one-dimensional beti number corresponds to the number of tunnels in a space. So a tunnel being something that you could roughly use to put your finger through, as in the torus, for example. And the two-dimensional beti number, beta 2 here, corresponds to the number of voids that are enclosed by a space. So if we now tabulate those beti numbers, and of course, this lecture is all about how to calculate them, and we will see how beautiful this is. But if we now tabulate them just by our intuition or by our human understanding, then let's see whether we can distinguish some of those spaces already. So this thing here is just the point. It's a little bit magnified, so it's a little bit bigger than you would usually draw a point. But it's a point meaning that it has no shape information whatsoever. It's just like one single dot in the plane somewhere. And obviously, I would say this constitutes one connected component. So the beti number in dimension 0 of the point is 1, because there's only one point. If there were two points, the beti number would be 2. But there's only one, so it's 1. So now let's move up the ladder a little bit. Let's take the cube. The cube also has one connected component, because you can move from any face to any other face. You can move around on this object, and nothing will change. You will always remain on the same object. Hence, it also has a beti number of 1 in dimension 0, so one connected component. But it turns out that there's more. It certainly has no tunnels, at least the cubes that I'm familiar with don't have any tunnels, because otherwise they might not be usable as dice, for example. But it certainly encloses something, at least when we only consider the outer parts of the faces of the cube. Then it encloses one void. And hence, it gets a two-dimensional hole, and beta 2 is 1. So now we can move even higher, and we go to the sphere. And interestingly, in this paradigm, with this view, the sphere is completely the same, at least in terms of the beti numbers, because it also has one connected component, no tunnels, and one void that it encloses. So nothing ventured, nothing gained. It's essentially the same. And this tells you a lot about the kind of analysis that we will be doing in this lecture. Namely, we are really looking for very, very fundamental properties. Topology is not about the geometrical shape aspect that you would expect, because obviously the curvature of a sphere is different than the curvature of a cube, because the cube has these sharp edges that project out, the ones where the different faces meet. Whereas the sphere has no such thing. The sphere is a completely smooth object and has the same curvature all over the place. But this is not what we're looking for here. So in this lecture, or at least in this part of the lecture, we will only be looking at extremely fundamental properties of a space, namely just the connectivity and just this number of holes that we can find. All right, now moving on. Now towards the main elephant in the room in a sense, namely the torus. So the torus and I think we all agree here is also one connected component. You can move from one point to the other and everything. It's the same. Let's skip a dimension for a second and let's talk about the hole that it encloses. So the right intuition that you should have here is that of a bicycle. And the tire of the bicycle contains some air and like a tube, a hollow tube that contains some air so that the bicycle can actually be used as a bicycle. And hence, like this bicycle thing, the torus also contains one two-dimensional void. But there's more. There's something very fundamentally different from this sphere. Namely, the torus has at least one tunnel that we can find, namely this one here, because you can use the torus as a ring and put it on your finger if you want. You can do this with a bagel and try to eat the bagel like this. It's a marvelous way of eating a bagel. You just have to be careful not to eat your finger. And so this accounts for at least one of the holes, one of the tunnels in dimension one. The other one is a little bit harder to see. Maybe if you already know this, you've guessed it, but the other tunnel is created by moving around like this on the torus. So you can move around the surface. And if you think of it like this, you can put it on your finger, so that's one tunnel, but you can also use a piece of thread to put it on a Christmas tree as a Christmas tree ornament. It's a very geeky ornament, but it works. And this thread has to be looped around like this. And this accounts for the other tunnel. Now again, you may ask yourself, why is there not an infinite number of such tunnels and you're in fact right? But the thing is that we're only interested in qualitative differences. And there are only two of those tunnels that are qualitatively different on the torus. The one being this guy here, and the other one being all the ones that loop around like this. And we will see a more precise way of expressing this fundamental property in a minute in the rest of the lectures. Right. Why do you say that a tunnel is one dimensional and a void is two dimensional and a connected component is zero dimensional holes? This is more a little bit like the convention. We will see why this makes sense in a few seconds. Once we have introduced a way to actually capture those holes, the idea is that these holes are kind of capturing hierarchical properties of an object. So you need essentially for an object to have a void, you need it to be a different shape. You need it to be more complex because it needs to enclose something, right? Whereas for an object to have a connected component, you only need some thing, some piece of that creates this object, you only need this to exist like a point. So creating connected components is in some sense easier than creating voids. But we will see, we will formalize this in a second. Does that answer the question? Do some of this? Yeah, that's good. Okay, great. All right, so again and again, feel free to interrupt at any time. We can make this as interactive as possible. That's why we're here. All right, so let's move on and let's see what is on the agenda to actually formalize this. So in this lecture, we will be primarily looking at a very simple recipe as it is. Namely, we will be using a so-called simplicial complex to model these spaces because I said in the beginning, it's about computational topology. So we need some way to actually compute stuff, right? We will do this by defining so-called boundary operators and maps. And if this sounds very complicated, do not be afraid because it will all boil down to simple, simple matrix algorithms. And in fact, we will be able to show how to calculate these BT numbers using matrix reduction algorithms. And that will bring us roughly to the, I would say to the end of many concepts in algebraic topology, it might be a little bit heretical and maybe I should censor myself if I really want to put this lecture online. But this is essentially a lot of the things that have been developed for algebraic topology boil down to these things. It's not everything boils down to the whole calculation that I will show in this lecture, but a lot of the things in algebraic topology boil down to something where it can actually turn the crank in some sense and get a number out, get a calculation out. So that's the whole point of algebraic topology. It's different from point set topology or differential topology. And algebraic topology really try to reduce complex geometric questions to questions of algebra, to questions of can this matrix be inverted? Can I run this algorithm here? What happens if I reduce this matrix like this? Things of that sort, which makes it a prime target for an implementation in computer science. All right. So having said that, what is a simplisher complex? Simplisher complex is a non-empty family of sets K with a collection of non-empty subsets S. And we call that an abstract simplisher complex if the singleton sets V are in this family of sets for all sets in K. And if we have two elements sigma and a subset of sigma tau, then tau also needs to be in K. This sounds a little bit abstract, but we will see an example in a second. What this essentially means is condition one tells us that the vertices, so the singleton sets are part of the simplisher complex. And whenever we have an object that is already part of the simplisher complex and we take one of its subsets, then this subset also needs to be part of the simplisher complex. That's a condition that ensures that the simplisher complex is closed under taking faces. It's closed under this relationship. And we call the elements of the simplisher complex, we call them simplices. And we note that a K-simplex consists of K plus one vertices. So the vertices themselves, so the single points, if you will, they are the zero simplices of a simplisher complex. The edges are the one simplices of the simplisher complex and so on and so forth. And this is how it looks in practice because we can decompose these simplisher complexes into their skeletons. So we can build them from the ground up from different dimensions. So this is only all the points in dimension zero. And now we can start to add the edges. And this is what you would probably call a graph. And if you think that this is a graph, you are absolutely right because one dimensional simplisher complexes and graphs are essentially the same. They consist of edges, they consist of vertices, you can do things to them. And for the purposes of this lecture, at least, we will treat them as exactly equivalent. And this will be very, very useful later on in part four where we will be able to see how to use topology to classify graphs, for example. But moving on through the hierarchy here, we can also add triangles. So these are some triangles here. Notice that I didn't add all the triangles here because I didn't want to make this more complex than it has to be. And we can also add tetrahedra. So now this is why I colored this in a different color. You have to bear with me for a second. This is actually the tetrahedron in the sense that it consists of four triangles themselves and the enclosing tetrahedron. So it's not only the triangles that we're adding, but we're actually adding one simplex that consists of all the triangles. So this makes it great for modeling things in a hierarchical manner. And in case you're familiar with this sort of perspective, think of cliques in a graph. So this is nothing but adding a different element for every clique in the graph. So this is a two-click here and this is a three-click here and so on. All right, now a non-example. And this is, I know this is pedagogically, maybe not the best thing to do, but this is a non-example. It's not a simplex complex because the higher dimensional simplices do not intersect in a lower dimensional one, or rather we have, for example, this triangle here. And we're not sure whether this is now an edge of the simplex complex or whether this is an edge of the simplex complex. And there's a triangle that does not contain this point here and there's an edge that intersects with certain things and that has no intersections here. So this is a non-example. This is not what we mean by simplex complex, but rather simplex complex has a beautiful, has a neat hierarchy. Think of it like a triangular mesh, for example, if you're familiar with that sort of terminology. All right, there are more examples, of course. So as I said, graphs can be considered low dimensional simplex complexes. Simplex complexes can also be obtained from point clouds. This will occupy us for quite some time in the second lecture, starting at around, I don't know, three years, I guess. And we can also convert a hypergraph into simplex complexes by adding more relations. So if you're familiar with the terminology of a hypergraph, a hypergraph is essentially what you would get if you take away the condition of being closed undertaking phases. So a hypergraph can model arbitrary relations between its different vertices, but it doesn't have the added property that all the phases of a simplex are part of the simplex complex. But there's a way to generate one from the other. So just to mention this. All right, and now for some mathematics, we need some concepts that will be helpful later on. And maybe you've already heard about this. Maybe, I don't know if it can get a quick show of hands. No, probably not. This won't work. But anyway, I'm assuming that you haven't heard of it. So we need groups to understand how simplex complexes can be used and how we can calculate their holds. A group is a set G with a binary operation that we denote by a dot. And this operation combines two elements to yield another one, such that the resulting structure G and this dot has the following properties. The operation needs to be closed. That means for A and B from the group, the operation A dot B needs to be part of the group. This operation also needs to be associative, meaning that we can change the parentheses as we want. So A dot B dot C is the same as A dot B dot C for all elements of the group. Furthermore, there needs to be an identity element E in the group such that E dot A is the same as A dot E is just A. And moreover, we also need to have an inverse element, A inverse or A to the power of minus one in this group, such that A dot A inverse is the identity element. And likewise, if we change the order, the identity element is also sometimes called the neutral element of the group. And notice that this operation is not necessarily required to be commutative. So in general, A dot B is not the same as B dot A. If this holds however, if the operation is commutative, then we call the group Abelian after the mathematician Abel, who was able to derive those groups and who was able to describe those groups, two puns in a row. And you all experienced groups before. There's a lot of examples around, for example, the classical one that you definitely will have encountered in school is the set of integers Z and the set of integers Z with the usual addition is a group because there are positive integers and there are negative integers and there's zero as the neutral element and every integer has a inverse element that is obtained by just flipping the sign. A little bit of a more complex set would be the set of R-valued quadratic matrices with element-wise addition. This is also a group, so you can add two matrices together if they have the same dimensions. You can add them element-wise and you will get another matrix. So it's a matrix, the neutral element of this group will be the matrix that consists only of zeros and of course, since it's R-valued, you can get the inverse elements also. Now the set of R-valued quadratic matrices with matrix multiplication as the operation is also a group because if two matrices have compatible dimensions, then the resulting operation is for this operation results in another matrix and so this is also well-defined and here the neutral element is the identity matrix. Now one non-example and maybe I'll ask this as a participant question. The natural numbers, however, with addition they do not form a group. Does anyone care to venture why this is the case maybe? Also you don't have the negative numbers? Perfect, yeah, exactly. You don't have an inverse. So because the structure doesn't contain them. So if we would add the inverse of all those elements, then we would end up with a set of integers but that's exactly right. So this is why in mathematics, there's this hierarchy of things that you can define and then it's not a group, it's something else but we want to work with groups because groups are our need and in fact, we wanna work with commutative groups but anyway, let's see. So now back to the simplicial complexes, why did we define this? So we define this to talk about the elements of a simplicial complex in a more formal manner, namely, given a simplicial complex, we define the so-called chain group in dimension P to consist of all combinations of P-simplicies in the complex. And we assume that the coefficients of this operation are in Z2, this means that all the elements are either of the form some of some simplices or not. So basically, Z2 refers to the group that only has two elements, zero and one and the group operation that we define over this chain group is the addition with Z2 coefficients. And this sounds a little bit weird but it's very convenient for implementation purposes because we can implement the addition of two of these simplicial change as a so-called symmetric difference. I have to admit here that this is where we boil it down a little bit for a lecture format in four hours because other choices of these coefficients would be possible. And in fact, it's also possible to have coefficients in certain fields, for example, in the fields of real numbers, things like this. But we're not covering this here. So in this lecture, everything is done over Z2 coefficients. So either a simplex is there or it's not. And we need these chain groups later on to algebraically express the concept of a boundary in a simplexial complex. So first, before we do this, let's first take a look at some valid simplexial chains of a simplexial complex. So suppose that we have the following simplexial complex that consists of A, B and C. So the vertices consist of the edges A, B, C, A, C and of the triangle A, B, C. And now let's look at some valid simplexial chains that we can build. So one would be A, B, the other one would be A, C. The next one would be B, C. Then we can add some together. And notice that this is what mathematicians like to call a formal addition. So it doesn't really mean that we can add them and we will end up with some result as over the natural numbers or over the integers, over the natural numbers or over the integers, but it just means that the chain group contains all of those combinations. So in essence, these things are the members of the chain group, but they cannot necessarily be evaluated. However, we can of course put two of them together. And for example, if we were to add, let's say this chain here, A, B plus A, C to this other chain here, then you would see that these two simplexies occur twice and so they cancel each other out because recall, we are calculating over Z2 coefficients. So there is only the chain is either there or it's not there. And if we added twice, we remove it essentially. This is what I meant by the symmetric difference of two simplexial chains. All right. I think I might be seeing some questions in the chat. Let me check this. I don't monitor this directly. I have to, maybe we can take those to the Q and A session as well. So I now have the chat window open on my secondary monitor and I can monitor this a little bit. But if you have something that you want to ask during the lecture, you can also just unmute yourself and then we will be tackling this. All right. So moving on with the chain groups, we can use those to define something called a boundary homomorphism. So given a simplexial complex K, the boundary homomorphism in dimension P is a function that assigns each simplex sigma to its boundary. And this is a very simple operation again. Notice that we will be summing with Z2 coefficients. So it will be either there or it won't be there. And we will just take the simplex sigma which consists of those vertices here. And we will take all the subset sums of this form and we will ignore the ith vertex in this sum. So vi hat means that we leave out the ith vertex. This seems like a very strange notation but what it boils down to is that we take our simplex and we take all the sums, the formal sums over the subsets that do not contain the ith vertex. And this is a function that goes from the Pth chain group to the chain group in dimension P minus one. And it's a homomorphism between the chain groups. I'm not proving this by the way. This is actually kind of a very deep result in algebraic topology. A homomorphism means that it behaves linearly like a linear function. But you have to trust me on this. It is a homomorphism and this will make certain calculations a little bit easier. And of course, another caveat here. In our case, the boundary homomorphism is relatively easy because we don't have any signs to take care of but if we had a different coefficient set we would also have to take care of the signs in this equation. So just so you know, this can be made more complex but it doesn't have to be more complex at least not for this lecture. All right, so now an example because that's more elucidating than the algebraic properties in there. I want the take home message to be that this kind of coincides with our intuition and not something else. All right, so taking again our example simplisher complex that contains everything including this triangle here. Let's take a look at the boundary of the triangle. So I just saw that this is unfortunately missing one other simplex. So it should actually be the sum should be BC plus AC plus AD so it should be the sum over all the edges as here. This should be the result of this calculation. And we noticed that this is a non-trivial boundary by which I mean that the sum doesn't go to zero. However, when we now apply the boundary operator twice we applied second time to this simplexial chain here then we noticed that this becomes a simplexial chain that is trivial because we obtain all the vertices twice here. So BC evaluates to C plus B, AC evaluates to C plus A and AB evaluates to B plus A. And if you notice this, then you can see that all the individual vertices appear twice and so they cancel themselves out and we are left with a zero dimensional chain or a trivial chain. And this is always denoted by convention with zero. This means that it's the empty chain by convention the idea is that people know which zero is being talked about because there could be a zero in dimension one and zero in dimension two and so on and so forth but I'm just using zero here to tell you that nothing else is going on. All right, so you can see that this kind of coincides with our idea of the boundary because if we would take this triangle here AB and C we would say, ah, perfect we can walk around this and we can say that we first have to walk around this edge here and then this edge here and then this edge here. And this is exactly what is expressed by the idea of the boundary operator. So we first say, for example, we move to BC that would account for the first edge, then we move to AC and then to AB. Notice that the orientation does not make a difference here because it's just sets but again, you could also do it orientedly but that's beyond the point. Anyway, what we can now do, we can, since we have the repeated application of the boundary homomorphism to be trivial, we can state that boundaries do not have a boundary themselves and this makes it possible to define a so-called chain complex. By the way, as a fair warning, I think this is the most algebraically dense of the lectures, the remainder will be a lot more hands on and we'll cover a lot more examples but we first have to get these fundamentals somehow square away and I want you to have like an intuitive understanding of what is actually going on. So anyway, this chain complex is the complex of chain groups that we obtain by evaluating the boundary operator over all dimensions essentially and this leads us to a digression that we need to understand before we can finally codify the idea of a whole in this complex. And for this, bear with me for a second and then everything will become clear, I hope. For this we require the kernel of a homomorphism of a linear group. The kernel of a homomorphism F going from some group A to some other group B is the set of all elements that are mapped to the zero element. So the kernel is just the set of all A from A such that F of A is zero. And this is a subgroup or a subset of A because it contains those elements that get mapped to zero. The image of F on the other hand is the set of all its outputs. So the image of F is just F of A for all A in A and this is a subset of B. So kernel and image are two ways of looking at a homomorphism. And with that squared away we can finally calculate some interesting groups from our boundary operators. Since the boundary operator is a homomorphism we can define its kernel and its image. And we call the resulting groups the cycle group and the boundary group. More specifically, the cycle group ZP is the kernel of the boundary homomorphism in dimension P. And the boundary group in dimension P is the image of the boundary homomorphism of dimension P plus one. This is not a mistake here. The idea is that a boundary is something that comes from a higher dimension and a cycle is something that is created in one dimension. And notice that in the group theoretical sense the boundary group is part of the cycle group. In other words, every boundary is also a cycle. This is what we saw earlier. We call when we calculated the simplicial chain of the simple simplicial complex that we had we saw that if we apply the boundary operator twice then you would get a zero simplicial chain, a trivial simplicial chain. So in other words, every boundary is cycle. And again, we cannot cover all these things here, unfortunately, but this is a really fascinating set. And if you want to look it up I have some literature on the website for this. The fact that these sets are actually groups this is a consequence of some very deep theorems in group theory and it's really fascinating but I won't have time to cover that. And in fact, we will need even more things to finally fully describe what is going on in algebraic topology in this lecture. And this is why I cannot give a lot of the proofs here, unfortunately. So you have to believe me to some extent, at least. But if you find an arrow, please let me know. All right, so having squared away those kernels and those images, we now need one other definition to make for all of this to make sense. Namely, we need a quotient group or a normal subgroup. And again, this sounds very formal but we will see that this works out and that it actually corresponds to something that is extremely intuitive. Namely, take a group G and let N be a subgroup of that group. And then we say that N is a normal subgroup if this expression here, so if GNG inverse is part of the normal subgroup for all G of the group and for all N in the subgroup. And for an obedient group, every subgroup is normal. So we don't have this extra definition because we can just flip the order of things around. Now this is really neat because all the groups that we have considered before, they are abelian because the way we described the addition over a simplificial change, it does not really matter whether we first have A plus B or whether we have B plus A because it's a formal addition anyway. And we just look for two factors or two simplices that cancel each other out and we are not restricted to the way we describe them. We're not restricted to any ordering. So it's an abelian subgroup, it's commutative, which is great. Further on, if we have such a normal subgroup and then we can define the quotient group of this group as G modulo N, this is how we write it. And this consists of all the tuples of the form GN for G in the group. And this petitions G into equivalence classes. So in some sense, the intuition that you should have here is you remove certain parts of the group. So G modulo N consists of all elements in G that are not in N. So you take away N, you take away this normal subgroup, you make it trivial. And this will finally lead to something that we all know and that we've all worked with. So take, for example, the subgroup 2Z, which is of course a subgroup of the integers. That's just the subgroup defined by being multiple of two, right? So hence, if we remove the subgroup here, Z modulo 2Z, then it consists of only zero and one because this is the only thing that is left. And why are we doing this? We do this because a quotient group makes it possible to reduce a group by partitioning it into equivalence classes. And these equivalence classes are defined by another subgroup in the end. And this will be extremely, extremely useful. All right, and now finally, we can put all of this together. So the so-called homology group in dimension P is a quotient group defined by removing cycles that are boundaries from a higher dimension. So we define this group HP by ZP modulo BP. So by using the kernel of the boundary operator in dimension P and removing the image of the boundary operator in dimension P plus one. And this is really a group. Again, the fact that this is the case is a consequence of a lot of other theory. But with this definition, we finally have a way to calculate the BETI number. The BETI number becomes the rank of this group. So the BETI number is the rank of the Pith homology group. And the rank here is just the generating set of the smallest cardinality. Or if you're familiar with this other notation, it's just the basis of the group. We will see how to calculate this. I mean, basis is kind of an overloaded term, but I think it should be clear. And anyway, the intuition behind this, and this is the main takeaway here. We calculate all the boundaries, then we remove the boundaries that come from a high-dimensional object, and then we count what is left. So let's see how this looks in practice. So let's take a Simplisher complex here and notice that it's a little bit different than the one before. So this Simplisher complex only contains the vertices, A, B, C, and the edges, A, B, C, A, C. It does not contain a two-simplex. So it does not contain this triangle itself, right? It just contains the edges. And now we will see how to calculate the boundary matrix of this Simplisher complex and its homology groups. And then we will finally have this connected to something that we're familiar with, namely matrices. All right, so first of all, the boundary matrix, you can think of it as a matrix that just is indexed over the individual simplicies of the Simplisher complex. And it contains a zero if the respective simplex does not occur in the boundary and it contains a one if it does. So it's a square matrix defined by simplicies times simplicies. And we just mark a one or a zero. So first let's do that. For the vertex, it's kind of easy. A vertex does not have a boundary because there is nothing for it to be a boundary. So it's a zero column. Likewise for B, likewise for C, all is zero because that's easy. Now it gets a little bit interesting. We add the first edge and since it's an edge, AB, its boundary is defined by the Simplex A by the Simplex B, right? Because the edge AB consists of the two vertices A and B. Similarly for the edge B and C. And similarly for the edge A and C. Consists of the two vertices A and C and so we mark a zero here. All right, so this is the boundary matrix of our Simplisher complex. And this gives us the first hint as to what we can do because now we have a matrix and so what can we do with matrices? Well, we can do all kinds of crazy things. We can do things like Gaussian elimination. But what we want to calculate here is we want to calculate the kernel of the boundary operator in dimension zero and the image of the boundary operator in dimension one. This is all that we require to calculate the homology group in dimension zero. We call that H zero is defined by Z zero, modulo B zero. So this is all we have to calculate here. And it turns out that this is relatively easy to do because if we go back to the previous matrix, the kernel of Z zero is just the span, so the algebraic span of the vertices A, B and C because each one of those simplices is mapped to zero. So we already know that they all will go to zero so this is easy. And since we cannot express any one of those simplices as a linear combination of the other simplices, this means that our cycle group in dimension zero is just defined by three copies of the group Z, modulo two Z or Z two, if you like that more and there's nothing else because we can always add those three simplices together but we cannot express any one as a linear combination of the other one. And similarly, we cannot do this for the boundary groups in dimension zero. We have to calculate the image of the boundary operator in dimension zero which is the span of the three simplicial chains that we find here. However, notice that you can express this, one of the simplicial chains as the other. So we can take, let's take this one here A plus B and let's add B plus C. And you can see that the B cancels itself out because we are calculating over Z two coefficients. And so we are left with A, we are left with one simplicial chain A plus C which we already have. But this means that there are only two independent elements and so the image of this boundary operator is actually just the group defined by two copies of Z modulo two Z. So for the cycle group, we have three copies and for the boundary group, we have two copies because we can express one simplicial chain by adding two of the others. But this means that we can now calculate this similar to what we know in school and I really have to say that this is not normal division so it's a normal subgroup calculation. So this is why we're allowed to do this. But anyway, we can do this. So by definition, if we want to calculate the zero-dimensional homology group, we define Z zero modulo B zero. And this is three copies of Z modulo two Z modulo, two copies of Z modulo two Z. And so we're left with one copy of Z modulo two Z and hence the bitty number in dimension zero is the rank of this group and it's the group that only contains two elements. So it's rank in the algebraic sense must be one. And so this whole complicated calculation tells us that this simplicial complex has a single connected component. So all of this work just for telling us that a triangle has a single connected component. So clearly this needs to be done, this needs to be done a lot more automated, right? Because otherwise we would not be able to calculate more complex data sets, more complex structures. But bear with me for a second because we will do the same thing now for the cycle group in dimension one, pardon me. And then we will finally see how to automate it and thus we will be ending the lecture on a positive note. So again, to compute the one-dimensional homology group we need to calculate Z one. So the kernel of the boundary operator in dimension one and the image of the boundary operator in dimension two. And so to calculate Z one, we just verify what is going on. We say that Z one is the kernel of this operator here. It's the span of AB plus BC plus AC. And you can verify that this is the only cycle that we can calculate in the simplest complex. The only cycle of dimension one, I should say. And we can verify this either by inspection or by pure combinatorics. So hence Z one contains only a single cycle. And B one is luckily very easy because there are no two simple C's in K. It's almost as if someone had planned the example that way. And so B one is the image of the boundary operator in dimension two. And this is just the trivial simplest chain or you could also say the empty set. And now something very, very weird. And this is why I did the dangerous bend sign here because this is not normal division. Don't do this at home. Don't do this if you're doing your algebra homework or whatever, don't do it in your code. But here we are allowed to divide by zero. How great is that? Thanks to normal subgroups. So by definition we want to calculate the one dimensional homology group which is defined by Z one modulo B one. And this is Z modulo two Z modulo zero. And so what could that be? Well, zero is the group that contributes nothing to our calculations. And so we're just left with the original group here with Z modulo two Z. Hence, beta one, the BT number in dimension one is the rank of the homology group and it's also one. And finally, the intuitive view of this is that our calculation tells us that this simplicial complex has a single cycle. So again, all of this work for telling us something that seems to be obvious, but again, this is how you would have to do it before matrix reduction algorithms. This is what topology is actually doing with more complex examples. Of course, they're just looking for some ways to express these operators a little bit in a smarter fashion. And let me again mention this. This is one of the few situations where division by zero is actually well defined in mathematics because following the definition of the quotient group, we are not removing any elements from the group at all. And so we're allowed to write it like this, but see this a little bit as abstract nonsense if it helps because it's not usual definition. All right, so how do we finally do it in practice? This is more like an outlook because we will be covering the actual calculation algorithms in the next lecture, but the basic idea of doing homology calculations in practice involves something called the Smith normal form. It takes an n times m matrix with at least one non-zero entry over some fields. So notice that we are going to an arbitrary field here. So it could be the real numbers, it could be the rational numbers, it could be Z modulo two Z. Then there are invertible matrices S and T such that the matrix product S times m times T has the following form. So it's a triangular matrix with nothing but zeros at the upper diagonal. And at some point it starts to become zero on the diagonal again. And all the entries bi have to satisfy that bi is greater than or equal to one. And they have to divide each other. So bi has to divide bi plus one. We're allowed to talk about division here because we're over a field and all bi are unique up to a multiplication by a unit. So that's the Smith normal form. I'm not going into details how to calculate this normal form because this is a little bit more like algebra and more algebraic topology, but we actually wanna move to higher dimensions in the second lecture and this there, we won't need this calculation, but just so you're connected with this. And the way you would do this in practice, you would calculate the boundary operator matrices as I showed you, you would bring each matrix into the Smith's normal form which is similar to a Gaussian elimination. And then you could read off this description of the Pth homology group. And you could read it off like this. You could say that the rank of the cycle group is the number of zero columns of the boundary matrix of the boundary operator in dimension P. And the rank of the boundary group in dimension P is the number of non-zero rows of the boundary matrix of the boundary operator in dimension P plus one. And so this gives you a very neat way to calculate those two things together. All right, now to end this lecture on a positive note, after all these very abstract definitions, let's take away that homology groups can characterize topological objects by the Betty numbers. They can somewhat, okay, the easily maybe stresses a little bit, but they can be easily expressed as linear operators. So I mean everything that we did boil down to a computation of maps between matrices. And essentially the calculation of the homology groups themselves can be boiled down to linear algebra. And so this concludes the first lecture. You can find the slides and some more information on this address. And maybe it's already a good idea to take some questions. I'll stop sharing for a second and use the second lecture notes. All right, are there any questions? I have to admit I'm already a little bit lost. Can you give sort of an idea what we need for the following up, which is totally essential or maybe like an intuition that I and maybe other people who are also lost can move on without being lost for the rest of the lectures? Oh yeah, of course. So I could say I should have shown, that's a very good point. I should have shown like a learning curve at the beginning of the lecture. So from now on, congratulations, you made it. You're through the hard part. The rest will be much easier because we are, I just had to square away this foundations and this sort of things so that we all know what at least the, that we at least all have the vocabulary of what is going on. But the remaining lectures will be much more hands-on and they will be much more about the actual underlying concepts and how to intuitively calculate those sort of things and how to visualize them and so on and so forth. So I think this is definitely the lecture that has the most mathematical concepts in there. And the remaining lectures will be much more hands-on, much more intuitive. In particular, lecture, lectures three and four where we will be discussing topological descriptors and how to use them in practice. And I will basically, lecture four will basically be a summary of current cutting edge research in topological data analysis and machine learning. And I will show some very cool papers on how to calculate, pardon, how to classify graphs, how to have an autoencoder that respects topological information and so on and so forth. So this first part was the part where most bravery was needed and now it gets easier from here. Which concept must I have understood in order to understand the rest? So I was able to understand the simplex from then on already the chain was a little bit too fast for me. So that's approximately where I got lost. Pretty early on, to be honest. This is, I realized this. I'm really sorry for the steep incline that I had to set there. It's really tough to cram all of this into like a few hours because I definitely wanted to end up with a very cool, with a very cool overview of current machine learning topics. And so I had to like explain some of the details here but right now from the concepts you only have to recall that we are looking for holes in a high dimensional space essentially. And you only have to recall that these holes have like kind of a dimension, like dimension zero, dimension one, dimension two and so on and so forth. But that's all there. So like I think mathematically we now have the hard part behind us. Thanks. But I do feel you and I mean, this is also why I have additional sources and additional books linked. I think that even if I say so myself you could probably take a book and follow along this other lecture one again because the calculation is really not that hard. And I'm not saying this to dissuade anyone. I think that it's easy to get lost there if you hear it for the first time but just trust me that essentially it's really not that hard. It's just like you do some matrix algebra essentially and then you end up with something that tells you how many objects or how many generators you have. But I can totally understand that it's a little bit, it's a little bit confusing at first. So this is why the second part will be much more visual and we'll have many more examples of how to do this in practice because this is also the, I should probably move on so that we're not getting too much behind although I think we're getting time. This is also the reason for this tutorial. The reason is to at least tell people what this field can do and what things are out there. And if you are only interested and I mean this only not in a negative sense but let's say if you're merely interested in applying this in your own algorithms and we will see this in lecture four and there's a cool ecosystem already in place. There are some algorithms, there are some libraries that make this relatively easy to include into Python together with PyTorch and whatnot. So you just have to have like a basic vocabulary of what is going on. But other than that, you should be fine. But again, I see this and maybe we can handle this also in the proper Q and A session from 3.30 to four.