 And we will give the floor to Robin winter, winter, yeah, with this talk on unsupervised learning of group invariant and equivariant representations. The floor is yours. Yeah, thanks for the introduction. And yeah, we'll talk today about our recent work. So actually, there's a recent pre-plaint already out. You might want to check. Yeah, so the work is about unsupervised learning of group invariant and equivariant representations. And this is joint work with Marco and Turin from the Bayer Machine Learning Research Department. And actually, Turin is giving the next talk here, where he also presents his follow-up work for confirmation prediction. So let's not work here. Do I have to do something? No, it's working. OK, perfect. So I first have some motivation slides. But as you probably followed the last talks and the last day's talks, this might already sound really familiar to you. So the main idea is basically that many properties of interest of data you might work with have some inherent symmetries. So for example, most prominence, the label of an image of a digit does not change under rotation or translation of the image. The composition of a set of balls, like you can see here, does not change under permutation of the elements in the set. And for example, the conformation like the binding affinity or the ground state energy of molecular conformation does also not change under rotations in translation in three dimensions. The problem, however, is that the data is often expressed in a way that does not respect the symmetry. So for example, images are often mostly represented by these matrices of pixels. And if we transform, translate, or rotate them, the ways of the pixels change. Sets are represented by matrices where different rows correspond to different elements in the sets. And if we now commute different elements, the metrics will also change. And as you saw in the previous talks, if we represent molecular conformations by the Cartesian coordinates at the etym centers of the molecule, this representation will also change under operations, like quotations, translations. So these different transformations can be described by symmetric groups. And we have here, for example, like the special Euclidean group in two dimensions, or the special Euclidean group in three dimensions. And the DMO is so in contrast to the previous talks, where we basically either define or use some equivalent neural networks and supervised learning to tackle these representations, or use some invariant atom descriptors, for example. Here we try to learn these representations that respect the symmetry in an unsupervised way by utilizing autoencoders. And if you think about it, you might wonder, OK, how do we actually do this? Because autoencoders are trained on this reconstruction objective, right? So you have an encoder and a decoder, which are randomly initialized. And you compress your input with the encoder and try to reconstruct the inputs by training the autoencoder on this objective. The problem is now, if you represent your data in this way that does not respect the symmetry, the reconstruction objective will also not respect the symmetry. And thus, you have to encode this information, this non-invariant information, in the representation. And you will not be able to learn this invariant representation. You might design an encoder that is invariant. But then you have the problem in the reconstruction, right? Because you can only reconstruct the input up to the group transformation you want to be invariant to. And then your reconstruction loss will be high even for the best case scenario. And to put this into more formal terms, because in the paper, we actually don't show this not only for rotations, but actually make this general for any kind of group. What we in general want to do is we want to encode orbits of groups G through some data space X to one point in a latent representation. So again, to make this more tangible, I put again the simple examples of images with rotations. So here we have three different data types of fruits, like apples, bananas, and pineapples in different orientations. And one orbit would now be like the apples in all the different orientations. So this is what we want to find one representation for. So in essence, our encoder has to be invariant if we want to find one representation only for an orbit. And the problem, again, is now that in the decoding parts, the best thing we can do is to map to one element in the orbits. But we do not know in advance which element this will be. And we don't want to force the decoder to decode to a certain element. This is basically a degree of freedom of the decoder that is learned during the process. But the problem is now, of course, so in this learned element we decode to, we call it the canonical element in this case. The problem again is that if we calculate the reconstruction objective, this will be even in the best case scenario where we encode maybe this apple, but reconstruct that this apple. This is the best the method can do, but it will still get a high reconstruction loss, right? Because the reconstruction loss is not invariant. So what we do here to solve this problem is to basically learn the group transformation that will align again those two inputs and outputs objects to each other. So we want to basically learn this group representation that aligns the decoded canonical element with the input elements. So we have to define this learnable function psi, which maps the input element x to the group. Basically, we want to be invariant too. So in the cartoon, this looks like this. So we have an encoder that encodes the invariant part and the decoder that reconstructs the canonical element, right? But now we also have to sign encoder that encodes the actual variant part, so the group function basically. And we use then this predictor group function applied on the canonical element and then reconstructs the inputs in the right orientation and then we can calculate the reconstruction loss again. And again, like in the paper, we show how we define this psi for any group and what kind of properties the psi has to fulfill to make this work. But here, like due to the time limit, I cannot really go into the details of the math here. I just showed two examples. And the first one is for the rotated MSK. So images, again, of digits. And so this is like the common benchmark, right? But here we have different rotated versions of these MSK images. And we want to learn an auto or a representation that is invariant to these rotations. So it should not matter how you rotate the input image, the representation should be the same. And we utilize here these nice networks proposed by Wiela et al, which are actually variable cnn's and two dimensions to define the invariant encoder and the actually variant group function psi. And for you to kind of get the idea, what we are doing here is we use psi basically to predict the point, why, on the unit circle. And you can maybe see that if we predict such a vector here, we can then with some reference points calculate an angle, which we can then use to construct a rotation matrix, which is then used in the decoding process to recover the original orientation. And since we use here fully equivariant neural networks, if we somehow rotate our inputs, the predicted group function or group rotation in this case will rotate equivariantly. So we basically are by design equivariant, which is a necessary property basically here. And if we look into the results, we can see that for different orientations of the input image like here one, three, or six, we can see that the canonical predicted output is always the same orientation, right? Because the representation is always the same and the output digit is then always the same, the canonical element. But once we apply the predicted rotation, the input and output align again. And if you now compare also the embedding of our proposed architecture to a classical autoencoder, we see that like expected for different orientations of digits, you see different clusters in the original autoencoder. And in our case, we only see one cluster because the rotation does not affect the representation that we learn. Maybe a more complicated example is now if we go to three dimensions and here we also include the symmetric group to handle permutations. Because here we look here on the toy data sets of these tetris shapes and three dimensions. And here we want to also encode different orientations and rotations of these tetris shapes. And here we utilize SC3 equivariant networks the transformers proposed by Fuchs et al or equivariant GNNs which Tuan developed which he will talk more in the next talk about. But here it's slightly more complicated because now we also have to not only have to predict the rotation, but also the translation metrics and permutation metrics by this side. Because these are the basically the group elements we want to be invariant to in our representation. But indeed this also works. So here you can see maybe a bit harder to see because it's in three dimensions. Here we plotted again the inputs, the canonical output and the output after applying the group transformation. And you can see that after applying the group transformation the input and output aligned again. And we can see that for all the different orientations and rotations we see that only one representation is learned in a two dimensional dynamic. Actually, this is not a projection. And yeah, maybe just a quick spoiler for the next talk. We also applied this now on conformations which is maybe most relevant for this audience. And here this also works for this slightly more complicated case compared to Tetris. So here we encode like molecular conformations by their Cartesian coordinates. So no further like invariant descriptors or internal coordinates needed. So we can alter encodes this and reconstruct these are the predicted versions while tunneling through like an invariant representation. Yeah, that's all I wanted to show you. So thank you for your attention. Thank you very much for the nice presentation. Are there any questions? Do we have another microphone? Yeah, so I will take one here. Hi, thank you for the nice talk. I have two quick questions. One is like what is the dimension of the bottleneck of the autoencoder relating to the size of the molecule? Or here, so for Tetris it was two dimensions. Right here, I think we used like 32 dimensions but this is really hyper parameter. I mean, at some point if you go too low then at some point this will contribute. But the embedding is again embedding is for like the whole molecular graph, right? So we have like not like atom centred representations. Yeah, and the other question, can you somehow make this independent of the size of the molecule of the number of atoms you have? You have one representation for the whole confirmation for the whole molecule and this is independent of the size. So that's the main reason. Maybe my question is to like do you, can you develop one universal autoencoder? For the whole chemical space? Yes, yeah. Okay, so I mean at some point it will break if you go out of the training dimension of course, yeah. We tried up to 32 atoms so we haven't tried it now for like products for example but in theory, yes. Other questions? Yeah, so when you said for any group, do you mean for any compactly group or any group? So for, yeah, we showed both for discrete and continuous groups, yeah. But is compactness a necessary condition for your argument? That's a good question. I think so, yeah. I mean... Yeah, but the way they deal with translation maybe makes it all free. So I mean, and also is it only for finite dimensional representation can learn? Sorry? Only for representation of finite dimension. Yes, yeah. I have a quick question. So the encoder part that's doing the equivariant bit, right? Did I understand correctly? It's effectively learning a sort of normalization, some canonical direction, say, if it's a notation group. Yeah, that's correct. I mean, you can even see that it kind of learns to, for the digit part, I mean, it's nice to analyze it. So you can see that it aligns basically the main axis of variance along one axis. I think so this is like how it kind of learns the easiest way for it to reconstruct. So then you said that you didn't want to impose this because of course, I mean, maybe for some symmetry groups, it's difficult to think of a normalization, but say for rotations, you can always say like, oh, well, the first principle axis should be X and so on. Exactly, yeah. Did you not want to do that because it's not so easy for other groups or? Yeah, exactly. So we didn't want to imply any canonical orientation advance, we wanted the network to learn. That's like the best way for it to learn this problem and to represent it, yeah. Nice, thank you. Thank you. In the interest of time, I would suggest to keep future questions for the coffee break or the posture session. So one applause again to the speaker.