 Okay, and the next speaker is Tuan Le, who will give us a talk on supervised learning for molecular confirmations. Thank you. Okay, thanks for the introduction. Hello, everybody. So I'm Tuan and I will continue talking about the follow-up on the theoretical paper we all Robin has described earlier and essentially I will talk about unsupervised learning on molecular confirmations. So I think maybe interesting or more interesting than the toy data sets we have shown previously. And to motivate the generally why are we interested in data-driven methods? And we have already heard very excellent talks here and during this workshop is that essentially we want to understand the structure property relation of atomic systems. What we usually assume is in the machine learning perspective, this can be described by supervised learning where we want to basically predict a certain molecular properties given some structure. And if we are assuming that we have, let's say, our mappings, which we want to implement are differentiable or there are any kind of neural networks, we usually assume that the descriptors learned within the model. And this learning paradigm assumes that we already have the label for the molecule of interest which we can obtain through experiments or even through computational methods, but still they might be expensive with respect to money and time. So another way of learning meaningful representation is to look into unsupervised learning. And essentially since we are dealing with molecular systems, I just want to reiterate that we basically can represent a molecular system or atomic system as a point cloud, G, which consists basically of a tuple. So we can say, all right, we can store our point cloud into certain vertices, right? So we have n atoms in our molecular system. And additionally, which was also presented before by Elias and his co-workers in BotNet, basically we have some attributes, so we have attributes which are basically some atom features, but we also have geometric quantities such as positions, so spatial coordinates. And now in unsupervised learning, we basically we want to train a mapping or some kind of neural network architecture which can reconstruct itself, right? And the idea is that we basically want to obtain a descriptor which describes the entire physical system in one embedding, so therefore we are not operating on a node level, but we get really one embedding for the entire physical system. And this embedding is assumed to lie on a lower-dimensional manifold which should describe the physical system. And then it should be also informative enough such that we can utilize another network which is the decoder network to reconstruct the structure. So this is basically the framework we are interested in. And since I've talked already about representation, there were also several talks already at the workshop, or it would be very brief here, as said we are working on point clouds and they can be essentially regarded as a set, right? So we have, assuming here we have an ordered set with certain attributes, so this could be atomic charges and atom types and other features for the atoms. In addition, we have geometric quantities stored in spatial coordinates. And now, since I've already talked about the set and how we actually store point clouds in computers, and which is also very straightforward and convenient, is that we are just storing these point clouds in matrices. And there, we already assumed there are some ordering, so pi, basically you can assume pi is just some note ordering how we store each atom in our matrix. So we can say, alright, this representation can also be stored as this matrix UV, which is basically the attributes, so we can say it's our atom types and we have n atoms, therefore we can store them in these matrices. And additionally, we have Cartesian coordinates, which are geometric quantities stored in this Cartesian matrix, P. And now, since physical systems are basically comprised with coordinates, atomic coordinates are also sensitive to SC3 transformations. So SC3 transformations are essentially just rotations and transformations and three dimensions, so we just say alright, this is the school, we can represent one element of this group as this tuple here. And then we can say alright, what is this actually mathematically, whenever we would rotate and translate our physical system, so basically one group element acting on the system, it can be described here with this matrix multiplication with the rotation matrix, so we basically just rotate the system and additionally we translate each particle in the point cloud. And as that, that sometimes we are also like many features of interest, for example, as often said already in this workshop of potential energy, these are invariant features, to be very precise, SC3 invariant features. And we can always say alright, if we have these attributes or these atom features, they are in fact SC3 invariant, so whenever a group of SC3 would act on them, they don't change. So we can also assume, right, this is just the element, like the trivlar representation is acting on them, so they don't change. And in our work, we are leveraging message-pass neural networks as function approximators, so it's a very general framework. And the reason why we operate or why we utilize this kind of class is since we are working on sets, we require permutation-equivariance, and this is essentially obtained in MPNNs because we use functions where these are all shared among all nodes. So we basically, we don't really implement functions for each node, but rather we use one network which process all the nodes simultaneously. Okay, so as I said about the representation, I just want to talk a little bit about the method. So we utilize an SC3-equivariant graph net, and since in our problem now here, we are just dealing with Cartesian coordinates, so we would just require basically, if you think about equivariant features, L equal 1 features, so we just have vector features. We are not here interested in higher-order features. We can store basically whenever we have a point cloud or we have a graph, within the network, we assume there is some hidden representation learned. So I say, all right, I can represent this for hidden representation in Xi, so it's basically stuple. It consists of this Si feature and Vi feature, which is some scalar feature or vector feature, and a very simple approach in message-pressed networks and also known as local inductive bias is, I mean, because you can also use graph neural networks for other types of graphs. You can also use them on just molecular graphs if you didn't have any structure. You can also use them on social networks or you can even use them at more larger networks, which you might work with in bioinformatics, if you have protein-drug interaction networks or drug-drug interaction networks. In MPMNs, usually you will just exploit this inductive bias of locality and you basically implement a message and update function, which was already also said earlier, and in the most general and easy way, you can implement them as basically a function component where you have some kind of self-interaction, which consists of the single-body term, and additionally you can implement a two-body function, which is the step one function, which consists basically, which wants to compute the interaction between neighboring nodes. And notice here that at this example, I've shown here this molecular graph where we can consider A2 to be some kind of like target node, so A2 is highlighted in red, and its neighboring set is defined here, which consists of 3DMU nodes. And then we can implement any function f0 and f1, but it also has to preserve certain symmetry constraints, such as equivalence, which I will also talk a little bit. And now these equivalent interactions are obtained through simply powerful compositions, which basically just using the tensor product, also often the outer product of relative positions. In this case, we are using basically normed unit vectors, so we just compute relative positions, we normalize them, so we basically just describe points on a sphere, and then we are still operating on the Cartesian coordinates. But you could also use other methods, just as Therapeut harmonics as described earlier, but in our case we just wanted to have a simple model and just try it out. And so we compute these equivalent interactions by this tensor product, so we have here this relative position, and then we just multiply or compute the tensor product with this M12 feature. And M12 essentially is just an SC3 invariant embedding, and there are many ways how you could compute them. And for example, as described in Kustoschitz's tutorial or lecture, you could use radial basis functions, and then the other term here is basically just a linear transformation of the neighboring node. But there are also other ways if you just think about, you could just come completely just from a deep learning perspective, you could also just utilize other functions, you could implement some attention like mechanism, which already also includes some normalization, such as whenever you wanted to compute some, let's say a filter between two neighboring nodes, because this is you can interpret it as some kind of like edge filter, you could utilize some attention like mechanism where you already then get it for free that the filter itself is basically arranged or is ranging between certain values. And then you can use also a linear transformation for the neighboring node, where in the field of attention mechanisms like the inner or like this component here would be the querying key transform, where as on the outer part, you just have to value transformation as in attention mechanisms. Okay, and now since we talked about the basically why I just talked a little bit about the method, I want to reiterate what we are actually interested in. We are interested in implementing an autoencoder like architecture, which operates on point clouds, and in particular on conformers. And essentially, autoencoders are usually trained to reconstruct its input data. And now, since our input data itself exhibits certain symmetries, such as permutation symmetry and Euclidean symmetry. In most cases, if you're reading literature about graph encoder, graph autoencoders, what we are actually interested in is we just want to obtain one descriptive for the entire point cloud. So we need to do some kind of like pooling operation, where we get permutation invariance. So what we get here is we get an SN, so permutation, and E3 embedding Z. And if you're familiar with autoencoders or variation autoencoders, this is often also referred as latent code. So this is basically the descriptor for our physical system. And then the decoders then task to reconstruct the input. And to reiterate, the input is basically this point cloud. So is it set. And the usual input for the decoder is just this latent descriptor Z. So it should or the assumption is that it should really capture the most salient information. And then the question is, well, because our input could lie in any permutation because we are operating on sets, the decoder doesn't know in which how the set should be reconstructed. And so basically, you can think of it as some discrete node assignment problem. And additionally, since we also want to reconstruct, or we want to do the reconstruction on absolute Cartesian coordinates, we don't know in which coordinate frame the input point cloud or the output point cloud lies. So the answer is, as described in previous talk by Robinus, we implement a group equivalent encoding network. So it predicts a group and very embedding. So this is the descriptor of our physical system. But additionally, we need to predict certain other quantities to get the group actions. And next, once we have these three components, we use a group equivalent decoder. Okay, I don't know if someone can draw. That's weird. Anyways, we have this group equivalent decoder which inputs, well, this latent descriptor and additionally, the group actions to the reconstruction. All right, so this is then the framework. So you can see we have as input this point cloud, where we have these attributes, because it could be atom types. And additionally, we have this Cartesian coordinates. We push it to our group equivalent encoder, we get here this permutation and SC3 invariant embedding. So this is basically this one embedding which should represent the entire system here. And then we also get these two other quantities in order to get the group actions. And we use V to construct the rotation matrix. And we use M to construct the permutation matrix. And then once we input these three quantities into our decoder network, we can apply the group actions on the output because since the decoder is group equivalent, we can either apply the group actions before or even after the network, because if we have equivalence, this is preserved. And the benefit for this is that then this outer encoder can be trained end to end. And the next slide, I just want to show you some examples. It's a bit, it might be not too visible. I don't know if I can show you, but essentially, okay, it's maybe too much. I'll go back. Okay. Well, so we have here as input, we have here this molecule, right? And we could, and the green dots, they showcase the spatial coordinates of the input point cloud in this coordinate system. And the blue crosses, they basically show the reconstructed coordinates before applying the rotation. So notice that because our molecule could lie in any orientation, it is the same up to a group transformation. So whenever we have this output, we can apply the group transformation, which is this rotation. And then we can see that whenever we have applied the group transformation on this blue crosses, the green point and the red cross are good aligned. And we can see that it's also performing well for larger molecules. But this was just some preliminary results on the QM9 dataset. But we can see that for larger systems, the reconstruction accuracy deteriorates. And we believe that there is some more hyperparameter tuning required. But this was just an initial run we did a while ago. And we just saw it works. And now we are currently still working on the follow up. And to conclude and summarize, I just want to reiterate that in many cases, the input data exhibits certain symmetries. And depending on whatever machine learning you want to do, let it be supervised learning, unsupervised learning, respecting the symmetry in the model architectures, very crucial and you can get efficient architectures. And in our work, we presented an end to end data driven, trainable alter encoder, where basically the recipe is to implement these group equivalent encoding and decoding networks, whereas the encoder outputs this group invariant embedding, so this is the latent code. And additionally, we have this group equivalent embeddings in order to get the group transformations and then align the input with the output. And then there are, of course, several questions we can tackle because why are we actually interested in obtaining this kind of auto encoder like architecture? We actually want to obtain a molecular descriptor which can be used for several downstream tasks, such as supervised learning, clustering or in generative modeling. And this is what we are currently doing. And I hope we can also share a preprint soon. But this is just the follow up application of the theoretical paper, which we have put on archive and Robin has presented. So with that, thank you for your attention. And I'm happy to answer questions. Thank you very much for the nice presentation. Are there any questions? Okay, so I do have a curiosity. How sensitive are your, let's say latent representations to reflection, for example? Yeah, that's a good point. So we always just assume that we have proper rotations. So if you also have like reflections or inversions, then you actually have improper rotations. And since we haven't really looked into this, but in order to also capture these use cases, we would also require to treat, if we are just working on the vector feature, so le equal one, we would need to include the parity constraints. So sometimes we need to define then, all right, are we actually having like really vectors or we have pseudo vectors? So then there might be other model architectures, which we could use, but we haven't investigated it yet. But in general, it's a follow up we want to do. We also want to kind of analyze the latent space being learned. Can we use it for other downstream tasks? Because this is essentially the idea why we are actually interested in this. Yeah. Thank you very much. Any question popped up in the meantime? Yes, one there. Thanks. Thank you very much for the talk. So my question is, how do you evaluate the quality of your embedding, the quality of the descriptors you obtain after you encode your molecule? Yeah, so that's a good question. This is also a bit preferred to be for. So right now, this network itself is just trained on reconstructing its input. So we kind of assume, all right, this embedding has to be informative. So therefore, we need to, next, we need to really utilize this embedding, try out training it, for example, on other supervised learning tasks or to see maybe if it's beneficial. But as I said, like they're currently this latent space, it's not really restricted. And I have just formulated like this framework more from a deep learning perspective, right, but you could also go from a more probabilistic point of view where you assume you have some kind of like generative modeling process for latent variable models. And then we could actually also do more generative modeling, sampling, maybe confirmations, doing some linear interpolation in the latent space and see what about the decoder outputs. So this is all follow up what we want, what we want to do. So right now, it was just trained on reconstructing input. And of course, you can always include certain other auxiliary tasks into the network. If you're reading literature in semi supervised learning, self supervised learning, but you also want to enforce the embedding to be somehow informative, just with respect to some other tasks, and not just on reconstructing the input. But this is what we haven't done yet. And what you are planning to do it also in the follow up work. Maybe a follow up question. So do you have any theoretical guarantee that somehow your embedding is unique? So you don't have overlapping in the representation of two different structures in the latent space? Yeah, that's a good question. So if there was an overlapping, the decoder would also output the same molecule. So we haven't looked into this. But right now, this space itself, it's not really regularized. We don't know if they are, for example, holds in the latent space. So we don't even have this, we don't have, or we don't know if it's really a continuous space. So we need to analyze it further. But coming back to your point, since right now everything is deterministic, it would mean that whenever one molecular structure would, if it would map to the same embedding, then the decoder would also map it to the, or it would map it to any other structure. But since we are monitoring the reconstruction loss, and we can see that the reconstruction loss is fine, it is, it might be the case, or it's very likely that actually the different molecular structures are also placed separately in the latent space. But still we don't know exactly how this latent space is currently structured. Thank you very much. In the interest of time, I suggest to keep the discussions for the poster session. One applause again to the speakers. Thank you very much.