 Okay. Hello everyone. So I'm Ilyas Batatia. I'm working with Gabor Chani at the University of Cambridge, and I will present you today our work on unifying the design space of eco-variant atomic potentials, and it's a direct follow up to the talk of David. So this work started by seeing the very, very large amount of new architectures for anti-atomic potential coming each year, and we wanted to understand really what was the connection between this myriad of new architectures. And particularly notable among these are the atomic cluster expansion produced by raft roads that David mentioned, that unified the atom density by its descriptors and also and then NECWIP, a message passing neural network that used eco-variant features and outperformed by a large margin the existing approaches. And so we wanted to understand what are the formal relationships between these different approaches, and you have in one side the atomic descriptors, and the other side you have the equivalent neural networks, and all that connected in any way. And this is something that a lot of people in the community have been thinking about. And we are sharing our point of view on that. And also, can this connection inform us on what are the very important choices in this full design space that makes this approach is successful. And so the question was in all this new machine learning papers, what are the key ingredients that they put in that really make them successful compared to the previous approaches. And so we will try to provide a unified framework to put all this, all this approach on the same footing in order to compare them together, and then we'll highlight the key choices of the different approach in the literature within this framework, give a special studio of NECWIP. We have a set of experiments to really shed some light on the design space that make NECWIP so successful, and then introduce a new model botnet, which is a modification of NECWIP with a network architecture that reach excellent accuracy with a much simpler architecture. So just to recap on MTNNs, what are machine learning neural networks. So they are graph neural networks, and they map label graphs to some vector spaces, and in atomic potentials, labels of the atoms are usually the states. So this is what David introduced, a state of an atom describes the configuration of the matter of the molecule you're looking at, and these three properties, the Cartesian position vector, the set of attributes that can be the chemical elements and some learn about features. And we call them semi-local because of the message passing, these features can depend on a quite long receptive field up to 14 angstrom if you have a lot of message passing. So this information contains semi-local information of the atom. And so a typical MTNNs have three steps, the message passing phase, the update phase and the readout phase. And in the message passing phase, a message is collected from the neighboring states of the atoms. So each atom has neighbors, J, and then you embed this neighboring, the state of the number of atoms in some function, and then you collect them to recent around one atom. And this operation, this big splash needs to be permutation invariant. So this is a permutation invariant pooling, and if you swap atoms of the same time within the local environments, then you don't need to not change the message. And love that is a normalization constant that will become very important later in the experiments. And so then after the message passing phase, you have the update state, the update phase, so the state of an atom is updated based on this aggregated main stage via an update function, and then you get the state of the atom at a new stage. And at the end, a readout phase which maps this collection of states to a quantity, which is usually a scalar, the energy, and some model up to just read out the last state of the network and other read out multiple states and their freedom in that. So I will skip on equivalence but it's very important as Boris and David highlight to put in the symmetries of the physics inside and in equivalent and PNN, the way you do that is that you put some constraints on the messages, so they transform in a in a particular way by the action of the symmetry on the states. So if you change the configuration the message needs to transform in a very particular way. The messages that are given by the offer a result of notation, and has David mentioned a very successful approximation of the potential and so face is the body order expansion that expand a function in in in a hierarchical terms. And this can be transferred to also the concept of message passing, and we can define what is a body order message, which means that this is the this equation that this message will have some hierarchical form. And the meaning of this T number is the order of the body order of this message, which is the largest integer, where the derivative is non zero, which means that we have truncated this expansion to some order, and this T is this that order, and all the higher order is zero. And this is the concept of body ordering in a message. And. And so now we have reviewed the message passing, a little bit of a pretty framework and David has introduced this general days, and how can we connect them together. And the central idea on that is that all the atom centers message passing can be understood as some kind of multi ace, so stacking of different ace, and the crucial idea is that you will construct the message of the message passing network. And this is the ace basis, because this a basis gives you a way to construct symmetric features. And so the kid is that you want message that are symmetric. So you will take the as basis and cost of the message this way. And this is a generalization of classical MPN and because no the message won't be to body. In most MPN and the message are only to body. The messages can be many body, and then the updates will connect the S layer together, and we need to generate the one particle basis which are the edges features to incorporate the previous ace inside it. And the way to do that is via the T function, the T function will not only be a function of the attributes, but it also be the function of the previous features that will be generated via the ace formalism. And so this way you can re expand iteratively in new ace basis and reconstruct the message this way. And, and the connection between MPN and an ace is that the edge embedding of MPN and is actually the one particle function of ace, and the pudding operation is the tensor product and symmetrization of the ace. And this way you have the full equation that actually unifies the atom centered and the MPN and description that has some very key parameters. So the crucial parameters of this framework is the number of message passing layers for the number of time you re expanding the ace basis, the maximum order of the spherical expansion, which is the order you go in the one particle basis. So how much on your information you put on the edges. And then you have the L max, which is the symmetry, you will pass on the symmetry of the messages will pass between ace layers. And then you have the new, which is the correlation order at each message, because no the message are not only two body but they can be many body. So you have the decision to make of what's the correlation order that you will put at each layer. So this is the new, and then the final crucial part to decision to make is the coupling. So you have a large spectrum of coupling that will influence the speed and the computational cost of your model. And this is described as this V index that defines how much you will couple things within one layer. So usually in message passing, nothing is coupled. But the reason for that is because they are only using to body as David mentioned when you are creating distance of product of higher orders, you have the choice to couple things are not for example you can couple the species or not. And this is what the V index will tell you. And so, no, we have classified a large range of message passing and also also atomic center with this hyper parameters, and you can see that all these are just special choices of this operator. For example, schnett will be taking just colors in the one particle basis, not having invariant messages and having correlation order one at each layer, and neck with will be having an equivalent one particle basis an equivalent message and also taking just correlation at each message, and linear is in the order in the other side will be just a one layer, but we've equivalent high equivalence in the one particle basis, and invariant in the message. You can see that you can actually classify all this design space of most of the previously published models in a single table via this hyper parameters, and one important also aspect of this classification of the models that use Cartesian vectors, and we mentioned them in the GEN and pain that they are equivalent to this framework just by a change of basis. So the one particle base basis will be expressed in Cartesian format, but because they are using only vectors just a change of basis can merge them into the same framework. Okay, so now we have classified all these models. And can we try to really understand what is important within this design space. And when we probe the design choices of all this mirrored of architecture to really take away what is important is in the design space, and the way to do that we have created this architecture which is botnet body order tensor network which is halfway between linear ace and make whip. So the reason for that is that it's a message passing network so you have different layers, but it's completely body order. And the reason why it's completely body order is that we have removed all the nonlinear it is that breaks the body ordering of the usual energy and ends, and the body order expansion comes from the iterative layers so at each, at each layer the, the, the features are created, or just to body, but because you are recursively doing message passing you create a higher expansion of the of the potential energy surface. And then, at the end, you map each of these features to one contribution of the energy which is exactly the contribution of this body order term. So the archival thing build the archival expansion of the of the body of the potential other surface build in inside the network which gives this interpretable architecture. And this will serve as a probe between linear ace botnet and make whip which have like a line connecting in this in this design space. And now we will probe some different shows is first the nonlinear cheese. So they are most MPN and use nonlinear cheese. But in botnet we don't use any nonlinear cheese, because we will conserve body ordering. And then we showed that actually not using any nonlinear cheese can give very, very good performance so we reach the same performance on as neck whip, without a single linearity in the model, except the coupling the other tensor product. So it means that really what's important in this network is the tensor product, not the point wise nonlinear cheese. And one key key key asset also is that when you put just one linearity at the last readout to account for the trunk actually turned in the expansion. It's a very well defined expansion, and then you have a residual part that is not bundled, but you control because it's just the last part. This way you can get an extra 20% outperforming neck whip. Just with one point point wise linearity in the whole network. So this is this understanding that we build help us to really target the precise architecture that you wanted. Another thing that we found very important is in this network is the normalization. So normalization even is very important for a rich in high accuracy, and in particular for the regime with very low amount of configuration in your training set. And we found that just changing this lambda in the equation of the aggregation can give over a 50% change in accuracy. So you see that by putting this lambda equal to one in neck whip, we lose 50% of accuracy, and by putting it to the square root of the number of average neighbors, we recover neck hypocrisy. And by changing it again, not to the square root, but to the average number itself, you can win another 10% on neck whip. So normalization is very key inside the network in low data regime, and also the data normalization is very important. So most of the network use the scale sheet data normalization which takes the energy of the training set and subtract the mean of the training sets. And, and other approaches like the atomic atomic density approaches, not doesn't use that but just scale by the zero of the of the of the density functional theory used. So you see that in a network which changes the loss the training dynamics, and you can also lose another 10 or 20% by choosing the right normalization. So normalization is very key, but it has effect on the generalization of the networks. You see that using scale shift normalization which are the SHH model in this plot, you have very, very good accuracy in domain, but when you can start doing out of domain, very extreme abstraction or stretching modes, then they completely fall apart, and they cannot do the position in other surface. And the reason for that is that you have put it the wrong limit. So normalization needs to have the also physical property, and even more important when you know that you would break bones, you cannot do anything with your data. And you see that both in the stretch in the stretching mode and the abstraction, the, the scale shift model which are more accurate, completely fall apart. So let's know some benchmarking on the Philippi data set that Boris and David talked about. We see that botnet can actually reach state of the art performance, just with a very simple architecture. And we also know that, as Boris said that botnet and make we are far ahead of the other, other models, and they are further work in this direction to really understand also what makes these two stand out final extrapolation test. So here we have taken the potential energy surface of the free BPA by rotating the angles free angles of the of this molecule. And then we have taken one these slices in this potential energy curve. We have taken one of the black slices in the 2d heat map, and we have plotted down the prediction of the models. And you can see that the models do extremely well in particular botnet and make we even we very, very low amount of data. And if in the middle cut has no data is it's a very extreme extrapolation, but make with them botnet can still give reasonably well accuracy, which is a very impressive. And they are all smooth, smooth potentials with with good predictions. So, at the conclusion, we have proposed a mathematical framework to unify most existing ML potentials approaches, including symmetric potentials and also equivalent message passing. And then we have introduced a new model botnet, which with an entire product architecture combined state of the accuracy accuracy, also great extrapolation. So I would like to thanks all the co-writers on this project, which is a joint work with the Gabor Chinese group in Cambridge and the Harvard group of Boris Kerensky in half, and also have dropped and Christoph out now. I would like to say that the preprint will be certainly out. We hope if you want to dive in the mathematical framework, and we will give a much longer talk where we also dive in other aspect of this framework. In the a seminar that we invite you to follow to dive more into the framework. So thanks a lot for your attention. Thank you very much for this amazing talk. Now we have time for a few questions from the audience. So, I'm pretty impressed by the old architecture of this thing, but it's quite hard to grasp what is the computational scaling of the objects in terms of the total number of atoms, the average number of atoms in a neighborhood and the number of species. Do you have any specific idea or actual number on these. So, so botnet suffers from the same problem as other message passing networks, which is exactly what Boris said is that the receptive field you have is very large. And then it's that easy to parallelize and also equivalent message passing cost a lot because of the symmetrization part. But they're also way more accurate. And so you have the freedom in the design design space to lose a bit of accuracy, but to gain in speed or reduce a lot the computational cost. So no, the design space allows you in all these models to really choose between very, very accurate models and also fast. So a very fast model so I, the bold column numbers will be quite slow models actually. But you can still build very accurate models with few numbers of hyper parameters that will probably compete with the other approach in terms of speed and still be very, very accurate. So, yeah, and and this is the point also of all this design space is also in the future to try to target the best approach for both speed, computational cost and also accuracy because we haven't explored the full design space yet. So we have one question from New Orleans. Thank you. I have a question related to this. I basically this contrast between what Boris was saying about Allegro about not needing message passing and what what you guys just talked about. And, and I guess the question is, there's kind of two things that the message passing does on one hand it gives you richer information about the environment, which Adam, and then it also gets you longer range. And I guess to kind of decouple these two things. You could just look at also the influence of the cutoff on a non message passing model. So do you have any insight on that. I can answer this. So I think, if you look at the question is in each neck with message passing if you ignore the non linearities which seem to be not very important. If you if you don't have non linearities in neck with that each message passing increases the local body order by one. So basically the question is that, do you need five layers on a creep to reach the very good accuracy to build up the body order, or to get to to get some long range information. And I'm what you see from Allegro and what you see from from maybe ace as well is that actually the local high local correlation order is much more important than the semi locality. So this is basically Allegro is building up this local information from these parents or products is very similar to what ace does but it has a lot more like multi layer perceptrons inside so it's, it is a much more over parameterized version of because you don't have any MLP in the traditional ace, so to say, whereas Allegro has a and also nice in to some extent these days are very related approaches, they have introduced a lot more flexibility into building up this local high body order description. And one, so one or so comment on that is that, if you look at the agro paper, you still need to you to use some large cut off on some molecules right. So, maybe you need a bit of semi local. It depends on what you call semi local. So if semi local is 10 out of 12 or if it's 20 or off, because usually this atom, the center of the description use a cut off of around five or four and strong. And I'll go for example use in some cases, a seven and strong online and strong cut off. So, do you call that semi local or local. This is up to discussion right. Okay, thank you very much to Boris Kosinski David Colette, and he has batata for these amazing talks. I'm sure they will be happy to discuss them with you further. Now we have a coffee break and we start at 40 sharp. Thank you very much.