 Okay, so first speaker is Jata Gamm and we'll talk about toward method passing with the end center of the presentation. So please. Thank you and good morning to everyone. So I realized that I promised in my abstract to talk to you about Hamiltonian learning, but I thought I'd take this opportunity to talk to you about something that I've been finding recently more cooler and since Julia will anyway talk up to me about Hamiltonians, I don't think you'll miss out on much. So instead today I'd like to tell you a story which I think links a bit with the panel discussion we just had and I'd like to present your story about identifying ingredients in machine learning that when you have pinpointed which step of your machine learning is giving you what you can take them you can arrange them in whatever order and like extend your framework to more complicated targets. Okay. Okay. So just as a quick recap because you're probably seeing this a bunch of times already. So a supervised machine learning framework goes as follows you are building models for a particular property of interest could be inter atomic potentials or any energy that you might be interested in studying. You begin with some structures that you have sampled from the relevant part of face space you want to study and most people myself included spend a lot of time in constructing representation functions which are intermediate inputs to your model and then you provide this to a model so here for example. Okay. Here for example we have a linear model so the model instead in turn will learn weights which when combined with your representation function will lead to the best of its ability the closest reconstruction of some reference calculations and then you use this framework for prediction. So at the same time there exists an alternative set of schemes that combine this representation and modeling into one whole step and this forms the end to end learning approaches and we'll see today if they are really disconnected or not. So wait until then. So before I tell you about why so many people spend so much time in constructing these intermediate functions let's quickly recap what how one goes about constructing them. So you start with your structure and at each atomic position you assign a Gaussian function or some localized function. So like in yesterday from a stock you've seen localized functions which are delta functions here we work with Gaussian functions and then around each of the atomic position you consider a localized environment and in this local environment you tag each of the neighbors one by one by its relative position and since you have this pair-wise representation of each of your neighbor with respect to a central atom I would call this a pair density and then you sum over all the pair densities in your neighborhood and construct what would be a neighbor density which is a function of your central atom and clearly you've seen this in many many forms so here since we're talking about Gaussians and its atomic positions this is a function in position space but it could be in any space that you can imagine. So the procedure that I have told you very very quickly in words incorporates a series of mathematical steps and these encode your representation function with some symmetries at hand so in this way of identifying relative positions you've encoded translational symmetry and in considering the sum you have incorporated permutation symmetry of your system and these are further rotationally symmetrized and you've seen from also in the talks before me that whenever we start talking about rotational symmetries you see a lot of these LM indices or spherical harmonics and it's not a surprise because spherical harmonics provide the most natural language to talk about rotational symmetries and so you see all these coefficients with LM indices and then you take these density you can symmetrize over one single copy of your density or what multiple copies what would be considered a tensor product and we'll see why this is helpful but in general this forms a framework that is generally applicable to any system so this would be an AC-DC framework or atom-centered density correlation framework so just for all the rock fans in the audience I have a reference to this AC-DC band and so what is the advantage of doing this complicated procedure that seems quite involved so the advantage being that you maybe start with these three molecules here three water molecules which are just rotated copies of each other and if you're trying to learn the energy which should be the same for each of these structures because they are just rotated copies in vacuum this should in fact map to the same representation function because this incorporates all the symmetries that these water molecules have and so instead the model now does not have to learn that the inputs that you're presenting to it are in fact equivalent but instead it can spend time and learn in things which are truly distinct and of course this neat architecture can be extended to targets which are more complicated like dipole moments or tensile properties so from yesterday's talk also I think it was mentioned that a dipole moment which is a Cartesian vector has a direct mapping to a spherical harmonic and so if you want to learn something like this you start again with your structures and here of course the properties are going to be different because they are rotated copies and so the dipole moments also rotates with the structure so you start with the same procedure you construct the atom-centered densities for each of these structures which would be the same and then you have this additional function here which is a spherical harmonic and averaging over this leads to a representation which now rotates the same way as your target property so what happens now is that in the spirit of what is called the Wigner-Acker theorem you've separated your learning of your equivalent target into something that captures the symmetry so which here is your representation and then something that is independent of the symmetry so invariant weights so now with the same set of weights you've captured the rotation of your dipole moments so I like to think of this as adding you know like spices to your some recipe that you're making so incorporating different kinds of symmetries is like maybe adding some sort of pepper to your recipe some people like pepper because maybe you're worried about symmetries and then also depends on how much pepper you add to your system because it depends on how much symmetry you want to encode and so this framework is more and more and more and more employed because it provides a completely linear basis to expand your target property which is something also that you have seen yesterday so I would like to maybe present here an analogy so I will talk to you about a central character which I'm calling today Bob and we can think of Bob as a person whose energy or his mode during the day changes based on his interactions within his friend circle okay so we can think of Bob's interaction with Jim or interaction with Sally or interaction with Alice and these together when you sum them up control his mode to some extent okay and so these are like these pairwise interactions or one neighbor that's controlling the energy of Bob so this would be captured by a one neighbor correlation so like the pairwise displacement if you want to think in terms of atoms and so you get this naturally by averaging over rotations one single copy of your density and then in the same way you can consider energy contributions from two neighbors so for example here we have Bob again and maybe Bob gets along well individually with Jim and Sally but when the three of them are together they fight a lot and so Bob's energy decreases so you need to capture this effect as well and this you would capture by okay averaging over rotations two copies of your density because this naturally gives you your environment in terms of two distances and an angle and then you can do this on and on for more and more higher order contributions to your energy so you can think of triplets and so on so one of the things that is nice here not only by name but also I think it's a very neat framework so for a long time it was thought that when you want to construct higher order representations that that capture these higher order contributions to energy you need to start by averaging over these you know atom densities from from scratch but it turns out that you can use the information that you have at the previous order and so for example here in terms of this analogy you can think of this contribution of triplets of atoms as recycling information about Bob's interactions with Sally and Alice at the at the second order and then considering an additional contribution from the third neighbor and this scheme leads to linear scaling of your descriptors and as I said before when I talked about spices since I come from the land that is infamous for its spicy food I like to think of these as ingredients that when you combine in the correct order and in in appropriate proportions you get any recipe you want so having said that this is a figure from this paper that was already one year ago and this seems a bit overwhelming but it shows here all the representation functions that have appeared over the past decade and all of these basically start from the same Cartesian coordinates but they reflect similar information about your structure in just in terms of different languages so one of the key things here is that this tree unifies all these languages in one tree and at the same time there exists again this alternative end to end models and we'll come back to this shortly so the key ingredients here to the success of all these representations are symmetry which we've talked about extensively completeness which means that the represent this intermediate representation function that you're constructing maps distinct structures to distinct functions otherwise you've lost the advantage of constructing this intermediate function in the first place and then the advantage of locality and so one of the problems that one might be interested in studying further is what is the property that we're trying to learn is not just atom centered because till now we were learning property as atom-centered composition but what if it's n-centered so for the last few years there has been interest in the community to target more fundamental quantum mechanical outputs such as the single effective particle Hamiltonian matrices which are indexed by two centers or orbitals on two different centers so one would need to have representations that treat two centers on an equal footing so we start again with this atom-centered framework that we have so far and we can combine this with the pair density that I talked about a couple of slides ago where we explicitly tag one other center and we can combine these two using tensor products again which you have to symmetrize and get what is an atom pair correlation so if we go back to our analogy of Bob and we consider here his environment described by one neighbor which is Alice now I tagged the special atom Sally so now Bob and Sally form a special couple so you would like to define how the energy of this couple changes with respect to the couple's environment right so one way to do this is to define this couple environment which is the interaction between Bob and Sally and then you can choose to describe the relation of this couple with Bob's environment so Bob's friends but at the so this would be the joint representation but at the same time you could have chosen Sally to Sally's friends to define the interaction right so there's some sort of asymmetry in how you're defining this two center correlation which comes in handy so for example here if we have a water molecule which is expressed in the Hamiltonian is expressed in some atom-centered orbital basis we can have three different blocks of this Hamiltonian so we can have orbitals that are centered on the same atom which would correspond to like maybe just learn learn something using the atom-centered density we could have orbitals on two different atoms which are different colors here so you can think of Bob and Sally again so here the red thing is Bob and the blue thing is Sally and you can think of the interactions and orbitals on them but here again so the atoms being distinct means you can explicitly know which center is Bob and which center is Sally but now comes the complicated case where both the atoms are of the same species so for example now we have Bob one and Bob two so you don't know beforehand which atom is Bob and so this gives this asymmetry in tagging these centers comes in handy because now you can construct symmetries and anti-symmetries combinations of your two center correlation and this is what would be called mathematically permutation equivalence and so this way you've built in rotational symmetry translational symmetry and permutation symmetry into your representation so what this does is that since all the molecular groups that you know of are subgroups of the rotation and permutation group this automatically encodes the the rules of molecular orbital theory so for example here I trained a model with one random Hamiltonian matrix of a benzene molecule just random data and so the prediction is of course also random because it's learning useless stuff but what happens is because of this representation your output is now in it's it captures automatically the degeneracies of the eigenvalues and it also captures furthermore the distortion effect so if you distort your structure it captures the degeneracy breaking off your eigenvalues which is the yarn teller effect and this you get for free you don't need to do anything else so now I would like to switch gears and go back to how this framework now naturally extends to message passing because in the framework that I've described to you so far we have a way of describing a central atom we have a way of describing a pair of atoms and of course this if you look in the language that is used in message passing generally it's this here could be considered as a message between two nodes and the edge between them and then you can extend this to multiple other nodes in your neighborhood and then you can sum over them and this would be a message from message from all your neighbors and now you can use this message to update the representation of your central node and this is usually called the update function so again going back to our analogy of Bob Jim and Sally we so so far I showed you how we can construct higher order correlations by just considering Jim considering Jim and Sally and Jim Sally and Alice and basically some of them and now we can look in for the message passing case again start with the one neighbor correlation so we have for example here just shown for representation Alice but again remember that you sum over all these pairwise neighbors and then you tag a special other center which is now Sally which is again we are going back to our case of representing a couple and now this is our two center correlation and now we can think of Jim who is Sally's neighbor but not directly related to Bob okay so if there was no message passing Bob would have no information about Jim but now since Bob knows about Sally and Sally knows about Jim and Sally being a gossip monger can bring information about Jim to Bob and so here you have constructed a single neighbor correlation on Sally and single neighbor correlation on Bob and combine the effect of the two so there might be a case where Jim is a simultaneous neighbor to both Sally and Bob and how what so what does message passing bring us in this case it brings us more information about Jim so because Jim is a distant neighbor to Bob you don't trust the information you get from Jim directly so much but since you have this intermediate relay point of information you maybe trust the information you get through Sally about Jim more so okay here if we look at look at the case of this degenerate methane data set from from this paper before so a three-neighbor correlation which basically looks at a central carbon atom and the three neighbors around it which is centered only on the carbon atom does not capture to the full effect the interaction between these two hydrogens and that's why the learning curve saturate quite early but if you center yourself on all the atoms in your system you bring in information about all sorts of correlations so you get a lower error now we consider the message passing case where you center yourself again on the carbon atom but now you have information about the neighbor's neighbor so you get information about the interaction between h3 and h4 through the message that h3 passes to your carbon so with just this carbon centered message passing representation you basically recover the performance of your all-center three-neighbor correlation function and another way to look at how much more informative this message passing scheme is is to look at how the feature reconstruction error happens but I think I will not go into the details of that since we're running out of time so I will just conclude with this with the three set of trees that I've talked to you about today so we started with this tree of representations and on top of these three we built different models so the models could be linear kernel neural networks so on and for a long time neural networks seem to be disconnected but they were doing almost similar things in different language and you can basically relate this tree of neural networks which also diverges into feature base message passing base and other forms neural network schemes but related back to the tree of your representations through this message passing atom-centered correlation functions and of course I mean since the neural networks have their own spices like non-linearities or attention schemes you can borrow them and include them in your atom-centered correlation scheme and choose how to combine all these ingredients to get whatever recipe that you desire so with that I'd like to thank you and yeah. Thank you, thank you for the for the nice talk and we have time for one question so good question. Thank you. So just to help my understanding the so the you symmetrized over two sensors right so you've got you've got like two SO3 sensors and you symmetrized over the permutation right so that's that's different than say you know a message passing between the two or an environment that's centered in the middle because that would be permutation the invariance that system yours is equivariance so it can really tell you all this thing's going on in terms of the occupation of matrix on this guy and this is going on the other guy right okay I understand very very nice thank you. Thank you okay so I'm sorry if I didn't leave you with time to ask questions but okay I'll be of course available throughout the end of the workshop so let's talk outside. Thank you, thank you again and thank you.