 So good morning and welcome everybody to the second day of the young researchers workshop on machine learning for materials. Thank you for joining both in presence and online. This first morning session will be a tutorial session that will take place on Google collab. So you will find on the conference website ml4m.xoz the link to the first Google collab. So presenting today is Anton Buscherev from ICAMS. And he is a researcher working on atomic cluster expansion representations and their applications to machine learning potentials. Without further ado, I give you the word. If you need any help, reach out to me or the other organizers will try to help you as much as we can with the running and using the collab. Thank you. Hi, thanks for the introduction and morning everyone. So yeah, just I'm Anton Buscherev from ICAMS at Germany so without too much of an introduction, let's immediately go to and see what we have. First, before we start anything at the collab, please go to this runtime tab and click this change runtime type and make sure that you have the GPU selected here. And if you do, it's fine just get out of there if it's not just change it to it and proceed. So we would need to install a few things. I try to make it as package independent as possible, but we would need to install a couple of packages like atomic simulation arm and pandas. And also, closer to the end of the lecture, we'll talk about the atomic cluster expansion. We would have to install a couple of tools that we develop and they are installed here you have to copy some repositories from Git for the Python ace and it tends a potential for running the fitting. All of this is it up to just run all of these windows here. Just after you run all of them that would there is would appear the button to restart the runtime so please click it and to restart and after that do not execute it again and just go directly here and start the the the importing. That's to the technical details. I won't run any of this. Okay, so we cannot talk about the something that is called structural descriptors. What are they and what are they needed for what are they supposed to do. We want from them. And we are in those who want to find their represented basically do the machine learning potentials. So basically what's the problem. And that we want to solve. So here, usually what we do is that we have some structure that is represented by the coordinates and atomic types. So, some Hamiltonian and minimizing for the way functions we can produce the energy so basically solving quantum mechanical equations. In a, in a perfect case, but it is expensive methods so we want to replace them and exactly we want to replace them with something with some function some generic function that is preferably cheaper to execute than the computer quantum mechanical Hamiltonian. That would also be dependent somehow on the same configurations and also some parameters of that function or model and minimizing with respect to this parameter so that model we would produce some energy that would hopefully be closer to to the problem. And minimization of this parameter says carry out through minimizing some loss function that usually take form, which is similar to this. So you, you know, basically the, the root mean square difference between the true energy and the one produced by the model minimization respect to the parameters. So we're not going to talk about this part too much that concerns the minimization algorithm that concerns exactly what are what is the function here. And there are multiple methods of regression basically what we mostly do not talk about about this part about the structure representation of our structures. So, let's just then take some structure, for example, that's that you, that's a little bit molecule here, which looks like this to carbon atom one oxygen for hydrogen and just how can we basically do this. Just take the coordinates of the structures and do some functions or can be just actually do this. Just take the coordinates like that would be just the matrix like this here. Assign the types of the atom somehow and put this generic function and try to learn the energy. It's actually possible. And I think people just tried to do it. At the beginning or for this machine learning activities. And the thing is that you, you can be to the certain degree successful with this. You can actually learn some energy from just Cartesian coordinates of the of the atoms. The problem is that that you would have to have a massive amount of data for just even this particular simple system to do it. And why is that is that because if you just rotate at this molecule and that we do by simply executing this common providing the degree of rotation of the axis of rotation here. And look at this positions again, they are changed almost completely except for the Z axis because we're taking around it. And but we know that the molecule didn't change. So we want to have the the same energy for this rotated molecule. And still you can sort of fix it, you can just add to your data rotated molecules was exactly the same energy and just hope that your genetic function would learn this rotational invariance of your system. The same thing happened if you're just rearrange the indices of the atoms or just the orders of the atom coordinates in this list. So there's nothing by this we just have some array of indices, we just shuffle it and then we here just rearrange the according to this mixed indices and look at the positions again they are totally different order and pay to the previous one, we didn't do anything we just really both the atoms. And again we have totally different representation. So, this is not particularly convenient. And I just stress on this because it's not convenient this is doesn't mean that it's just again impossible to do just this. Maybe it's just that it would require a massive amount of data, it will require the complexity of your generic function say the neural network that it would learn all of this that you want. And this just not very practical not very efficient. What you want in particular is that take your atomic coordinates and types and take it in another function that would somehow represent this this in a some different way, including some domain knowledge of yours. And because of the different domains that representations could be different and that sort of also important, but in general what's the domain knowledge here is just physics. And most of the thing is what the requirements to this function here that I usually called the script. The main requirements are most that we just said that would be the rotational and translational variants, because we know that's the, the energy of the system do not change upon this transformation. Also we want it to be presentation and the version and variants, invariant. Preferably, the size independent say preferably because size independence is not a physical requirement really. It's a technical requirement and physical model can be perfectly size dependent. And it would be kind of naturally that the, the property of the system depend on how many atoms it can consist of. But for modeling applications, because again we kind of talk about all of this in the context that we want to do some simulations for with the structures. Fast and efficient long time law, but large scale simulations. So we want that all our representations that we do are actually size independent so we can scale up. It's not the single molecule but say the hundreds of these molecules, or in this molecule incorporated in some something else, but such that it still that wouldn't affect this representation. So, now, let's take a look at some couple of simple examples and historically one of the first examples of the structural descriptors. And one of them is the two column matrix. So definition is right here. So, basically, you take your items, and you construct them square matrix diagonal elements of this matrix would be just the atomic number of an atom in some power, and then all diagonal elements would represent the sum of all the other atoms in the system and was scaled by the distance between the atoms. So couldn't be more simple than that. Now, then the energy of our system say will be some function that depends on this. So here we just build a function that would construct for us the decolon matrix, given the the atomic structure. So here we just take as an input atoms for further experiments we allow this function to to accept couple of arguments, whether we want to kind of commute the atomic position so we want to rotate it. And here we just get the number of items that no structure. So if the mute is true, we arrange atomic indices, if the rotate is true we rotate it again some degree. And here we construct the, the decolon matrix. That's not the best way to do it because this matrix is obviously symmetric. So in principle you could just compute the only the upper part, and then mirror it and all the other stuff but here is it better try to make it bigger. Everyone can see. Alright, back to colon matrix. So again, this particular algorithm is not necessarily the most efficient one, but we just want to kind of explicitly see what we're doing here, and just to build it. So we go around the all our atoms and the basically all the elements in the matrix, we take the atomic numbers of a particular indices here that will list. And we take the positions and we compute the distances between the atoms. And so if the indices are not the same. So we compute the old diagonal element. And if it's the same so it's just a diagonal we take the atomic number atomic number of this atom and perform this. And we then return now matrix here so we write this into the particular element of the matrix. Well, and by the way why I'm saying this I'm just assuming that everyone is somehow familiar with spider and how it works and and all the index in this stuff. So let's say where if not raise a hand. Okay. So that's our function. Let's take back our molecule and try to build this for this molecule. And what we expected exactly the diet square matrix here, because it's symmetric again doesn't matter with either also columns, they would just represent an interactions, or some, some sort of interactions of a particular atom. So the first one would be oxygen first was itself that was carbon hydrogen, another carbon and another free hydrogens. And that's basically that's the matrix of numbers. More or less what we expected now. So let's check if it's rotational invariant, and do exactly the same but now this time set the rotate truancy. And, as you can see the matrix did not change at all. It's all the same elements in all the same places here. So that at least one of the requirements that you want to have for this filter is fulfilled. But now what happens if we permute the positions of the atoms. And now the matrix is looks totally different. It's all the same numbers but they're on different places. The problem is, as we want this technically required that this representation wouldn't change upon such operations, because then attaching some learning machinery. Today some function that would actually produce energy out of this descriptor. So the dimensions will always come into exactly the same way. Otherwise, it wouldn't distinguish whether there's something change with the structure, or it's just with the same structure but you did something to it. So, it's very simple descriptor, and it appears to be kind of almost working so can we somehow lift this small restriction here. And there are actually many ways of doing this. There are not a unique way of doing something. And that I guess I will also repeat a few times during the tutorial that there are, there is lots of arbitrarities in the in the machine learning in particular, and all these representations and machine learning potentials. Because there are only this few physical requirements that you want to have from your representation, but you can come up with multiple numbers of ways of how you do it exactly. And here we would see two or three of them, but there are many more. And even here, having this representation and having the fact that it's not permutationally variant, how you deal about that. And there is again there is not a unique way of doing this. For example, you can just sort this matrix itself and use the matrix as a descriptor directly. And some other ways, for example, is just to compute the eigenvalues of this matrix, and then solve them according to the values of the magnitude of the value basically. And if we do this for all the three matrices that we produced, we will get the list of the list of exactly the same numbers and exactly the same order. So, even though the matrix itself changes upon the permutation, the eigenvalues don't. And the only thing we just have to make sure that we arrange those eigenvalues in the same way, regardless of, of which comes first. And that this one would be as suitable descriptor for do some learning to assign the energy to it with some particular methods. Usually is the column matrix is done with the kernel methods. And I guess you were there there was a lecture yesterday about that and the tutorial is coming after this about the kernel method so on the methods of how you learn from that descriptor the energy but rather just the representation itself. The other variations of this methods for the periodic systems, because obviously the you cannot do it so easily if you have periodic boundary conditions, but there are ways of how to do it. And basically it doesn't become more complicated than that. It's just accounts for the for the distances of the atoms that you have in your code the atomic distance that enters the column matrix here. It just computed taking by taking care of the boundary conditions. And that's basically it. So, what are the advantages of this particular representation. It is very simple and arguably couldn't be more simple than that. It just literally have atomic numbers and positions as a descriptor and that's it. And it's also global, meaning that you, your descriptor has the information of the entire structure. The atoms interact with all the atoms, you have information directly or indirectly about all the interactions or every atom is every atom. And, and principle this is something that you want to have so that that's a plus. And the minuses is that it is simple and that means that potentially there would be some interactions that this descriptor just cannot capture simply because there is something more going on, rather than just positions and the numbers and you have to have some additional degree of freedom that would allow to account for this. And, and this is again, the opposite of the this quality of this descriptor is that this size dependent. So, here, even though this array of numbers, it would be exactly the same order for exactly the same for the same molecules or even for the molecule for the same size it would be the same and then you learn the difference between the molecules, but the number in the amount of numbers that you have here, it will change if the size of the molecule change. Here we have what seven, and then would be 10 atoms you have 10. And that is a restriction actually and big inconvenience. So in this case, first, then preparing your data, you have to assume a maximum size that you have, and all your descriptor should be of a maximum size, regardless if they have this number of items or not. You would have to insert something, or we usually zeros that they account that this particular molecules for example doesn't have so many items as the other. So it means that it doesn't really scales that good. If you want to do such a descriptor for structure for few thousands of items that would mean you would have a matrix of a few thousand thousand. So the computational cost of this method would scale quadratically with the number of items. And the whole point of having this machine learning potential something is to have a linear scalability with the number of particles in the system. It doesn't mean that it's particularly bad or something it's again about the domain of application and what exactly you want to have. And I put here a few links that you can take a look for a further what horror study on the topic, and it's, as far as I know it's successful method for, for its domain of application, basically. So, now, the, let's move to the other type of the descriptor. This size dependency. So it is crucial if you want to have if we want to simulate his own method systems that are of size of several thousands of items or more. So the classical way of doing this and that's the way of doing this from the classical empirical potentials is to say that your total energy of the system is a sum of the atomic energies. And that's already puts us in a lot of troubles, because this is a very you define problem, simply because there are infinite number of ways that you could fulfill this requirements. And the fact that there are that we are talking about the structural descriptors and not to say descriptor is already kind of showing this this arbitrarity here there is multiples and close to infinite ways of doing this. And with that, still, it allows us to kind of to separate the energy into individual contribution. And now instead of writing some function that will describe the total energy of a system we will try the function that would describe energy offer given atom. And now, while just discussing it and we haven't even done anything yet, we have to write now that our generic function that would take as an input and descriptor and immediately something, some other function, which would depend on some distance here are RC. How this distance appears here and why is it appearing here. So this is simply by this requirements of lack of that we just did here. If we want this some kind of to make sense. We have to limit our energy of particular atom to some local environment. So we have to limit it in space somehow, otherwise we just have in a global descriptor again then there is no reason to actually do this separation. So we have to put a limit. And then since we have to put it so it have to appear here. And so, usually, it's done so you have your central atom and you draw a circle around at their distance of our scenes. We call the cutoff distance and everything that is inside this cutoff distance is contributing to the energy of the central atom here, everything is which is outside does not. And that would be this local atomic environment. And introducing of this cutoff distance immediately also requires us to introduce a cutoff function. And the, the role of this cutoff function is make sure that the basically the continuity of, of your energy surface. That's simply because if you don't have it at any point of time some atom which is outside of this cutoff distance can get into your cutoff. And then it would, it would introduce a stepwise change into your energy. And that would mean that discontinued discontinuities kind of first derivative so you have troubles with forces and all the other stuff. So you want to avoid this. And so, whatever happens by introducing yet additional or removing an item from the local environment, whatever changes happen you want these changes to be smooth. So you have to be smooth and also you kind of want that. If the atom is very far, it obviously should not obviously but intuitively, it should contribute less to the energy of this atom rather than those which are closer to it. So you have to introduce a cutoff function that will take care of this. And for this particular example we take a look at such a function, which has a cosine that the case with the cosine from one to zero, going from the distances, which are smaller than RC. Starting from zero to RC the case like this. And if it just outside the cutoff distance simply zero. So it does exactly what we just, they just describe. So, let's take a look at this function out looks like and we can just implement simply here by the function that takes two arguments the, the distance. But that would be the atomic distance and then the cutoff distance and then we simply do this and everything that is bigger than RC, we just saying that it's going to be zero. And let's now set the cutoff to be 10. Let's span our distances from zero to something slightly bigger than 10. Let's just draw the function doesn't quite fit into the screen but this is basically the idea. So it smoothly goes from something to zero and everything that is outside this function is zero. And everything that goes in is slightly and smoothly starts to contribute to to the total value here. There are again multiples ways of doing the cutoff functions. Some of them don't decay immediately they are just one all the way and then closer to cutoff they start to rapidly decay. And the, that's all possible, all the different functionals for functional form of this cutoff functions are possible, polynomial functions or something. And that's again, to the argument about the arbitrarity certain degree of arbitrarity of the things that we do. And so with the cutoff function. We take a look at the structural descriptor that's called that Adam centered symmetry functions. And the idea of this descriptor as that by just taking this inter atomic distances. Oh, let me from late another way. So imagine then you would have a structure again, say the other molecule that you've seen the other time. And it has a set of inter atomic distances that are present there so some atom is distanced it from the other or the particular distances. And then you have for all the these atoms you would have the the list of such distances here are Jay, and then say now since we are moved to talking about the local picture. We are mostly going to talk about all the time body descriptor or the energy of a particular central item I. So for this central item I, you would have this, you would have several neighbors and Jay neighbors, and for each of them you'd have a distance. What you can do is then to take a Gaussian function or some some probe basically at distance RS and see if your local environment of this atom has a distance that it is located here. These functions are basically nothing but pro at some distance that you put and check if there is a distance like this or not. And this parameter at the here determines how wide your probe is. So, and then the the cutoff function comes in so let us just see what looks like and I guess it would be more clear from there. So we implement this descriptor here again, and the input goes into atomic distance, the parameter at the pro position and the cutoff radius. So, basically this is nothing but exactly this function this formula implemented here exponent at the atomic distance minus the pro position squared, and the cutoff function that we just implemented before. Let's choose the cutoff to be 10 again, or at a parameter, and the positions, or let's just first take a look at the some range of the inter atomic distances that would range from zero to to all cutoff, and the, let's do a several probes at the positions that also range from zero to cutoff distance and just try to plot it. So that's basically exactly how it looks like at some position RS, you would have a Gaussian peak of which determined by the parameter at that's basically what you're doing by applying this function, you probe the space here. This one atomic environment, you basically insert the probe say at some distance RS, and say if there is an atom around here. And if it is, you would get from this peak, some values, whether it's how, how close is it to to your pro position you get the value, which is higher lower. And that's what would be your descriptor. Let's basically see how the at the parameter would influence the hour X. So the smaller the at the wider the probe width. So wiring this parameters, you can adjust the precision of how precision of your face space probing. And then you can even if you take it even smaller, say, one, then the peaks would start to overlap. And then, even this let's say the distance five, even if you don't have an atoms which are five angstroms from each other. But let's say for you would still get a contribution from that probe to your descriptor. And then the descriptor itself would be the sum of across the neighbors of this probe functions. And then you can select the several of those props like here and say that I would prop different places in my local atomic environment. And for each of this probe and for each neighbor that hits the probe, I would just accumulate them and see how it looks. So yeah, we can also take a look at how the bits changes with the other parameter like this. So basically the smaller the parameter do wider the peak. And this is the to go the hyper parameter of a model. And we'll come back to it a little bit. So let's just then build this descriptor for particular crystal structures and just see how it looks for safe for BCC and FCC lattice. So we're doing this by again. And using these functions we pass in the atomic structure. We would build the several of these descriptors. And the here we pass the positions of our probes. And we put the data parameter and the cutoff parameter. So first of what we do we should build the neighbor list. That is something that for each atom in our structure, we should construct this list of the neighbors that it has. And that is just the function that does it. Nothing really to do here. It's probably just something to explain that basically that's the cutoff that we have here in the particular form. So we have this parameter some might be important here. So the self interaction means that we exclude the atom itself from this neighbor list, because they are descriptor only you can for this description of a local atomic environment we probe how this given central atom interacts with this environment, we don't consider self interaction in a way. We have to exclude this one. And here both ways means that we, if the city atom J is the neighbor of an atom I, then also for the number for the atom J I would be a neighbor as well. Because again for each of the atom in the structure, we have to build this, this full environment, because our descriptor is described the kind of depends on this full full circle around the in saying the classical some the classical atomic potentials, you depend on the kind of the total number of pair distances. And you would say this to fall simply that that would mean that if and J if it's a neighbor of an atom I, then I wouldn't be a neighbor of a J already, because you counted this distance already. That would be so called half neighbor list. And for this one we need a full name list. So we do that. Then let's, then here we're preparing the list of our G to descriptors and for that we're gonna feel now. And then for all the atoms in our structure, we go iterate and get the neighbors of a particular item I here it returns the indices of of the numbers of these atoms. So in the list of the positions, the indices will be the position of the atoms in this list, and then the offsets here. That's our offset vector, which takes into account the periodic boundary condition. So if the neighbors of the atom is actually located in a different cell, the offset vector will account for this. The index would return you an index of an atom in the in your real cell and the offset will tell you if this atom is actually in this cell or in the periodic immature this cell. And because of that. Here we compute the intertomic distances. So we take here the distance of our central atom, and we repeat it as many times as it has neighbors. So just to compute all the distances at once. And then exactly for the neighbors, we take the positions of our neighbors in the real cell, and then add the, the offset distance. And here this symbol is a matrix multiplication so offset is a vector. The atom cell is the, the cell, the, the cell of the system, multiplying by this you produce this displacement vector from the position in the real cell to the periodic image. And then you just compute them to atomic distance and take a normal of it. And here. Yep. So we do exactly this. We are previously defined to your radio function that computes our descriptors. We pass in this distance that we just compute the other parameter the probes, and the probes are in the list of our rust, a restless that we provide later. And we some amongst the across the neighbors of a central. And then return it in the end. So, let's set up couple of constants here so ng would be the number of here of our G descriptors that are we going to produce RC is the cut off. So, the RS list, exactly this this list here would, we just choose it's something that ranges from one strong to RC minus zero five simply so just not to take it exactly at RC because it would be zero by the definition of the cut off function. And we take the energy of those and just set some random data parameter. And then build an FCC corporate structure and set the lattice parameters to some arbitrary numbers a for and build our G to descriptors for it and plot it. So this is how they look like something funny, but that's just basically will further later try something else and we will see more clearly what it is. This is for the moment. Some descriptor. So basically that's what was supposed to do it just gives you some information that that's to say is basically that at some distance a free Armstrong, your atomic distance density correspond to something like this. But I mean, I connected the dots here but in reality you only have exactly this number of dots. You have the information that your structure has such this density of neighbors so your atomic environment rather has some density at this distance. It does not necessarily mean immediately something that you can just do the same for the BCC just do the same but just change FCC to be CC here and compute this. It's just different from the against something but different from the FCC, just if we put them together in one plot. We just see that they're different. And this is basically the purpose of what we are doing we want to have a representation that fulfills certain requirements that we listed and just able to distinguish different structure. It seemingly does the job FCC and BCC are different. So let us just increase the number of functions that we have here. And then we will have a picture like this. And this is basically nothing but the radio distribution function. And just what we do with this atomic centered symmetry function, we just probe the radio distribution of an atoms and summit with this. We probe it with some smooth function and accumulate it accordingly. And in the end, our descriptor is just a radio distribution function and what we did is just way of computing a radio distribution function. It's the same for BCC and that's for the comparison so they look clearly different but actually, you would see that FCC and BCC at certain distance are very look very much the same. So just to make sure that it works we can create different FCC also structures but just with different volume, assigning different lattice parameter and certain range, we would build a descriptor for the structure and see how that would be looking like. And again, we'll see that they are all different. So we built something that is according to our requirements. This is rotationally and translationally in variant, because the only coordinates that we're using here are inter atomic distances. So internal coordinates they do not depend upon these rotations or translations, the distances in the atom do not change. So it is also permutationally in variant, and that is achieved by summing the neighbors. So because of the summation and doesn't matter in which order in this neighbor solicited, they all contribute in the end to some accumulator function. So the function is able to produce us to give us something that that looks different for different structures. So, exactly what we want. The only thing that is left is just add some learning machinery to assign an energy to such a descriptor. And that was just a two-body descriptor basically so we only consider the interaction of pair of atoms and never more. There's a similar way you can do the free body interaction, so considering free atoms. Now, having a central atom I, you not only consider the one atom, J from its neighborhood but also some other K. So you can angle between these atoms and multiply by this similar exponent, which is scaled by these two inter atomic distances between atom I and the other K, IJ, IK distance. And you multiply by two cutoff functions now for the which depends from each of this individual distances that again make sure that you are all smooth and nice. So that's the additional couple of parameters. One is this side. That would be determining again the width of our probe. What this function does basically comparing to the previous one so just take a look how it looks like immediately and discuss from there. So, here we just implement this this function here. Nothing fancy. And for some particular parameter at the cutoff distance and the range of the side parameters, which is build it in the range of angles. I insert directly angles. I not compute the angles as a between two vectors of distances I just hear for the illustration I insert the angle directly. We compute it, and that's how it looks like. So, again, this is just a probe, but now make it smaller. But now instead of the distances we probe the angles, and depending on the parameter side, the bigger it is, the more narrow the probing window. And the now having an angle having this descriptor here, the closer it is the angle to zero, the larger contribution you get. And the bigger the side parameter, the closer to the zero you have to get so you get only contributions which are closer to zero 360 degrees. See that is symmetric around the the full rotation. And this is something that you can vary it for a change. And also there there is another parameter lambda here that can be either plus plus one or minus one. So if you change that you revert the function totally in different way, and you start to probe the angles which are close at 180 degrees. And again, the parameter side controls how close you have to be to the 180 degree to get a contribution. So basically building something like this as previously we will build the radio distribution function, this would build us an angular distribution function. So, these are these descriptors and there are a few variations of this as well. And this is an atom centered symmetry function. So here I didn't put attention to it but you have this indices here that used to be two is five. And then you can imagine there are three there are four, there are six and seven and whatnot. So there are a few of those. But they all kind of share the same, the same problems and advantages. The advantages is that this is relatively simple and relatively means that I mean it is undeniably simple because we just implemented it here, and a few lines of code, so they can be difficult, but relatively simple in a way because there are quite a few parameters that we were talking about the side the data and whatnot. And these parameters are introduced, again, a lot into the arbitrarity that you talk about. So, you have to build your descriptors somehow so you have to choose first for the number of these probes that you have, you have to choose these parameters for these probes. And you have to stack them together and insert in some learning machinery for these descriptors usually neural networks, and hope that you can do the good fit. But if you want to make better for example. So you will assume that you probably need giving radio distribution function here, having a few programs so you probably see here that you don't need probably as much descriptors here because in two ways, probably you need something some more here because you have big density here, and you want to describe it more accurately, but how do you know it in advance. And how do you choose exactly this parameters and usually this parameters are hyper parameters so you don't optimize them during the fitting procedure. So you choose them, and then you do the fit. And ideally, you would have to choose them as several sets of these parameters and do several fits, and from them, try to choose what's the best that inconvenient and quite arbitrary in a way. So basically one of the disadvantages that it has advantage against local and size independent is actually what we wanted. And this is again immediately it is advantage that it is local, because some interaction that might be spanning across the whole system like chromic interaction you are not accounting for them. So I have to do it in a separate way somehow. The bigger disadvantage is that it only can include includes two and three body contributions. So the one that the radio distribution and the angle of distribution, and moreover the three body symmetry functions. You have to explicitly list free body terms. So the truck this triangle of atoms you have to list all of them. So the actual explicit free body contributions. It's quite expensive in a way and it getting more expensive the logic at all few have. So this come this grows can be not really for a given an neighborhood you have a combination become a natural combination of these triplets of atoms. Again, a few references to look at about this descriptor. Again, about the arbitrage. So there is this, I guess, famous picture you might have seen it already before that basically shows quite a few this structure descriptors that exist. They have some connections and so on. Not going to discuss it too much about except a few things that one is here listed is the complete and body representation. There are only a few of such one, two, three, four. And we now going to talk about the ACE descriptor here. And the ACE stands for the atomic cluster expansion. So what it does, I guess I have a little bit short on time so I skip this part a little bit and go directly here. So what we do with this atomic cluster expansion we do is so called the density trick so similar to what we have done just for the symmetry function. We define the neighborhood density of the other in this way, but instead of some smooth function we define the delta function. And the other thing, this is the definition we define it like this. And we define the one particle basis function that would depend on the entire atomic distances and the spherical harmonics of the spherical harmonic so the orientation of this of this bond. In some way, so this is spherical harmonics this is clear, and this is some radio functions, and they are some radio functions you can choose variety of them. And then defining this, we now projecting this a neighborhood density onto this basis functions that we chosen that basically leads us to producing this some something that we call atomic base. And this would be the summation across the neighbors of this one particle basis functions, I just compressed here this index in here, one particular one. Now, after doing this, that's all that would produce our base function for just two bodies to body interactions. So, in terms of this function, we can produce an end body function basically and body contribution to our energy to property whatever we want to expand. So many body contribution to the, to the expansion by simply taking the products of this atomic basis as many times as we want. Picturely kids can be represented like this. So, this would be so our energy would be contribution of two body terms directly summed together across the whole neighbor so immediately the whole atomic density in the region would be contributing. Then our free body term, I would be just simply a product of two such densities for body terms with free body products, and so on and so forth. So, the, the atomic centered symmetry function would be only limited to this. This is a very much truncated expansion. And here we can easily extend it further. Also, unlike the symmetry functions to build a free body interactions, you don't, you don't have to explicitly build the free body terms explicitly come calculating the contributions from the triplets of atoms. You can only build it for a pairs and contribute the higher body contributions obtained by the product. So there is another nice pictures that represent how it works and why it works. It's just a schematic representation but having this projections of the neighborhood density, summed across neighbors, and multiplying two of such. We're getting not only the pair contribution, but contributions of a higher order as well. The only thing is that this products are not necessarily rotationally invariant because of this spherical harmonics here. So we have to make them rotational invariant, and this is achieved by the fact that we summing across the, the M index of a spherical harmonic, such that would only contributes that would only leave the contributions with the zero hunger momentum. So that means that those which are rotational environment that's achieved by summing with the appropriate clebsch-cordon coefficients, but not too much details about that. And then we can define some atomic property that would be simply an expansion in terms of, of these basis functions with some expansion coefficients which are now our triangle parameters. So we can say that our energy is basically some function of this atomic densities of these atomic properties that we define like this, and we can choose multiple of them. Again, to the arbitrarity argument, we can choose multiple of them and if we choose only one of them we can say that this is already our atomic energy, and this phi i here, and that would be a linear expansion. We can represent our energy as a linear expansion of basis functions, and we can build as many basis functions of different orders as we want. We for each order we can build as many functions that we want, and that because we have basically this expansion is complete in the way so you can go further as much as you want or is required to build the energy as accurately as possible. So that is we, however, use a nonlinear function here which is just so called a finite singular type. This is a combination of two densities. One is the linear expansion and the other one is the square root of this expansion here, which allows for a better efficiency but again, not too much about that. So this expansion this this ace expansion allow us to build this some descriptors that would because of its completeness and because of the fact that the higher body contributions are built just from the spare bones. So we can build a very efficient potential and we can show that this is the which basically forms the perimeter front of many other potential basically, meaning that it is the most accurate and the most and the fastest among the few. So, and then again so that's basically the, the, the advantage of this representation that it has, it is a complete basis, meaning that unlike the previous like the symmetry functions for example, if you want to extend your representation if you want to make it more detailed or you don't have a particular guidance apart from your own experience, you want to have additional probe say, I want to make my descriptor larger, where do you add it exactly, which parameters do you choose, and so on, how do you add a next function, basically, there is no definition of what's the next function is. There is no problem of non complete representations and for this one it does not exist there is always a next function that you can add naturally. And again, this is advantage that is local, but in a way this is something again we want, but that restrict us from the scripting some other long range interactions but again, that just only means that they have to be taking care separately. And then there are references to look at for more details. And then there is just something and I want to draw your attention is this GitHub page here it comes here that has the implementation of this of this ace method here. And now do some fitting with the ace and then we would look at how the ace basis functions ace representation look like. So, you've downloaded before at the beginning some data file. This much. That's because on your on the call up that you have there there are. There is different Python version so I had to rearrange some things. There's just some data file that I've loaded for you that contains some DFT calculations for a copper of different lattice structures, the crystal structures lattice parameters volumes and whatnot. So we can plot briefly something which is the volume per atom for the structures that we have and the energy per atom and something energy corrected so basically energy that we have here are cohesive energy. These are the FHI aims calculations so it's a full electron code so it has much bigger energy so it is normalized by the energy of isolated atom so producing a young cohesive energy. And that's the structures that we have there. There are just one electron world above the, the minimum here, and it contains the scaling with the volume crystal structure they are shaking so I don't move a little bit around. And there are some structures with higher volume so they're probably some open surfaces. And now what you have to do is then here on the club just simply running this this commander that should start the fitting. Yeah, this one would just start for you and I will go somewhere else and do exactly the same but different place. So what it does it starts the base maker code that that runs the fitting of this potential with some config file. And meanwhile does it we just can have a look on the config file a little bit. So, there are a few parameters that are similar to what discussed before but also specific to a particular form of this expansion here. So the one is the cut off here is the again this is the global cut off for the cut off function so everything that inside contributes everything outside to not contribute. So we define our elements here from this particular example that would be on the copper. Then embedding. This are this function so here we call this embedding function. And there are some specification for this meeting functions we use this finish Sinclair type, which is the contribution of two densities one linear one a square root. And some parameters of this of this density of. So basically there are two of them, and some parameters of this which basically saying that this is the linear and the square root. And those are not important for us at the moment. Then there is the bone section here that we have to specify that would describe how we take care about the radio part of the base base function. And this type of this radio basis because I mentioned that this function in particular can be whatever in principle of your likes. We, for example use some Chabrisch of polynomials, or the linear combination of Chabrisch of polynomials actually, or in this particular case this linear combination of simplified spherical basal functions. And as I said, this part is again contributes to the arbitrary a little bit you can choose it, according to your likes like for example that you could choose the Gaussian functions. As in the previous example. Then the cut off again, that might be different from the global one. Just again this determines the sphere that we look up for the representing of the atoms, the type of the cut off function. That's something I didn't talk about, but it's not particularly important. What is important is this section so that one that determines the basis here, the one that determines the how deep we go into this product. And basically how many times we multiply these functions together. And this describe here. The first one would describe only the pair interactions. And therefore so that would be only this single function here. Just we just can choose a number of it. And again, and like the previous example, that's, it's a basis expansion so you can just specify how many you want, and it would be choosing the like the next one. So the result with the definition what's the next function is, and you can just select, I want to have this much. And if at some point you decide that you want more you just put a bigger number, and it would put a bigger number. You don't have to select where this number have to go. And then in this list the next order would, and basically here, it would be defining the N and L for the radio function, and the L the maximum L for maximum the angular momentum for the spherical harmonic function. And because this is only pair interaction. The doesn't have angular contribution. So it's zero. For a three body interaction, it's already contributes, and we can select the maximum and maximum L here for the functions that goes into the product here. So this would be free body order and double product of this function is the maximum and three maximum L to that might just put this numbers and you can change them to anything and again, if you decide at some point that you need a bigger basis you just put a bigger number, and it will produce the functions. This very simple in this way so the four body contribution and five body contribution. Very straightforward in a way. And here there is the have to specify the file name with the data, and we did it already automatically with the uploaded file. And then there is something that concerns the fitting. I don't want to talk much about it because we don't touch the fitting problem. Basically the only parameter here that determines the contribution to the loss between the forces and the energy. Because I didn't touch also any forces. I put here zero that means that we only fit energy. We don't fit any forces. So the method of optimization how many steps we do. The other thing is that the A space is allows us, as I can illustrate here so we can, you can choose the trainable parameters, you can select which part you train which part you don't. This is in particular useful. If you want to have a notice unary compound representation but some binary say. This property of ways because of this delta function that we have so delta is not only in the radio department but also in the chemical one. So we distinguish function that they contribute from a particular combinations of pairs. So here, you can in principle combine different potential so say you have a representation that you train for pure copper here, then you have somewhere data for aluminum, and then you have to have copper aluminum. You combine the two unit representations, and now you want to only learn the, the binary one, and you can leave the unary untouched. The other thing is that the completeness would allow is, is the continuum continues grow of the, of the basis. So here we specify the maximum and also basically the maximum size of our basis, we can select a very big numbers here for example. Here, selecting the latter step, we can do the so called lot of fitting the hierarchical fitting, where we could add the function one after another. So because again of the basis, you could choose which functions first which are second, and you can add them by portions one after another or like the, the few of them like here five for example five one after another, and see how your description converges. And then at some point when you satisfied with the accuracy say you can just, yeah, same, I'm fine and and and stop so it's not necessarily grow the basis. So again unlike the previous representation with allow you some flexibility. So yeah, exactly. And yeah, so we are close to down to the end so just here again, few words that there is one additional package that you install there is the terms of potential that just does this fitting on the GPU, basically. And that's would determine how many structures you do at once and etc there are some technical parameters for more details you can again visit the this page here. And then there is a link for online documentation for all of this is the instructions and stuff that you could go through and try to see for yourself how it works and how to do is how to use it for your particular example. Some instruction on how to prepare this data sets that you now have at your hand to do it for your data and how to start the fitting and whatnot. So now. Okay, now I have troubles here but anyways you shouldn't be having them. So what you see there is that's why some fitting in progress is that exactly how the fitting procedure goes. So you start with some select basis and then the accuracy of your representation is very low, which is represented by this error here. And the more you go the fitting will see just get more and more accurate. This is some some fine, but then at some point, you see that convergence stops and then you don't improve anymore. Then the this hierarchical we're fitting steps in and just basically says though okay I'm done with this, and I'm extending the basis. So it used to be just the five basis functions. And then now it extended it to additional 10 because we selected the step to be five basis functions. And is the and after that in adding one immediately see that the convergence. So the improvement started again up to until some point where it stops again and then you have to add additional function. And so it's already converged here. So we open we had additional function and we start with the 15 now, and again the convergence continues. So, we produce some potential in the process that is so in the process of fitting it produces a few files, and basically contains the this lot of steps potential potentials that we kind of extending the basis and each step basis it saves the potential in these files, and this one contains the current representation, we can also quickly take a look at this one. So it has the information about basis functions. So it has the, the, the, the payer contributions you see there's only two copper atoms and the types of this interaction that's only copper copper. So we only have this one in the data, in principle we could have here something called right now aluminum again, something else. So the free body contributions. Now then the four body, and the five body, and this here are the expansion coefficients that correspond to this particular basis function. And these are the N and L, which correspond to this basis function. Just some idea how the structure look like of the potential what it has inside so this basically literally a list of the basis functions with its parameters and the expansion coefficients. We can try to do something with this potential. We can learn the, we can load the calculator that prepared for this in this basically the IC calculator for this potential. And we can do some computations again, let's set up some corporate FCC. And create some and just let's compute the, the lattice parameter basically do the marina girl curve, ready to change change the volume, and continuously compute the volume and the energy business this potential, and see what we get. And this predicts us the lattice parameter three, 63, which is again very close to a PBE DFT value. So we did pretty good job. Yes, we can also do this for the PCC copper, and also get value, which is reasonably close. We can combine this course for two phases, and did we see something that expected copper is the FCC material so it's low in the energy than the PCC. And we can just quickly look at the basis functions, what the ACE produces. So you can access them when you create the structures, and then you set the this calculator there and perform the computation then the calculator has the attribute projections, which are basically the basis functions for these structures. And this would be because I created a two by two by two supercell that would contain so few atoms with 65 basis functions and for each of the atom that would be the 65 basis functions. And we can just take a look at them, and they have no particular information by just looking at them. And you can see because that that's somehow the difference between the atomic symmetry function, because they are somehow intuitive and illustrative. You can look at them you could see what means and that's also the reason why show them you can really kind of the meaning of them is quite visual. These two projections are not so meaningful. Just by looking at them by eyes, can't really tell the difference except it tells you about some symmetry, because you would see that some functions I have zero or close to zero values and the close to zero simply because I put some random noise in the positions, if they would be completely perfect. All these functions would be just trying to do this. Yeah, see that these values will be completely zero. That meaning that this is just it. They are somehow the only thing that you could see from here is that they resemble the symmetry. These functions are zero simply because they are zero by symmetry, and having them you kind of incorporating the symmetry description there. If you break the symmetry this function start to have values, and that they will start to have contribution to energy with the corresponding expansion coefficient. That's basically ensures the security that you could reach. And so, but I guess that's that's could be the end of it and thank you very much for your attention. If any questions I guess you could ask now or afterwards. Thank you. Thank you very much, Anton for the exceptional tutorial very clear and detailed and going through all of the majorly used descriptors in machine learning. For time constraints, we're going to end this now and have a coffee break, but I'm sure Anton will be happy to answer your questions one on one. And also remember that the collab notebook is accessible to you and there will be a recorded version on these going online. So we now have a coffee break and we start again at 11 with the militia Todorovich on Kernel methods. Thank you very much.