 Are you the Chairman? Yep sure, so you can have this. So welcome back everybody! My name is Alessandro Lajo. I work here in CISA. Close by. I have the honor to chair his session. And casts in to is a field which is important is really booming in many communities. In vesef it is a field at the border between different communities, this is the field of well-using and developing machine learning techniques in the context of solid state physics and material science. pri tem seštih nalivamo nekaj vzdušenih, iz nekaj vzdušenih komunicji, nekaj vzdušenih Mikere Ceriotti in Gabor Ciani. Zvoljimi imeli nismo se b만očnih stavov. paling neither miles tones ideas which are now stimulating, a lot of research everywhere, I am among the ones that are have now learned to use these techniques developed by these two guys in my own research, for several different Tako, sem zelo zelo poživljeni, da se zelo poživljeni, da je tukaj. Zelo tukaj je Gabor. Čakaj. Čakaj. Čakaj. Čakaj. Čakaj. Čakaj. Čakaj. Čakaj. Čakaj. ¬ territory ¬... ¬, ¬... me najkela zdaj nekako v tem rukulji, Mikkele Parenello očili o jebrih pé načina, in parjevce lati se pearlsi沒 od first publikacija in da bomo tez misto zato poslednosti o to priključenem z kolačnega odvačenja v ensembne lepova, da bomo zdeni me zvoj govadi nekaj nekaj staj, od razduh lahko. Molektorodemice하면 je zelo vizivno. Prevenim igram. Danes porobil še četv, in na četmu, nič smo nekaj ti bila, Če moramo beli izvajati, da počimunjamoč, da bomo počimu, kaj prisoners velika ovedno, in tudi, apa, nekaj teo v desnovu. Prevenim igram, nekaj ga se počima, in ga. What are interatomic potentials, so really there are three communities that are actually disjoint and that's been a problem, a learning curve for me and I hope that actually this approach, the machine learning approach is the one, which can bring them together. So the first community is really the community of force fields, organic force fields in biochemistry. vse predstavnih pakča in vse metod. Vsih transferabilitaj je obližno, in je več na razovim, da je to so vse začetne. Vse vse se pretrovno bolj se držete, pa se vse vse složite, in počutite protične protične, je to vzbolo vstavno. Imamo izvah, da je nekaj komunit, vznik, in nekaj vznik in vznik stavili materijali, in imajo nekaj faz, krvah, zeloč, in vznik ta komunit, nekaj potencijačni je krejil, prihajte začnega, počkaj ki je počkaj, in ne bo to tudi vznik, ne in vznik, in v organične svetlje, in drugi nekako se možemo vse partnerne formi. Tukaj je zelo komunitet, fiz pozitiv, materijalizacij. In tukaj je terga komuniteta, izgleda što je vidiš tukaj z drugimi in je tukaj z kojom komunit, ki je kratične komunit, in je se način v vse vse možete, ki je kratičnje stavlje, povah nekaj in In jim je to v tem, da je zelo skupnje potencijalne energije do vseh systems, kot je skupnje neko je pomečen kvantum kemistri, ki je potrešen klaster, in kako se skupnje več več vse. Zvečo, da se časno vse je vse vse in tudi ima tudi odličen. The goal that the long-term vision I have is really to unite these 3 corners of this triangle to retain the accuracy and many of the attitudes of these people, but make them transferable to large systems and make them reactive and applicable to the materials that lots of us are interested in That's sort of the end goal. ki subsequently doesn't go. First, I think what has to ask, and I didn't ask for many years while doing it, is actually possible. If we think about density functional theories or any other description of quantum mechanics, is it actually possible to reduce it to a potential energy surface and then fit that on analytic functions accurately. And really, all of us start and never really transcend the separation of short and long range interactions when we do that. And the rest of the talk we really focus on the short range interactions. So I do have to say something about long range interactions because quantum mechanics is long range. So when you solve Schrodinger equation in whatever approximation you do have charge transfer, you have polarizable electrostatics in vandavalske interakcije, in zelo si atomi počekaj, kjer lahko od svoj način. I včešljati tudi, da se udajimo v te rovna interakcija. Kaj pa se bilo studične sistem, na kajno se malo nesel, ki se sem počekaj, in razljeli metod in potencijal, ki je to vse zelo učin, in način učin ne bilo oto sovje. Kajmi, tudi, nalitikih formov od nekih interakcija, Ako pristosti, zaznam, da tako bo vse bo nekaj dobro, da bi smo se obradi, če nekaj dovradi, in da bo tako rukit, ki so. In sem, da najske način, dokonan, bo, ki sem več način bil, in da sem vse nasočila, da bi se vse nekaj dovradi in nekaj da je dovoljnje, da se nekaj dovolj, tako začil jaz. Vse je izten, kaj je ljubo vse, is the remainder. Either after ignoring the long range, after subtracting it or after parametrizing it, is Volterkohn's short nearsightedness of electrons, how does that extend to atoms? Are atoms also nearsighted? And one of my favorite systems in which to think about this is amorphous carbon. You can have very low density amorphous carbon full of SP and SP2 bonds, and you find, if you test this, that atoms are really not short range. So you perturb an atom and forces on atoms very far away change. And you can go to very high density diamond like carbon, which is a very, very short range material. If you perturb an atom here, only the first few neighbors have forces that are changed. So in order to, the rest of the talk is about short range interactions, assuming you've done one of these. And now I want to spend the next few slides telling you about function fitting. And it will look very elementary at the beginning, and the surprise is that that's really all you need. But one key idea is dimensionality. So if you want to fit functions in few dimensions, let's say this is a one dimension, and this is a one dimensional function, and I give you some data. From a physics perspective, it is not a deep problem to make an interpolation of these points. If I ask you to draw it, you will all come up with something like that. That doesn't mean, I don't mean to say that there are no mathematically interesting issues in this. You can write books about the accuracy of various approximations in the various limits. But from a physics perspective, these functions are not going to look very different. And the real reason for that is that we can get enough data to really fill this space. And once we fill that space, mathematicians can tell us about universal basis functions, polynomials, splines, gaussians, and they can study their convergence properties, but they all essentially lead to the same answers. And that is on contrast with fitting interactions of functions in many dimensions, which is what we're going to want to do. To get the short range part of quantum mechanics accurately, we will want to fit functions of, say, the energy of an atom as a function of its 20 nearest neighbors. And that's a 60 dimensional space, so it's very large. So in high dimensions, you get data, but the data doesn't fill the space. It's sort of very sparse. And because when the data is very sparse, really the only thing you can do, and this is at the very heart of every machine learning method, is that you use basis functions to construct your functions, and those basis functions need to live where the data lives. If I try to use a regular grid in 60 dimensions, I have too many grid points. So these typical splines and polynomials that work in low dimensions don't work in high dimensions. And that is the essence of machine learning. It's actually finding basis functions, suitable basis functions induced by the data. Typically basis functions are local, and so I can fit functions that I want where the data lives. OK, so how do we do this? I like to do linear regression in many dimensions. So suppose that you have atomic configurations indexed by these curly letters, so A, B, C, and so on, and I have some observations in the space of atomic configurations. Let's call them Y, and let's say I have N observations, and I'm going to want to use not all of them, but some of them to represent my functions, induce basis functions using some kernel. So the kernel is, you can think of it as a function of two parameters, taking two atomic configurations, and by placing one of them where your data is, it induces a basis function. I'm going to tell you later what the kernel is, but just think of it abstractly as some local function in atomic configuration space. And any function that I want to fit, be that the energy of an atom, be it the polarizability of something or a charge induced on an atom, is going to be written as a linear combination of these kernel functions. So A is the free variable here, I can choose any atomic configuration and evaluate this function there, and B is going to range over some representative configurations, and K, A, B are the induced values of the induced basis. And you can see that this is essentially a dot product, I have unknown a vector of coefficients x, I can collect it into this vector, and this vector K is the vector of the basis functions over all of my representative configurations, and the other argument is where I'm trying to evaluate. And this K is a row of a K matrix that is m by n. So it's essentially, it's also called the design matrix. It's all the basis function values connecting the representations, connecting the observations and the representative configurations. So this is an easy problem to solve for the unknown coefficients, it's a linear least squares problem, you are looking for the best fit. So Kx should equal Y, it's not going to equal it exactly, but I'm looking for the unknown coefficients x, which minimize the error. If you actually implemented that, it turns out that it's very unstable, but it has an analytic answer. So if m is equal to n, if I use all of my representations as, all of my observations as induced in representative configurations, then all I have to do is invert this K, and that's my x. Typically, we don't want to do that. There are too many, many observations, and we don't need all of them to induce basis functions. So if m is much less than n, then instead of K inverse, we use the pseudo inverse, right? Just a slightly more complicated linear algebra expression of a non-square matrix, multiplying the observations, and that gives you the coefficients. If you implemented this directly, it's extremely unstable. Oh, just notice that this is order n cubed, whereas this is order n, so linear in the number of observations, which is a much nicer place to be. But it's unstable, but it can be stabilized, and that's a 50-year-old theory. Tkanov invented a way of regularizing these inversions, and essentially what Tkanov says is that instead of minimizing the best fit, you should also try to minimize the magnitude of the coefficients. And there are sort of very deep geometric reasons why this is the right thing to do. If you minimize this combination, then you get pretty good fits, not the best possible fit, but the best fit that are also very smooth. The amazing thing about the Tkanov regularization is that you retain the analytic answer. So this is an analytic solution of this minimization problem. If I add this to it, there is still an analytic answer, and it's this simple when m is equal to n, and if m is not equal to n, I mean this pseudo inverse case, that is the analytical answer to this minimization problem. Lambda here is a diagonal matrix that contains weights, so you can weight different configurations by different amounts. And that is, in fact, is what we have in our codes. The only thing that I'm hiding in the interest of time and the difference between the talk and the detailed paper is how we pick the representative configurations and questions of derivatives. So I'm just talking about observations. Why? Whereas in reality we observe energies and forces in gradian. So that's the only thing that makes the real code slightly more complicated than this, that we have derivatives, and the selection algorithm. I'm going to want to talk to you about error bars, and the way we... There's a very nice theory, which I'm just going to quote here, that this linear least squares fitting has a counterpart in probability theory. The probability theory of Gaussian processes. So there is an alternative derivation of the previous result, in which you think about having a probability distribution of functions over all the observations, and then asking, what's the probability of predicting the next observation, and that theory results in the same answer as the previous linear least squares solution. The average of the posterior probability has the exact same formula, but when you go through this probabilistic derivation, you also get the covariance. And I'm illustrating that here, suppose these are the observations that I make. This is just a one-dimensional example. I can draw multiple samples from this posterior distribution, and you see that they all agree very near the data points, but they disagree where I don't have data, and we can consider that as an error bar. So this shaded region is now the variance around the mean, and you don't get that from the linear algebra view, you only get that from this probabilistic view. And this is what inspired us originally to call the application of all of these two interatomic potentials. We call it Gaussian approximation potentials, because these distributions that you assume here are multivariate Gaussian distributions. We call it gap. And the reason people really like this is it comes with a theorem. So if you use Gaussian kernels as the number of observations goes to infinity, as long as the function you're trying to represent is reasonably regular, and the number of representative configurations is order n, so m is order n, then you're guaranteed that the error goes to zero, as n goes to infinity. And that's a really nice thing. And the function that you obtain is going to be a sum of your kernel function, sum of Gaussians, so it's going to be infinitely differentiable, arbitrarily close to your target function, whether that is infinitely differentiable or not. You can get very close to it. And I want to just close this section by quoting my favorite public speaker and mathematician, Hannah Fry from UCL on a radio program last year. She said, oh, there's a lot of talk about revolution in artificial intelligence. There is no such thing, but there is a revolution in computational statistics and it's fantastically exciting. And that's what lots of us are working in. So don't believe the hype, that hype, believe the other hype. So let's get back to physics. And the anatomy of an interatomic potential, these are the things that you need. So you need a representation. We need to think about how we're going to say where the atoms are. We need to think about what function we're going to approximate and where we're going to get data. So traditionally, in potential making, people are representing configurations using one-lengths and angles and maybe torsions. I want to tell you about in the next couple of slides how we represent configurations by their neighbor density. We want many body representations, not just one-lengths and angles. We want the complete environment of an atom. Then when it comes to regression, traditionally parametric functions have been used, but in the quantum chemistry community in various polynomials have been used and the body order expansion for two, three, four body terms. Artificial networks are a way of regressing data and the Gaussian processes that I outlined before is just a way of doing regression. The function that you're trying to fit, we could do a body order expansion and very often people do that, think about pairs of atoms, triplets of atoms, pairs of molecules, energies of triplets of molecules, but in the condensed phase, we really want to think about entire periodic systems, so the total energy of many atoms together and that's what I'm mostly interested in. We target data and if you're going through this sort of upper root, so simple functions, parametric functions and body order expansions, you're thinking about a couple of molecules, very often you're able to incorporate experimental data in these fits, but if you're going towards the bottom of these choices, complicated neighbor density descriptors, kernel regression, total energies, then quantum chemistry and solid state physics, density functional theory is where the data is going to come from. I want to tell you what our kernel function is and in order to do that, we need to think about the representation of atoms, so we introduce the atomic neighbor density function, so think about an atom in the middle and it has some neighbors and you could think about putting a delta function at each of the locations of each of the neighboring atoms and summing them up and that density rho of R is a representation of the neighborhood. Very, very quickly one realizes that it's much, much nicer to work with smeared atoms because it's going to lead to much smoother functions when we're going to build functions out of this neighbor density, so imagine gaussians placed centered on your neighbors and maybe even with a cut-off function that smoothly cuts them off as they exit some radius and so we have a representation that is translation invariant it doesn't matter where my atom is this rho of R function is centered on each atom, there is permutation invariance because this sum doesn't care in which order I add up my neighbors, it's continuous especially if I have a continuous cut-off function, but it's not rotation invariant, I like my atomic energy functions they do not change when I rotate the entire system and this function certainly isn't rotation invariant and everybody solves that essentially by projecting this density onto rotational invariant basis set so Baylor-Parnello symmetry functions which were part of the original proposal can be considered a projection of this neighbor density and there are many, many others you can make histograms of bon lengths and angles in the projection of this in the next talk Mikhele Czerioti will go into more detail about these connections but our favorite choice is to take this atomic density and perform essentially construct a kernel directly so remember we need a kernel function and we can take two of these densities so of two different neighboring environments and multiply them and integrate and that's an overlap and that's a kernel but it's not rotational invariant but we can make it rotational invariant by taking this overlap function squaring it and applying all possible rotations to it and integrating this square the overlap across all possible rotations and that is a six-dimensional integral three spatial integrals here and three rotational integrals prej nice kernel but it would be pretty difficult to calculate for each of the basis functions you would need to perform a six-dimensional integral but it turns out that if you expand this neighbor density into spherical harmonics taking your favorite radial basis set I'm calling it GN here YLMR spherical harmonics you have some CNLM expansion coefficients it turns out after this integrated overlap is exactly equal to the dot product of what we call a power spectrum P which is just C dagger C so contemplate this for a few seconds you take the neighbor density you expand it in spherical harmonics you square it C dagger C gives you a rotational invariant actually some elements of this are the well-known Steinhard-Bond-Order parameters Q246 which are used to characterize crystals and if you take dot products between these power spectra then that is exactly equal to the K, to the kernel that I defined here so that's sort of a profound result and that's in fact what we use so the kernel functions and kernel matrices are just a small integer power of this rotated integrated overlap the nice thing about this from a physicist's point of view these are very few free parameters so really the cutoff function this sigma which is the smearing of the atoms the cutoff, the smearing and this small integer which is typically 4 and that's it so all of this there is very few things to tune so now I want to show you that this really works so the first few demonstrations that if I give density function and give data to this system to this framework I get density function accuracy at much, much, much reduced cost so here are few examples here is tungsten tungsten is difficult, it's a BCC metal and these are three different interatomic potentials across the decades so a finite singular potential and MEAM and the Bonn-Order potential and this is the DFT reference error with respect to DFT reference for a bunch of properties and you can see that not a lot of accuracy progress has been made understanding progress, yes but not accuracy if you put this through the soap kernel smooth overlap of atomic positions that I just described and the Gaussian approximation potential you get these points and I didn't cheat that's real data exactly there isn't exactly zero it's a milli-EV per atom accuracy the database are small unit cells MD on less than typically less than 128 atom unit cells we can go further here's iron even more difficult system it's also BCC from here are from Daniel Dragoni and Nikola Marciari who are also here and it was much harder to do than tungsten we were so expensive to get the DFT data that we had to use non-uniform K point sampling and they extreme care when setting the DFT parameters and when we did this first and we computed the thermal expansion this is experiment this is the DFT which gets the volume slightly wrong but otherwise qualitatively is correct and our gap model was okay at low temperature unstable and it turned out that we weren't honoring our own promise of being careful enough because what turns out that when we use the noise, the weight in the regression to compensate for the inadequate K point sampling of large cells large unit cells being slightly inconsistent in the small ones once we fix that we get a perfect reproduction of the thermal expansion the last example is what I call the silicon challenge ten years ago when we made all these promises we said let's make a potential just does everything can we in fact do that and silicon is the material in which to try here are lots and lots of different crystal structures have been described in the literature can we have a database of DFT calculations that covers all relevant configurations the answer is we can this is from earlier this is just at the end of last year here is a bunch of different things you'd like to compute elastic constant surface energy point defects here are the best potentials on the market from to even a tight binding model and you can see that really the DFT errors are all over the place the machine learned model is really very very very good there are some untargeted properties that are really very far from what we had in the database and it's really not bad for those either not as good as for the targeted properties but not much worse I want to finish with I want to finish with transferability and future thoughts in the last minute the most stringent test that we can think of is crystal structure search so here is crystal structure search from random initial conditions with all the different potentials and instead of letting you stare at those points let me just give you convex hulls and the machine learned model is the only one that's vaguely sensible compared to the density functional theory black and red are those so what is I can't skip 7 by 7 reconstructions especially this audience is the first interatomic potential that gets this extremely delicate 7 by 7 and 2 by 1 reconstructions on silicon surfaces here are what the other potentials do here's what you get with these das reconstructions as a function of unit cell size and you can see that the density functional theory answers that these 5 by 5 as 7 by 7 are 0.01 joust per square meter difference are the lowest energy states is correctly reproduced so I'm not gonna talk through these but in the last year alone these are the materials for which we produced potentials and not just shown that they work but actually did science so all of these materials had outstanding scientific questions about growth of carbon films about defect less silicon amorfa structures screw dislocation glide in BCC iron medium range order in germanium antimony telleride all of these are very active research fields and accurate interatomic potentials make the difference so here's the vision for making interatomic potentials for the world so you start with some baseline maybe incorporate strong inter nuclear repulsion you may need to have electrostatics depending on the system I think we've solved the short range bonding problem and also many body dispersion I didn't talk about that and we can add on to this body ordered corrections from wave function chemistry again that's old work that's been done but I think if you put all of these things together with the missing piece really the fitted electrostatics we do have the possibility of making potentials that can do large scale MD very very accurately I'm just gonna leave these up these are things we're currently working on thank you