 So, well, it's a real pleasure to be able to present my work in front of all of you, and I'm grateful to the organizers for giving me the opportunity. And before I start, I also want to make sure I acknowledge properly the funding bodies that supported my research and the many people in my group who have been working and generating the results that we are going to see today. And actually, I have started talking around with machine learning for making predictions only a couple of years ago. And the reason why, at this point, pretty much everyone in my group has been and is doing something related to machine learning is that really, you know, my background, I did my PhD with Nicola Parinello, and I'm really interested in doing statistical mechanics in the end. And if you want to do statistical mechanics, not with models, but with real materials, your life is a hell of compromises between, you know, doing very accurate reference energetics or having to cut corners in terms of sampling and the system size and the time scales that you can afford. And so machine learning, really, you know, the salesman pitch about it is that it really allows you to hit a very soft spot of accuracy that matches the reference electronic structure calculation and that is a cost which is perhaps more importantly a scaling with system size which is comparable to that of a force field. So Gabor has already given a very excellent thorough introduction to the specific kind of machine learning that he and I have mostly been doing. But, you know, from a very, very general perspective, the idea behind machine learning is really that you get inputs that are just structures with labels associated with them, that are just properties, and you feed them to a correlation finding engine that will allow you then to do inferences for structures that you haven't seen before. So from this point of view, machine learning is a deeply and one could say naively inductive approach in which you assume the data is king and you don't, strictly speaking, do the kind of considerations of, you know, elegance, conciseness that are so integral to the mindset of a physicist or a natural scientist in general. So what I'm going to try to show you is how you can incorporate physical considerations in the construction of a machine learning model and perhaps less obviously how you can also extract some insight about the chemistry and the physics of your problem by looking critically at the machine learning exercise. And in doing this, I'm trying sort of to argue that you can, you know, there is no contradiction between a machine learning and a physical approach and I will try not to slide into a science versus religion kind of approach in dealing with this. So what kind of physical considerations do I want to incorporate in a machine learning model? So to me, one of the beautiful aspects of physics is this notion of universality that the laws of physics hold true everywhere in the universe. And from a machine learning perspective, the best you can ask for, in my opinion, since the method is deeply rooted in the data, is not so much that a model fitted on certain data will allow you to solve all the problems in the world. But the formal framework that you use is not fine tuned to a specific problem. So I want to build a machine learning scheme that doesn't rely on, I don't know, bond recognition or something which is very specific to a class of problems. And second, and Gabor has already stressed the importance of this, you want to incorporate physical principles in the building of your model. For instance, something which is fairly obvious but actually was not recognized by everyone in the field initially is the importance of symmetries. So the way you input the data to your machine learning scheme should already incorporate all of the correlations that are trivial because they are related to symmetries. And also, concepts like locality, additivity of the properties that you're trying to learn can and should be taken into consideration when you build a model. And finally, there is also all this aspect of trying to use the machine learning exercise not only to predict properties but also try to understand in a more intuitive manner the structure property relations in the materials that you're working with. So for starters, I would like to, there is kind of a spoiler alert. And so the end result is pretty much going to be the soap power spectrum and kernel. But I want to get there in a very abstract fashion because this also shows how different representations that have been used for machine learnings are deeply related to one another. So let's say that our objective is really to represent a structure as a vector in a Hilbert space. And I will purposely use a direct notation to stress how this is largely independent on the basis set that you want to use to represent this vector. So using the Cartesian coordinates of the atoms would clearly be a bad idea because your representation is not even invariant to the order with which you label your atoms. So the first thing that you can do is put a Gaussian on each atom. And here it's not, as in Gabor's slide, I'm not sitting on an atom and looking at distance. I'm really just putting Gaussians in the center on the position i of each atom. And how do you take care of the chemical differences between different species? I decorate these Gaussians with a vector that a little bit like a spin component tells me if this Gaussian is a hydrogen, oxygen or a carbon. Now this object seen in the position representation is not a translational invariant. If I translate my molecule, this will become a different function of r. So how do I make it a translational invariant? Well, one possibility is just to average these over the translation group, which actually means integrating over all the possible translation vectors. And if your Gaussians are normalized, you end up with something that simply counts the number of species that you have in your molecule. So you got something which is translational invariant, but you have thrown away all the information on the geometry. So how can you obtain something which is symmetric but doesn't discard all of the geometric information? And the idea is that what you can do is that you evaluate your decorated atom density at two different points and effectively you symmetrize tensor products of your ket. So I'm writing these in position representation, but this is something that you can really formally do as a symmetrization of a tensor product of these vectors. And if you do this, you end up with something which is essentially the convolution of the two densities and the convolution of two Gaussians is a Gaussian and out of these, you obtain very naturally a representation which is symmetric invariant under translations for your entire structure. But this is written as a sum over all the pairs of atoms of Gaussians centered on the distance vector. And then it is again very natural to split this sum into sum over vectors that now describe atomic environments and then invoking some kind of near-sightedness principle cut off these densities so that you make your representation local. So basically this form of sum over pairs arises just by a symmetrization and then defined range cut off arises from near-sightedness kind of considerations. Now of course this is not rotational invariant and what we found is that you don't need to go through a kernel to incorporate rotational invariance but you can directly symmetrize over the rotation group this object. And if you just symmetrize this object you see that you lose all the information on angular correlations so you end up with an object which is modulo sum scalings just a pair correlation function. And if you symmetrize a tensor product you get something which is precisely equivalent to a three-body correlation function. And so on and so forth, if you take higher tensor products you will have vectors in a Hilbert space that describe n-body correlations. Now of course you don't need to do this in a position basis you could choose any base that you like and if you are going to symmetrize over rotations you might as well choose a basis that transforms in a simple way under rotations and so if you represent your local atom density in radial functions and spherical harmonics low and behold you obtain something which is exactly the same power spectrum that you would obtain going through the construction of soap as a symmetrized kernel with an exponent of 2. So once you consider your representation in these terms now I don't want to get into too much detail about this but you see very very clearly how for instance better Parinello symmetry functions are just projections of this object over certain two or three-body functions and you find how many other descriptors can actually be seen as special cases of this general formulation. So how well does this work? So Gabor has already shown you, I mean in the end we use soap so Gabor has already shown you that this can work very very effectively for the study of hard materials but this works extremely well also for quantum chemistry kind of applications or for instance to study molecular solids. So if you want to predict the relative stability of different polymorphs of these molecular compounds that's a difficult problem and you need very very high accuracy because you have very many of these within a small range of energies but you see that using just so you know these learning curves show you how the predictive accuracy of your model improves as you increase the number of structures in your training set and you see that already with fewer than a hundred reference configurations you can predict relative stability is well this is in kilojoules per mole so these are very very small errors and you can also predict properties that are not the energy since you have an atom-centered composition you can for instance predict the NMR chemical shieldings of atoms in your system and this is very interesting actually for experimentalists and in fact we are doing this in collaboration with a group of experimentalists because in order to determine a structure by NMR crystallography you need to make the experiment and then you need to have a set of candidate configurations and compute the chemical shieldings for GPO DFT with GPO DFT which is considerably more expensive than computing the energies and doesn't scale very well with system size and we have a machine learning model that you actually can try for yourself this is a shortcut URL but this is actually hosted on Marvel's materials cloud and using just 2,000 reference structures taken from the Cambridge Structure Database we can take two compounds cocaine and ACD that are not part of this database and we can predict the chemical shieldings of different polymorphs accurately enough that we can identify what is the polymorph that is seen in experiments so basically it can be used as a drop-in replacement for DFT to identify the correct polymorph in NMR crystallography and the final example that I want to show you to really give you an idea of how broadly applicable this is and this is this I did in collaboration with Gabor and you can try to predict whether a molecule will bind or not bind to a protein as a proxy of whether it might be a promising candidate as a drug to target that protein and you have to do a lot of subtle things here because for instance binding affinity is not an additive property not all the atoms in the molecule contribute in the same way and so you need to use a trick to combine the environments information that leads to a non-additive behavior but then you can reach a prediction accuracy which is in excess of 98 percent with just a handful of reference calculations so the the fact that these symmetrized body order representation implies that pretty much all the representations that we are using are just different ways of looking at the same object also suggests that it might be possible to systematically optimize a representation to improve the regression accuracy and for instance an obvious parameter that you might want to use and to tune is the length scale of these local environments and if you try to build models and these are for the atomization energy of small organic molecules if you try to predict this quantity using very short-range cutoffs just two-ongstrom you actually obtain something which does best with a small number of reference configurations and this is not so surprising because when you think about it the complexity the dimensionality of a two-ongstrom environment is relatively low and so with just a few reference configuration you can capture the essence of it but if you then try to increase the size of your training set the accuracy will saturate because you don't have enough information to represent the dependence of the energy on the relative position of the atoms so then you can switch to a longer-range cutoff and so on and so forth and now this is incredibly hand-waving but I maintain that the envelope of these curves tells you how much energy you can represent effectively using information within 2.5 or 3-ongstrom cutoff so in a certain sense you can use machine learning to get information on the multi-scale nature of interatomic forces and not only you can gain this insight but you can also use it to improve your regression so if you build a kernel that is a linear combination of kernels with different cutoff distances or and this is something that descends from this representation construction equivalently by introducing a scaling function that weights the importance the contribution to the representation from atoms that are at different distances you can obtain a considerably better model than any of the single cutoff distance models and on this QM9 data set you can get down to a ridiculously small residual error which is certainly in order of magnitude less than the error of the reference electronic structure calculations so someone asked about how you deal with chemical species and you know I explained you how you can see these as assigning a vector an abstract vector to each species but what does it mean in practice well you can represent these vectors on a finite basis and you don't need to make these bases correspond to orthogonal vectors one for each species so this is of course just a metaphor but you could regard these as rather than learning the behavior of hydrogen carbon and oxygen you learn the behavior of some classical elements and you learn simultaneously the composition of the different actual elements in terms of fire earth and water and this works very well so if you take this is a database of quaternary materials and the database contains a total of 39 elements so you can learn and actually you can learn quite efficiently the stability of different compositions using a fully 39 dimensional space but if you actually reduce the dimensionality to two or to four you can obtain a much more efficient learning process because you learn the fact that different elements behave in a similar way and I think that this will be very very important if we want to extend machine learning to multi-component alloys and to systems that span more than a handful of elements and actually a sort of acute side product of this construction is that if you stop at two each element is going to be represented as a point in two dimensions and you can look at how these points look like and the clustering of the elements which is induced by the data mimics very closely the partitioning in groups on the periodic table so the last thing that I want to discuss is how you can learn properties that are not scholars so learning properties that are not scholars as was shown very nicely by Aldo and by Sandro de Vita for the case of forces or vectors is tricky because the condition that your kernel is invariant under the application of a symmetry operation becomes more complicated because your property is not invariant under rotation it transforms under rotation and since your kernel represents the correlations between properties if you want to incorporate the trivial geometric correlations you have to impose that your kernel transforms in a prescribed way under rotations now if you want to do these not for vectors but for general tensors you need I mean if you don't need to but it's convenient to break down your tensor into irreducible spherical components so every tensor can be broken down in components that transform like spherical harmonics and then all that you need to do is build a matrix kernel that is the symmetrized overlap between these atom densities but now you're including the integral also the associated vignermatics and effectively this is telling you simultaneously how similar are the two environments and what is the best alignment to bring one density on to the other and I mean this is getting a little bit silly but actually you can see or you can also obtain this kernel starting from a representation from a vector that is now a symmetrized product of two densities and a spherical harmonic so it's kind of it's very elegant how you can use these to obtain something that then allows you to compute this symmetrized kernel very efficiently so what can you do with these well for instance one thing that you can start trying to do is learn dielectric response properties of atoms in molecules and here since DFT is not particularly accurate if you want to get something like polarizability I've been teaming up with Rob Distagi and Cornell who is a proper quantum chemist who knows how what are the good basis sets to do CCSD and we are building a CCSD level model of the polarizability using as training set small organic molecules with up to seven heavier atoms and then we put these models to the test by computing the largest molecules we can afford doing at the couple cluster level so the largest one is a cyclovir that has 15 heavy atoms and actually this works incredibly well so the accuracy of the model relative to this to CCSD is basically the error is half of the error of DFT relative to CCSD and here we are in a largely extrapolative regime we are really trying to predict molecules which are twice as large as the molecules we have trained on there are of course limitations so every time you have no local physics you are pretty much ruined and for instance if you're looking at polarizability of the localized conjugate systems you have a vanishing gap and particularly in DFT you would get a diverging value of the polarizability so if you use a machine learning model you are bound to saturate to the polarizability of the largest molecule you are considering and but at the same time this is also a tricky system for DFT and even for couple clusters because when you get to the metallic limit you would really need a multi-reference method so you know in a certain sense yes the machine learning model is failing because the underlying assumptions are failing but still getting something which is relatively stable is not bad and we can extrapolate up to fullerine with an error which is smaller than the error between CCSD and experiment another system in which you really see this problem of locality very clearly and we see another way of solving the problem is if you want to learn that the this is epsilon infinity so this is the electronic dielectric constant of water if you try to learn the dielectric response of water you get a learning curve that is saturating and this is a sign that machine learning is not working very well and if you try to make a prediction for solid ice using training done just on liquid water you get a much larger error which is another sign that you're not really capable of extrapolating and the reason why this happens is that as I guess many people in this room know very well the dielectric constant is not a local property it really describes the response of your system to the electric field that is also polarized by the the continuum that you have around and you can use classical electrostatics to approximately map the dielectric constant on an effective local polarizability so what you can do is you take your dielectric constant that you have computed you use closest mossotti to map it onto a local polarizability learn the local polarizability predict the local polarizability and then go back to the dielectric constant and if you do that learning works much better and you can extrapolate twice so in a certain sense here we're using a very simple and completely inexpensive to apply relation from classical electrostatics to get around the non-locality of the property that we're trying to learn and I think that this is another way of incorporating non-locality into your system final example I want to show something that we are very exciting about is that as a side product of being able to learn tensors of arbitrary order we can now learn complex properties such as the electron density so the three-dimensional electron density and the reason why you need the tensor learning for this is that the only way you can learn the electron density in a way that you can transfer to larger molecules is by breaking it down in in local contributions so the way we break it down in local contributions is by writing the density as a sum over local basis functions centered on the atoms but now the coefficients must transform like spherical harmonics and so you need all of these tensorial frameworks to be able to learn these efficiently and another subtlety is that the only physical quantity is the total density so we we don't even try to break down the density in local contributions but we directly learn the total density and this is a massive pain because these orbitals are non-orthogonal so you have to deal with this and incorporate the non-orthogonality into the machine learning scheme but you can do that and it works very well so well that actually we are limited by the accuracy of the basis set and not by the by the machine learning itself the machine learning saturates with a few tens reference configurations to the limiting accuracy which is set by the basis set so we are actually now speaking with you know people who know about basis sets and the resolution of identity kind of basis sets to try to improve on that front but from the purely machine learning point of view this works beautifully and it's very transferable so we can learn molecules that contain four carbon atoms and make a prediction on a molecule that has eight carbon atoms and I mean you cannot distinguish the true densities if you look at them so sort of to wrap up I mean it's really certainly not my idea that you can use physics to inform a machine learning model but it's an idea that I fully subscribe to and I think there are many many ways in which you can benefit you know particularly people coming from a purely machine learning background have a tendency and this is actually informed by past experience in that community that you are actually much better off just feeding data to your algorithm and let it learn the all the underlying laws but I think that in the case of physics, chemistry, material science there are trying to obtain something which is elegant and incorporates symmetries and conservation laws smoothness and locality is something that gives you an incredible edge over more sophisticated machine learning schemes I mean all that I have told you is based on Gaussian process regression which is one of the simplest and you know oldest also statistical regression schemes and also I hope that I could convince you that if you do this machine learning exercise critically you can also extract some information about locality about correlations between elements and about the structure property relations in general and I fully agree with Gabor on the fact that you know everyone in this room should be very excited about machine learning because it allows you to take the you know bleeding edge electronic structure method you're working on and bring it from being useful to predict a couple of small structures at zero kelvin to be able to use it for proper statistical mechanics so thank you and if you have questions