 Thank you. So I'm a post-doc in the Laboratory of Computational Science and Modelling, the Cosmo Group at EPFL. And I'd like to talk to you about some work we've been doing very recently about learning some scalar but mostly some tensorial properties of mostly molecules, but hopefully I'll convince you it's also scalable to materials. So this was all carried out under supervision of Michele Cerioti. The tensor learning was done with Andrea Grisafi, a PhD student. Also on the scalar learning side are Chondipe, Felix, and Michele isn't in this picture. And these are our collaborators. As you will know, given a general molecule or material, in principle you can solve the Schrodinger equation and find any property that might interest you. But in practice there's going to come a point when you have a two high degree of accuracy or too large of a system when you just can't do this. And for that we need to turn to a cheaper method. So traditionally, perhaps you might use an empirical force field, but obviously I'm here to talk about machine learning. So what we do is machine learning. This is basically a reiteration of what you've seen before. Machine learning has the potential to give an accuracy close to that of ab initio but with a cost much closer to that of an empirical force field. So that's the basic idea. And now what do I mean by machine learning? The kind of machine learning I'll be talking about today is called Gaussian process regression or kernel reach regression. And that's nothing more than a fancy kind of interpolation, basically. Just so you want to calculate the energy of one of these small molecules. The way that you do it is to collect a training set of some of these molecules for which you calculate the energy using your ab initio method. And then to predict the energy of one of these test molecules, what you do is compare it to each member of the training set and predict the energy just like this. So it's a weighted average over this function, which we call the kernel, which gives the similarity between your testing molecule and each member of your training set. And you can get these weights w i just by a matrix inversion, a regularized matrix inversion, basically. So the big challenge that we work on is how to compute, how to calculate these kernels. No, sorry, I'm getting a bit ahead of myself. What is a kernel? A kernel just gives you the similarity between two objects. So we have this sharp star here and this fluffy star. And they're pretty similar, so their kernel is pretty much one. Whereas this fluffy star is completely different from this blue spiral. So their kernel is basically zero. And what you see is that when you're making a prediction on the energy, that means that members of the training set that are quite close to your test molecule are going to help you better to make this prediction. That's what's encoded here, basically. So given all this, how do you compute the kernel between two molecules? And it's the same question for materials, basically. And the way that we calculate, say, the kernel between molecule A and molecule B is to look at local environments. So the kernel between A and B is a combination of the kernels between local environments on molecule A and local environments on molecule B. And what I mean by local environment here is, for example, we take an oxygen atom and look at its surrounding atoms. Here they're the carbon atoms. This carbon atom has a local environment made up of these three atoms. This carbon has a local environment made up of these two carbons and this blue atom, whatever we want it to be. And the way we calculate then the kernel between two local environments is we use what's called the smooth overlap of atomic positions or the soap method. And how that works is very briefly like this. If you take these two local environments, one around the oxygen and one around the nitrogen, you can define a local density by superposing gaussians that are centered on all of these neighboring atoms. Multiply the two local densities together and integrate overall space. And that gives you the kernel between this environment here and this environment here. There are a couple of subtleties that I'll get to a bit later to do with the symmetry. But this is the picture you should have in your head. And a key observation that we've made is that if you separate this kernel into kernels between local environments, one of the questions is how to combine them. But if you just combine them additively, so the kernel between molecule A and molecule B is the sum of all pairs of kernels between every environment on A and every environment on B, then the property you predict here, the energy, is given by a sum of predicted properties for each of these individual local environments. And this I'll come back to you later. It turns out to be a very useful property to have. You don't just have to add the kernels together in this maybe simple minded way. There are other ways to do it, for example, to maximize the entropy of a coefficient matrix that you could put in here, so to get the most information possible. I'll give you a couple of examples of the scalar learning that the group's been doing. So I'll go into a little bit more detail about this. But basically, this is using density functional theory and empirical force field structures to predict the energy of a molecule using couple cluster calculations. So you can save an expensive calculation by doing a much cheaper one. It's not just molecules, it's also materials or in this case crystals. This is a silicon 100 surface. So the lowest energy surface reconstruction for this surface is the 7 by 7, which was found with SKM experiments and also with DFT. And if you apply these four different empirical potentials here, all of them predict the unreconstructed surface to be the lowest in energy, which is incorrect. But if you train a soap model using structures that are only three by three reconstructions or lower, about 2,000 training structures, then you can correctly predict that the 7 by 7 reconstruction is the lowest in energy. It's also been applied to train a support vector model to predict whether a given ligand will bind or not to a protein. And this is with something like 99% accuracy and it uses these entropy matching kernels that I described earlier. And also very recent. So these first three examples are in this science advances paper. And there's a more recent work that we're doing at the moment. These people are doing at the moment to predict the energies of some pentazine derivatives, which you may or may not be able to actually see. But the results, I can tell you, are promising. So in a little bit more detail, what was done here was to take this GDB9 database, which contains 134,000 small molecules like this and to relax their structure using either DFT or an empirical PM7 model. And then using these relaxed structures, calculate the couple cluster energies of them. And there's a few things that you could learn from this. You could take the DFT relaxed structure and learn the couple cluster energy, or you could take the PM7 relaxed structure and learn the couple cluster energy. Or perhaps more interestingly, you could take the DFT structure and learn the error that you make, which is the couple cluster energy minus the DFT energy, or similarly for PM7. And what I'm showing you here is a learning curve. So this is the mean absolute error of prediction versus the number of members of the training point. So on the end it's about 20,000 out of 134,000 molecules in the training set. And what you see is that by using the PM7 geometry to learn the couple cluster energy, you get an error of about 2 kcals per mole, which is OK. Or if you weight members of the training set differently based on whether their structures are different from the DFT structures. So if you throw out ones where the DFT structure is completely different from the PM7 structure, you can predict with one kcals per mole accuracy. And similarly, using the DFT geometry, you can predict the couple cluster energy with about 0.2 kcals per mole accuracy, which is much better. But the message here is basically that training this SOAP model with the very cheap empirical PM7 force field gives you these energies with reasonable accuracy, which can be improved by some clever tricks. So now I move on to the things that I myself am working on, which is to take into account the symmetry of the problem. In Gaussian process regression, your kernel between two environments, fancy x and fancy x prime, is a measure of the correlations between whatever property you want to predict. So here it's why this might be the energy, or as we'll come to later, let's say a false, a dipole moment. And what this means is that because your property will transform in some way when you apply symmetry operation, your kernel also has to transform in a prescribed way. For example, if you want to predict a scale of property like the energy, which is a variant to rotations or to translations, your kernel also has to be invariant to rotations and translations. But if you're working with a tensorial property, it's a bit more difficult because if you take an environment and you rotate it, then the tensor corresponding to it has to rotate covariately with this rotation. And what this means is that your kernel now has to be a tensor and this kernel also has to rotate covariately with any rotations that you apply to the two environments. So here these subscripts, lambda and mu, are standing for any number of subscripts. If you've got a vector, then mu is one number. If you've got a third order tensor, then mu is three numbers. And it turns out that to define a tensorial kernel is very easy for specific types of problems. So if you want to predict something for a rigid molecule like water, it's easy because as we've seen earlier, for a water molecule you can unambiguously define an axis system. And if you rotate your test onto your training molecule and align them, align their axes, then you can predict the property that you want in this molecule fixed reference frame. So that's basically what I'm showing you here. This tensorial kernel is given by the rotation that takes the test onto the training molecule, then whatever kernel you like that's invariant, and then once you predicted your tensor in the molecule frame, you just rotate it back. And this gives you a prediction for the tensor. And we've applied this already to learn the first hyperpolarizability beta of water molecules, which go into these second harmonic scattering experiments performed by the Rote Group at EPFL. And what you see is that comparing the results of a couple cluster calculations to our machine learning model, this simplistic machine learning model, you get a pretty good agreement. This is an error of 10% or less, let's say. But if we wanted to apply this same method to, for example, a concentrated solution of ions, we'd already be stuck because it's very difficult, impossible to define an axis system for each ion. And because we want to split our overall kernel up into a sum of local environment kernels, and some of these local environments are ions, we have to go a little bit beyond this. And so what we did just for fun was a tensorial generalization of a soap kernel. So if you remember the kernel between local environments, this one here and this one here, is given by the product of their local densities integrated overall space. Now you don't just want to compare this one orientation of this environment to this one orientation of this environment because you won't get as much information. This won't tell you as much as this mutual orientation, for example. So what you want to do is to average it over every possible mutual orientation of the two environments or in practice, average over all the orientations of one environment, keeping the other one fixed. If you have a tensor, and you can split it up into components that transform on spherical harmonics, so a scalar transforms an L equals 0 spherical harmonic, a vector like a force or a dipole moment transforms as an L equals 1 spherical harmonic, and a symmetric polarizability has an L equals 0 component, which is the trace, and an L equals 2 component, which is everything else, basically. And we know how these spherical tensors transform. If you have one that corresponds to this environment fancy X and you apply a rotation R to this environment, then all you have to do is multiply it on the left, this tensor, by the Wigner D matrix that corresponds to this rotation. This is basically a complicated function of all the Euler angles that describe the rotation, but these are known tabulated, calculably mathematical kind of functions. This gives us our recipe to define a tensorial kernel. This is our original scalar kernel. It's an average over all possible orientations of environments. And to define a tensorial kernel, all you do is you take this usual average over orientations, but you weight them by the Wigner D matrix as you rotate them. And this gives you a kernel that is tensorial and that has the correct L equals now lambda properties. That's enough theory. How well does it do? So I should say what we've called this method is SAGPR, Symmetry Adapted Galsium Process Regression, which is a mouthful. And the first thing we apply it to is oligomers of water. So the water monomer, the dimer, and this is undelkaterm. Again, I'm showing you learning curves with the root mean square error for the dipole moment for the L equals 0 and 2 components of the polarizability, the L equals 1 and 3 components of the hyperpolarizability. In each case, the horizontal line is the intrinsic variation of the data set, of the training data set, and we see that on the order of hundreds of training points, in this case 500 was the maximum, we can get very good errors compared to this intrinsic variation. And furthermore, because we're splitting our kernels up as before into local environments, we can predict our tensorial properties as a sum of local environment predictions. So your local environment is, in our case, take each oxygen atom and look at all of its surrounding hydrogen atoms, and you have a predicted dipole moment for this, a predicted dipole moment for this, that sums up to a total predicted dipole moment for you. And again, this isn't just for molecules, we also applied this to bulk systems. So here I'm showing you the dielectric tensor of bulk liquid water, and there's two things I'm showing you. The red and the green curves are what we got as we tried to machine-learn the L equals zero in two components of this tensor. And the blue curve and the gray curve are what happened when we transformed the dielectric tensor to a polarizability tensor, learn the polarizability tensor, predict the polarizability tensor, and convert it back using the inverse of this Clausius-Mazzotti-Tensorial relation to a predicted dielectric tensor. And this gives much better results, definitely for the L equals zero. And the reason for this is that while the dielectric tensor is a bulk property, the polarizability is much more splitable into local environmental polarizabilities, which is what we want to predict. And the squares in these cases are what we get if we take the model trained on liquid water and use it to predict the tensors of ice. So this is a very transferable model already. And I think I've got time to show you just a couple of things that we're working on at the moment to take this a little bit further, to look at some interesting examples. The first of these is to benchmark the learning of polarizability. So with the distacia group at Cornell, they're doing the calculations and we're doing the machine learning, they've run DFT and high-end couple cluster calculations for this QM7B database of fairly small molecules. It's 7,000 molecules. And what I'm showing you on each X-axis is the polarizability calculated with CCSD, the XX, XZ and ZZ components. And here the DFT calculation of the polarizability and here the machine learning calculation of the polarizability. And while the DFT results aren't quite on the straight line, especially the extremes of their values, the machine learning results match a lot better. So basically a machine learning model is able to predict the polarizability of the CCSD polarizability of these small molecules much better than DFT is. And the final example is to learn the electron density of at the moment small molecules, but we'll see where we can take this, of course. This is carried out with the common birth group at EPFL. And while the electron density is a complicated function of space, you can write it in terms of these localized components. So a sum over all atoms of radial functions times spherical harmonics with these coefficients CnLm that transform like a spherical harmonic. And these are things that we can learn with SAGPR. This here is the contribution of various L channels to the total density. So this is the sum of the L equals 0 density contributions from the oxygen and from the two hydrogens, L equals 1, 2 and 3. This is a series that truncates quite quickly and that we can use to decompose the density. And so without further ado, some learning curves. Using 500 training molecules, we can get errors of 5% or less for these various L components. This is an average of all atoms, values of N and values of M. And just to show you where this error comes from, here this is a slice through the X-Y plane of quite a distorted water molecule, the density. This is the error that you get when you decompose it into spherical components which is about 0.02. And this is the error that you get when you learn this decomposed density which is an order of magnitude, two orders of magnitude less. And if you put all of this together you get an error of about 1% predicting the ab initio density of these water molecules using this machine learning model. And most of this error comes from the decomposition. So this is an area where we will shortly improve but the machine learning is solid, is sound. I think that's a decent place to stop. Hopefully I've convinced you all that splitting your kernels into local environments is a great way to predict scalar and tensorial properties of molecules and materials. Thank you.