 So it's a great pleasure to welcome our next speaker, Mark Casida, so Mark is a theoretical chemist based in Grenoble. He's a world expert in excited state theory as well as applications. And given the broad spectrum of topics that we have for this workshop, we thought it be very interesting to hear from Mark about his perspectives on the role of excited state calculations in biology. Okay, so the floor is yours, Mark. Okay, thank you, Ali. So it's a pleasure to have been invited to participate. So far, I've been able to see three talks and I've been learning a lot. This is not really my area. I happen to have, well, my father and his father were professors, university professors, as was my uncle, but they were all biologists and I'm not a biologist. I am somebody who is interested in quantum chemistry methods. So I wasn't quite sure whether there hadn't been some mistake in inviting me to this talk, to give a talk. But I was assured that somehow I fit in and so that's why I'm sure the list of topics includes spectroscopy, excited state dynamics in biology. Because I'm coming a little bit from outside the field, I'm going to try and keep this fairly general. So I thought I would begin by just how we think about photochemistry. This is an example of old data for a small molecule reaction. You shine light and you get various products in various yields. You can then try to do mechanistic experiments and try and write down Lewis representations of how you think the mechanism goes. But if you really want to understand, then Woodward and Hoffman said, should think in terms of quantum mechanics and orbitals, the Woodward Hoffman way of thinking about things works very well for the ground state. It turns out it doesn't work as well for excited states. One of the reasons is that excited states actually involve not just orbitals, but configurations built up of orbitals. So in the simplest case of a two electron, two orbital model, you would have to build singlet and triplet states. You would build their excitation energies. And then you would then look at start to think about how the different ground and excited state energies vary with all the nuclear coordinates of your system. And those are the potential energy surfaces. Then in the modern approach to photochemistry, you would be thinking about potential energy surfaces. You would be investigating various pathways to find photochemical funnels, ways to get from surfaces, one surface to another, and you would think about doing dynamics. And then ultimately, you would want to take that and maybe go back to the orbital models and go back to the Lewis structure so that you could talk with experimentalists. I want to give you some more examples of what I mean by photochemistry because what I mean by photochemistry is more general than what some people mean by photochemistry. And I think you'll appreciate why this is the case. My photochemistry includes photo physics. That is where you excite a molecule A. You make an excited state. And when it relaxes, it returns to the same molecule without any broken bonds, without any bonds broken without any bonds formed. So this includes, for example, luminescence like in the green fluorescent protein. You can also have real photochemistry where bonds are broken and made during the photochemical process. So you return to a different structure than you started with. And one canonical example is in vision where you have the cis-trans isomerization in red and all. If you turn the photochemistry backwards so that your products become reactants and they come together, make an excited state molecule and that falls apart. And that gives a product in emitting light. Then you have chemoluminescence and there are lots of examples, fireflies, glow worms, et cetera in biology. One thing which is probably less appreciated and which first turned up as best I recall in biological applications is that there are reactions that look like they ought to follow photochemical reactions. They have the same actions, the same products, but which are thermal reactions. Light is not involved. So sometimes this is called photochemistry in the darker photochemistry without light. Finally, and I think this is very relevant for biological applications, we need to think somehow in terms of localized excitations or excitons. These can be energy transfer excitons. These can be charge transfer excitons. They can be real because you're exciting part of the molecule and the energy transfer and the charge transfer is going throughout the molecule. Or they could be fictitious simply because our mind is not capable of understanding what's going on unless we're able to break the molecule into subsystems and understand how they're interacting. Okay. My specialty over the years has been density functional theory and especially time dependent density functional theory. Around 1995, density functional theory was becoming very impressive to Abinissio quantum chemists. They were afraid that they were about to be buried by the low cost and efficiency of this method, but there was a way out. And that was it was believed that density functional theory could not handle excited states. This slide was a slide from somebody else's presentation that a student gave to me. And this shows two of the people who've helped to show that you can get excited states from DFT. Party gross that you saw together with Eric Roonga proved a theorem which basically says that analogous to what happens in ordinary density functional theory. In the time dependent case, the external potential that is the potential seen by the electrons is determined by the time dependent charge density. And this helps to justify time dependent quantum equation. So this means you take your ordinary familiar quantum equation and you turn it into a time dependent equation. And if you do that, the naive way that most people will do it, you have time dependent DFT in the adiabatic approximation or what I call conventional time dependent DFT. And if you worry about what is missing, then you need to worry about what's an equation three, a exchange correlation action. But we won't go into that we'll stick primarily to conventional time dependent DFT where the exchange correlation potential is more or less what you think it is. Okay, we can get excited states from that by using response theory. So what we do is we look at the response of the dipole moment to a time dependent electric field, which is a model of a photon. And from that, you can show that the poles, that is the singularities of the polarizability, give you the excitation energies and the residues give you the intensities in your absorption spectrum. And this is the famous sum over states there in optical physics. A quarter of a century ago, I made that practical by writing down some equations that looked like the random phase approximation equations that are common in quantum chemistry software. And this meant that it could be implemented very quickly in both quantum chemistry software and quantum physics software. Since then, it has been tested. And we know where it works best time dependent DFT works best when DFT works well for the ground state without any symmetry breaking. It works best for low energy excitations of dominant single excitation character. It works best when there's not too much charge transfer so I will be talking about that a bit more in the stock especially. And the excitations are reasonably localized. So that's the safe place. The place that interests me is the red zone, the danger zone, where everything breaks down and you need to develop the method further. The place that where most users actually apply the method is the blue zone where there's a reasonable risk. They want to go outside the safe zone, but they don't want to go really into the danger zone. The method is fairly simple in its use. And that is one reason people like it and also because it works well enough that they can treat many problems that they cannot treat with other methods. Something that is also important from the point of view of this talk is that density functional theory has an approximation to it, which is called density functional type binding. It looks like DFT. It takes methodology from semi empirical theories and it fits DFT and it can be developed along lines that look very, very much like DFT and it should behave like DFT whenever possible. It can be extended to time dependent DFT. Okay. Murphy who is an aeronautics engineer said, as we all know, if anything could possibly go wrong, it will. And so you, as an aeronautics engineer, he needed to try and keep things from going wrong. One of the things that goes wrong is that charge transfer excitations with the typical functionals can be underestimated by huge amounts by one or two electron volts. And this was pointed out by Drew Weissman and Head Gordon actually was pointed out earlier, but their paper is particularly clear. So that's why they get credit. We can see often when there is going to be a problem within a time dependent DFT calculation by using some sort of criterion as a measure of charge transfer. The most popular is the lambda criterion, which says that for small values of lambda, you should have a problem. You should expect a problem with underestimated charge transfer excitation energies. If that's the case, what you need to do is use range separated hybrids. These are special density functionals where you divide the Coulomb repulsion between the electrons into a short range part and a long range part. The short range part is handled by density functional theory. As Cohn once said, electron correlation is nearsighted. That's not entirely true. And that's why the long range part is also there and needs to be handled by wave function techniques such as techniques using exact exchange. Okay, now I'm going to talk about some applications that we have done. It's going to be a little bit superficial. I hope it will be clear. However, one of the applications is remotely inspired by photosynthesis. I'm not going to go through all the details of photosynthesis. But here we have the sort of main structure of the most basic parts of the working units in photosynthesis. And chemists will ask themselves, can I make something that's not photosynthesis that will do something else through bio-inspired chemistry that will somehow look like the photosynthesis molecule but will do something else. And one of the popular ways to do this is begins with the magic of the Strasby-Pyridine ruthenium complex, which has a long lived excited state because of phosphorescence, but it also has easy charge transfer. And so then you can start building something with chemical wires and a chemical reaction center that starts to look like your photosynthetic unit in biology. The problem is when you start to put on those chemical wires, you can interfere with the excitation lifetime and you can lose some of your desirable properties of the ruthenium complex. So what you want is to be able to maintain the long luminescence lifetime. And we know from studies of Trisby-Pyridine in particular that what can happen is a quenching of the excited state due to an avoided crossing of the triplet metal ligand charge transfer state, which phosphorescence is with a triplet metal centered state. And so one of the things that we wanted to do, and I should mention that this has a close link with ICTPE through the help that we've gotten via the African School for Electronic Structure Methods and Applications. One of the things that we wanted to do in this project was to understand if we could understand the barrier height there from orbital energies. And fortunately, there's a bit of a gold mine in this paper where there's about 100 pages of photochemical data, which has been published, and that allowed us to extract not a real barrier height, but an empirical barrier height. And from that, and about 100 calculations on about 100 molecules, Dennis McGarrow was able to construct this correlation between our empirical barrier height and our orbital luminescence, orbital-based luminescence index based upon the pi star orbital energies of the ligands and the EG orbital energy of the ruthenium complex. There are outliers, but what is interesting is that if you focus in on those things that look most like Trespey-Pyridine, that is, they're bidentate ligands, and they're symmetric so that the both sides look the same, you get the blue squares, and they lie very much on a line. So what that's telling me is that the other things have other mechanisms, and that would be interesting to investigate in the future. Okay. Mark, I had a question. So how did you, what did you extract exactly to get the potential energy barrier from the simulations? Okay, we got the potential energy barrier based upon experimental data. Okay. In essence, times measured at room temperature and at 77k, so liquid nitrogen temperature. Okay. And that's just a crude estimate, because there are many things that go on between 77k and room temperature. I see. Even better would be to go and calculate the barrier heights from first principles, and we have an ongoing project that's doing exactly that, but that's actually more time consuming. Yeah, okay, got it. Okay. Okay. All right. So I want to tell you now about a project that, that, well, one day Claudia Filippi called me up and said, Mark, I think we need some help. And it led to this very nice work involving her then doctoral student, Omer Walsen, and this is about the vision mechanism, the cis trans isomerization. And in particular, they were looking at various models that could be treated at a high level. And this is a bit of a complicated transparency, so I'll try to explain it. And they discovered that the gold standard for analyzing, for modeling this type of reaction, which is the complete active space self consistent field because SCF, which had also been used to criticize density functional theory as giving the wrong answer, particularly time dependent density functional theory, giving the wrong answer, was itself giving the wrong answer in the sense that the gold standard, in the sense that if you put correlation corrections on top of Cass SCF, so either did Cass PT2 or CC2 or even quantum Monte Carlo, then in fact, your initial relaxation of the excited state in the retinal model involves some single bond rotations that take you down in a minima, which can lead to fluorescence that's actually observed in bacterial rhodopsin. And that turns out to be well described by some density functionals. What is not well described is when you return to the double bond rotation, and that's something that we can't describe at the present time with typical commonly accessible time dependent DFT methods. So they came to me and they said, why? And we looked at the lambda parameter for the the charge transfer and we found out that where the lambda parameter is large, so that's on the bottom, then our curves matched the red curves for the PT2 relaxation and depending upon the functional, it would last for more of the dynamics or less of the dynamics, but it always correlated with the lambda parameter and that was very, very satisfying. So we understand what's going on. Now I want to talk about exotonic effects. This is one of those papers where I'm fairly proud of what we were able to accomplish, but looking back at it, I learned a few things. Yes? Mark, sorry, so we lost like a minute of what you said. Oh, I'm terribly sorry. Oh, dear. When I was talking about this, did you, Lisa? Yeah, just when you started exotonic effects. Oh, okay, so now I want to talk about exotonic effects. Exotonic effects are probably the way that one has to think about biological systems. You need to be able to break it down into smaller systems. In this case, we were looking at exotonic effects for a somewhat different reason. When molecules come together and form aggregates, as they do for some time molecules in solution, that you will see spectral shifts. And these can be understood. And this is an example of spectral shifts for pentasyn in solution, which is the blue curve, and in thin films, which is the red curve. And this can be explained in terms of shifts and in terms of certain splitings, and all of that can be explained in terms of what I call Kasha's exciton model. This is a very old theory. It's actually a theory that was presented in Davidov's book on excitons in the last chapter. Michael Kasha translated that into English, and it just involves assuming excitations in individual molecules and then taking into account the interaction between the two molecules as a perturbation. I'm not going to go through all the equations. I'm just going to say that if you interact an excitation on one molecule with an interaction on another molecule, you get a two by two matrix problem to solve a configuration interaction problem to solve. And so you get two solutions, which gives a splitting. But on top of that, you can look at, you get information in this model about the intensities. So you can determine which states are going to be present in the spectrum, which states are going to be absent, or which states are going to be bright, which states are going to be dark. And ultimately, that tells you something about how the different molecules are oriented with each other. Now, Kasha's model included energy transfer, but it didn't include charge transfer. So what we did was to generalize it to include charge transfer. At the time the paper was published, I thought we were the only ones who had done that, but now I realized there are other people who had done it as well. And then we were able to look at what is an interesting phenomena from my point of view, and that is where you have excitations of these aggregates, and you see no obvious charge transfer, and yet Kasha's model, including charge transfer, modified to include charge transfer, is telling you that there really is charge transfer going on. And it turns out that we could analyze that, and we could see for different functionals that indeed the excitations that were associated down here in the middle bottom, we could see that the dashed lines that were associated with charge transfer were underestimated, and then when you put in more Hartree-Fock exchange using some range-separated method, or just time-dependent Hartree-Fock, you would see that the charge transfer excitation increased. So even when you don't see charge transfer, you have to be careful because it might be there in a hidden way. Okay, this brings me to dynamics. We have a little experience in dynamics. It's not my specialty. Mine is the electronic structure. But the way to handle larger systems is usually using some mixed quantum classical dynamics. In these methods, the electrons are treated quantum-mechanically in the field of the movie nuclei. The nuclei are treated classically. There are essentially two families for doing this. One is Ehrenfest Dynamics, where we note, Ehrenfest theorem, that we note that Newton's laws hold for expectation values. The problem with Ehrenfest Dynamics, the advantage is that simple, the disadvantage of it is that the nuclei are now moving on an average potential energy surface. So it's not a dynamics which is any longer associated with the well-defined electronic state, and it leads to physically incorrect results in some instances because it violates microscopic reversibility. The other way to do it is called the surface hopping method. Here, the nuclei move on the potential energy surfaces for well-defined electronic states. You regain microscopic reversibility. You can even, in principle, obtain relative yields for competing products in a chemical reaction, but it's going to be even much more difficult to implement. The idea is somehow simple. You have a time-dependent Schrodinger equation for the electrons. At any given molecular geometry, at any given time, you have a particular molecular geometry. You find the stationary states, you expand your time-dependent wave function in the stationary states, and then from the squares of the coefficients projected onto each of those stationary states, you get the probabilities of hopping to another surface. In practice, you can't really do that because there is too much hopping. Tully's genius was to figure out a way, using a Monte Carlo method, to minimize the number of hops to make this feasible, and this is a method that's very commonly used. It includes some important details as regarding how you adjust, for example, the nuclear kinetic energies when you hop, but there's another thing which is not a detail, and that is because of the probability aspect, you shouldn't read too much information. You should not try to read too much information into a single trajectory. You have to do ensemble averages over large swarms of trajectories before you've got something physical, and I think we'll see an example of that. Another thing, which I was a bit slow to realize, I didn't realize it fully until last summer, was the importance of adding a decoherence correction. Now, what that means is we have a quantum part to our calculation, we have a classical part to our calculation, and that switching from the quantum part to the classical part is a little bit like what you have in the Copenhagen principle for interpreting quantum mechanics, where you're making a measurement with the classical apparatus of your quantum system. You should get the right, you should find that you are moving on the right potential energy surface after you make that quantum measurement, otherwise you're over-coherent. Ehrenfest's method is by definition over-coherent because the system is moving on an average potential energy surface, and that by definition can't be right except possibly in the crossing region. Tully's method a priori doesn't seem like it would be over-coherent, but Tully showed early on that it also has aspects of over-coherence, not as much as the Ehrenfest method, but it's still over-coherent, and that's because after you go through a crossing region, you have components of your electronic wave function, which correspond to the excited state and the ground state, which shouldn't be there when you're far from the crossing region. And so, you need to correct that by introducing decoherence corrections, and if you do that in the Ehrenfest method, you get something that looks more like Tully's method. If you do that in Tully's method, you get something that starts to look a little bit more like Ehrenfest method. Okay, let me show you a couple of examples. The first example is the second, to my knowledge, application of the method with time-dependent DFT. Just to the photochemistry of oxyrane, we see the Gomer noise mechanism. We also saw other products that correspond to experimentally observed products, which was very satisfying, but we only could run 30 trajectories, so we couldn't get the relative quantum yields. Now, I want to go to a case, which is not biological, but has some resemblance to biology, because we're dealing with a large, complicated system. And this has to do with photovoltaics, organic photocells. We had a student come from the United States to do an internship with us, and she brought with her this wonderful example. She said, silicon solar cells are like your grandma's crystal. It's really nice, but it's expensive, and it's breakable. You also want to have some Tupperware in your kitchen, which is cheap and versatile, and that's like organic solar cells. So this is a niche application, which is developing quite rapidly. And we wanted to look at one aspect, one tiny aspect, in a particularly simple, organic solar cell. It's not the most efficient one. And this diagram is showing you how that works. You need an electron donor, in this case, pentasyn, and an electronic scepter, in this case, a C60 or buckyball. You need a transparency electrode, and you need another electrode on the bottom. The photons come into the transparency electrode, and they go to the donor acceptor boundary region where something happens. There's an excitation, there's charge separation. And we wanted to know if we could model the charge separation with the fewest switches surface hopping method. There are lots and lots of studies of this particular system, both experimentally, because you can build it, and also theoretically, but they're almost, they're very, very few studies with the fewest switches surface hopping method. So we decided that we would try to study this using the simplest model, just one molecule of C60, just one molecule of pentasyn. And we needed to define observable. So our observables here actually are the total charge on pentasyn, the total charge on the fullerane evaluated from particle and hole charges on each molecule. This corresponds to physically how charges are transferred. But if you have a situation, as represented in this picture, where a hole and a particle are transferred at the same time, then there's no net charge transfer. There is charge transfer in Cush's term, but there's no net charge transfer. That's not something that we're going to see the way we set up the calculation. There were some other things we did that were particular to the organic electronics model. But we set it up and we ran a swarms of about 100 trajectories so we could get ensemble averages. And we did it using time dependent density functional type binding theory with a long range correction so we could describe charge transfer. And that involves a parameter, the RLC parameter that you will see. And we wanted to understand what was going on in the calculation as we varied that. So first I want to show you the importance of the decoherence correction. You could either calculate the charge transfer and the energy transfer via, as you have it here, from the time dependent electronic wave function. Or you can try and calculate it from the adiabatic wave functions that correspond to each particular potential energy surface depending upon which is the active potential energy surface. Without the decoherence correction, you get two very different results. With the decoherence correction, you get essentially the same result. So that's just one illustration of why it's very important to include a decoherence correction in these calculations. I'm not going to go through this transparency. This is just showing you that for different values of the range separation parameter, we get different results for the charge transfer and the energy transfer. This summarizes things much better. Here we're seeing the energy transfer and the charge transfer times as a function of the range separation parameter. If we compare with experimental estimates of the charge transfer time taken from different systems and also theoretical estimates, then what's physically reasonable is when the range separation parameter is greater than about 10 more. We can't do high quality ab initio calculations for this system for a lot of geometries. We can't do the fewer switch at surface hopping with high quality ab initio methods for this system, but we can do it for one geometry. For that geometry, we can make a comparison between what we see with this semi empirical method and the high quality ab initio calculations. We can see that we get the right ordering of states, but only when the range separation parameter is larger than about 10 more. It's interesting to think about what that means. 10 more is about the distance between the two molecules, which makes some sort of sense, but on the other hand, if we were doing time dependent DFT instead of time dependent density functional type binding, that is TD DFTB, we would expect a much smaller range separation parameter. This is indicating that there's still some difference between the two methods. That brings me to the end of my talk. I hope that I'm not running over. I've been very superficial in some ways. I'm just giving you a glimpse of the tip of the iceberg. I've shown you that we can look at factors that influence luminescence lifetimes. I've shown you the importance of describing charge transfer correctly for biological systems and that there are ways to describe it more correctly. Also, the importance of defining charge transfer correctly, if you're looking at an exciton model, I've shown you that we can do surface hopping dynamics. I've shown you that we can start to do this for systems that begin to have very large numbers of atoms, not as large as the proteins that I've seen up to this point, but still large. We will probably go to systems where we have five pentasine molecules and five C60 molecules, but you could imagine taking this in a biological system, looking at the important part of the molecule with this semi-classical dynamics, and then having outside of that some sort of molecular dynamics model to describe the rest of the biological system. In fact, the method that I've seen the most used for biological systems is the Ehrenfest method with time-dependent DFT that's been used to study the green fluorescent protein and also used to study DNA damage. This brings me to the end of my talk. I want to thank you. I want to thank the organizers. I'm glad that we can have this workshop. There is one small thing that I do miss from ICTP, and you can see that in the picture. Thank you. Thank you very much, Mark, for your excellent talk, and I certainly miss the coffees too at ICTP in these circumstances. Okay, very good. So we have time for some questions. Okay, we have Uriel. You can raise your hand. You can ask a question yourself. Okay, thank you very much for your talk. It was super interesting for me. I have in fact many questions, but one of the questions which is a general question is what do you think about you talking this all of the applications that you mentioned are about TDFT in the linear regime in the sense of linear excitation, single excitations. What do you think? There are many now many experiments that have a very, very interesting nonlinear studies. What do you think about the future of linear response to the TDFT in order to try to tackle these kind of experiments? Okay, that would probably depend on the precise experiment. Time dependent DFT can be used in the nonlinear regime. You don't have to use linear response theory, or you could use higher order linear response theory, so second order or third order. One of the places that's important is in biology in spectroscopy, that is two photon spectroscopy. I was surprised when I was looking for information about two photon fluorescent spectroscopy to discover that it's one of the most used spectroscopies in biology, because it is you can use low energy photons that can penetrate into biological samples in a non-destructive way, and there are lots of things in biological samples that will then fluoresce. That has the additional advantage that because the light is coming directly from the molecules in the biological sample that you get much increased resolution. That is something that has been worked out. I'm sure there are many other examples that we could use at higher energy, but that's the one that came to mind first, as probably being the most important and existent already implemented application. Thank you very much. You're welcome. Okay, we have another question. Yavar, please go ahead. Yeah, now we can hear you. Hi everyone, thanks for this nice presentation. Professor Casida, I have a couple of questions. First one is to general. When we work with this biological molecules, mainly zrytropionic or anions, we have a large band gap in Homo Lomo, and also we have positive eigenvalues. It seems these Lomo estates are unbounded, and this leads to lots of spurious and fixtures, excitation, and changing exchange correlation, for example, change our results and affect them. There are some suggestions in literature to tackle this problem, but I'd like to hear from you. What's the best benchmark to check this problem? The best thing to do right now with existent software is to use the range separated to hybrids. They will correct the asymptotic behavior, which will correct your eigenvalues. They're not totally a black box. There are things you have to adjust, but they're designed to fix that problem. Another problem is when we use even these range separated exchange correlations, another problem is the excitation should be described by a lot of unoccupied levels, and it makes it difficult to visualize, to show the shape of excited states, and sometimes we have big diffuse orbitals in visualizations. What's your idea about this? Okay. I'm not sure that there is a perfect solution to that. The one that we often use is to look at natural transition orbitals. This is a canonical rotation of your orbitals to get rid of, to simplify as much as possible, the analysis. Often that works. As you go to larger and larger systems, you probably need to use special techniques. There are special techniques, which I am not as upon as I would like to be. That's on my reading list. There are special techniques that represent all the excitations as a sort of a matrix, which is then color-coded to show you where you've got, where the major excitations are occurring in space or in your molecule. That's very helpful. Thank you. Adi, can I ask one question? Let me, as is a question in the chat, so let me ask that. The question is the following. Is the Hartree-Fock potential taken cautiously because it has something to do with the overestimation of charge transfer or some related stuff? The Hartree-Fock exchange at long distance is taken at long distance because, if you think about Koopman's theorem, Koopman's theorem tells you that the Hartree-Fock orbitals, the Hartree-Fock method is set up to describe ionization and electron attachment. That's charge transfer. That turns out to be a long-distance effect. Hartree-Fock is somehow correct at long distance, but when you have things that are locally excited and you don't have charge transfer, then you don't want to describe things in terms of ionization and electron attachment. You want to describe things as if the underlying potential hasn't changed very much because the electron density doesn't change very much and there, DFT works very well. I've given you a very simplistic, intuitive explanation. I think of why you want to use Hartree-Fock for charge transfer at long distance, but you still want to keep density functional theory at short distance with its dynamical correlation and its up-priority, better description of localized excited states. Thank you. Are there any other questions from the audience? There's a question on slide 42. Yes. The question is, what is the range of the order of time on slide 42? The order of time. Yeah, I guess. How long? There's no time on this transparency. I think we're talking then about phosphorescence lifetimes. These go from nanoseconds to microseconds. Okay. Thank you. There's another question from Uriel. Go ahead, Uriel. Thanks. I have a technical question. When you mentioned these, the coherence correction and when used with service hopping, does it help in the preservation of detailed balance or it has nothing to do with it? Does it help in the preservation of what? I didn't catch the word. Detailed balance. Detailed balance. It doesn't have to do with detailed balance. It's necessary in order to avoid artifactual oscillations between states. It's necessary in order to recover Marcus theory and the limit that Marcus theory should be valid. It's necessary so the different ways of calculating expectation values are consistent with each other. I think most people who are really deep experts in this area understand this, but that's probably only a tiny, tiny fraction of the people who use these methods. I think it's important to get the word out that it is important. Yeah. Okay. Thanks. You're welcome. Okay. Well, thank you for the questions. Thank you for your talk, Mark. We'll basically close the morning session and we'll be back at 2 p.m. in the afternoon. So thanks again, Mark, and thank you all and the first speaker, Max, for today. Bye-bye. Thank you. Thanks a lot. See you later. Bye-bye. Mark, actually, are you there? Yeah, I guess. No, so I had a question for you regarding when you're doing excited state simulations like the surface hopping in the context of biology. So one of the things that happens, of course, is when you excite a molecule and it begins to relax to its potential energy minima, there's going to be also a lot of vibrational cooling, so the system's going to heat up. Right. And so when you're modeling small clusters, there's not a heat bath for this energy to go into. And this sort of relates to your last point in your conclusions, which is that if you couple that with molecular mechanics first field, then you'll be able to dissipate that heat into this running environment. So I'm just wondering about what are the artifacts that one has by not allowing for sufficient degrees of freedom in the system for that vibrational cooling to happen? That's complicated by the timescales that we're dealing with. Most of the interesting photochemical phenomena that we're studying are happening in the order of sub picosecond timescales. Okay. Okay, so if it doesn't happen in 100 or 200 femtoseconds, then probably something else is going to happen to quench the photochemistry. That's also about the limit of the calculations that we can do right now. Okay. Now, that doesn't mean that the coupling with the vibrational motions, with the nuclear motions, is still very important. And I think the coupling with the bath is going to be important. And it's going to be important in determining somehow the decoherence rate, too. Yeah. You've got us the literature. We don't know. Yeah, okay. Okay. I see. We'd like to have the bath then. We know the bath should be in. We know it's going to affect the decoherence rate, but I don't think we know very much. And we certainly don't have exact calculations against what we want to compare. Yeah, sure. Makes sense. Okay. Thanks. Thank you. Okay. So I guess I'll say goodbye. Thanks a lot, Mark. Okay. Thank you. Goodbye. Thanks, Max, if you're still there. Yeah, yeah, I'm there. Okay. Thanks. Bye. Ciao. Ciao. Okay, perfect. Do you also meet people or someone else is doing it? I think Sabrina is doing it, right? Sabrina, are you admitting? Yeah, I can do. It's too dark, but maybe we can wait a couple of more minutes. People are coming in right now. So I think that most of the people are here. We have 60 participants and some are still coming. So maybe I will start saying a few words. So good afternoon to everyone and welcome back. My name is Giovanni Busce and I am one of the co-organizers of this event. Although actually, as you probably guessed, Ali and Angelo are those who did really most of the work. Still, I have the pleasure to be the chair for this session and it's really a pleasure to introduce the first speaker, Michele Czeriotti. So I know Michele since a very long time because when I was a postdoc in the group of Michele Parinello, so big Michele, small Michele, so Michele Czeriotti, Michele joined for PhD and so we overlapped for a few years and then after that Michele went for a postdoc in Oxford in the group of Manolopoulos and then landed finally with an independent position at the EPFL, where is now a professor. So Michele will tell us about equivalent representations for atomistic machine learning. Thank you very much, Giovanni, as well as all the organizers for having here and all of you for convening here virtually. So since I had 30 minutes to give you an overview, an introduction to machine learning, I thought that rather than giving a big blah blah introduction to everything about machine learning, I would focus on one specific ingredient and this ingredient is that of the representation. So whatever you want to do with your machine learning model, be it inference, so predicting a potential or properties or whatever, be it classification, dimensional introduction to be used in accelerated sampling. In all of these cases, the first and in my opinion, most important step is that of mapping the Cartesian coordinates of your atoms onto some mathematical object that can be a vector, a distance, or a kernel. And the way you perform this mapping is play a crucial role in determining the efficiency of your machine learning scheme. So I thought that I would focus in particular on this aspect of machine learning. So you need to find a way of mapping a structure to a representation and actually the point is that there is a huge number of different ways of doing it. You start from the Cartesian coordinates of your atoms and then these three sort of summarizes applying different transformations, you can get to a very large number of different families of representations. You get SOAP, you get better parallel symmetry functions, permutation invariant polynomials, springs. There is a huge zoology, but actually the reason why these are represented in these three is that they are actually very strongly connected with each other and they are strongly connected with each other because ultimately the requirements that you have from these representations for them to be effective and to work well are pretty much always the same. So what do you require for a representation to be an effective way of mapping a configuration of your system onto a mathematical object that you can use for machine learning? So first of all you have several requirements in how the structures are mapped into this feature space which is basically is the mathematical space in which machine learning algorithms will operate. So some very fundamental requirements that you have is first of all that your mapping reflects physical symmetries. This means that if you just change the reference system by translating, rotating your structure or permuting the order of the atoms the mapping should bring you always to exactly the same place. Otherwise you will have representations that basically you have structures that should have exactly the same properties but have a different representation and therefore your machine learning algorithm will have to learn that the mapping is basically to something that has exactly the same properties. Second you want this mapping at the same time to be complete. So this means that if you have two structures that are not equivalent related to each other by one of these transformations they should be mapped to different points. Third you know since nature dislikes discontinuities at least at the microscopic scale the mapping should be smooth meaning that if you smoothly transform one structure into each other the feature space mapping should also lead to a continuous map. A fourth requirement is that the way you build these representation incorporates some ideas of additivity and locality meaning that if you want to represent a large structure these should be in a certain sense that we will try to discuss built as the sum of representations that correspond to chunks of your system. Now why is this useful and important? Well because most of the physically inspired models of the properties of molecules and materials are based on a usually atom-centered additive the composition so basically you postulate that the properties that you want to learn let's say the potential can be written as a sum of terms that are attached one way or another to each atom in your system. Now the point is that for many different families of models the assumption that the target of your regression scheme can be broken down into atom-centered contributions implies that also the way you represent your system for instance the feature vector for your structure should be just the sum of feature vectors that describe atom-centered contributions. If you are using a kernel framework the kernel that you use to compare two structures should just be the sum of kernels that compare individual atomic environments. So why is this important and why is this useful? Well this is because and this is the same idea that underlies so many linear scaling electronic structure methods the idea is that if you have this additivity the possibility of decomposing your system into atom-centered environments then you will be able to train your model on relatively simple systems and then be able to break the pieces apart and put them back together into a different form and make predictions understand something about systems that are more complex and potentially more interesting than the very simple cases that you use for the training phase. Now to try and formalize all of these concepts I have been working together with people in my group and also collaborators outside my group on a formalism to describe atomic representation and actually we have got to the length of preparing a latech package to help you typeset if you want to use these in your documents and basically the idea is that the way you represent a structure should have the same degree of abstractness as what you would use when you do quantum mechanics to describe a wave function that is a vector in Hilbert space that exists and is well defined irrespective of the basis set that you use to represent it. So basically this is a notation that is meant to separate the way you represent your structure which is encoded in the cat and the discretization that you use to actively and practically compute these vectors which is associated with indices in the bra. And then you will see that we use this notation with a lot of flexibility and you can use specific kinds of indices in the bra because you want to express how these indices relate for instance to your basis set expansion but you can also just you know clamp all of them into a single compound index much like you would flatten an array in Fortan or Python. And the point is that this Dirac notation is very useful for the same reasons why Dirac notation is useful in quantum mechanics because it makes for a very natural notation when you want to describe linear operations. So for instance if you want to change the basis that you use to represent your molecule well this is just written as a projection of your representation written on a basis x onto a basis y and the matrix elements yx give you the transformation between one phase or another. If you want to build a kernel, a kernel is just a scalar product in its reproducing kernel libert space and this is just written in a very natural way as an integral over the discretization of the product of the features on the two structures that are being compared by the kernel. If you build a linear regression model once again this is very naturally written with this Dirac notation because you can write it as the integral over the discretization of this would be the linear regression weights times your features that represent the structure A. So let me now try to use this notation to show you how you can go from a representation which is chart Cartesian coordinates that don't fulfill most of the requirements that we ask to an effective representation to something that actually does. So the problem with the Cartesian coordinates is that if you rigidly translate your system or you swap the labeling of these two atoms you get a different vector of Cartesian coordinates and then your machine learning model would have learned that actually these are just the same structure. So the first thing that we do is represent a structure as a density field. So rather than having all of your atoms tagged by their individual coordinates we represent the entire structure as a function in our cube that is built by superimposing Gaussians centered on each atom. And to incorporate information on the chemistry we just use effectively a different channel for each element for each chemical species. Now these you see quite clearly this doesn't depend on the order by which you indicate the atoms. So this is already permutation invariant. But it's not translational invariant if you rigidly rotate your structure this density will rotate rigidly which is not a linear operation. So what you can do is a symmetrize this density by integrating over the translation group. And you can show that in order not to lose all of the information that you have what you should do is integrate tensor products of your density. So effectively you are already at this level computing a two-point correlation function symmetrized over translations. And then what comes out as a very natural side effect of this symmetrization is that your symmetrization representation can be written as a sum over representations that are centered over individual atoms. So you see that just the process of incorporating translational symmetry gives as a side effect an atom centered representation which is what we need in order to exploit additivity of the model. So effectively we are now at the level where our structure is described by an atom centered density built of Gaussians that depend on the distances between pairs of atoms. Now this is not rotationally invariant so what you need to do then is to do one further symmetrization and if you symmetrize this density over rotations you get something which is effectively equivalent to a pair correlation function. And indeed you can build by taking tensor products of the atom centered density before you do the symmetrization you can build a whole hierarchy of n-point correlation functions that are pretty much the same that you would use in the statistical description of liquids which is something that I really like as a you know as a isomorphism as an analogy. And another side effect which is very pleasing is that if you build a linear model on top of these n-point correlation functions which you get are essentially n-body potentials n-body models of your atom centered energy of your atom centered property that you're trying to predict. So what you can show is actually and you know I don't have time to do this in a very thorough manner but pretty much all of the representations that we had at the leaves of that phylogenetic tree can be re-derived as appropriate projections of these objects. And for example you can take the translation-symmetrized density and express it on a plain wave basis and what you obtain is a scattering representation of crystal structure that has been used by Zilletti et al as fingerprints of atomic structures. So they didn't need rotational invariance and so this was good enough but basically that's just the same thing just expressed on a different basis than these two-point correlation function. You can symmetric project this density on a set of pair or three-body functions and what you obtain are the atom-centered symmetric functions alabella pergnelli which are actually also behind the DPMD scheme. And actually my favorite version of these arises when you realize that you can expand the atom-centered density on a basis of radial functions and spherical harmonics because what is very nice is that spherical harmonics are a very natural object to deal with when you want to incorporate rotational symmetries and indeed you can show that the two-point density written in these bases is nothing but the soap power spectrum that has been used so successfully by Gabor Czani and collaborators over the past decade essentially. And you see here if you're familiar with the framework how the notation that is usually used in soap papers maps onto these Dirac notation. And then this Dirac notation becomes useful when you want to expand and extend these to higher body orders because you can then see how all that you need to do which is not necessarily trivial but is essentially to write these integral over rotations and exploit the properties of Wigner D matrices to write these integral which is a continuous sum as a discrete summation over the NLM expansion coefficients. And a little aside that I want to point your attention to is that even though this construction gives you a representation that fulfills the requirements of smoothness and symmetry invariance and additivity it doesn't necessarily grant you completeness. So this is very easy to show and it has been known for quite a long time. If you just consider two body correlation functions it's very easy to build examples of structures that have the same g of r essentially but are different. And we have been able to show thanks also mostly to an extremely smart actually graduate student when he started contributing to these. You can also build counter examples that show that soap and also the four-point equivalent of soap are not complete. So you can build the pairs of structures that are not equivalent but have the same three and four body representations. So this is still an open problem how much this is relevant for actual models is a other open problem but it's kind of interesting. So just to say that the problem of representing structures efficiently is not yet fully solved. Now all that I discussed is far dealt with the problem of predicting properties that are invariant to symmetry operations. But if you want to describe something like a vector or more in general a tensor you don't want your model to be invariant to rotations. If you want to describe a vector actually general tensors can be expanded in a way that correspond to spherical harmonics again. And so if you want to learn a component that transforms like a spherical harmonic of order lambda which you really need is a set of features that transform under rotations like a spherical harmonic of order lambda because then when you build a linear model this linear model will have naturally the rotation properties that you need to predict a tensor. And actually you can do these in the same framework of these symmetrized density correlations by incorporating what is essentially a YLM, a spherical harmonic in the integral that you need to do. You see here formally and doing a correlation of two densities, two atom-centered densities multiplied by a spherical harmonic that provides the basis to expand a property that shall transform like a spherical harmonic. And actually once again this is done much more naturally much more easily from a computational standpoint if you expand the density on a basis of spherical harmonics because then the object that you want to compute can be evaluated in a discrete way. And if you look at these those of you who are familiar with the summation rules of angular momentum would recognize that what I'm doing is basically taking true angular momentum summing them over with the Klabsch-Gordon coefficient and getting an object that still transforms like an angular momentum. So something that we recently realized again thanks to a super smart master student who is now a PhD in my group is that actually you can build this up and you can build a whole hierarchy of equivalents that can be obtained by summing up spherical invariance and you can basically build them recursively using the summation rules for angular momentum. And then this is also a very efficient way of computing invariance. So also if you're interested in just learning scholars and so you say oh why are you talking about all of these covariance stuff I don't need it. Actually this is also a very efficient and elegant way of computing invariant features. So the problem that you encounter if you start doing these is that the number of features that you need to evaluate explodes exponentially with the body order that you want to consider. And so what we are proposing is that you can come up with an iterative contraction scheme that we call nice that basically allows you step by step every time you increase the body order you then sub-select or contract your features so that you keep the complexity both computational and in terms of number of parameters of your model under control. And you can show that this allows you to build linear models that compete in accuracy and actually are better than the fanciest deep neural network that you can come up with. So if you have very very good features they can actually perform better with the linear model than when you use a complex non-linear model because you have essentially much less risk of overfitting. Now up to this point I have worked with an expansion of anatomic density and locality has been a fundamental ingredient of all of these. So in the end you want to predict the properties of a large system and you are expanding it up in atom-centered contributions. So a very important question that you should be asking yourself is how local is local? And so the idea is that a naive thing that you can do is try to expand the cutoff radius as a convergence parameter. Unfortunately this doesn't really work as well as you would expect. So what you can see here is I'm trying to build the models actually of the cohesive energy of some small organic models and then building models using different cutoff radii. And what you can see is that if you have a small training set a short cutoff radius works very well. But then the accuracy of this model will saturate as I increase my training set size because you can't describe anything that relates to interactions that extend to more than two ohms. So you can increase the cutoff distance and this will make your model more expensive because you need to sum over more neighbors when you build it. But actually yes your model would be more efficient if you have a large training set size but it's not necessarily the most efficient model. So if you have a small training set size or if you want to extrapolate this might not be the best model that you can use. So actually the idea of making a regression model better just by increasing the cutoff radius of your model is not only an issue for computational complexity but also it's not necessarily the best thing that you can do in order to get the best accuracy throughout the sizes of training sets that you could have. So one possibility is to combine multiple models as we are doing here and the weights are effectively hyper parameters that we tune so that we reflect the importance of different length scales in determining the properties. And if you do this at the price of computing multiple models so this is more computationally expensive but you can beat at least the model complexity issue and you can get something that for these small organic molecules allows you to predict the cohesive energy with an accuracy of 0.14 kilocalories per mole which is ridiculous because this data set is based on B3ip and so the reference energetics is no better than 1 kilocalorie per mole so we are predicting very well reference energy which is not very good. So the problem is that if you want to now go to a system that instead is dominated by long-range interactions let's say electrostatics you are in for a lot of trouble because electrostatics has this very long range behavior that can lead to very paradoxically effects. So for instance this is a typical example from something like Kittel, an undergraduate textbook on solid state chemistry. If you try to compute the cohesive energy of an ionic system just by summing over spheres or larger and larger size or cubes of larger and larger size the two series converge to different results. And this is a consequence of the fact that the series that you obtain is not convergent in absolute value and so you get all of these paradoxical effects. And the point is that if you try to use a local machine learning model to predict something that depends strongly on electrostatics let's say the binding curve of true charged residues that's never gonna work. You can see that of course a short range cutoff won't be able to predict the tail and the tail is going to have a very substantial effect. I mean these are hard, so oh my god this is more than a electron volt. And this is a log scale so you really would need to increase your cutoff to 50 angstrom before you can even approach convergence of these interactions. So what we propose to do here is basically to try and keep a local representation but one that incorporates the long range physics. So how can you do that? The idea that we are exploring is to start from this decorated atom density. So here you really see how useful is this Dirac notation. Basically we just do a non-local transformation and we transform this into an atom density potential which is the electrostatic potential generated by this density. Of course this is not a real potential because this is not a real density and for instance each atomic species will give rise to a different atom density potential. But the important thing is that once you symmetrize this potential you end up with a local description. This is the potential computed around just one center but it has the proper asymptotic contribution. It can be actually computed very efficiently in reciprocal space. This is basically just solving the Poisson equation. This is something that has been studied to death. When you use this to predict something like the cohesive energy of charged molecules you can see that even though we are only training on short range information since we capture the correct asymptotic behavior we get very nice extrapolation up until the fully dissociated limit. More recently we have been playing around with multi-scale family of features that basically combines local and non-local features in a fully equivariant symmetrized way. What you can see is that first of all you have an extremely elegant mapping of, sorry, somebody said something. I mean I'm almost done. So you get a very nice mapping on multiple electrostatics but you can actually use these to learn all sorts of long range physics. This also works well for instance to learn the dispersive relation between apolar residues. I'm very excited about these. You can also use these to learn the interaction of a molecule with a metal surface which is, you know, this is the most polarizable system that you can think of and still these objects gives you the function a functional form that is flexible enough to learn this kind of physics. So with these I just want to wrap up so we have time for questions. So overall the message that I want to give is that representations are perhaps the most important step in the construction of machine learning model so you should use them randomly or because of, you know, personal history but you should think very carefully about what it implies to choose one representation over another. Luckily for you most of the representations are actually different ways of looking at the same object and so in the end I think it's not by chance that many competing approaches end up giving relatively comparable performances when applied to the same kind of system. I think that one of the most interesting open challenges relies on well understanding completeness and also in how to incorporate long range behavior because most of the features that are currently being used are deep down local descriptions that are built upon some kind of divide and conquer kind of mindset. Finally yeah I just want to leave you with a few, well I want certainly to thank all the people in my group. I highlighted the people who contributed most to these efforts but everyone has contributed one way or another to understand and to apply these to systems that pushed us to develop all of this machinery and so thank you for your attention and I hope that you have questions. Thank you. Thank you Michele for the nice talk so now we have time for questions and so as usual I invite people that have questions to either write them in the chat or to just write in the chat that they want to ask it live. Okay so we have a first question from Zrabani is it appropriate to talk about irreducible representations? So there are you can in two ways so one is you can speak about irreducible representations. Okay let me go back a second. Okay so there are two ways in which you can talk about irreducible representations. So if you look at this object okay the rotation of symmetries enter both in the cat and in the bra and it means two different things so in the in the cat it has to do with the representation of the target property so if you want to describe a target property that is a certain symmetry then you need to choose a family of features that has a certain symmetry and in this sense it's very useful not to use Cartesian tensors but to map them and this is something that is very well known map it onto irreducible spherical tensor representations and then you can ask yourself what is the most compact possible way to represent the features with a certain set of rotational properties and this has to do with the indices on the left side and this is another very complicated thing so seeing these as elements of angular momentum is super useful because you can actually use angular momentum recoupling theory to derive what are the linear independent features. So in terms of the Clebsch-Gordon coefficients? In terms of the Clebsch-Gordon coefficients actually so it's a little bit complicated basically every time you combine a new set of spherical coefficients these might be getting a little bit too technical but basically this is like when you sum angular momenta it matters the order by which you sum them and you have indices that keep track of this and all of this is very well understood in angular momentum theory and since you can map these features on angular momentum theory you can exploit all of these to figure out which are the linearly independent features of a certain body order but it's a complicated so these these paper actually has 50 pages of SI that try to get a little bit more in-depth into these and the Gdiaz and Igam is really the person you want to speak to she understands this much better than I think. Other questions? Don't be shy come on maybe I will ask one in the meanwhile so when do you foresee that these techniques will be it will be possible to apply to these techniques to small biomolecules in water? So I so personally I think that one big hurdle towards doing these for something like solvated systems is going to be the long-range part. These far people have happily lived without long-range interactions and I think that Bing Qing is going to show later how you can actually get away and get very accurate results without long range. I think that most of these successes are connected with the fact of working with bike systems. When you start you know when you start having something like a surface I think it will be very very hard to get away without a proper description of long-range terms. Now Tristan Bero and collaborators I mean okay let's let's go in order of history. So Arthrit Nongnuch and Jörg Beller have done perhaps the most simple thing that you could think of which is learn point charges and then use an electrostatic model based on point charges. I think that the fact that this hasn't been used much after even though they have done this I would say almost 10 years ago says something about the fact that perhaps it doesn't necessarily work so well. I mean when we tried to baseline with an electrostatic model it didn't actually work so well. Then Tristan Bero has done something which is more in the direction of doing a multiple model where you learn the multiples and build an electrostatic model with multiples. I think that in general on a sort of a philosophical level the issue is that what's the point of doing machine learning if in the end you go back to doing models that people have been doing for 40 years. And my answer to that is that I think it's very useful to have a machine learning framework that reduces to these limiting models in some sense. For instance most of the stuff that we are playing around with right now reduces to time tested functional forms when you build a linear regression model. But then you have more flexibility because you are not trying to fit multiples which is always a pain because if you learn multiples of isolated molecules you have all sorts of double counting issues. So if you use this functional form but in the end you train to the target that you care about which is the cohesive energy and the forces I think that's a very promising way to go. But I would say five years perhaps less. Okay. I'm taking notes. Okay. Other questions or comments from the audience. There's a question from Lorenzo. So these representations depend not only on the geometry of the system but also on its topology and therefore on the choice of the force field which is the storage burden. What's MB and what is GB? Sorry. I don't know. Oh, megabytes. Okay. I thought GB was me. So it depends. It's not a question I can answer with the number. So some of these stuff that we're doing now with these nice features. I want to be honest with you. Running on a custom built node with one terabyte of RAM. But this is more... So it all has to do with how you select your features. So the full set of features that you need to compute grows exponentially with the basis that you're choosing. But the ones that you need in practice is usually much, much, much smaller. So we need a lot of RAM to quickly go through all the possible features and select the ones that we need. But once we have selected them, you are speaking of, I would say, of the order of one kilobyte per atomic environment. So it's not much. But there is a lot of... There is a lot in between an implementation that works for demonstrative purposes and an implementation that you can actually deploy without special hardware and without knowing exactly what you're doing. So this is one of the things that we are actually currently working on. There is a lot of implementation effort to make sure that you don't need one terabyte of RAM. There is no fundamental reason why you do, but you have to be very smart in how you implement this body order contraction to avoid that. Okay, good. So I don't see any more question in the chat. So we still have a couple of minutes. If anyone has a question, don't be shy. If nobody has questions, I have fantastic five slides about chemical learning, which is kind of cool. Do you want to? I don't know. We are supposed to have a break now. But anyway, I mean, this is just to say that there is a lot more that you can play with. And for instance, you can play about how you represent different chemical species. And how you, for instance, cross the periodic table. So you can try if you have a system with many, many, I focus mostly on the problem of describing geometries. But if you are doing material science and so you want to work with the entire periodic table, you have another dimension of complexity, which is the chemical complexity. And there is a lot of very fun stuff that you can do on that thought. Okay, just that. Okay, great. So I think we can thank Mikael again. And then we will reconvene in 13 minutes. So at three o'clock. See you in a while. See you. Thank you. Okay, good. It's three o'clock. So I think we can wait maybe another minute or two that people come back from the coffee break. Okay, so maybe we can start with the second talk of this session. So the next talk will be given by Binkin Chang. So Binkin made her undergraduate studies at the University of Hong Kong. And then she moved to the group of Mikael Ceriotti for her PhD that she completed a couple of years ago. And now she has a junior fellowship, junior research fellowship at the Trinity College in the University of Cambridge. So apparently she will answer to my question to Mikael Ceriotti about using these methods in solvated molecules. And indeed the title of her talk is more than water with the help of machine learning. Thanks, Giovanni. And I thank all the organizers for putting this together. I too miss the coffee in Priesta very much. So before I start my talk, I just want to thank this very long list of collaborators, especially like Mikaelino, my PhD advisor, and in particular Alex Reinhardt, who might have collaborated a lot over the past two years. So modeling water with the help of machine learning. So before I begin into the sort of the glorious details about the system of water, which Ali has talked extensively in his wonderful talk yesterday, I want to look at the big picture and ask the question like, what are we trying to achieve? And what is the biggest question that we are trying to answer here? And actually, one of the greatest scientists of all time for direct, whom of course invented the direct notation that has found many use in various fields as well. So an answer gave the answer to this question almost a century ago. So as it turns out, the fundamental law for understanding most of the materials and molecules on the planet Earth are almost completely known. The problem is really that the solution to these equations are just too complex to be attained. So the solution, according to direct, the Holy Grill of computational physics is just to develop approximate practical solutions to make the computation trackable. And with that in mind, I also, with that in mind, let's look at the picture. So this is a very standard picture of the various computational tools that we are routinely employed today. We have the Schrodinger equation, which is intractable. So and then at one level below, we have the quantum Monte Carlo methods and couple cluster, which are usually of reference quality. And then the workhorse of the field is more or less a density functional theory. One can of course use empirical force fields or model force fields as such as Leonard Jones. But in doing that, we gain a lot of efficiency, but also sacrifice the accuracy. And to make matter worse, when we do simulations, we're not just interested in one system at a given snapshot. We want to do extensive statistical sampling. And the reason we want to do that is because for any given system, the microstate is sort of a frozen snapshot of the state of that system, like where the atoms are, what are the velocity. And that's not enough to describe the thermodynamic state. So one of the most famous equations like this one, which is actually engraved on the Boltzmann's gravestone, the Boltzmann equation basically states the entropy is the Boltzmann constant times the log of the number of microstates. So that sort of provides the basis of more than statistical mechanics. And of course, what people don't always say is that this particular equation, which is on Boltzmann's grave is actually wrong. Because indeed, when we when we account for the microstates and when we do when we consider entropy, it's not just the sort of the number of microstates that matters, but we also need to apply the Boltzmann weighting, which is the exponential of minus the enthalpy of the system and divided by kb. We have to do such weighting to compute this relative entropy. And there's really this anthropic part that makes the task of sampling and make the task of doing thermodynamics so difficult in optimistic simulations if you are aiming for exponential accuracy. So with that in mind, and let's go into this specific system, the system of water. So although it's a similarly very mundane, very simple system, there are many, many mysteries about water. So for example, we know water is densest at four degrees Celsius, ice floats on water, which is quite unusual because usually solids are usually a little bit denser than the liquids. It has quite high melting point, very intricate nuclear content effects that comes from the fact that we have a lot of hydrogen atoms, which are very light. So they can behave very much like waves rather than the classical particles. And another mystery is we know the ambient pressure ice have two different polymorphs, the hexagonal form and the cubic form. Now we know in nature the hexagonal form is more stable, but there hasn't been any sort of accurate and satisfactory explanation why this is the case. So with that in mind in this talk, within the 30 minutes, we will talk about how does machine learning potential help us solve this problem also with the big picture in mind, like the challenges in doing exponential thermodynamics. And then I want to talk a little bit about locality, like Michele has talked very extensively about this notion of locality or actually the lack of locality in many systems. So I'll explore the external locality a little bit more, but more from a sort of a practitioner from a empirical point of view. And we talk about several applications that has relationship to this locality. Okay, so the starting point is that the key machinery that makes this exponential thermodynamics possible is the use of the machinery and potential. So when we're using density functional theory, which is the workhorse of sort of initial calculation these days, because it actually consumes one sixth of all the supercomputing power worldwide, it can handle a system size of hundreds of atoms on the timescale picoseconds. And but the worst part is like it usually has a cubic scaling that is not ideal. And machine learning potential are nicely overcome these limitations. And particularly on the system side, it also has a very nice linear scaling, thanks to the locality, which we will discuss a little bit later as well. Now, so why does machine learning potential work? So I want to just to give a very intuitive explanation, because I'm a like a simple person and I like simple explanations without using a direct notations. So the reason how I rationalize why this work is to think in terms of atomic environments. So if I ask you the question, we have these two systems here. And what do you observe? You will tell me on the left, we have a mofa system, a liquid like system. And on the right, we have a solid system. And the reason why you think this is a solid like system on the right, is because all the atomic environments on the right configurations are very similar. And indeed, if you look very closely, you can even find very similar local environments in liquid configurations as well. And moreover, you probably also find certain local environments in the liquid that resembles the solid. I'll also delve into this point a little bit later. And what this means is that similar atomic environments are encountered in your simulations over and over again. And that's what makes an initial thermodynamic molecular dynamic sort of a bad idea, because you will encounter these atomic environments over and over again. But every time you see each environment, you have to turn on the engine, you have to turn on the quantum chemistry engine, you have to solve the quantum mechanical equations over and over again. And this is a little bit of the waste. So the idea that the core idea of machine learning potential is that we sort of like build a memory of all the sort of relevant atomic environments that we are going to encounter. And we also make this assumption, which is the locality assumption, we assume the energy and forces of each environment has this near-siteness. It only depends on that small environment of a finite cutoff. And with that, we have a very simple work for building a machine learning potential. It's basically a two-step process. Step one, we collect a bunch of relevant environments, and then we can do some interpolation between them so that we get a smooth potential energy surface. So this is a very simplistic view, but of course, there are many algorithms to do this. But in the end, the performance are quite comparable in my experience. So go back to water. So what we did is that we first select a good density functional that was a benchmark from AIMD simulations, as well as against reference data. And then we collect bulk water. These are all liquid water configurations. Each snapshot has 64 water molecules, and we have about 1,000 plus configurations. 1,000 comes from quenches with classical water, and then the rest come from passing to grow molecular dynamic simulations. And then I just show you this is the standard 45-degree line that all people do machine learning shows. We fit the energy and the forces for these configurations. And then we use the machine learning potential to do some predictions. So the first thing we did is to predict the density isobar of the ice and liquid water. So first of all, the dash lines here, the dash lines here are the classical water. So classical ice and classical liquid water. And when I say classical, it means like the hydrogen and oxygen are treated like point particles. So we don't consider nuclear quantum effects on them. And then the solid line is when we switch on the nuclear quantum effects. And it's quite interesting because the nuclear quantum effects actually makes water a little bit denser. Now we also compare with experiments. So the stars here are from experimental data. You can see the machine learning potential actually captures the density maximum, the 4-degree Celsius density maximum in the liquid water. And then we are more or less sort of a few percent within an experimental era. And this type of performance is on par with the sort of the best water model available. So we also computed the radio distribution function. In this case, we also consider the classical as well as the quantum mechanical water. And only when we switch on this nuclear quantum effects, then we get a good agreement with experiments. Now I'm interested in thermodynamic properties and in particular, the chemical potential of different phases. So the standard workflow of doing this is that we start from a harmonic reference and then we do a thermodynamic integration to switch that, to switch this harmonic system with an analytic free energy to our physical system. And then to go from there, we turn on nuclear quantum effects to to embrace the wave-like nature of hydrogen and oxygen activities. And I want to go a little bit deeper into the thermodynamic integration method. So thermodynamic integration is really a very sort of a broad concept. It really means that when you have two systems and then you can construct either a physical or virtual path, thermodynamic path between them, and the free energy difference between these two systems can just be evaluated as that you go along this path and you accumulate the infinitesimal differences in the free energies along this path. Now in practice, we have a recipe for doing this that works for most cases. And it basically goes like we do a switching from a harmonic to a harmonic as mentioned before in the MVTI sample under a constant volume at low temperature. And then we make a jump to MPTI sample to get a measure of the Gibbs free energy. And then we go up in the temperature to get the Gibbs free energy at the desired temperature. Now, so this is what we use. However, now I'm going to show you something that may make you feel a little bit uncomfortable about machine learning potential. So I did this computation for the chemical potential difference. So this is the classical chemical potential difference between cubic ice and hexagonal ice using two different fits of the machine learning potential. And here you can see the results. Okay, these are very fine numbers on the order of mainly electron EV per molecule. But you see, we don't really get consistent results. In one case, this is like positive, on the other case, this is negative. This really tells us like because of various reasons, right? Because perhaps it's because of the errors in the fitting like the small residue errors in the fitting or this is maybe we didn't account for the long range effects when we construct the machine learning potential. We actually do not get the consistent answer when it comes to computing the very fine details of the thermodynamic properties. So how do we overcome this? And go to the actual, at the initial level. So what do we do is that if we think of the machine learning potential surface and our ground truth, which is DFT in this case, the DFT potential energy surface, what we want to do is that we want to promote our machine learning potential result at the DFT level. And these are not just for one data points, but for the many possible configurations on this potential energy surface. So to put it mathematically, what we are really doing is free energy perturbation. We are saying like if we write out the Gibbs free energy of the system described by DFT and we do the same for the Gibbs free energy of the system described by the machine learning potential, we can show that the Gibbs free energy between them. So the correction term that we need to apply is really just a weighted exponential, weighted exponential average of the difference in potential energy between these two levels of theory. So we did that. It's actually much easier than we thought because usually free energy perturbation gave you very bad statistics. However, in this case, because of the machine learning potential surface and the DFT potential energy surface are exceedingly similar, I was able to converge the difference. I was able to converge this correction delta mu using like less than hundreds of snapshots. So we did that. And you can see with these two fits of neural network potential, after we bring them, we correct them and promote them to the DFT level, we are getting consistent results. So this is in the end what we used with the thermodynamic integration part has been talked about before, but in the end, we always promote our results to the abutual level. Now, coming to the results. So this is the chemical potential difference between the hexagonal and cubic eyes. So we did the usual thing. We computed at the neural network level and then we promote the results to the DFT accuracy. And in the end, we also add nuclear quantum effects. In this case, surprisingly, nuclear quantum effects is quite significant in stabilizing hexagonal eyes. So without it, it might be the case that hexagonal eyes ended up being more stable. We also did it for ice and liquid water. We did the interfacial pinning assimilation to come to the chemical potential difference. This is also average over different proton disorder states. And we bring it to the DFT level and add nuclear quantum effects for both D2O, which is the heavy water and H2O. And in the end, the agreement with experiments is remarkable. So we are within 2% of the experimental error of the melting point within 2% of the error compared with experiments. And even like the difference between the melting point of H2O and D2O are very well captured. Now, so here comes the second part of the talk about this notion of locality. And I was really curious. So what is really the extent of it? And a very easy thing to check, since we have different fits of neural networks anyways, I was looking at like, because the first step is to compute the atomic energy associated with each atomic environment. I was wondering if the atomic energies for the same environment but predicted by different fits of the neural networks, are they the same or are they even related? So here you see the parity part and they are not related at all. And these are for the atomic energies of oxygen or hydrogen, right? And then I was like, maybe the molecular energy, you know, like each, if I sum up the atomic energies of the hydrogen, hydrogen and oxygen in each single molecule and compare this molecular energy and how it looked like. Still the correlation is very poor. Now, this is not a problem per se, but at the time I was exceedingly interested in computing the heat conductivity from the liquid. This is like partially expired by the fantastic work of Stefano Baroni in CISA about heat conductivity, this very nice notion of gauge invariance and so on. So the traditional way of computing the heat conductivity is from the Green-Kugel relation, which basically says that you do a integral of the outer correlation function of the heat flux. Now, what's the problem here? So first of all, when you do this integration of the outer correlation function, the Green-Kugel says that you should integrate the time equals to infinity. In reality, this is not possible, because the longer you integrate, you also accumulate this Gaussian noise at the long time tail. So eventually, your estimate will diverge. So which means in practice, you have to set a cutoff time for this outer correlation function, which also introduce a bias, because it may have a long decay tail, but you are not kept in that, because you cut off to a finite time. Now, the second problem is about this definition of J. So it has this first part is the atomic energy, which is not unambivalent in this machine learning potential picture. And also, there's a mutual forces, this mutual force between two atoms, Fij. That is also fully defined if we are looking at many body potential. And then suddenly, I had a revelation. It turns out you don't need Green-Kugel to compute the heat conductivity. Actually, I was going around in Cambridge in my office and other offices, I was going around like, do you believe that you can compute heat conductivity just from a molecular dynamics trajectory without heat flux, without any energy argument, but from the empty trajectories alone. Nobody believed me at the time. But here it is. So it turns out that you have a density field in space. This is actually a particle density field in space. You can take a forward expansion of this density field in space. Now, you are getting a density amplitude of this density field on each wave vector k. Now, this row of k actually fulfills the hydrodynamic equation. And if we look at the autocorrelation function of this row of k, they have two modes. So one mode, the first mode is related. It's the heat mode. It's related to the heat conductivity. And the second mode is because of the propagation of the sound. So basically, what that means is that if you solve the hydrodynamic equation for row, it has two poles. It has two poles at zero on the imaginary axis that is related to the heat dissipation. And another one that is related to the speed of the sound and the sound attenuation. So I'll skip the detailed mathematical derivation. But in the end, this is what we see. So I plot we first did a benchmark on the Lennard-Jones system. Because for the Lennard-Jones system, there's a pairwise potential, and you can use a green couple without any problem. And so this is the autocorrelation function for the density wave. And you can see the simulation and the fit. Now, if we look at a smaller spectrum, we can see this very sort of typical two modes. One is the one at the center at zero is the heat mode and the other mode related to the sound of propagation. And then we can extract the heat conductivity from different values of k from different wave vector. And we extrapolate this value to k equal to zero, which then give us the macroscopic heat conductivity. So for Lennard-Jones, we can do a comparison with the green couple. And we get a perfect sort of parity over a very wide range of thermodynamic conditions. We also did this for water. I will not go into details. And as well as for a high pressure hydrogen system that uses a recently developed machine learning potential. So we don't care what is the functional form that potential, all I need is just a pure molecular dynamics trajectory. Now, and now I'll just very in the next five minutes, I'll just very quickly go through this other aspect of locality, extracting out ice-like local environments from liquid water. This is a very recent work. So remember like for the water training set, we only have liquid water configurations, but we were able to describe hexagonal ice and cubic ice very well. And why is that? And can it be extend to other ice faces? So back then, and like Edgar, also like Michele, they have a skin, they use a generalized convex hull approach to screen a very large and very big, very extensive ice data set and selected out like about like 57 also ice faces that are very promising. They have the potential of being experimentally realized. So I was like, how well is our, so you can, so they were using a sketch map to plot it, you can also use a KPCM map. So Alex has talked about this dimensionality skin during the talk of clustering yesterday. So if we take these ice faces and we compare those ice faces and we plot those ice faces together with our liquid water configurations on the same map, unsurprisingly, the PCA map are able to distinguish the ice and water faces very clearly. And for the obvious reason, they should be very different. However, if we instead not visualizing the whole structure, but only the local environments, we can see the local environments that are covered in those ice faces are almost sort of included by the by the range in liquid water. This means liquid water contains these ice local environments. So what's the implication of this? The implication that the implication here is that we can train a machine learning potential only based on the liquid water configurations and use that to predict the very diverse set of ice faces. So this is what we got. We compare the density predicted using the machine learning potential against the very various DFT methods using different packages or different functional as well as experiments. And we got a very good agreement and also for lattice energy. And we even computed, this is super tedious, the vibrational density of states, even so this is for all the ice faces we have considered. That's a little bit hard to see. But if we're zooming in different faces at various sort of density values, we get very good agreement for the full non-density of states. I'm almost done. So this is the last slide. And so this is something that I'm very excited about. So we are computing the app initial phase diagram. So it's actually mostly Alex Reinhardt. But we are computing the ice face diagram. This is more tedious than I anticipated because of nuclear content effects and we have to average over different proton disorder states. But it's looking good so far. So with that, I would like to end my talk. And I just want to give a shout out at this very new Python package that we have developed. And this is a command line tool to do very simple tasks related to machine learning in materials and chemistry. And thank you so much. So thank you, Binstein. So now we have time for questions or comments. As usual, just type your question in the chat or just write that you would like to speak and then speak. Don't be shy. Okay, there's a question from Srabani. What about water near an interface like that of a protein? I think it's likely not to work because of the reason Mika mentioned, right? The long range effects is going to be more important because in water, the long range effect is obviously important. It's a sort of a polarized molecule. The reason we were able to sort of safely ignore it or just simply correct it is because in bulk water it kind of like cancels out somehow. But when you have an interface, I don't think that would be true anymore. So I would be very cautious about it. Maybe I have a question related to this. So what about the difference in long range interactions in water and ice? So when we compute the ice face diagram, we actually did this free energy perturbation correction. So there I would say, yes, we do consider the long range effects. And surprisingly, it's not even that large. Okay, Adi, would like to ask a question? So thanks, Pinching, for a great talk. I have a question related to Max's presentation this morning, where he was using information from experiments as well as simulations to build more refined models for biological systems. So there's so much data on things like the various of the dielectric constant as a function of temperature, as a function of pressure. Could one think of, and how would you, if it's possible to reverse, to come up with a machine learning model that comes, that's reverse engineered from experimental data like that? So I think, so two things, right? I think we, first of all, we are not clear if you just use aboriginal, what kind of accuracy are you going to achieve, right? I think to do this more refined correction, we need first to know what it is like without the correction, right? And secondly, I remember like Giovanni Bussi has on skin that uses this sort of reweighting and this mutual information kind of skin to bring experimental insights, experimental observation. I think Giovanni did that for like RNA sort of system, right? But I can see this is also applicable for water. Thanks. Okay, there's another question from Elham. Thanks a lot. Can you explain why you did not use another functional like blip instead of PB0 to the water behavior? So back then we used Rafa B0D3. This is because, so first of all, Thomas Marklin has some aboriginal molecular dynamics paper using this particular functional to show that it works. And also Garrett Brandenburg did some benchmarks against diffusional Monte Carlo as well as couple cluster data. So to show like Rafa B0D3 is a very good choice. And also I want to mention like we actually have some work in the pipeline, which is, so we have a training set of water, right? And we, you can recompute the same training set using different density functional, and then you can fit very without much effort, a new machine learning potential based on this new functional. So what this means is that because one can easily run various benchmarks and simulations using the machine learning potential space on different functional. This can be a new way of benchmarking DFT. Okay, thanks. Then there is another question from the chat from Suman. Could you identify different phases of ice locally, equally efficiently, or the efficiency is different for different phases? So I would like to understand it a little bit more. So in this case, in practice in this case, when I think I'll switch the slides to that particular one. So in this case, we have the ice phases available, and then we are projecting them on the same map. So I kind of know a priority which ones are from ice phases, which ones are from liquid water. I think if I, if I didn't have that knowledge, it will be quite difficult to distinguish the local environments, I would say. Of course, for bulk phases, it seems to be quite possible. Then Ali has another question. So Big Cheng, I have another question for you. So I think this observation that, you know, that you find fingerprints of local ice-like environments of water is very interesting. It's an interesting way to characterize fluctuations in water. Just to extend and think about this even further. Have you thought of whether the same argument would hold when you introduce other impurities like ions? So you know, if you have an ion in the liquid, will you still be able to see those environments being created through the fluctuations in water? I have never tried, but it would be very interesting to try. Especially, I think, because my intuition would be like some ions, they kind of like strengthen the part of the hydrogen network. So that might even have some effects in promoting light ice. Right, exactly. Because there's all this very interesting physical chemistry on structure of makers and breakers. It could be very interesting to look at this, but we'll discuss this later. Okay, other questions or comments? Don't be shy, we still have a couple of minutes. Ah, yes, there's a question. Could the overlap of the ice and water be just a result of your projection into a lower dimensional space? I think it's referring to the PCA. Yeah, this is certainly possible because when you do any type of dimensionality reduction, you lost information. So that's why we treat. But the interesting thing here is that the machine learning potential here, we only feed it to the information coming from the liquid water. And then it predicts the property of ice faces. So I see that as a stronger argument about this coverage of the ice-like environments. But I see this PCA map as a more visual way of demonstrating what's going on, what I'm trying to say. But I view the success of the machine learning potential here as a much stronger proof. And also, by the way, so a related point is like, so stored rice, I think, in actually, I learned this after I did this work, actually in the 70s or 80s, I would say. So they were looking at this like oxygen, oxygen angle, you know, there's oxygen, the angle and that stuff. So their conclusion is like the angles, if you do the probability distribution of angles and bonds in liquid, that covers ice. So that's sort of like, and of course, that doesn't capture all the information of the very rich local environment as well. So there's another question also related to functionals. So Marcelo would like to know if the generated potential is sensitive to the different functionals or is it, or it doesn't change too much? Right. So actually, we are doing some work on this. And I think for now, I can say like, mostly on how different functional changes the behavior of face diagram doesn't seem to have a major effect. But then we are already operating at the hybrid functional sort of run. So maybe that's why I imagine if we just do a PPE that may have a major effect. Okay, good. So any more questions or comments? Again, we have a minute. Don't be shy. Don't lose this opportunity. Okay, so if there are no more questions, maybe we can close the session here. So thanks to everyone, to all the speakers and to all the participants. I'm very happy as an organizer that there are so many interesting questions. And it's a pity that yeah, that the chair has to read all the questions. Maybe it would be nicer to see also the faces and hear the voices of those asking the question. But anyway, we are seeing a lot of we are hearing a lot of interesting discussion today. Okay, good. So see you tomorrow morning at quarter past nine as usual. And have a nice day, everyone. Bye. Ciao. Bye bye. Bye.