 starts now. So welcome, everyone, to the second day of the ICT Peace Hitchhiker's Guide to Condense Matter and Statistical Physics that is dedicated to machine learning in condensed matter. So before today's lecture, by Bing Ching Chang from University of Cambridge, I'd just like to remind you that we'll have next two appointments on next two Wednesdays. But in particular, next Wednesday on 27th of January, we will start earlier. So 12.30, European time. So just check on the program to be sure that you're here. And then the final lecture on the 1st of February will be by Hungras Gila that will start in the regular time at 2 PM. So that's it for me. And today, I leave the floor to Alex Rodriguez, who will introduce Bing Ching Chang and the discussion. So Alex, please. Hello to everybody. First of all, thank you, Bing Ching, for being here. Well, let me introduce Bing Ching. She is now a, and please, Bing Ching, correct me if I do something wrong. She's now at Cambridge. She's now a fellow in Cambridge. I don't know which, an early career fellow. It's exactly your position, I think so. Yeah. And today she has done a lot of work on machine learning on atomic systems. And we are going to listen some of her work and some kind of before an introduction. And then the Q&A for this session will be, I think Bing Ching will stop every quarter of hour or something like that. And she will replay your questions. I will remind you to write it in the Q&A box, please, because in this way, we can check it. That's all from my side, please. Thanks, Alex, for the introduction. So let me share my screen first. So as Alex mentioned, I'll talk about machine learning, but more importantly, the application of machine learning to materials modeling. And as Alex also mentioned, I'll stop every 15 minutes to answer questions. So the first hour of the lecture is about basic notions. So that's the fundamentals. And during the second hour, I'll talk a little bit about more applications, which are basically my own work. I think I actually estimate the first part is a little bit longer than the second part, but that's okay. We can move a little bit to the second part as well. I think the fundamentals are, in any case, more important than the recent work I have done. Okay, so let's get started. So let's start from the very beginning, like what is first principle calculations? What is app initial methods? So the app initial methods in this context means that we predict material properties that we predict the motion of electrons and nuclei starting from the Schrodinger equation. Now, the significance of quantum mechanics and the Schrodinger's equation can be kind of reflected by this quote from Paul Dirac. The rest is chemistry. And what he means is not something dismissive towards chemistry, because what he really means is that the fundamental laws and equations that are necessary for material property prediction are completely known. But the difficulty is that these equations are just too complex to be solved. So what is the solution? According to Dirac, we can develop proximate practical methods so that we can make the salvation of quantum mechanics trackable. Now with that in mind, let's look at the methods that we have. So we have the Schrodinger's equation that cannot be solved exactly except for the simpler system, such as a hydrogen atom. And then we have reference methods such as quantum Monte Carlo or a coupled cluster methods. And then the workhorse of the field is the so-called density function of theory with different levels of approximations as well. So typically we're able to model a system with hundreds of atoms on a timescale of 10 to the minus 12 seconds using density functional theory. And then we can model the system using empirical force fields, meaning that we just assume two atoms would interact to each other via sort of empirical or simple analytic functional form. But the empirical methods, they lack the quantitative accuracy. Now, so with that in mind, with this preliminary slides in mind, now what we are going to talk about during the first hour of the lecture, we'll first talk a little bit about the fundamentals of statistical mechanics as well as its mystic modeling. And then we will change gears slightly to go to the machine learning part. So David, I think David and co-workers gave some talk about the basics of machine learning. But here we are going to focus on how do we translate our physical systems? How do we translate the system of materials and molecules into the mathematical language that machine learning model can use? How do we translate the physical problem into a design matrix? And then finally, I'll talk a little bit about machine learning potentials using machine learning methods to approximate the interactions between atoms. So let's start with a little bit of thermodynamics. So from a thermodynamic point of view, the Gibbs free energy of a system has two compositions. We have an enthalpy contribution and then we have an entropic contribution. So as the simplest example, when we have a solid and liquid and with the free energy can both be expressed in these terms, when the temperature is low, the free energy of the solid is lower. So the system stays solid. But when the temperature increases, the entropic contribution becomes more and more important. So the solid will melt and the stable phase will become the liquid phase. Now from a statistical point of view, the free energy is actually a measure of probability. So the free energy difference between the solid and liquid can be expressed in terms of the log of the probability of observing the liquid divided by the probability of observing the solid. This is a very simple expression, but however, bear in mind that the probability of observing the liquid or the solid means the probability of observing all the possible configurations that belong to the liquid state. And there are many, many of them. And they all look similar, but they all differ by where the atoms actually are. Now the founding father of statistical mechanics, the Boltzmann. So the Boltzmann has an expression of the entropy on his grave and reads the entropy of the system is the Boltzmann's constant times the log of the number of microstate. A microstate is just a snapshot that I have been showing before. It's a specific realization of the coordinates and velocity of all the particles. Now this expression engraved on the tombstone is actually not correct because the correct expression, not only we have to consider, just consider the number of microstates, but we also have to weigh them properly using the Boltzmann's distribution. So the anthropic term can be expressed in terms of the weighted sum. So by far, you might have noticed we already have a conflict. Meaning that, so in the sense like when we are computing the energy of the system, we have a whole spectrum of quantum mechanical methods with different level of accuracy as well as costs. So to compute energy accurately, we want to go to the more expensive and more accurate methods. But however, to sample the number to sample the microstates in a satisfactory in a comprehensive manner, then we want the methods to be cheaper. So there's a conflict there and during this talk I'll explain how can machine learning resolve this conflict. So just some very fundamental and very important information on how do we actually sample the microstates? Because in no circumstances we want to enumerate the microstates because it's a very high dimensional object if we have n atoms in our system. So we will be sampling this six times n dimensional space and that is not trackable. So there are two pathway forward and the first one is the Monte Carlo sampling. So the end game is that we always want to sample from the Boltzmann distribution. So Omega is the microstate and the microstates are populated according to the Boltzmann distribution weighted by the entropy of each microstate. Now, so the fundamental principle of Monte Carlo sampling, the important sampling is that we instead of sample generate uncorrelated microstates, we have a sequence of correlated states. So I start from a certain point in my phase space and I make a move from let's say Omega to Omega prime with a certain probability. Now it can be shown that if the probability P of Omega is invariant under this move, then we eventually will be ended up sampling the correct Boltzmann distribution. However, this integral here is quite difficult to implement in practice. So in reality, when we do the sampling, we often impose a stronger condition which is called detailed balance. So in the detailed balance, we are saying that we have to, let's say if we have to microstate P of Omega and P of Omega prime, the move back, the move from Omega to Omega prime and the move backwards, the ratio of them should be equal to the ratio of the probability of Omega and Omega prime themselves. Now, the one possible option of sampling also according to this detailed balance is the famous metropolis sampling. So it is saying that if so typically we have a diagonal matrix and when we are saying that the probability of, if the probability of Omega prime is higher than P of Omega, we always accept this move. And otherwise we accept the move with a certain probability that is equal to the ratio between the probability of these two microstates. So to show this graphically, like so this is a very simple scenario with like in 2D with limited number of particles. So each time we propose a move, if the energy of the system goes down, meaning that if the probability goes up, we always accept the move. But otherwise we accept it conditionally depending on the energy increase of this move. So this is the one way of sampling our microstates. But honestly like Monte Carlo is not that common these days but the underlying principle is very nice. Now the other method to sample the system is to use molecular dynamics which in fact is relying a Newtonian mechanics. So what we are doing is that it is a very simple picture. We compute the force on each particle and then we propagate this trajectory just using the classical Newton's equation of motion. And we also have to do some additional tricks to control the system's temperature as well as pressure. And the Bible for this type of sampling is the Don Frankel and the Baron Smith book on the sunny and molecular simulation. So there's a, however, that is not the whole picture. And the reason for that is it typically when we do molecular dynamics simulations, the time scale is fairly short. And if the system has to minimum in its free energy profile. So back to this example of solid and liquid. And solid and liquid can be understood as two equilibrium states on the free energy profile. And what would happen if we just do a kosher molecular dynamics is that when we start from the liquid state and do the molecular dynamics, the system would remain in the liquid state. And if we start from the solid, the system will also trap in the solid state. This is because the thermal fluctuation is often not enough to overcome that there's very high activation barrier. So to overcome this, there's a method that is called a meta dynamics that is developed by Exondro Lio and Michele Palinero. And I would like to use a high keen assembly to explain how this works. So can you see which mountain is this? So this is the Matterhorn. And I think this can also be viewed from Italians, both the Italian side as well as the Swiss side. So the profile of the mountain can be likened to a free energy profile. And the values are the equilibrium states and the peak is similar to the activation barrier. If we want to travel from A to B, we have to spend a lot of energy to climb up the peak and wait for a very long time. However, there's an alternative which is a tool to hiking. Now, what if I travel on this landscape and I deposit a heap of sand whenever I go? So imagine if you do this long enough, what will eventually happen is that we will even up this free energy landscape and make it flat so the system can go back and forth without much resistance. And another nice aspect of the method is like at the end of the simulation, just by checking how much sand is deposited at each spot and we take the inverse of that and then we take the negative of that, then we recover the actual free energy profile. So, and there's an alternative method for performing free energy calculation and that is called thermodynamic integration. So the idea there is that, so we have, so let's start from a slightly easier approximation. So we can use the minimum potential energy in a zero Kelvin, we can neglect the entropic contribution altogether at low temperature and just use the minimum potential energy as the proxy for our free energy or we can assume that our system behave like a harmonic oscillator and we take the harmonic approximation and add the harmonic contribution of the free energy. And then there's also the option of doing everything properly and taking into account of n-harmonicity as well. So in that picture, we perform a thermodynamic integration. The idea, this is a very general concept. So the idea there is that if we have two systems and somehow I can connect them using a reversible thermodynamic path, then the free energy, sorry, here I flipped the sign should be like from B to A. Now, the difference in free energy between these two systems can be just expressed as the integral of the finite difference when we go along this path. So that's a very general idea and this path in particular can be anything. It can be a path along thermodynamic variable like temperature pressure or other things or it can be a switching parameter between different Hamiltonians. So in practice, we usually do a switching between a harmonic system and our actual system because the harmonic system has analytic free energy that we can express very easily. So in practice, we typically follow a recipe, right? We start from a harmonic reference and then we integrate through a real crystal and then we can choose to go up in the temperature or go between different pressure. So this is the recipe that I would always use with some justifications because like we have to do the switching between harmonic to a harmonic at the relatively low temperature because when temperature is high, this integral becomes divergent. This is because at high temperature like diffusive behavior happens to make the system very non-harmonic. And another thing that we often do we typically do, if we want to compute the Gibbs free energy we first start from a MVT isombo, we start from the constant pressure and temperature isombo. This is because the pressure is not well defined for the reference harmonic system. So we cannot place a harmonic system under the MPT, the constant pressure isombo. So there are typically also some number of tricks that we can play, but in the interest of time I will not go through them. So I just want to point out the sometime ago we have written a tutorial style paper to explain the tricks as well as the fundamentals of computing Gibbs free energy using thermodynamic integration. And this is accompanied by the Python notebooks for data analysis as well as sample input files. And I can, I'll just quickly show a couple of examples for to show why this accurate free energy estimation is important. Then I'll try to answer a couple of questions. Now, the example here I'm showing the free energy of the vacancy formation in BCC iron. So we have the approximation that it just using a potential energy difference. And then that is the black line here. We have the ones that use a harmonic approximation. And then we have the accurate estimation taking into account of and harmonicity using thermodynamic integration. And low temperature, they are very similar. But however, as the temperature goes high even the harmonic from estimation is not sufficient enough to capture the accurate vacancy formation free energy. So here's a similar example that computes the stacking for free energy in FCC metals. And you can, it's the same idea. You can see like at high temperature the harmonic or the potential energy approximation. They are not only a quantitatively inaccurate but even the overall trend like even it can even get the sign wrong. Okay, I think now is a good times to stop to answer some questions. So there's a question from a question rower. Like question, do you want to speak up and ask the question yourself? Or are the students allowed to speak with the zoom setup? Okay, so question ask, will we only be dealing with equilibrium systems? I think this is mostly the case. Although like strictly speaking when we are using my dynamics the system is seeing a quasi equilibrium. But yes, this is mostly the case. And Raj, do you want to speak up? So Raj asks, yes? Like allow Raj to speak if you want. Yeah, yeah, I think that's better. Hello ma'am. I would like to ask is detailed balance and the material sampling are linked to each other. Or this detailed balance is applicable in all condition as it has certain condition that when we are going from one microstate to another microstate. So it has the same probability of the, when we move from the second one to the first one. But it seems that this condition is not favorable for each of the experimental conditions. So you mentioned this experimental condition. Can you elaborate this a little bit more? So what does this mean? Like the probability of going from one of the phase like a material goes from one phase to another phase and he's not capable to recreate the same thing. Like come back to the same like material has two phases. One is in the low temperature phase and one is in the high temperature phase. And material is such that when it's moved from low temperature to high temperature it's phase transition takes place. And in that case there is a certain probability that it moves from one state to another state. But if reverse condition is not possible. So this detailed balance that we have applied that you have said in my Montevallo sampling is strong condition is of detailed balance where we equate these two product of two probabilities. So that I doubted that that can be happen in those situations. So I want to clarify two things, right? So first of all is that here we are talking about micro states, right? So a micro state is like basically a list that can be understood a list of factors of where atoms are and what their velocities are. So let's say the liquid phase. So a phase is a different concept. A phase would contain many, many such micro states, right? So the concept of detailed balance applies when we are talking about micro states. And it's also worth pointing out like detailed balance is like more like an assumption rather than like what happens in reality. It's sort of an assumption that enable us to do sampling particularly Montevallo sampling in an easy way. And so this is another thing like metropolis sampling is a special implementation of detailed balance. So there are many ways of realize, there are many ways of design the moves that satisfy detailed balance and metropolis sampling is a particular simple form. Okay, thank you. Thank you. Okay, so back to where we were. So far I have talked about the classical system, the classical free energy. And the classical term here applies refers to the fact like we are assuming that our particles or our nuclei are classical particles, they are point particles. And the point particles can be characterized by their center of mass in space. And that is sufficient. However, in reality, many nuclei that we have, especially the light ones such as hydrogen, helium and lithium, they are sufficient like the classical particle treatment breaks down. And this is called the nuclear quantum effects. And the nuclear quantum effects affects many aspects of our system. So it affects the particle momentum distribution and isotope fractionation. So isotope fractionation means that in different, for example, in different phases of water, that equilibrium concentration of deuterium or oxygen 18 are different in different phases. That it's different in gas phase, in the liquid phase and the ice phases. It affects the pH, heat capacity, diffusivity and many things, right? A nice intuitive example of the differences that like light water is perfectly drinkable, but heavy water is poisonous at high doses. And how do we take that into account in our simulation? So we use the path integral of formalism. I left this slide for reference, also upload the slide to the website later. But this is just some, but our gloss over them. But the essential idea is that because the nuclei, they are not classical particles, right? So the momentum and kinetic energy operator, they do not commute. So what do we do in practice? In order to compute the density of states, we have to decompose the system into separate replicas. And each replica would live in a much higher temperature because at the high temperature limit, the two operators becomes more commute. So in practice, what we do is that we use the ring polymer molecular dynamics formalism. It's as I mentioned before, instead of having a nuclei as a classical particle, we represent each nuclei as many, many particles and they are connected by the ring polymer by a harmonic spring. So the total Hamiltonian, the system is the Hamiltonian of the individual replicas and the harmonic springs that connects the different replicas. Now, just to give an example of, and just to give us more intuitive example, under the classical picture, we have the equi-partition theorem. So as we are saying, each degrees of freedom, we have a kinetic energy of KBT divided by half, but because of the quantum mechanical nature of our nuclei, this is not always true. So for example, in this water molecule, we have three modes for the hydrogen. We have the oxygen and we have the oxygen-hydrogen and the bond vibration. And we have this sort of the breathing mode and we have the out of plane vibration. And because of the nuclear quantum effects, because of the zero point energy and so on, that each more actually carries a kinetic energy that massively exceeds the KBT half. And they are also different amount of quantum mechanical kinetic energy in each mode. Now, when we try to characterize the free energy difference between the classical system to the quantum mechanical system, we can still use the thermodynamic integration to perform a reversible switching between the classical and the quantum mechanical system. And in practice, when we write the expression down, this is equal to, the integrand is a function of this quantum mechanical kinetic energy, which we can compute from the ring polymer molecular dynamics. Okay, I saw no question in the Q&A, so I'll continue. So far we have talked about atomistic modeling as well as a little bit of statistical mechanics. So the next part is, so far, I think I have paint a pretty grim picture about atomistic modeling. There are many parts that needs to be taken to a consideration and moreover, each step means a lot of computation, right? So thermodynamic integration is not cheap. Metodynamics simulation is not cheap. And if we want to consider nuclear quantum effects on top, because we need to not just simulate one system, but we need to simulate many copies, many replicas of the same system. And there we also increase the computational cost by 20 times. Now, so here comes the machine learning part and how can machine learning help in this case? So first let's talk about representations. How do we represent our molecules? And we have many types of systems, right? So we have, let's say, a peptide, like a protein, or we can have a crystalline system with different arrangement, different symmetry groups. Or we can have also a box system with certain defects. It's a little bit difficult, so it's probably a dislocation somewhere. So usually the starting point instead of looking at all the atoms, instead of looking at the box system, we first divide a system into a set of atomic environments. And each atomic environment will be like if I sit on a central atom and I cut off a sphere with a predefined cut off radius. And that is my atomic environment. And the reason why we want to do this is because imagine if you want to compare two systems with different number of atoms. And that is very difficult, right? So by decomposing the system into a set of atomic environments, then we can focus on representing the atomic environments instead of a system with varying number of atoms. And there are many popular representations which I'll talk about. Now the idea is that we first do this decomposition and then any observable of the system, like let's say, can be represented in terms of the local contributions. So let's say the phi here is the descriptor, it's the representation of the local environment. And then we can have observable associated with this local environment. And so for example, we can have an atomic energy that is characterized by the local environment and the total energy of our system can be expressed as the sum from local contributions. So let's dig further about this local, how to represent local environment. So this picture doesn't just apply to representing materials and molecules, but it's a very general idea in machine learning. So in machine learning, in the end of the day, we want a representation that tells us how similar our samples are. We can characterize the similarity in terms of the kernel matrix, the K, or we can have a distance measurement. So the things that are similar have a kernel matrix, have a kernel that is high, that is close to one and a distance and they are very close in our distance space and vice versa. So the idea is that we want to have these kernel or distance metric for our atomic environments. Now, let's look at these two atomic environments and how do we compare them? So first of all, we can represent the atomic environment. So remember that we are sitting, the center is an atom, right? We are sitting on an atom now. So we can characterize the local environment as a list of displacement vectors. So basically a list of like neighbors and the displacement vectors characterizing these neighbors. Now, so here's a question, right? So we have a list of neighbors. In this case, they are all hydrogen atoms. Imagine if I swap to a hydrogen atoms, right? So the physical system doesn't change because the hydrogen atoms are indistinguishable to each other. However, if we look at this displacement vector, that changes, right? When you permute two atoms and we don't want that. So what we do is that instead of having this list of displacement vectors, we put a smearing, we put a three-dimensional Gaussian distribution on top of each atom, on top of each neighboring atom. So now instead of having a list of vectors, this displacement vectors, now we have a density field and the advantage is immediate. Now, like it doesn't matter when we swap two atoms as long as they're of the same species. So now we can overlap these two density field and then we compute the degree of overlap in order to characterize how similar these two atomic environments are. However, there's another problem, right? So if I rotate one of the molecules, it's still the same molecule, right? The physics doesn't change, but the integral would change. So how do we overcome that? How do we incorporate rotational environments? So the trick here is that we not only compute the degree of overlap for one particular orientation, but we do a rotation and we compute the average degree of overlap on all possible orientations. So in this way, we remove the rotational degree of freedom as well. So, and you might have guessed, this integral is quite unpleasant to evaluate for each pair of atomic environments, right? Suppose you have N atomic environments, then like computing this guy for between each pair would be have a quadratic scaling and that is not ideal. However, like Albert and Gabo and Rishi, they have a very nice formalism to show that one can actually express this, we can compute very efficiently this kernel, this similarity by expanding the individual role, this density field in terms of spherical harmonics. And then like computing the K here would be some, would amount to some simple operation using the spherical harmonics coefficients. Now, so far we are talking about the atomic environment, right? But we are interested, eventually we are interested in bulk materials. So what happens there is that we can, if we need to combine the atomic descriptors into a global descriptor and there are many ways of doing this and the easiest is that, okay, I can just take the global descriptor, the global feature by taking the average of the individual contribution from each atomic environment. So that's the simplest thing we can do. Obviously, there are many other choices available. Okay, so now I think it's a good time to stop ask and answer some questions, okay. So Oscar asked, so the atomic environment is somehow like an application of the renormalization group, isn't it? I don't think so, but then I'm not an expert in renormalization group. There could be a way of expressing atomic environment as such, I do not know. And then there's a question, does local environment means up to nearest neighbors? Yeah, that's exactly right. The local environment is taking the nearest, it's taking, no, sorry, sorry. So depending on how you define nearest neighbor, right? So typically we take a cut off, right? And usually you can go to the first neighbor shell, you can go to the second neighbor shell, it's really up to you. It doesn't have to be the first neighbor shell, but it is the nearest atoms within the certain cut off. And then there's, okay, so there's an anonymous question here. So how large should be the variance of the Gaussian kernel associated with each particle? And this is a fantastic question because it actually is actually very deep. So intuitively the sharper the Gaussian that you use, let's say the more sensitive your kernel matrix, your kernel measurement is for atomic environments. Imagine if two atomic environments differ very slightly, but if you use a very sharp Gaussian, then you are still not getting close to one when you compute the similarity of the two. So intuitively that would be a good thing, right? Because you want some, you want the measurement that is as accurate as sensitive as possible. However, in reality, this is not the case because eventually we need to use our representations to do some machine learning, to do some regression. So in that way, we want to incorporate a little bit more smoothness into our representation. So it's a balance between the two. So there's another question about, what about interactions between atomic environments? Symbol summing means they do not interact. And that would be a correct statement. However, when we come to machine learning potential, the architecture is actually more complicated. And we do consider interaction. Okay, I think I'll stop now, stop the questions now in the interest of time and move, no, before, if there are other questions, we can also answer them, and other questions that are not addressed at the end of the talk, I'll answer them during the last part. Okay, so we talk about representation, now we can use this representation. And the simple, and there are many ways how we can use them. We can build a low dimensional map using dimensionality reduction so we can visualize our system. And we can do some pre-processing and sparsifying the data set. We can do clustering or we can do regression. So I'll talk about this dimensionality reduction. So the dimensionality reduction at its core is basically I have a high dimensional data. And I want to find a low dimensional representation that best preserve the relationship between the high dimensional data. Notice that the terminology I'm using here is kind of vague. And this is intentionally so, because depending on how you define this, what kind of relationship you want to preserve, you basically ended up with different dimensionality reduction algorithm. So the more popular ones are PCA. And if you do it in the kernel matrix, it's called kernel PCA, like tisny and UMAP is getting very popular these days. So I'll just talk about the PCA principle component analysis because it's really like sort of the mother of all the other dimensionality algorithm. And the principle are usually pretty similar. So the question is like, what is preserved? What is this relationship that I have talked about that is preserved during the PCA analysis? So in a simple example, I have two dimensional data, I found the principle component L1 and L2 and I project the data set onto this first principle component. Now in PCA, we first, we have the data in the high dimension, I have the data in the low dimension. So this is all about the covariance of the data. So I'm trying to preserve this covariance, which can be expressed by T transpose, sorry, X transpose times X in the high dimension and the same at the low dimension. Now mathematically, one can show that this can simply be done by finding the first the small D eigenvectors of the big C covariance matrix coming from the high dimensional data. And these eigenvectors are sorted like with a descending value of eigenvalues. So we are looking for the eigenvectors like D, eigenvectors that are associated with the largest eigenvalues. And so this is just a little bit of a derivation. And of course, we can find the eigenvalues and eigenvectors just by using the symbol linear algebra tricks by solving that by determinant becomes zero. Now, so graphically, so let's look at this graphics again. So we are looking at L1 and L2 that preserves the covariance of our data. And this can be done, of course, can be done in higher dimension and just a simple illustration how this happens in 3D. We take the two principal components and project it down. Now we have returned this simple Python package that does these analysis automatically. So this can be just because using a simple bash command line. So you use one bash command and then you are able to generate a low dimensional map as well as do some other analysis automatically. So these are the selected bash command that one might want to use. So here just show some examples. So here's a learning dipeptide and typically people try to visualize the system using the Raman-Changin plot. So using two dihedral angles, far and short. We can also do this type of analysis automatically using the ASAP package. And then we can see we came up with something that is quite similar to the Raman-Changin plot. So we have the principal component one and principal component two. And but however, remember that we didn't know a priori like dihedral angles are important. You don't need a prior knowledge to come up with such automated map. Here's another examples here. I'm showing like this kind of map can distinguish a classical water and water that is from with nuclear quantum effects. And then there's also a projection of the QN9 dataset and the map is able to distinguish small molecules with different compositions or like branch molecules or long carbon chain. Okay, so wondering should, okay. There's still this last part of machine learning potential and as I expected a lot. So this part is a little bit longer than the second part. So maybe I'll stop now and talk about machine learning potential during the second lecture. And now I'll answer any of the remaining questions for the first two parts. So Bibi asks a question, Bibi, do you want to ask this alive? Hi, so. What decides the, which representation of molecules we should choose? On what basis do we choose the representation of molecules? Right, so the soap representation that I introduced earlier has this advantage that it's completely general. You can use this for any, so these are all generated with the soap representation and they are very different class of materials. So it's sort of applicable to many many things and we don't have to think, right? And of course there's the option of a handcraft representations. This is often done in the chem informatics community. The handcraft representations has the advantage that you can incorporate your prior physical understanding of the system, right? Because when we think about it, we can also think the dihedral angles phi and psi to be the representations of our system, right? And by using like phi and psi, we are incorporating our prior knowledge about the peptides. We know that the dihedral angles are often important to characterize these systems. And we know the positional side chains are probably not as important. Does that answer your question? Yes, thank you. Yeah. And the larvae also ask a question. I'll read it out instead. So how do we go about large order interaction in atomic environment case? And how do we incorporate it? So the answer is typically people do not incorporate it. So there are some ongoing work from Micaela Chariotti from York Baylor that they have certain scheme of incorporating the long range interaction. But as of today, this is not the norm. So typically people do not. Which is a little bit of shame. But surprisingly, without accounting for large interaction, the machine learning framework seems to be rather accurate for many things. Hello. Yes. Yeah, yeah. So actually I had a question. So while you were explaining about the PCA, about the data there. So I was just wondering if the data there has something to do with the information which you explained in the last slide about the atomic environment case, where we have this information about this displacement vector. And yeah. So is it somewhere related to this PCA data set so that now in this PCA analysis, what we do is now we have this information of this atomic environment. Is it something like this? I just wanted to relate it with the previous slide. So which, do you have a slide number that we can refer to? Okay, so you were explaining about the PCA. Okay, I actually didn't look it. Yeah, maybe before this one. Okay. Yeah, I hear you talk about this data set, where, and I think this data set you explained in the previous slide to this one. Yeah. So here, yeah, you were explaining these data points. So I was wondering if these data points were somewhere related to this, in the previous case, the information of the atomic environment. Right, right. So this depends on, so let me explain this example. So let's say, in this example, each point represents a particular dipeptide, alanine dipeptide configuration. So what actually happened is that we take each configuration of the molecule, right, and then we compute the descriptor for each of its atomic environment, right? And from there, we compute the global descriptor for this small molecule, right? In this case, by taking the average of the atomic contribution. Now, so now I have a vector for each molecule in my data set, right? Now, let's say if I have a 10,000 molecules in my data set, right? So now I have a 10,000 by the dimension, dimensionality of the descriptor matrix, right? And then I project this, that matrix down in two-dimension. Okay, okay. And that's what we see here. So each point represents the low-dimensional representation, the 2D coordinate of the design matrix of the vector for a small molecule. Okay, okay, okay. Okay, thanks. Thank you. Okay, I think we are a little bit over time. I think we can take the break now. And when we come back, I'll go through the machine learning potential as well as some applications. Yes, let's take how much do you want to break? So on the schedule, it says a 15-minute break, right? Maybe we come back at 2.15. Perfect, let's back to 2.15. Thank you. Hi. I think we will give people one, two minutes to come back and then you can continue with the vector. Let me start showing my screen. Yeah, thanks. I think almost all people is there, so whenever you want. So I'll slowly start. So I hope people had a chance to grab a nice coffee. So just to a brief recap, so we talk about atomistic simulations and we talk about translating materials and molecules into design matrices using representations. Now it's the machine learning potential part. So the machine learning potential basically is rumored to be this device that has the accuracy on par with amnesia methods but at a cost that is just a little bit higher than force fields. So to compare it with density functional theory, so the density functional theory can handle hundreds of atoms on a time scale of picoseconds and with the machine learning potentials we can do much, much more. This is mostly because of the favorable linear scaling compared with the cubic scaling in the case of DFT and I often run it just on a laptop. Now, so how does it work? So first of all, I would like to use a black box view, this black box view. So we have certain configurations out of atoms in our training set. We label them, meaning we compute the energy and forces using DFT, although it can be other electronic structure methods. And then we feed this information to the neural network although it could be also a Gaussian process or something else. And then when the new configurations come seeing then the machine learning model can give us a speedy predictions of the energy and forces associated with these structures. And of course, the black box view may not be very satisfying to you. So here's an alternative view that starts from the atomic environments. So if I invite you to look at these two configurations here, what do you see? And your answer might be that, okay, on the right, we have like a solid configuration on the left. This looks amorphous. The reason why you think this guy here looks like a solid, it's because if we look at individual atomic environments, if we sit on an atom and look at our neighborhood, we see very similar atomic environments over and over again. In this case, it's FCC. Now, actually, even within the liquid, we have these similar atomic environments. It's very hard to see, but it's there. We can even find solid light environments in the liquid. I'll elaborate this point later when we move on to the examples. And of course, this is very hard to identify using naked eye. And therefore, we rely on these popular representations, including the soap representations that we have talked about during the previous lecture to characterize the atomic environments so we can compare them. So what does it mean in practice that we have similar atomic environments over and over again? This means that if we compute all the configurations using quantum mechanical methods, it's quite wasteful. The reason for that is that, so if we take the locality, if we take the near-sightedness approximation, if we assume the energy associated with each environment is almost completely determined by the environment itself, by the nearest neighbors, then we encounter these environments over and over again. So each time, if we have to recompute them by solving quantum mechanics, that doesn't seem twice. So what we can do instead is that in our memory, in our memory, we have a collection of atomic environments and the, together with the energy and forces associated with these environments. So we have these in our memory and when the new configuration comes in, when new environment comes in, we can just compare this new environment to the existing ones in our memory and then give a prediction. So to summarize, to construct a machine learning potential, we basically follow a two-step process. I mean, regardless of what kind of machine learning algorithm or what kind of representation that you actually use, we first collect a bunch of environments and then we do interpolation, right? So that basically sums up the machine learning potential. And with that, I would like to move on to the applications. So let me share my screen again. Okay. Cool. Applications, the system of water. So this is a ubiquitous system, but the system of water has many mysterious properties that we often take for granted. So for example, the ice floats on water and that's quite unusual because typically, we think solid to be often denser than the liquid and the liquid water is densest at four degrees Celsius. There's also a significant difference between heavy water and light water. We have many ice faces, at least 18 of them. And one of the mystery, another mystery is that we have two polymorphs, ambient pressure polymorphs. We have the hexagonal ice, one H, and the qubit ice, one C. So energetically speaking, the enthalpy of them are basically degenerate. However, in nature, we only see hexagonal ice. That's why all the snowflakes are hexagonal and white stuff. So we trained a machine learning potential. This is trained based on the hybrid DFT, wrap B zero plus the D three dispersion correction. We trained using the Baylor-Pollinello network and there are about 1,500 configurations in the training set, both the energy and forces. The training set is publicly available and you are more than welcome to look at it and play with it. So this is the standard 45 degree line that all the machine learning work show. And then we can use the machine learning potential to do actual simulation. So here I'm showing the density isobar for three phases of water. We have the liquid showing in red. This is from simulation, as well as the qubit ice and hexagonal ice. Qbit ice and hexagonal ice have the same volume. Now let's first look at what we have two lines here. So what are they? So the dash lines from classical simulations treating nuclei as classical particles. And the solid line is accounting for nuclear quantum effects that we have explained before using the passing to grow molecular dynamics formulas. So you see actually the nuclear quantum effects makes liquid water a little bit denser by about 1%. And also for ice, it makes ice a little bit denser. That's quite contraintuitive. We also capture the density maximum of liquid water very nicely at about like four degrees Celsius. So the experimental results are marked here using the stars. We are just a couple of percent within the experimental observation. And here I show the radio distribution function, oxygen, oxygen, oxygen, hydrogen and hydrogen and hydrogen. And again, from classical molecular dynamics simulations as well as the passing to grow molecular dynamics simulations. So for oxygen, oxygen, the nuclear quantum effects doesn't play an important role. But for the oxygen, hydrogen and hydrogen, hydrogen, we really have to turn on the nuclear quantum effects to match very nicely the experimental observation. Now just a brief recap of thermodynamic integration. And again, I got the sign wrong here. This should be flipped. So in reality, in practice, what we do is like we do a thermodynamic integration from a harmonic system to a classical system. This is the first step of integration. And then we do the integration from a classical system to the quantum mechanical nuclei considering nuclear quantum effects. So as a reminder, like for this last step, we are just integrating using the quantum mechanical kinetic energy. Now I'm going to show something often make people feel uncomfortable. So I compute the classical chemical potential difference between the aforementioned eyes. I see the cubic eyes and hexagonal eyes. And I did that using two different fits of the neural network potential. And you can see the results are different, right? They are not just quantitatively different. And you can see the sign is different. Although like arguably the energy scale here is very small. So we are looking at nearly electron volts per molecule. But still, this kind of tells us just using the machine learning potential may not be able to capture the very fine difference between free energies. So what do we do? So to present this sort of schematically, so this is the problem that we have. So we have a potential energy surface, which is our ground truth, which is DFT in this case. And then we have the machine learning potential potential energy surface. Now they two are very similar, but they will inevitably be some small differences, right? And the difference may come from different reasons. So for example, the machine learning potential doesn't incorporate long-range interaction, but obviously it's there. The difference may also come from maybe the training set. It's a little bit sparse at certain points, right? And then there's also this residue difference because of the fit. Now, how do we promote the machine learning potential results to the DFT level? How do we do this correction? And we not just want to do this correction for a particular configuration, but for all the relevant configuration. So to write down this mathematically, we can write down the Gibbs free energy of the system described by DFT. So this is the log of the partition function, and we can do the same for the machine learning potential. Now the difference between the actual chemical Gibbs free energy and the machine learning one can be written in this free energy perturbation form. So we are taking the average of the exponential of the difference between the isombo average of the exponential of the difference for each configuration. So typically free energy perturbation converge rather horribly. However, in this case, because the two potential energy surface are very similar, we can actually converge this estimator, very rapidly. Typically we use like less than 100 configuration. So we come to this term for different phases of water under different thermodynamic conditions. Here I divide the Gibbs free energy by the number of molecules so we can plot out the chemical potential. So the difference is small, right? On the order of one million electron volts per molecule. But this makes a difference. After I put this correction term back on top, this graph that I showed before, the chemical potential difference between cubic ison and chesorganized, then I'm getting converged results. The predictions from after correct them, the predictions using two-fifth of neural network are consistent. So to summarize, here's the workflow of the initial thermodynamics. So the first part is what we have talked about before. We do a thermodynamic integration to compute the classical and quantum mechanical free energies. And then in the end, we always add a correction term on top to promote the neural network to the initial level of theory. So here are the results. We have the cubic guys and we have the chesorganized. We compute the neural network results. We add the correction and then we add nuclear quantum effects. So here we can see nuclear quantum effects actually has a major effect. It's significantly stabilized, has agonized to make it ever so slightly more stable than the cubic one. So without nuclear quantum effects, maybe the snowflake that we see in nature will not have this nice hexagonal shape. And another one is the chemical potential difference between ice and liquid water. And we computed using umbrella sampling on coexistence systems. We first compute the neural network results and the same story. We corrected to the DFG level and we add nuclear quantum effects. We can even consider not just H2O but D2O, the heavy water as well. Now also to compare with experiments and we can see we are really within like a hair compared with experiments. And not just that, even the difference between the melting point of D2O and H2O, we can predict that very accurately as well. And notice that the D2O and classical water which is the red line and the green line for the D2O here almost overlapped. So the classical water and D2O have the same chemical potential and why is that? This is because when we were doing the thermodynamic integration and we look at the integrand, there's actually a reversal of this integral. So there's a little bit of cancellation of nuclear quantum effects. So this is for water. And next example is on hydrogen. And then we will also dig a little bit deeper on this locality argument on this near-sightedness. So for hydrogen, hydrogen is the dominant component in the center of giant planets such as Jupiter. So what happens is like on the surface, the pressure is low and the hydrogen takes the familiar dye molecular form. And but as we approach the center, the pressure goes up and the hydrogen starts to dissociate, they become atomic as well as metallic. So experimentally, this is very difficult to pull. So this is a deeply controversial topic of where, under what pressure and temperature does this transition from molecular hydrogen to metallic hydrogen happen as well as the nature of this transition if it's like first order or smooth transition. So using a DFT molecular dynamics like this transition, because we are restricted to the small system and relatively short simulation time, the transition can mostly only be approved from this kink on the equation of states. However, with the neural network potential, we are able to scan the whole face diagram. And so here I'm showing the color scale. Here is the average order parameter. Here is defined by the fraction of the bounded hydrogen. So here at low pressure, low temperature, we have mostly molecular hydrogen and at higher pressure and higher temperature, we have atomic hydrogen. Now this black line here is the melting line that we have computed. And then the purple line and the orange line here are the location of the density maxima and the heat capacity maxima. So that is if we plot out the density and the heat capacity of the system under isobar conditions, and then we trace the location of the maxima and we plot on the face diagram. So that is the purple and the orange line here. Now, from this graph, the transition looks smooth, but we want to characterize it a little bit more. And we explain the system in terms of a regular solution model. So in this picture, we are saying we're saying like the system can be understood as the mixture of two liquids, like the atomic liquid and the molecular liquid. So in the regular solution model that some of us might have studied during the undergrad, the total Gibbs free energy of the system as a function of the fraction of one of the component can be written as the sum of the chemical potential from this component and a mixing entropy, right? So this is the mixing entropy as well as an anthropic penalty of mixing. So under this regular solution model, when a temperature is high, our system makes perfectly when a temperature, but when a temperature is below the critical point, then the two liquids face separate. So now the game is that we want to compute this free energy profile for our system so we can understand that we can fit it to the regular solution model. So we did just that. We computed the free energy profile as the function of molecular fraction using meta-dynamic simulations. And then we fit this profile, free energy profile to the regular model, a regular solution model and get the parameters as well as the critical point. So here's the critical point that we have located. It's just on the melting line. So above the melting line, the system is super critical according to us. And not just that, the machine learning potential also correctly capture the ground state crystal structure at different pressure. So solid hydrogen is known to be very complicated and it can form many, many, many polymers at low temperature and different pressure. And the melting line also looks okay compared with previous experimental measurements. Okay. So the extent of, is there any questions related? Yes. Could you replay the questions I think in now is the moment? Yes. Okay. So there's a question from an Andre, maybe like, does Andre want to speak up? Hi. Yes. I was just wondering, so you were talking sort of in the beginning of the second section that for the neural networks we essentially collect the different environments and then use them to essentially estimate the energy instead of recalculating the environments again and again. So I was wondering if in that data set that you collect, you collect specifically the environments for single molecules or if you're storing the snapshots of a system say at particular temperature or what not. So the systems in the training set are all bulk structures. Okay. In this particular case, there are all configurations from liquid water and the reason behind that I'll actually explain in a bit. Okay. Thank you. And then like there's a question from Yuxi, do you want to speak up? Yeah, thanks. I want to ask that, is the correction to the machine learning model, the term U minus UML you mentioned, is it from training a residual neural network? So in principle, I think one can do that. I have, but I also have seen people who train the difference not between the neural network and the DFT, but they train a difference between two different levels of electronic structure calculation. Let's say you can train a difference between a hybrid DFT and the PBE for example. I have seen that. So in principle, it is possible, although, but are you thinking about training the difference in potential energy surface or training the difference in the free energy difference? Because they are different, right? One is a high dimensional object and the other is a number, it's a scalar as a function of pressure and temperature. Okay, so here you might require the DFT calculation for this correction term, is that correct? Or maybe my understanding is kind of wrong. So I can explain in practice how this is done, right? So in practice, what we do is that at a certain thermodynamic condition, let's say I'm interested in this correction term at 300 Kelvin and one gigapascal, right? So I run MD simulation using the machine learning potential at that condition and I collect on correlated configurations, right? And then I put these selected configurations generated from the machine learning Hamiltonian back to DFT. So I can compute this difference and from which I compute this delta. I see, I see. Thanks. Thank you. Maybe like I'll take one more question from question, from question. I want to ask between the PBE and bitwally functional, we discover the best parameter for machine learning potential. So this is, it just depends on the system. So there are two things here, right? So just the underlying electronic structure calculation, right? So for all the machine learning fits, it's garbage in, garbage out. If the underlying theory is not great, then obviously we won't have a good machine learning potential. So the first step is always to a benchmark DFT so that we select a good reference, right? So another thing is that so the selection is not always possible. So for water it's clear which function also better. For high pressure hydrogen, it's a little bit of a guesswork. Now, so, and then once we select the underlying theory, then the question is about the quality of the fit. And as of now, this is still a little bit of art and one needs to validate and refine the potential and so on. Okay, so for now I'll move back to the talk. There's the last part about this argument of locality because we were explaining everything in terms of atomic environments, right? We stress this concept over and over again, but how good is it acts and approximation? So how local things are? So we are going to explore this problem. So again, this is just brief recap, machine learning potential starts from atomic environment and each atomic environment gives us atomic energy and we sum this up to get the total energy of our system, right? So let's look at atomic energy. So here I'm plotting out on the X and Y axis. I'm plotting out the atomic energies from two different machine learning potentials. So this is for the water and they are not correlated at all. So at first I thought, okay, this is maybe due to how the energy is partitioned between oxygen and hydrogen. So now I'm comparing the molecular energies, which is the sum of the atomic energy of oxygen and hydrogen in each water molecule from the two fifths of the machine learning potential. Still they are not correlated. And this basically tells us that the atomic energy that we rely on very heavily in machine learning potential is really a mathematical device. It doesn't really carry a deep physical meaning. Now, the reason why I start looking into this is because back then I was thinking about the problem of heat conductivity, which is, so heat conductivity is a very important parameter that goes into the power system. It's also a sort of input parameter for fluid dynamics and other type of continuum modeling. Now, the typical way of computing the heat conductivity is to use the Green-Cubo relationship, which is basically found by taking the integral of the autocorrelation function of heat flux. Now, what is the problem here? So the integral costs to integrate to infinite time, but we know that if we take the autocorrelation, there's always a noise, there's a Gaussian noise. So if you actually integrate to infinite time, the sound is divergent, but if you cut off prematurely and if the signal does have a long decaying tail, then you put a bias on our estimate. And moreover, the computational heat flux as for the atomic energy that we have talked about and also a paralyzed force, paralyzed forces between pairs of atoms. So none of these are well-defined in the machine learning potential setting as well as in many other settings as well. Okay, so luckily, we've actually found a formulation that allows us to compute heat conductivity independent from the Green-Cubo relation, independent from the heat flux. So how it works is that we have this particle density field. So this is a well-defined quantity. And then we do a full expansion of this term in space to give us this road shield. That is for at each wave vector K. Now, if you do some hydrodynamics equation, which my math was not good enough to do that, but these things were solved in the sixties by fluid dynamics people. Now, turns out the autocrelation function of this road shield has two modes. So there's one mode that is actually an exponentially decay mode, which is the heat mode that carries the information of the heat conductivity. And the second mode is actually an auxiliary mode that is related to the sum of propagation. So here's the hydrodynamics equation, which we will skip. So, and then we did just that, we computed. So first of all, we want to do some benchmark. So we benchmark on the nano drones because for nano drones, there's a pairwise potential and we can compute that, compute the heat conductivity very easily using the Green-Cubo relation. We compute the autocorrelation function and we fit to the hydrodynamic expression. You can see the fit and the simulation, our actual simulation are basically overlap perfectly. And we can also look at the power spectrum. From power spectrum, we see two peaks. The first peak is the exponential heat mode that we talked about and the second peak is the sum propagation. And then we compute the heat conductivity from at different kappa. And then we extrapolate that to k equal to zero which is the microscopic heat conductivity which can also be computed from the Green-Cubo relationship and they agree, we do this at different thermodynamic conditions. So basically this is what we call the wave method gives consistent estimate with Green-Cubo for nano drones at many different conditions. So with that, we can use this method with such validation, we can use this method for other systems. So for example, we compute the heat conductivity of the high pressure hydrogen. Again, we compute the autocorrelation function and from there we extract down the heat conductivity. Okay, so for the last part, so this is a little bit of a bittersweet story but the next example may build you more confidence about the locality of the machine learning potential. So this is related to the question has been previously asked. So what is in the training set? We have the bulk liquid water in the training set of the machine learning potential. And, but remember that we actually use the model to compute for cubic eyes and hexagonal eyes and they work fine. So I was thinking like how much can we extrapolate from this machine learning potential? Is it applicable to other eyes faces as well? So we took from this study that collected many eyes faces, some actual experimentally confirmed ones like all the experimentally confirmed ones as well as many hypothetical ones. So they plot, they did this map using sketch map. You can also do a PCA map of the eyes faces. And then we took the representative 54 faces of eyes and then using the same framework that we have talked about, we compare them with the liquid water configurations in our training set. Now you can see eyes and water, they appear at different places on our PCA map which is understandable. They should be different. However, the interesting thing is that if we instead do not compare the global structures, but instead just projecting down the atomic environments, we found out the local environments in liquid water completely almost completely covers the environments that we encountered in the 54 eyes faces. So what does this mean? This means that we have collected all the relevant atomic environments for these eyes faces. Although our training set are completely built on liquid water. So because of that, this machine learning potential train on liquid water is able to predict various properties such as density, lattice energy, as well as the full non-density of states. So these are 54 faces and we can zoom in to look at individual ones and for each one the agreement is magnificent. And because of that, we are also able to use this machine learning potential to compute the face diagram of water, right? So we have them again, we have the machine learning prediction, but we always add the correction terms on top and we can choose not to correct it to the Rafa B0 D3, which is the theory that we use to fit the machine learning potential. We can also correct it to a different DFT levels theory such as PB0 D3 and B3 lip D3, right? Those gave different, slightly different face diagram. And overall the agreement with experiment is very good. It's like better than the existing empirical water potential. And again, nuclear quantum effects here play a very important role to shift the boundary around. And that's basically it. So the take home message here would be that machine learning potential is a very powerful tool. Now we can compute the app initial face diagram. There are still a lot of things we do not fully understand about machine learning potential and I think there will be a lot going on in that direction, particularly for the large interaction. And then there's probably also a good time to revise the typical simulation, the typical tools that we use to better utilize the state of the art machine learning potential. And with that, that's the end of my talk and I would like to answer more questions. Okay, so back to the Q and A. So there's a question from Muhammad. Hi. Hello. I want to know, is this correction to MLP just for light elements because of nuclear motion? Thank you. So they are actually two separate things, right? So nuclear quantum effects correction is needed for light elements, right? So imagine if you run an app initial MD simulation, you still need to consider nuclear quantum effects. Now the correction term is needed if you want to correct the residue error in your machine learning potential. And that error is because your potential energy surface is slightly different from your ground truth. And that is a fact, regardless if you run MD simulation or passing to grow molecular dynamics simulation. Thank you. Thank you. And then there's a question from William. William, could you please turn on the, your audio? Yes, thank you. Sorry. If I understood correctly, when you were trying to train the neural network on the data, you need to define some local environments for the particles that seems to be very similar across the sample. What will happen when you have a phase transition where the local environments can be really large? How can you define that kind of local environment? Right. So thanks for the question. So in practice, how do we decide the local environment? It's a little bit by trial and error. So what you do, so there's a trade-off, right? So if you select a smaller environment, then the neural network is much cheaper to be able to train and to use, right? But when you select a larger environment, that gives you more long-range interaction, but it's also more expensive to train and to use. So in practice, what we do is we select different local environment and train the network separately using them and see what happens and pick an optimal combination. Now, related to your question of the phase transition, right? So personally, I'm not sure if phase transition would dramatically change the use of the size of atomic environment. So for example, in this case of liquid water, we always use six angstrom for our cutoff throughout. And as in my previous talk, we have shown like the liquid water, the machine learning potential described both liquid water and ice phases very well. Thank you. And thank you for the very interesting talk. Thank you. I think we are a little bit one minute over time. Maybe I take two more questions and... Okay, like... So there's a question from Juan if that's how the name is pronounced. Yes. Juan, we cannot listen to you. I think maybe you can read the questions because Juan is unmute, but... Okay, I'll do that. So Juan asks, when our results don't fully coincide with experiments, can we reverse engineer the neural network potential to reconstruct the neighboring environment? I'm thinking about a distribution function or a geovar or something like that. So from what I understood from this question, right? So there's a residue difference between the machine learning production and experiments, which come from different reasons. The most important reason probably being that the DFT functional that we use involves approximations. So I do think there will be a lot of opportunity to add another correction term on top of machine learning potential to make it match experiments a little bit better. Now, I don't think this has been done before, although in principle, since people routinely do that when they build force fields for proteins in RNAs and DNAs, I think this seems to be possible, although I haven't seen anything in that direction yet. Okay, so let's take one last question from Robson. Hello. Hello. Thank you for the talk. I'm just wondering since we can map the phase diagrams if we can also determine the nature of the phase boundaries from the MLPs. Yeah, yeah, yeah. So first of all, I'm very cautious. Personally, I'm a little bit on the cautious side. So when you say the boundary of the, the nature of the phase transition. So in a, in a case of ice and liquid water. So this is, when we go from one phase to the other, this is typically through nucleation, right? So there will be an interface between ice and liquid for example. Now, intuitively, I think if you have an interface, long range interactions going to be more important compared with, if we just have to bulk faces, right? So I think they're there because the machine learning potential is short range. So I feel a little bit uneasy to use the machine learning potential to characterize interfacial phenomenon, although maybe it's not a problem. So, so that's my sense. Thank you. Being, as you prefer, you can go ahead. We have time on Zoom, but if you cannot continue, we can stop here. Okay, so maybe like another two more questions as well. Okay. So let's see. Okay, so there's a question from, I'm sorry if I, I will mispronounce your name. My Tane. My Tane, yes. Hi, you actually answered my question. I had some Zoom problems, but you went on that later. So thank you. Okay. And then there's a question from Mauricio. I hope nobody is keeping a scoreboard to see how many names I have mispronounced today. Let's see Mauricio and meet Mauricio. Mauricio, you should admit yourself. He's not replaying, so let's read yourself. Okay. So Mauricio, if that is the name. I understand machine learning potentials are very hard to generalize. I, there cannot be a machine learning equivalent to Charm or OPOS, which works well for some families of materials. Can you elaborate on this? So, so the machine learning potential that I have trained, also because I am on the lazy side are for a single system. I have seen machine learning potentials for a class of molecules. I think the ones come to my mind is the ANI. I think it's from, or, or, what's his last name? It's the ANI CXX, so that thing. I think they were trained on QM9. They were trained on small molecular, molecular data set. And as a result, it's applicable to a very large collection of small molecules as well. And I believe the Shinets from Klaus Muller and Alex Tchenko and their co-workers, it should also be applicable. I think it was, it can also be trained on the collection of small molecules. Now back to ANI, they also use the Baylor Palinero neural network architecture. So it's actually the same architecture as the ones that I have used before. So it's really by choice that I didn't train our neural network that is, that can be generalized to other systems. So it is possible. Okay, so let's go to the, like, the last, last question from Leonardo. Can you hear me? Yes. Okay. Since you have many experimental phase diagrams, can you use that as an output for your training data set? Like for example, you can take many studies and artificially create a data file with those data. And use that as an output to train, for example, to use as a shortcut for your training data set and your results. You could, for example, take the structures from your database already and compare it to the results that are published and use that as a way to try and... How can I say that? Smooth out the results from your predictions and the FTE and such, or am I being... So what type of experimental observation are you referring to? For example, specific heat, you have, for example, you can pinpoint a phase transition from a peak on that on your specific heat. For example, I work with other models and stuff in search and generally there is the peak on specific heat indicates a phase transition. Could you use that, the experimental specific heat as a way to create phase diagrams as a training data set? Or for example, you have experimental phase diagrams, you look at that different papers and try to create that, or am I making a mistake? So I think this basically brings... This is the issue that we have already put. So my thinking on that problem is like, let's say we have a machine learning potential energy surface, right? We also have that we train from DFT and we also have the experimental observation. And how do we build the framework that utilize both type of data, right? So this hasn't been done. And my hunch is that one way of doing it is basically to have your experimental observable also into the loss function when we train, right? But this is not obvious because when we train the machine learning potential against DFT, we are basically matching the energy and forces. The experimental observable, particularly the heat capacity in order in the heat diffusion that you have mentioned, they are not a simple function of the atomic configurations. They are not directly related to the atomic environment. They are related to the atomic configurations in a very, very complex kind of way. So it's not completely obvious that how do we build this loss function that also incorporates experimental observable, although in principle this can be done. Okay, thank you. That answered my question. Thank you. Okay, so I will say this is... That's all for the day. Like, do organizers have something else to say? No. Thank you very much. Yes. Remember the next session, it's earlier, right, Asya? Yes, exactly. The next session is a 12-30 European time. So just check what is your time zone, one hour earlier. Okay, thank you very much. Thank you very much for organizing. Thanks to you, Winking. Thank you very much again. It was a nice talk. I think the participants enjoyed because we are getting really a lot of messages. So you can read it after. Okay, bye. Bye. Bye. I am here next week.