 Okay, so welcome back to this last session of the conference, of the MACS conference. As you will see, the program has changed slightly, Alessandro Cullione apologizes but is unable to be here today, so we have to shift the program upwards essentially. The session is on deep data analytics and material science, so we hear a lot about machine learning in these last three talks. There will be some brief closing remarks by Elisa Molinaria at the end, after which we're all encouraged to stay. We are going to have a very short brief ceremony to award this graduation ceremony for the students of the joint ICTP and CIS masses program in HPC, so you're all of course invited to stay. It's just going to take a few minutes when we get the certificates to the students. So we start with the first talk of the session, which is Roberto, Roberto Carr on deep neural networks and molecular dynamics, please. Thank you, Sandro, and yeah, so I must say that this work is the thesis project of an excellent student, Lim Fong Zhang, and in fact I learned from him all what I know about neural networks. So well, what are deep neural networks? Well, these are, let's say, a sort of algorithm that can learn a very complex functional dependence of some physical properties from suitable descriptors. Of course, in order to learn, they have to be properly trained. Now, deep neural networks are extremely more powerful than standard. So in some sense one can say that these deep neural networks are some sort of interpolating technique. But they are much more powerful than standard interpolation approaches like, for instance, Spline, Fourier, et cetera. In fact, seemingly they do not suffer the curse of dimensionality. That seems to be the main important property that they have, and one may ask why? Well, I put there a question mark, because there are some mathematicians who are trying to understand more fundamentally why they work in this way. But so far my mathematician collaborators tell me that there are no theorems that really prove this thing. So let's see what we do with deep neural network and molecular dynamics. Essentially the first thing, I will talk about two applications of deep neural network. And in the first one, I will discuss how this network can learn the many-body potential energy surface obtained from ab initio molecular dynamics simulation. And in that way, they boost the accessible size and type scales of simulation because they make possible simulation of ab initio molecular dynamics quality at the cost of empirical force field. Remember that ab initio molecular dynamics are a simulation in which the potential energy surface is derived from the instantaneous electronic ground state. And the second application that I will consider is to coarse graining. And in that area, deep neural network open new perspective, and just I want to make a comment here. In this case, the potential energy surface is available from what I call the microscopic underlying physical model, but when we do coarse graining, so we eliminate some variable, the potential energy surface is a free energy landscape, and we do not know the explicit form of that. We can only obtain that from simulation. And so I will discuss two schemes that we have developed at Princeton. One is called deep molecular dynamics and the other one is called deep coarse graining. Now what is deep, deep means that we have a neural network that is a multilayer, is made by different layers, and that again is something that made a big improvement in neural network because by having more than one layer, which is a shallow network, it allows one to capture rather complex non-linear dependence from the parameters. So let me discuss the DPMD approach first. In this approach, the atomic coordinates of a multi-atom system are transformed into descriptors, and these descriptors are given in input to a deep neural network in a way they have to be given in a way that preserves the translational, rotational, and permutational symmetry of the system, and then in output, the deep neural network gives the potential energy surface of the system as a sum of atomic contribution. So we have the energy of the system as sum of atomic energy, where each atomic energy depends on the coordinates of the atoms that are in the environment, in the local environment of this atom within a cut-off radius RC. Now obviously this atomic energy do not have a physical meaning, but there are some that does, is the potential energy of the system. And so there is a paper for that in the archive and also a paper in this journal that described the general concept of the representation that we use. Now essentially the representation that we use is indicated here. So for each atom, let's say this is, I will use as an example water in most of these talk, and so these are water molecules, and for each atom, for instance here we consider the atom at the origin being this hydrogen atom here. We define a local reference frame by defining an axis is the axis that connected to the oxygen, another axis is orthogonal to the plane of the molecule, and then the third axis is orthogonal to these two. And in this frame we have the coordinates of all the atoms that are within this coordination sphere, and we give, when we want to give the entire information, we give essentially the coordinate and the angle. But we give them in this way, there is some redundancy, we have one over R, while I, J are the indices, I is the atom at the tag atom at the origin, and J are the atom in the environment, and to have this one over R factor helps because it makes automatically the atom that are more distant, they count less. And we give the coordinates in the X, Y, and Z in this way, and what do we do with the atom in this reference frame? We sort the atom in order of ascending distances from the atom at the origin. So in this way, when we have this local reference frame, and because the final, well, let me say one more thing, we order them in ascending distance, and when we optimize the parameter of the network, the weight of the network, we assign to atoms of the same species the same weight. So in the end, the energy is given as a sum of atomic energies, and because of this additive form of the energy, and because of the way in which we sort the atom, automatically this scheme preserves the symmetries of the system. So what happens then is that these, so essentially these are the schematics of how this works. These coordinates are transformed into these descriptors for each atom, these are all the atoms in the system, then we have the descriptor in the environment of this atom, and they go through a series of layers in the network, hidden layers in the network, where there are a sequence of well-defined linear and non-linear transformation that depend on the weight, on these parameters, and these parameters that define the network are optimized by optimizing this loss function. In the example that I will consider, we consider actually in the water example that I'm considering this is actually water at the integral level and constant pressure and temperature. So we have, this is the energy per atom, so this is the deviation in the energy per atom from what we get with arbitrary values of this parameter and what we would like to have in the calculation, these are the forces, the forces are very important because if one uses, in principle one could use only the energy because the energy is an analytic function of the coordinate and the forces are given, but then one would need enormously long trajectories in order to learn the potential energy surface. So the forces are very important, yes, it's because we want to have also the average energy by including also the energy we are sure that we get the average energy, the average energy. The forces are defined a modular and arbitrary constant, we don't want to have this arbitrary constant. So, and then this is the virial tensor divided by the number of atoms again. So and these P are just a parameter that we adjust in order to have initially the forces play a more important role and then at the end the energy becomes important in order to optimize this thing and the optimization is done with this stochastic atom method which is essentially a local optimizer but it is stochastic from what I understood is a sort of precondition gradient descent method but stochastic because given the large amount of data that one has one will not be able to use all the data, one use just a subset of the data that is selected stochastically and the program from what Linfong told me the program to do that are available from Google and one can and so one does not have to program all that stuff. So at the end of this procedure we get a potential energy surface that has this form. So one can see that as a sort of realization of an embedded atom method but in which the dependence of the local energy on the coordinate of the atom in the neighborhood is very general and complex. So now that I have described how it is done how well does it work. So in this case we do a pat integral ab initio molecular dynamics for water and we consider here a comparison at the PB0 TS level and we consider here a comparison between the pair correlation functions that we obtain with, I have to go closer because I cannot read, this is the DPMD, the red one, this is also DPMD, the continuous lines are DPMD and the dashed line are those obtained from the DFT calculation. So you see that the agreement for all these correlation functions is extremely good. This is a three body correlation function, this is the bond angle between oxygen bond and again the agreement is very good and this is a Steinhard Q6 order parameter that actually include information also on four body correlation. And now what I can say is that I think that the discrepancies, the little discrepancies that one has there are actually due to the fact that the original pat integral IMD trajectories were too short and there is noise in this trajectory and so in fact these are trajectories I think of 10 picosecond but you get in these curves with the pat integral stuff have been obtained by running for I think 300 or 500 picosecond something like that so there is much less noise. And here one can see well that they have also simulated data for ice but let's just consider the liquid water and so this is the average energy that we get. This one I think is the one from DPMD and this one is the one from the original microscopic model and this is the average density in the two calculations. Now let's me move on and as I said the approach scale linearly because each local energy depends only on the coordinates of the atom that are within the environment and here there is a comparison, this is linear scaling, this is what we get with this DPMD, this is 3P, this is a force field but this is a force field with rigid molecule and so it is cheaper than this thing but this is much cheaper than the density functional, the original density functional calculation these are at the TBE, TS level and these are at the PB0 TS level. So far for the first part of the talk then I want to discuss coarse-graining so this is some work in preparation and again I use two examples here but the first one is again water and essentially what we want to do is to generate a coarse-grained potential for water that has only the oxygen and so we want to eliminate the hydrogens so will be some sort of there is for those experts in the water community there is the small in-aero water potential that is a sort of steel-inger-weather potential adjusted for water with parameter adjusted for water so in this when we obtain so now this coarse-grained potential is actually so as a function of this coarse-grained coordinate we want to project the Boltzmann weight on the certain values of the coarse-grained coordinate and this is the potential of mean force that is a free energy and this is the forces the forces that correspond to this potential now the problem is that in the previous problem we had explicitly the form of the potential V but now we don't have explicitly the form of that because it is a free energy so what we can do well we can write these now that by definition these force corresponding to the to the free energy is the now I use psi without the CG but is the coarse-grained coordinate and it is given by a constrained trajectory in which the system evolve by keeping fix this value of the coarse-grained coordinate and these and this force this estimator of the force there are various estimator instantaneous estimator for the force we use a formula given by Ciccoti-Capral and van den Eiden and that I don't report here but is a simple formula to compute and now we can do the following thing in principle the loss function that depend on the weight that we have to minimize again with this atom optimizer would be given by the deviation of these force that involve a constrained trajectory with respect to the derivative of the potential and this quantity has to be minimized and here we have a sum now M is the number of coarse-grained coordinates so if we have oxygen that is the number of oxygen in the system now little lowercase d is the dimensionality of space so is three and this uppercase d refer to the data that we have in this in this simulation to a set of data now the point is that if we try to do to use this formula for the water it doesn't work it exceeds it requires simulation that are extremely long because think about if you constrain the coordinate of the oxygen there are various configuration of the hydrogen bond that are compatible with that constrain oxygen configuration and in order for the system to explore these different coordinate one need timescales that are extremely extremely long and so certainly we cannot do it at the level of the IMD at this point but what we do we use the DPMD because there we can do much longer simulation and in this case we find much more convenient to use here rather than the force that is given by this formula use the instantaneous yeah the instantaneous estimator of the force and by doing that this amount to an ergodicity requirement for the system that is always validated the system samples and equilibrium a pure thermodynamics say let me just go quickly this is again the comparison between IMD and that was on a small system with this coarse-grained we have done here a somewhat larger system and this is the DPMD the green and the coarse-grained red on the small system and the coarse-grained in the large system again the agreement and these quantities are this angular correlation at different values at different distances again the agreement I think is pretty good let me since I want I don't talk about that but let's say equilibrium sampling with this DPCG is accelerated by approximately a factor of eight relative to DPMD in this system now we don't have any longer the hydrogen and so the pulling ice rule are I have been coarse-grained and so for instance the Molinero potential has been very useful to study crystallization and that is something that one could do with this kind of thing with the potential that in principle has the same accuracy of the original ab initio molecular dynamics now let me go quickly well there are issues related to random noise and multiple minima just one thing I want to make a comment on the multiple minima the point is that as I said this atom is a local optimizer so if the system has multiple minima it will get to one minimum but there are other and in fact if we start if we do all the same thing in this in this thing but we start with a different initialization of the parameter we get to different minima however what comes out is that the system the properties of the system even if we get to this different minima are essentially insensitive to that which means that there must be some sort of profound invariance property that we do not know yet what it is but that is what comes out and let me since I have to finish let me just present very briefly the last example so in this case so in the case that I presented before we use a number of coarse grain coordinate that is still linear with the system size so you see we have all the oxygen and we have just taken out the hydrogen but in many applications when we want to study conformational changes or structural transformation one has to map the lambda of free energy surface a function of a few of a few coarse grain coordinate or order parameter so here I take as example this alanine dipeptide solvated in 342 water molecule here it is at the level of the water is tip and the amber force field is used for alanine these two angles are the coarse grain coordinate and the coarse grain description correspond to this system without the water the solvating water learning the potential from a simulation that include the solvating water we didn't use any any fancy enhanced sampling technique we just did very brute force calculation but of course one could use fancy technique and the brute force calculation here is five microseconds to sample the microscopic model and 600 nanoseconds to sample are used for the coarse grain model and this is the result of the free energy surface the model with solvent and the model solvent free you see that they look pretty similar that's the different and we think that the difference you see that accumulated the edges of this barrier that because in this region we have bad statistics in the simulation and so we think that by using by using some enhanced sampling technique and improving the statistics this should definitely become even better but that is just to say kind of thing that one can do and now very quickly open issues I just quote a few open issues here what is the transferability to different thermodynamic condition that is something that has to be explored what to do so far so good if we have systems that are uniform but if we have non-uniform systems that are actually the most interesting like defect or interface what we have what can we do we have to combine consistently potential obtained by learning in different region of space for a non-uniform system and well that I can skip I mean there are the issue of what can be the dynamics for this coarse grain system because in the microscopic system we have a well defined dynamics in this case Newtonian dynamics but now when we coarse grain there are no closed equation of motion deterministic equation of motion for that system and that is something that we intend to investigate and obviously here typically one does some approximation like a invoke some separation of timescale and then one can use technique like the Moritz van Zicht projection technique and with that I want to conclude with the acknowledgement this is the guy mostly responsible for this work Lin Feng Zhang and I should also mention when an e a professor in the math department in Princeton with whom we are collaborating on this issue and I thank you for your attention