 And we start with Professor Boris Kozinski from Hardware University and Bosch who will present his work on, I don't want to mess it up, symmetry and uncertainty aware models of interatomic interactions for molecular dynamics. So, again, all these talks will be recorded and available online in streaming and on the YouTube page later. The talk will be approximately 40 minutes or keep your questions to the end. Thank you very much. No. Thank you. Thank you to all organizers. It's great to be here. Let me just tell you a little bit about the work we've been doing in the past three years approximately building up on some of the techniques you've seen in introductory lectures and I'll show some applications maybe some additional techniques that are being developed to enable essentially machine learning to enter into the sort of field of material simulations and accelerate our calculations. So the big picture is to go from fundamental first principles, you know, many body quantum equation the Schrodinger equation and all the way to large complicated structures that we want to simulate at quantum accuracy, such as, you know, large organic molecules maybe catalytic surfaces, but you cannot do it in one step. It's just very difficult so first you have to make some approximations the one common approximation to the quantum problem is a density functional theory I'll speak very little. There are some efforts where we are using some machine learning techniques to develop new density functionals that can make calculations cheaper and at the same time more accurate. Then the next piece, which I'll spend most time talking about is the atomic potentials, which are of course very useful if you want to get faster simulations. The simulations may be very large and might not give you a very good intuition, or might they might be still too slow to access certain timescale so you still need to reduce dimensionality and do some course grading and find collective variables. Again, if I have time I'll mention a little bit of an effort there, but once you put all this together, you know, start from the sort of very fine picture of the quantum interactions, learn density functionals then learn machine learning potentials, then reduce dimensionality and then you could possibly get to large scale simulation so this is overall vision of pieces connecting together to enable essentially realistic material simulations understanding complex phenomenon things like that. Okay, so very briefly, I won't spend time too much on this but I just want to mention the effort that we started on actually learning exchange correlation functionals for DFT, trying to do better than existing semi local which are not very good at a lot of problems which have, you know, a lot of maybe localized electrons, maybe some of the partially occupied orbitals are not doing the right things. So on a high level, what we want to do is bring machine learning to this field in a way that enables you to run fast and at the same time more accurate density functional theory calculation so in this project, we're trying to learn one piece of it the exchange energy which is a non local quantity of the density and we're trying to learn it as such an explicit non local functional of the exchange energy, which can hopefully give you as high of an accuracy as hybrid calculations without having to do all the expensive DFT exchange calculations which are prohibitive and in plain wave settings. So, we do this by introducing a set of features on the density, essentially, learning a model, which is based on the Gaussian process regression model. Formalism and the features on the density are non local and they're built up through these sort of complicated looking convolutions. You get as a reference at the exchange energy which you can calculate exactly and try to see, can you come up with a non local model explicitly as a functional of the density that captures the exchange energy. And it seems like the answer is yes, we're able to capture exchange energy, better than some of the leading density functionals out there. There's progress in this direction. That's all I wanted to say here, basically to show that the most difficult problem probably in this whole game is getting a good description of the quantum states and getting DFT to a higher accuracy level without sacrificing too much to class and going through these kind of approaches seems to be showing promise as indicated here also in the comparisons of performance between exactly change calculations and these what we call cider calculation which are based in this formalism. Now, onto the main part of the of the discussion here, we assume we have a quantum model, which is good. So we use as a truth model some DFT or maybe quantum chemistry method, and we want to learn in terms of potential so in other words we want to learn how energy depends on the on the positions of the atoms. So this is a many body energy. Right, that's a difficult thing to learn all the quantum mechanics is hidden in the dependence of the energy on the atomic positions, and of course the species. And we can do it explicitly quantum mechanics is just too slow for any realistic calculations existing empirical potentials, as you well know, are not very good. They're not transferable, even though they're very fast, but what we want is a fast model that at the same time is as accurate as quantum mechanics, hopefully, so this is where machine learning comes in. And the question is how do we learn this energy from which we can then derive the forces and drive our molecular dynamics. And we start and pretty much everybody in the field starts with a description of the atomic environment, the geometry of it right so how do you describe a geometry is very important because based on that you'll build a learning model. And it would be either Gaussian process regression or a neural network, you'll build a learning model that tries to describe this potential energy as accurately as possible and the same time be as cheap as possible to evaluate so you can do large scale fast calculations. So I'll introduce two methods that we've been developing one is called an equip based on neural networks the other one is called player based on the awesome process regression. And let me just start with the point that these models need to encode as much as possible of the geometry of the atomic structure. Okay, so if you sacrifice something in the description. The accuracy will suffer. So what is the sort of most complete way of capturing geometry well first of all, you want to make sure that the model satisfies all possible physical symmetries. What we're talking about here is, you know, the energy should be invariant with respect to rotation inversion flipping right so basically these are symmetries of 3D space. This is the three symmetry group the translation together with this builds the three symmetry group, and as well we have permutation of atoms of the same type. So once you build these symmetries in into your model into your description of the structure, then you're learning some sort of a model that knows the sort of the right environment, and you're trying to build as much as possible of complexity of richness into this description. So just to emphasize there are various ways of building models, you can start with building features and these features can satisfy these symmetries or not so if you don't have any sort of constraints, then your features of your model can behave arbitrarily as you transform your structure. Obviously that's not what you want you want the energy at least to satisfy these symmetries, but the features from which you build the energy model can be either invariant. In other words, they may not change at all. If you rotate the structure, for instance, they remain the same, or the richer way of doing this sort of encoding more information is to make them equivariant, which means that as you rotate your structure in 3D, your features from which you build the model can also rotate. Okay, so this is just one way of saying that you know you have some sort of more encoding of the of the symmetry behavior in your features. Okay, so this is what we call a covariance, a covariance technically means that you know you commute with a operation of the symmetry group. Okay, so this is something that Christoph already presented so I won't go too much detail here. And I want to show that, you know, if you want to build a neural network model of inter atomic potentials. The typical way of doing it is nowadays to build a graph neural network with what's called message passing and again this is already covered in the introductory lecture, so I won't go too much detail, but just emphasize again you basically have atoms talk to each other, send each other messages through this graph neural network and these graph neurons basically send information what what what the what the atoms see around them so this Adam sees around itself some atoms in a certain distance away. These are invariant properties distances or angles, you can pass these sort of invariant messages to other atoms, and these, you know, these networks then through multiple layers can encode a lot of the information can get quite good accuracy. So I'm just asking them to learn the energy. Okay, another generalization of this is to now say okay, we know that you know building a covariant features has more resolution in terms of what the geometry is doing and for instance you now have the ability to resolve in your features of vectors or even tensors that know about the orientation and know about transformations into the space. And so you can build an a covariant model and this is one model that we introduced called which instead of scalars basically invariance. It passes around a covariant messages which contain tensor information in general so you have a feature of this network that consists of vectors scalars tensors and technically speaking, this is related to what's called reducible representations of the space group, which you're trying to operate in. So the way we label these is using spherical harmonics. Okay, so spherical harmonics are a convenient basis for reducible representations in 3d space they basically allow you to expand any function on the sphere. Just like you would do a periodic function in terms of science and co science. You can neural network operate in an untrivial way to pass messages around to have convolutions in some way of multiplying some way of coupling these features which are tensors, and the proper way to do it the most general way to do it is using what's called a tensor product which is an outer product, which basically takes two tensors and produces another tensor. Okay, and so you build this network in such a way that everything inside, unlike a normal neural network which has numbers, it has tensors, and convolutions are tensor product based. And so this is basically, you know, taking a spherical harmonic, multiplying by a tensor feature, and that gives you a message which you pass on to another. So this way, you know, tensor information is being passed around the network becomes much more aware in terms of geometry of the structure. So this is much more accurate. Okay, so this is the state of the art. As far as we are aware right now of the accuracy on a bunch of benchmark that data set such such as the MD 17 data set, as shown here, and comparison is made here with invariant neural networks, and other sort of invariant networks just basically showing that equivariance is essential in bringing higher accuracy, and at the same time, in bringing learning efficiency. Okay, so, in fact, these models are able to learn in good comparison, in terms of the data efficiency with kernel based methods which you've heard about as being sort of more data efficient. Well these neural networks seem to be doing as well, if not better. So this is sort of a change of paradigm in some way saying that you know deep neural networks do not require millions of data points to be trained in fact they can train on rather than once as long as you build the infrastructure in a way that respects. Okay, so there's another example for water that basically shows that compared to the deep MD model here, which does not have a covariance. One is able to learn on 1000 times fewer data and still have higher accuracy in terms of in terms of forces. The interesting thing also in these networks is that as soon as you go to higher rank, in other words you go from scalar to vector, you change not only the accuracy but also the rate at which the network learns so this is the log log scale of the accuracy of forces with respect to the size of the training set so this the slope the scale, the scaling law of learning is actually changed itself, which is something that's a bit puzzling is still not understood that theoretically. This is all good accuracy is high but these networks are slow. Why are they slow is because message passing inherently involves interactions between atoms, iteratively so in other words one Adam sees what another atom is seeing, and then Adam sees what another Adam is seeing and so information propagates this way. And why would you want to have this propagation multiple steps, well you have multiple layers in the network. This essentially induces many body interaction in other words if you only have interaction with your neighbor. Then, okay so this is a two body interaction but if you have a nearest neighbor seeing other nearest neighbor then you can use higher order interactions and have a many body character in your energy, which is actually you want, because the energy is a quantity, quantum mechanically could be kind of complicated. So, as a side effect of this, you have a long range communication between atoms, maybe that's a good thing, because you know that way you can also capture longer interactions, but maybe it's not necessary but what it does do for sure is it increases the effective number of neighbors. In other words if you go to six layers for instance, with the six angstrom cut off in, let's say bulk water simulation, you all of a sudden end up with 20,000 neighbors for every atom on which you're trying to calculate force. And that makes it very difficult to scale. In other words, it's very, very difficult to paralyze over a large number of computational devices, because you have to carry around all this weight. Okay, so this is not a scalable approach you can do it on one GPU but doing it already on two GPUs is not possible. So this is an effort that tries to go around this. The goal here is to keep a covariance and with its magic sort of accuracy inducing properties but without the message passing that has the sort of propagation of information in the distance. Okay. And how this is done this is called the Allegro model that is recently introduced with an archive if you're interested it, instead of trying to absorb all the information into one atom, which may be limiting and forces the data to sort of carry around this information and pass to each other. Instead, what this model does is assigns information science features to pairs of ads. So there are more pairs. So that costs you a little bit more. But as a result, you can encode enough information on the edge, for instance between these two atoms here. You can communicate with another edge with another pair wise interaction. And this way you can still induce many body character by sort of convolving over these kind of features in your network. But you can make the math work out in such a way that the information never propagates out of the immediate neighborhood. Okay, so I won't spend too much time on the math here but it is quite similar to an equip. So this means that you have features on the edge, let's say the between atoms, I and J that just minimize this here that interact with the information from the spherical harmonics basic information about the direction of another edge. Okay. And so this has directionality this has all the tensor, sort of information about that on acquisitions, other than Adam Jay so what all the other items K in the neighborhood are doing is all encoded. And essentially, due to the fact that you know you only have interaction between Adam, I and Jay Adam, I and K one and I and K two, and so forth. There is no direct information between Adam Jay being passed Adam K two for instance. So this is a strictly local model that is still a covariant still many body. And, as a result, you know it can scale. So before we go to the scaling, I just want to show that the accuracy is not worse than equip. In all the benchmark data sets, like I'm just 17 like this sort of transferability test on a three BPA molecule, which was explored widely, or, I guess in detail in the, in the group of Gabor Johnny basic the question is, if you have a low temperature dynamics that can be trained for this molecule which has torsional kind of interesting dynamics. Can the model extrapolate can predict what's happening at higher temperature, and both in equip and Allegro models are able to do that quite well, which is, which is good news that they can generalize out of domain in this in this kind of a test, and they're comparable in accuracy. So on to the sort of what's special about this is, again, you can scale it you can deploy this on multiple GPUs, and just to show you some strong scaling data this is going up to 400,000 atoms or so on. I forget how many GPUs exactly. But basically you can get very good performance on the large scale simulation going even up to 100 million atoms. And this is in the case of 128 GPUs, and you still have the same accuracy that is basically state of the art compared to quantum mechanics. There are, you know, complicated simulations like blast dynamics and diffusion of lithium ions in a realistic solid electrolyte consisting of multiple species, and showing that the grease very well with DFT dynamics. Okay. So this was neural networks neural networks are accurate. They are now scalable. And what they don't have is uncertainty, but not yet at least. We need it, especially needed for simulations that involve rare events, such as reactions, or maybe phase transitions. And sometimes you don't know ahead of time what kind of situations you will end up in as you run your micro dynamics and when we train our models, we often experience when we go to large scale big simulations that something goes wrong, and the model just explodes and we don't know why but it's probably outside of the training set. Some forces got predicted incorrectly the model doesn't have any physical priors, other than symmetry doesn't know, you know what what it needs to do when it's really out of the main. So you need to know uncertainty you need to know how unreliable the model is based on the geometry. And so one way to build this in is to use kernel methods. A kernel method allows you to compare a configuration structure with another structure. And so if you can compare the configurations of which you're predicting with a structure set, which is in your training set. Then you can say if it's close or far away. And that way, you can actually build an algorithm that can complain saying I cannot predict this force. It's too far away from my training set I'm uncertain I'm going to give you junk I'm not going to do it. I'm going to instead do something else I'm going to run quantum mechanics and get that data point again. So this is what's called active learning. And you can automatically add configurations as you go, making sure that you're not going to get into any trouble without knowing it. Okay, so you basically what you're doing is you're trying to learn on the points that matter. As you go in your mechanics instead of the points that are available to you, which you may have seen a million times, and they're not adding any more information. So how do we do it technically speaking well, this one starts from another thing you've seen is sort of a very complete way of representing atomic environments called atomic cluster expansion introduced by Ralph droughts. A few years ago, and this again sort of starts with the resolving a configuration in radial functions and spherical harmonics. It's one choice of setting up a basis. And then you can construct a covariant descriptors from these from these functions, a covariance comes from the fact that spherical harmonics are a covariant. And then, because you want to construct a kernel you need some sort of a sort of scalar that needs to come out you form an invariant by an auto product of these see a covariant coefficients for themselves. And summing over one of these so you use some sort of mathematical theorems of spherical harmonics to arrive at these invariants, and then basically the variance which end up being descriptors of your atomic environment you can build a kernel. Okay, so these kernels, you can use a dot product kernel which is what we use, or the dot product of these descriptors to some power, as inspired by the gap potential formalism, you can build a model of the local energy. In other words, you give it as an input, a local environment description of where the atoms are on the central atom, build those descriptors from the ace formalism are going up to certain body order up to certain sort of resolution and angular and radial basis. And then do Gaussian process regression. You can go through details you've seen this. Basically, you need to construct kernels between the point on which you're predicting the descriptor, and the point in your training set the points in the training set you can use sparse Gaussian processes which reduces the number of training points but the principle is pretty much the same. The principle is because Gaussian process regression is a Bayesian method that gives you not only a prediction of the value of the energy it also gives you a posterior distribution on the prediction, from which you can derive the variance, variance is uncertainty. Okay, so it was illustrated here if you're close to training point your variance is low your uncertainties low. If you far away the variance grows and this gives you an ability to judge how far away you are from a training point in your training set, and assess whether you're comfortable reliable with the prediction. And you can automate this. Basically every prediction now comes with an uncertainty uncertainties too high you say, Okay, no more. I'm going to call DFT on this exact structure, and make sure I have an hour liable label. And so this is an active learning loop which is introduced in this flare software. And this is an action, you know trying to learn a phase of a aluminum crystal from scratch nothing at all, given to the model, it's really unknown what the interactions are and it starts requesting data, and these are the black points basically DFT calculations being called because uncertainty is still too high. At some point uncertainty gets lower because it has now a lot of points covering the space of configurations and now it says okay I don't need any more DFT. And then what we do after this is done is we basically turn on the temperature to melt it completely different state that hasn't seen before in it reacts automatically by saying oh okay uncertainties high again I'm seeing new things. So this is a very simple example of how this active learning gives you an automatic way of developing atomic potentials, essentially without you having to think about what to put in your training set. Now you might say this is going to be slow because the more points I add to my training set the more I have to compare every new point I'm going to predict to everything my training set so my cost will glow grow linearly with the size of the training set. This is in principle, always true for Gaussian process regression, but it can be bypassed with a trick was particular structure of a kernel. Again I said that we use a dot product kernel raised to second power. And what this allows you to do is actually to reshuffle the indices in the summation for the prediction to pre compute certain quantities and to pre compute correlations between points in your training set. So the prediction becomes essentially a matrix vector multiplication, in other words a constant as a function of the training set size so in other words you freeze the model at some point, say okay I'm done. This is my training set. I'm going to compute this matrix beta, which will then give me in this sort of polynomial second order polynomial model in the descriptor size, and it's a constant as a function of the training set at this point. This allows the calculation to be very fast, even though it's coming out of the Gaussian process regression model and this is a trick inspired by some of the earlier work by Claudio Zeny and Alderly Elmo in a different structure of a kernel. But this is exactly sort of the trick that they also use to come up with a very fast model out of a Gaussian process regression. So in this case we do it for a many body interaction instead but still very much the same in spirit. One thing I also want to mention is that we can also map not only the prediction of the forces and energy is also uncertainties. So now your model becomes a fully Bayesian force field in other words at essentially constant cost to get a prediction of energy forces stresses whatever you want. Plus uncertainties on those quantities. Okay, so now you can basically do a simulation monitor the uncertainty without having to go through sort of this Gaussian process machinery every time. Training models like this one for instance this is silicon carbide going through a phase transformation as a function of pressure, which you know you're on in one way compressing and then decompressing and you see a phase change and you see uncertainty spike. Above a threshold which induces these DFT calculations to be called and added to the training set. And eventually, sort of by going this direction in that direction, you're able to learn the entire model that can be described the the phase transformation of silicon carbide, and it matches DFT much more accurate than any existing is atomic potential. And it essentially pursuits automatically to learn to learn the model. What you can do is now actually explicitly simulate the face transformation with pre nucleating a second phase within the large simulation which is great for reducing hysteresis. And this is basically showing that, you know, you get the right phases. And then going to rock salt. Actually, you get some interesting twin boundary behavior automatically emerging for some reason because of symmetry breaking, but the point is that it matches experimental observations which are these colored points. Very accurately. So in other words, you get the sort of DFT accurate description that you can take to larger scale. Very quickly also get good vibrational properties and you can also do thermal conductivity if you're in the field of thermal conductivity just want to quickly advertise another code we're developing called Phoebe which allows you to do thermal conductivity for the transport for electrons and phonons. I'm not going to talk about it but just to show you that you know this this can achieve quite reliable thermal conductivity prediction and comparison to experiment here. And for those calculations that you cannot use phonons like glasses or liquids or whatnot. Things that are not periodic you can you still produce thermal conductivity predictions using these fast force fields using the green cubo formalism and this is essentially looking at the out of correlation functions of the heat flux. Using local energies, local stresses do out of correlation and these are very long very large simulations you need to do for which molecular dynamics with machine learning potentials is actually the appropriate method because these are expensive you cannot afford to do very many of these with density calculations. And this is an example of you know getting to experimental accuracy for sodium chloride which is a very unharmonic crystal for which you essentially need to do green cubo calculations. Now on to some more fun applications now in sort of surface science and catalysis I want to show you sort of an early version of flair we investigated the system of a palladium island, being deposited on a silver substrate. And experimentally what they were seeing is some interesting evolution of these islands of palladium getting thicker and mixed with the silver, producing some of the sort of interesting catalytic properties which are not available in either palladium or silver, because they are different properties, one metal is too active the other one is too inactive and the sweet spot in catalysis is always some mixture. So this is one experimental way of doing it but there was not clear what was happening. So these kind of calculations with flair force field, we're able to explain and give you a mechanism of how the mixing is actually happening so these sort of silver atoms, the blue ones, start getting into the island from the sides, and the island starts growing and mixing, and you end up with these sort of two layer structures now, and silver is getting sucked in to mix with palladium at the expense of the surface layers you actually start etching away the surface layer of silver and this is exactly what the STM images are showing so this is explaining the mechanism of this very complicated surface restructuring process which gives you these catalytic active structures. Another example that you can now do with these kind of force field is I think for the first time to see the herringbone reconstruction of the gold 111 surface as shown here this is sort of basically spontaneously emerging from a large scale simulation of about 400,000 atoms with these flat flair models and compares quite well with experimental pictures that you see these beautiful herringbone patterns caused by the strain mismatch, less lattice mismatch between the surface layer and the sub layer of gold, and this allows you to investigate the entire surface structures reconstruction phase diagram of metals, not only the 111 but any sort of index which is what we're trying to do right now. You can also see dynamics of nanoparticles. So again for catalysis applications, you often want to know what happens if I do a core shell nanoparticle and need let expose it to some gases to bring the active metal back to the surface these very complicated by metallic dynamics can now be probed with these kind of methods and you can see sort of both the heating and the cooling simulations are showing the behavior of the surface and behavior of the mixture of the metals. Skip this. Okay, how much time do I have 10 minutes. All right. Now reactions. This is sort of the, the main thing we're after the reactive systems with heterogeneous settings. So all the complexity that is very difficult to achieve with entratomic force fields, based on classical formalisms like rexf for instance, are now starting to become available. An example we started looking at is hydrogen interacting with platinum this is a prototypical catalytic reaction where hydrogen gets to the surface splits, then recombines with another hydrogen and this or so a couple of very different kind of reactive events are happening and things are much more complicated because of the high coverage of the surface, which you need to explore with large scale What traditionally you do in catalysis is DFT calculations and gas phase in other words in vacuum of a single molecule as shown here interacting on the surface, but things get much more complicated when there's a lot of these molecules and they're all very coupled and correlated. So we don't want to assume anything about the mechanism. And we want to simulate actually explicitly the full reactive dynamics. Okay, so this is now done again with active learning with flair starting with the hydrogen gas because you need to describe the hydrogen phase, the bulk platinum surface, and now a hydrogen interacting with platinum. The red atoms showing up in these pictures is indicate atoms that are uncertain during the molecular dynamics run when it's actively learning atoms occasionally blip red saying this is a new environment I haven't seen added to the training set. So this is this active learning going on, trying to see if it can capture all the configurations relevant for describing this kind of reaction without any prior assumption. Okay, so What is this. Okay. Interesting. The most learning actually is happening in the combined hydrogen platinum simulation, which is not surprising this is where most of the novelty is for the system and it spends most of the time learning compared to the pure platinum or pure gas hydrogen. One second. I think somebody must have drawn something on zoom. Anyhow, this is a summary of what actually is being requested by the active learning process and you can see that actually as expected most of the time in terms of wall hours in terms of real time is spent doing the hydrogen and platinum interaction and rather little. In fact, only a small number of calculations is requested by this active learning procedure for hydrogen and platinum itself. Okay, so overall, this whole training process takes about a week. So this is sort of DFT calculations in the sequence so this is not paralyzed basically just you know running DFT doing active learning. Most of the most of the calculation is again running DFT on hydrogen and platinum. So a week compared to months of manual tweaking of something like a reactive force field is pretty good. Okay, so this is something that you can now run around a larger scale simulation so this is now after the mapping procedure this is now about 1000 atoms now you can run longer dynamics and collect statistics on the actual reactions to see if reactions making sense. And so we can collect the number of reactive events as a function of time. At the end do that as a bunch of a bunch of temperatures and see if we can extract the activation energy from basically from the arenas behavior. We can build this log scale plot and compare with experiment and encouragingly experimental value that was measured in hydrogen deuterium exchange reactions, which is essentially the same kinetics is showing a number that is very close to what this kind of fully explicit reactive simulation is producing. Okay. And now the question so we did the 1000 atoms. How big can the simulations get and we sort of decided to stress test this in the last couple of weeks we requested some time on the, on the biggest machine we could find in the US and try to run the simulations before being able to do that. And at some time, optimizing the code for multiple GPUs. So this was done with the cocos library and lamps, which are being developed at national labs in the US, and by putting flair, the map version of flair as a pair style if you're familiar with using flair and lamps into the code, and making sure that you know the memory access patterns and GPUs are efficiently implemented so that you can access these very large scales. You know we can actually see that there's a big benefit to using GPUs for instance compared to CPUs, and the scaling can be quite efficient. If you access to 27,000 GPUs, you can run up to half a trillion atoms. And this is this little point here, right there. And you can see that you know actually you don't lose that much performance. You actually, you know, pretty efficiently scaling. Okay, so it's not a very fast simulation in real time I mean we did. I don't know how many MD steps not too many MD steps with this, but it was just to show that you know you can reach these size scales you don't need them of course for this kind of simulations. But in order to actually explicitly simulate reactions that we want, we do need millions of atoms for a long time because some of the reactions on catalytic surfaces may not be very fast. So this is one way of sort of finding out what's actually happening on catalytically relevant structures. And this is by the way the movie. It's a small snapshot, you can actually see the atoms here, but on an actual simulations there's so many you don't see them anymore. Anyhow this is basically showing that you know this is number of reactions proceeding sort of linearly as you expect with a number of time steps taken and that uncertainty is well behaved. So this was not immediately the case. Some of this first simulations were giving high uncertainty in some cases and again this is an indication of the fact that we still haven't learned everything we needed to learn. And large scale simulation did encounter a couple of events that was still uncertain about. So we have to go back and retrain a few more snapshots. Okay, but this is enabled, I should say importantly this is enabled by the fact that even at large scaling with the 500 billion atom simulation, you could in principle get uncertainty on every atom. And so on billion atom level simulations we do get uncertainty on every atom we can see you know is every atom well behaved or not. Okay, just a few words on other things we're doing because I don't want to spend too much time talking about it but now that you have this sort of formalism you can go to a larger scale and why do you need actually very very large simulations is because you have a lot of bio, you know interactions of drugs with proteins and things like that. And this is large scale and long time so you do need to reduce dimensionality and speed up things even further so course grading is one idea and you can develop course fields also using the flare formalism this is one example, being done by Blake and our group is trying to learn coarse grained inter atomic force fields of alkane liquids, and trying to capture essentially the radial distribution function the structure of the liquid, looking at the distribution of chain links for instance here, and making sure that uncertainty actually tracks the standard deviation of these models so that you know we actually have predictive uncertainty, and the loop is just slightly more complicated and if you encounter something uncertain you go back to the fine grain model retrain it and go back to the coarse grain model. So this is a double double active learning loop that you can implement here. And finally I just want to mention the very quickly, an effort to reduce dimensionality to find collective variables. And again this is just a point to make that if you have large amounts of molecular dynamics data you can reduce dimensionality and actually automatically discover what the low dimensional manifold is a driving your reaction, on which you can then put some biases and run much faster accelerated md with you know and sampling and things like that. And this is, you know I won't spend time discussing this but this is done with a encoder like approach, compressing the dimensionality not linearly, and trying to, again bring this sort of third level of machine learning into accelerating the dimensionality simulation. Okay, with that I'd like to thank you for attention and thanks to everybody who did this work actually and the funding sources and the computational resources. Thank you. Okay, thank you very much boys for these amazing talk and we have time for a couple of questions from the audience. So, so thank you very much for the interesting talk. So you showed a figure about evaluating migration of an ionic migration of lithium in an electrolyte. So the software could be used to reproduce to calculate lithium migration, how, how could it do that. Actually, the main reason we actually got into machine learning potentials is exactly the question of how to discover new electrolytes. In other words, how to compute diffusion coefficients. And this is done with molecular dynamics and observing the mean square displacement. And you know this figure that I showed compared to the DFT so as good as your DFT. So this will be essentially matching DFT accuracy. So this is shown for lithium phosphate here, the blue curve is ab initium like dynamics, the red curve is a bunch of trajectories run with a leg roll. So it's matching as well as you would expect so yeah I think you can. You can train the model appropriately you can run in much faster simulation and looking at diffusion coefficients with mean square displacement or maybe some more complicated green cobalt approaches which don't assume any uncorrelated regimes, but yeah. Okay, one last question from the audience. In your active learning slide, I don't remember the slide number. So where you had the active learning loop going on. Yeah. So why benchmarking against random leap sampling from points. So not using the uncertainty at all they just randomly sampling for whatever you were. Yeah, I don't have a plot here but we have a plot in the in the nbj competition materials paper that introduced flare comparing the two approaches so yeah you do see an efficiency gain going for active learning as you would expect from sort of maximally using the information that's coming, compared to completely random. It's not always true though it depends on the system it depends on whether there is a benefit to learning actively in some cases in homogeneous liquids for instance, you could probably do just as well with random sampling, but in some cases with rare events are actually driving the dynamics then you do get benefit from active learning. Again, I'm sorry we don't have time for further questions but boys will be here all day at least and probably all week. Thanks for your question one on one. So thank you again to Boris. Thank you.