 on the Panna code to train neural network potential. Okay, so please. Thank you. Yes, welcome to this last tutorial, which will be about training in neural network. So we've been hearing a lot in these days about different techniques to create the potential. And here we see one of the, let's say most common ones, which is a neural network. And we go ahead and really do a training and we will do this by showing you our package, which is called Panna, which is a complete package to go really from your, let's say material simulation through the training of a neural network, through the creation of a potential that you can then run your molecular dynamics with or do any of your applications. And so the focus here will be, I mean, we will talk a little bit about how to train a neural network. And this is in principle something that, I mean, nowadays there are very powerful codes and anyone can train a neural network in five minutes with a few lines of code. What we have tried to do here and what I will try to show you is something that although it will be done here on a very, very simple data and in a notebook, in a collab notebook so that you can all follow as exactly the same pipeline that you would have to follow if you wanted to really train a more complex neural network on a lot of data and a really more expensive endeavor. So what you will see here will be through a collab, mostly calling a series of codes and scripts that is something that's on a more real case scenario you would do in a cluster through a shell in remote to let your code run for a very long time. So what you will see in collab, let's say, it will be a little bit less interactive than it has been for other presentations because what we think is that the real, let's say important part of training a network is how you would do so in a real case scenario in a more challenging in terms of data in terms of a complication case. And therefore we are not just going to do something that you could run on a notebook but that would be of no use to you in real life if you ever needed to do this. So in terms of setting the problem, I think in the past couple of days we've had very good presentations already. So I will go through this very quickly just to refresh in your mind what's the name of the game but you've seen all of this before. So it will definitely just be a refresher. And the idea is that we want to start from some simulation and we want to have a network which should act ideally as a black box that can then give us some output about this molecule or this crystal. And here ideally I write the energy, it could be forces or it could be other quantities. Now of course there are multiple ways of going about doing this but the most common one as we have seen these days or let's say one of the first ones that were proposed, not necessarily most common is to write descriptors for your atoms that compose your molecule. And we've seen that there are multiple options for descriptors how to describe your atomic environment. And we will focus on one right now but let's say in general within this approach you could possibly use similar ones in the same way. And to write the total energy of the system as a sum of atomic contribution. So basically for each atom we have a descriptor, for each descriptor is an input to a network. And in this case we will see that we will have different networks for different atoms. Each one of these will produce an energy which is a sort of local energy in this, let's say very generic assumption that we could write the energy as a sum of local energies. And through the sum of all of this energy we will get our final result. So what the algorithm will have to do is to then back propagate the information about the correct energy throughout all of this chain such that all of these parameters in the networks can be the ones that produce the correct output. And as something that I think I will try to remind you throughout this presentation although I will not enter into the details of it in the tutorial is that if we are predicting the energy and which is let's say more general case for kind of simulations and if we want to obtain the forces we can do this, I mean, this whole machinery is very good at taking derivative. So we will do this through the derivative of the energies with respect to the positions. And this can be done in the code. And I will mention how it can be done but it's way more expensive. So it will not be a part of the tutorial itself. So again, very, very briefly just what kind of descriptor will be talked about would be a local descriptor of a local environment. So we will want to sample the environment. And in this case, it's what has been mentioned as say radial atomic functions or Baylor, Parinello kind of descriptor is the one that I will use in this code. Of course, any fixed size descriptor for your atom is a completely equivalent to the network. So since as we will see we will pre-computed descriptors for the atoms. Once you have a descriptor that you might also have computed with a completely different code or that you might write your own code for then the rest of the pipeline of training in your network can be executed exactly in the same way. So while we will focus on one we have a couple already implemented in our code and more are being or can be implemented as one desires. And again, within this, let's say framework of simple fit forward fully connected neural network. I'm not talking about more exotic architectures in this case. So the descriptor is exactly something that you've seen before. So let me go very briefly. We want to put let's say a Gaussian at a given distance from an atom and sample if we find any other atom of a given species within a certain radius from a central atom. So in this case, this is very smooth and differentiable and everything that we like. And of course, because we want to have a cutoff we put as an envelope to this to this Gaussian some sort of cutoff function which goes smoothly to zero such that let's say we don't have discontinuities. So it is very important that we do not have that just by changing the position of an atom of a little bit, our descriptor changes abruptly because that would lead to inconsistencies in our network. So a detail that is important and it might be easy to pass by is that of course, if this is done for, let's say we have one central atom and if we have different species we would like to have a descriptor that differentiates between these pieces. And again, the simplest way of doing it nowadays there are many other different ways being proposed but let's say let's stick to the vanilla simple way is to have a different, let's say descriptor different binning for each different species. And these, as we will see will increase the size of our descriptor when we want to go to systems with a lot of species which will be okay as long as we have a few of them but might become inconvenient, let's say in the future. So this will be the kind of descriptor that I refer to. And just to give you an idea because it's sometimes good to try and see what something like this would look like. If I consider this molecule and now I'm just considering the radial part, I have a descriptor which for this case where I have four atoms for example is made out of four different parts that are simply concatenated. So there is, our descriptor will always be one dimensional the structure will just be, let's say inside the descriptor but will be not specified to the network. And so for example in this case for a let's say the nitrogen atom and any nitrogen atom in a database of possible crystals of this I believe glassy molecule we will see that we will probe the space to see if there is a certain probability of observing a hydrogen, of serving an oxygen and so on and so forth. And we'll create this histogram. This is like an average contribution over many descriptors that has peaks corresponding to the position of these atoms. So this histogram is really what is let's say the input to our network. And then we want to do the same thing for the radial part and here we are considering three of the terms so we're considering a couple of other atoms. And again, we've seen this so let me go quickly just specify that I will talk about the most common descriptor that we talk is not exactly the standard one that has been presented yesterday is very similar. So we have a radial part that considers the possible distances of two atoms with a certain binning and we basically also do a binning on the angular part. So if you think you have a sort of grid which maps the angles and the radii and if you have two atoms, let's say two nearest neighbor in this blue positions, what you take is the angle between the two and the average distance between the two to map let's say create a fake point that gets mapped onto this angular and radial bins. So it's just a different way of describing the same data. But again, we have to do this for any possible pair of species. So this grows quadratically in the number of species. So of course it will make our descriptor become bigger and bigger as we have more species in the system. So again, just to see what these things look like because then otherwise they are always very abstract objects that enter in our network. If you were to consider this H2O molecules and just build a descriptor for H2O, this is what the three descriptors would look like for the oxygen and the two hydrogens. And they have been done for you into the radial part that concerns hydrogen, oxygen or all pairs of hydrogen and oxygen. And you see that when even if we do a very small movement, we move a lot of this, I mean a lot of this change. So in this high dimensional space we are moving in a very specific and non-trivial direction. And actually in this high dimensional space that we've already seen before, most directions in this high dimensional space do not map to a realistic environment. So there are only a few directions. And what we try to do with our neural network in some sense is to try and learn correctly this manifold and predict the correct energy as it moves into this manifold embedded in this high dimensional space. So it is rather complicated. And in this case, it's still intuitive, but you can imagine that when you have a lot of atoms, if your cutoff is very big and you start to have tens of atoms and you start to move them in little directions, your descriptor will change in a very, very complicated way. So it's somewhat even crazy that these networks are able to extract any information out of this. Okay, and then, well, I also need to have a very little brief discussion about neural networks, which luckily, again, we've had yesterday to cover to a good extent. So I will be, again, very brief. The inputs to our neural network, in this case, is a descriptor that I just told you about. And the output might be different properties. Let's say, in this case, we will only have a single output, which is the energy. And we will have pairs of known examples. So known inputs and known outputs, correct energy. We need to define a certain cost, which, let's say, might have a very general form. I would, we typically use some sort of quadratic distance, but we can use more, and sometimes it's a good idea to use more complex functions of, in some sense, let's say, the distance, the difference between the prediction and the expected output. And we will optimize the weights. So all of the parameters of this network by following the gradient, the local gradient through back propagation, as we mentioned yesterday, which is just, let's say, the chain rule of taking derivatives. So it's just a, let's say, local gradient approximation. In fact, what we do typically when training neural networks is, of course, we do not estimate the correct, the real gradient. We just try to estimate a proxy for the gradient that costs less. So we consider, let's say, a small number of examples or a mini batch. And over this small number of examples, we estimate this gradient. This will, of course, introduce some noise into our optimization. And well, then there is a big discussion of whether this noise is a good thing or a bad thing to help our network find its minimum. But so this is just to point out that there is at least two hyperparameters here. One of them is a learning rate. So how much we want to change our weights and the other being the size of this batch that we sum over. Where the idea is that the more examples we sum over, the more we will have a good estimate of the real gradient. So it will be a very smooth function, but it will take more to compute and it might not necessarily lead us to the minimum that we would like to reach for the weights of this network. So in some sense, it might be a good idea to have a rather small batch, do a lot of evaluations and have this sort of noisy stochastic gradient descent path in a parameter space that will bring us to a good minimum. And I will comment a little bit on this as we will see as the parameters that you need to give in our code. So with this said, let me introduce to you the different pieces of Panna. And as I said, so Panna is a whole pipeline. In some sense, the training of the network is just this step, but you have to consider when you're really trying to train a neural network from your data, you need to process the data, you need to put them in the right format and it's very convenient to put them in a format that can really be correctly utilized by a supercomputer, so by codes that can run optimally on the computer. So the backend of Panna is TensorFlow, which is developed by Google and has a lot of features that make it scale well on large datasets and for large models. But what we have see here and what I will present to you are different tools that basically come from your initial, initial code that's converted the code to a format that can be read by our Panna and then calculate these descriptors, pack these descriptors in a very, let's say, optimized format and then train the network, so train the parameters, evaluate the parameters, so try to benchmark against some other examples as we will see how good our network is and eventually extract only the weights, which are the only parts that is relevant and put them inside of a code that can be used in a different application. So really you can go from the input to the output and in principle you could do this by just, let's say running scripts without having to write your own code, although this code is open source so of course you're welcome to do write your own code if you want to modify some of this. So let's just quickly have a list of features because I mentioned already some of these things we are based on TensorFlow, you can just run scripts. We provide tools to parse the input from different codes. So of course being here based in Trieste, we started with QE but we also have importers for some other just generic XYZ format or the bus format. You can compute different descriptors and what this list is now being let's say extended and of course we would be very happy of extending it if people need some different descriptors. You can then package it in the format that can be easily important to this training procedure. And in the architecture as we will see it's very easy for you to just define a network. So as it is nowadays the case in many instances of training your network you just need to specify the architecture and the code takes care of building the network according to your desires. It's of course divided by species and we allow for a certain degree of freedom of importing weights from a species to another copying weights and constructing your network in a slightly different way or modify the network having different networks for different species, all of this possibility. You can decide if you want to train part of your network what activations you want to use if you want to introduce regularization change the cost function. So all of these parameters which become really important the moment that you want to train a real complex model and you're just not playing with a toy model where the defaults let's say are very good. And then of course we can not only predict forces by considering the derivative but also use the forces throughout the training. And again, I will not show you this but we can discuss a little bit about it adding forces in some sense. So adding not only points in your manifold that you're training but also the derivative of this point, let's say in many places in your manifold can be very beneficial to the training of a good neural network. And since at the end of the day we do not only want to let's say estimate correctly the energy but also the forces although in principle they are two parts of the same manifold. It would be very important to add this if you really want to build a potential out of this. We will also show you a little bit. Let's see. Quick question. What's this TFR, packaging? TFR is just the internal format used by TensorFlow. And so I move quickly through it because I will show you in a second. It's just a compact format for storing the data and then reading back the data throughout your training. And yes, so you can visualize as you're training the let's say quality of your model and see where your training is going. And then at the end of the day export the model to be used in your favorite, let's say molecular dynamics code and again here the option to extend to more codes is very welcome. So with this said, let's go on to the tutorial itself. So for this tutorial, you can follow this link or you just probably find it on the page. And we will use Google collab. I will ask you to copy something on your Google drive. So it will require access to the Google drive. I hope you don't have a problem with that. Otherwise, of course, this could be also run just in a single collab, but we thought it would be a good idea for you to have, let's say we made a special copy of Panna that you can have on your drive and later you can run other tutorials or you could in principle also copy to your computer to run the code itself. So this first collab is very brief and it will just basically download the package onto your Google drive. So if I hope again, this is convenient to any of you, but typically when you do this, it's Google will prompt you to connect the drive to the machine that is running your collab in this moment. It's just a very simple thing and you should not be scared of it. Sorry, because I'm using a computer and prompted with an extra security problem. But if you are on your computer, you should not have any trouble with this. And after you do this, hoping that it works, you should be able to see, and again, because I will ask you to download something, please make sure that this is the right path and that you're okay by building in this, let's say your main Google drive directory creating a folder called Panna tutorial, which everything will be contained there. So you can delete it after, if this is a problem for you. But if that is okay with you, then by executing this cell, we will just let's say create a folder and download the package, hopefully, yes. And now that it's downloaded, we can just untie this. So create a folder, as you see, so you have your downloaded archive and then your folder that contains the code. Of course, I mean, this is just something that we're giving to you with a specific tutorial made for today. In general, you will find the Panna code on our GitLab. And I think this version is actually a little bit more advanced, let's say more polished than the one currently on GitLab, but we will have a release soon. And we will also have a release of a version with the new version of TensorFlow. But let's say for now, we can work with this. So if you now go to your Google drive, so I mean, I assume that you're familiar with this, but just drive.google.com and you will see that you should have now a folder Panna tutorial. And so this is just, let's say, a copy of our code. So as you enter, you will see that there is a source that contains all of our Python codes. So all of these are the main codes and then there are some other packages. This is just to show you a little bit. But inside of the doc, you will find a series of tutorials. And in particular, so you just go to doc tutorial, there will be this ML for M tutorial. So if you want to follow along, this is the notebook, which you should now be easily allowed to run because it's from within, let's say, your Google drive. So it's just something that you're running now locally from a Google drive. Again, I hope there are no problems with this. If there are problems, you can just basically for now follow along and just look at what I'm doing. As with many of these tutorials, I mean, this is, I think, a general caveat for these tutorials. In one hour and 20 minutes, it's very hard for me to give you something that is not just a button mashing. So you can follow along, but you will basically just do the equivalent of saying, like just execute every cell in this tutorial. I think the more important part might be that I will try and talk a little bit about the details of what I'm doing as I'm doing it. And of course, then we try to make this tutorial kind of self-consistent. So you can just read through the tutorial and do it at a more literally time on your own and try to understand the details. And we are here, we are available for discussing any of this. So again, that's the first step of this tutorial. So this assumes that you have followed my advice of creating panel tutorial inside your Google drive in the standard place. If you did not do so, if you change the path, which is perfectly fine, please change it here or nothing will work. But in case you did this here, we're just installing some things and specifying to Google collab that we want to use the version one of TensorFlow because the code up to now is based on TensorFlow one, the version two exists, but it's not released yet. So when you execute this, again, you will be prompted to connect to your Google drive. And this will install a bunch of things and probably also throw a couple of errors, but trust me, in the end it will work. So it's just because a couple of versions of packages are not consistent, but they still work. So it's not a problem. Like Google doesn't think that I am doing something terrible to my account. No, this should be working. Again, as I mentioned before, we will be using this Google collab in a very non-notebook way. So you will see that most of the cells in this Google collab will be just preceded by a bang, meaning that we're just executing comments outside of collab because this code is not necessarily meant. I mean, you can use part of it, we could import part of it inside of this collab, but it doesn't really make a lot of sense. I would be showing you something that is not what you would be doing in real life when you would just be issuing these comments to a remote computer that can run it for a couple of days. So this is what I'm doing now and basically I'm just using Google collab for convenience but executing comments mostly outside of this Google collab. And so this is just a sanity check. If you can see what is inside of this tutorial data when you get to this point, then you did set your path correctly and everything else should in principle work. So okay, for the first part of the tutorial, now let's finally get to the code. We will assume that you have run some simulations with your favorite code, which of course we hope is quantum espresso, but it might be anything else. If it is quantum espresso, then you might have ended up with a bunch of XML files that contain the output of your simulation, namely the position of the atoms and the energy, also a bunch of other things, but let's say also the, mostly the parts that we care about. So in this case, we have just put very few of them here for you. These are molecules, I think, mostly of water. They're all small molecules. So whatever I will show you today will be not relevant from the point of view of science at all. I will just use very simple data, train networks with very few parameters just so that things can run quickly. Yes. Ah, yes, sure. Okay, so I hope this is more visible. And so when you start with the XML, and we have imported for different kind of codes. So in this case, we call the correct importer for parts in the QEXML. And we just specify, let's say the input directory and output directory. So we say take any file from here and convert them to our internal format. So you run this very quickly. And then you check into this tutorial that we specified my examples. And you see that this has created for any of this file a copy that we call dot example. So this is, let's say our intermediate format. So it's convenient because you can just write your own importer and convert to this formal, which is a human readable very simple format. So it's basically just a JSON. And so let me show it to you in a, let's say human readable way, but this is just text file that you can open. It's just all in a single line that contains all of the information that we might require for the training. So for example, a list of atoms where you specify what is the species of the atom, what is the position of the atom and what are the forces on the atom. And this is a list. So it just runs through every atom in our simulation. A bunch of other information that you might need for later like name of the file, the position and more importantly, the lattice vectors because of course, if you're considering, so in this case it's just a big box, but if you are considering a crystal, it's very important that our code will take care of the periodic boundary condition to compute correctly the descriptors. And then the target in this case will be the energy. And so we are computing here, we are storing here the energy of the configuration. And as you see, this is in the original hard tree unit, but internally then we will just convert everything to our internal units of EVs and angstroms and such. So this is just to say that now you've gone from your generic code to our internal format. And from this, you want to again, in batch calculate all of the descriptors. So this is the way that we chose to do this and this is again useful when you're processing a lot of data is the fact that you will have a huge dataset of tens of thousands of configurations. You will have to calculate the descriptor for all of this configuration and maybe the derivative of the descriptor and this can be quite intensive. And then you will want to train a network which might be rather large and take a lot of epochs. So a lot of running through the whole dataset like thousands of times. And maybe you want to train different networks with different sets of parameters. And so it is convenient. It is not convenient in terms of space but it's very convenient in terms of computation to pre-compute everything and have all of your descriptors pre-computed so that you can just pull them as they are and reuse them every time without having to re-compute them. So what we are doing now is to pre-compute these and put it in a easily digestible format. And again, the reason why this can be done through its own script in a batch, et cetera is because these in and of itself might take a few hours. So it's something that you might want to run and then if you change parameters, you run it and you create other descriptors, then you store them and then you can then run your training on that. So all of this is thought in this idea. So yes, we go to the computation descriptor. We keep calling them the G vectors just because of the original naming of G. And so, well, this is just a reminder of the equations but we have just seen this so I can skip through this. Again, as an example of the difference average vector when you compute for different kind of structures. And what you want to do is to run our G vector calculator so the second script and to pass to it a configuration file. So throughout this code, I will, we have some configuration files that we have prepared for you and this particular case, let's go through this and see what is inside and these are the parameters that you might want to change. So let's say the more useful thing that I can do besides telling us just push the button is to try and go through this and tell you what is important. So of course, I mean, in this case, the parameters are pretty clear. So there is nothing, let's say we have not seen before that you need to specify. So it's just, you know, where to find the examples and where to write your descriptors which will be stored in a binary format. What kind of function you're considering? What are the species that you're considering it for? Of course, some of your simulations might not have all of these species inside. So we need to know because we need to correct, create the correct size descriptor for all of the datasets because we need a fixed size descriptor. Some details about parallelization. So for example, on how many cores you want to write to run this. And again, this becomes more relevant when you have a lot of these to go through when you want to run them in parallel. And then the list of all the parameters. And these, I mean, they should be rather self-explanatory because they are actually exactly the parameters that are defined in these equations. So you need to specify all of these parameters. You can specify some of them, let's say, you do not list of course the beans that you want. You can just say that I want to go from, let's say this, from this initial radius to this cutoff radius. And I want 16 beans to be created in between and we'll just do this for you. Or you can specify the list of the beans if you want to, let's say do a non-linear sampling of the space which might be convenient for some cases. And so if you do this, and again, out of all this mess, basically the only important thing is you're at running Python on your Gevector calculator and passing the correct configuration file. If you do this, well, in this case, it's just 10 examples. So very, very quickly, you will see that inside of your Gevector, you now have created a bunch of files, one for each of your examples. And these are binary files. So I mean, these are not any more human readable. So you just need to remember exactly what you've been doing. Now, these files, as they are, of course could be used as an input from our network. Now the next step that we'll show you is how to compact them in a better way for the sake of having a good input output processing as you do this. I can stop a second if there is any question about the Gevector computation and in case, okay, otherwise we just go through. I mean, of course, there was no, let's say no weird parameter here, but I mean, of course, this is part of the work that needs to be done when you're actually, let's say dealing with the real data. So as I mentioned, these binary data, as they are, if you create 50,000 binary data and then you try to keep reading all of this data, you will create probably a huge IO bottleneck, especially if these data, for example, are large cells and they also contain the derivatives, they will take a lot of, let's say, computation time just to read and write the data and this will be the bottleneck of your computation. So what we do is, of course, you can handle these files as you want, you can maybe create some subsets of files, some subsets of your data, but then after this, you can just package them in a few big files and it is very convenient to just have your code read this file once and for all and then process this. So there will be, let's say, some processes in parallel that will handle reading these huge files, dividing them into different simulations, shuffling this simulation, and then use them as needed for the processing. And this will be very beneficial to the training. So in this step, well, there is actually very little, it's just a necessary step, but there is very little to define, it's just where to find your files and where to put the output files and how many elements to put per file. So this, again, is a little bit up to you to have a trade-off. Of course, you want to create files that are rather big, but you do not want to create, let's say, more than 10 gigabyte files. So you need to find a bit of a trade-off that works better so that we will create a few files. In this case, I will create them very small just for the sake of having more than one. So I will just put 10 elements per file and create two of this file. And so this TFR, as I was asked before, it is just an internal binary format. So it's nothing particularly relevant, it's just convenient for IO inside of TensorFlow. So this is the reason why we create this. Once you do this, you have these big files and now these big files are ready for your training. So now we can go to the second part, which is, let's say, the meat of the presentation, which is how to train a neural network. So again, now I have mentioned many of the diseases of the network, so I think we can just have a look at the input file and see what are the minimal details that you need to define when you want to train your network. Again, this is just a configurable input file and again, this will just be the input of our script. So there are some of the informations are just about where to find the data, where to put all of the output of this training, some details about how often to log information and then you need information about, of course, your simulations. So in this case, we specify the atomic sequence, which needs to match the one that we have used in our examples. And then as a first, let's say, non-trivial step, we need to define, let's say, a reference energy for all of the species. Now, this I believe has already been mentioned before. You, in general, you do not want, although in principle, any neural network in the last layer can just have a constant that is your constant offset and that can easily be learned because you're learning through gradient descent, if you try and learn very small differences of very huge numbers, what might happen is that your training does not work properly. So what I mean by this is that as you compute your first gradient, because you want to go from zero to, let's say, 5,000, just because 5,000 is your reference energy of 5,000 a week and then you just want to have a small details there, all of the weights of your network will feel, will be subjected to a very strong gradient in a rather random direction. And you might get that some of these parameters go in a range where you cannot recover a good range of parameters anymore. So the training of the network and this in, let's say, more complex scenarios is always an issue when you're training network. You want your inputs and your outputs to be normalized as much as possible such that the training in the complex landscape of the parameters can proceed in the smoothest possible way. And this is, let's say, one first very, very simple step that you can do to ensure this is not to have a huge bias that you know already to be learned by the network just in the first step. And so you specify here, I want this output set to be, let's say you could set the isolated energy of the atoms or even better, you could just do a fit over your atoms, we have a tool to do this if you want, just fit what is the best, let's say average energy per atom that you can get from your simulation and use this as your offset so you don't cause the network to learn very, very weird values. So this you can specify here, of course, I mean, you can also not specify it, we will just take it as zero but your training might then fail. And then the specific training parameters. Now, inside of this card, you can put many more parameters than this, let's say this is just a very minimal setup. So the minimal things that you need to decide is how many steps of learning you want to do where every step is one evaluation of a gradient and one step of gradient descent. What is the batch size? So what is the amount of examples that will be used per each step and what is the learning rate? So what is the amount that you want to move your weight in the direction of the negative gradient to optimize your problem. So as I mentioned before, and I think we can spend a little bit of time to discuss this even a bit more. This batch size, of course, the fewer examples you consider, the faster it will be to churn steps and get evaluation of a gradient and move. This will, however, increase the noise because what is a good gradient for one single configuration might not be a good gradient for the whole dataset. To compound this a little bit with respect to a normal scenario, our network is, let's say, for one configuration, what you're training here is technically more or less one network per atom inside of your dataset, inside of your configuration. So because we have one network per atom, although we're just doing one single evaluation of the energy, you are actually getting multiple information from the same network, let's say, if you have atoms of the same species from multiple different environments. So in this sense, even more than in other cases, it might be very beneficial if your cells are very big to use a rather small size of your batch because you're already kind of averaging over the cell itself. So this might be an idea. Unless, of course, your dataset is very varied in which case you might want to have, let's say, different, a little bit of a solid crystal, a little bit of liquid, a little bit of other configuration within the same batch. So this is a one important parameter. The other being the learning rate. And again, this is a place where, well, neural network training is more of an art than a science because this is a very tricky parameter to tune. In this case, let's just say that you would have a constant learning rate. What people typically do is to want to change the learning rate as you are doing the training. So you might want to start with a rather large learning rate and then decrease it as you go on with the training. You can do any of these automatically inside of our code, define a schedule, define details about that, or you can just do some training and then you can restart the training at a later time when you see how the training is progressing and you would like to change more the parameters. So these are the parameters that will give you a bit of a headache. And then of course, the other part is to define what is the structure of the network. So the structure of the network and again, it's hidden in a very simple line. So just the G size is the size of your descriptor, which in this case is like already quite big because as I said, we have four different species. So it's case very badly with this. And what is the architecture? So what is the size in this case of your let's say first hidden layer, second hidden layer and your output layer, which is of size one because you're just predicting an energy. This flag of trainable, for example, is just to show you that there are different things that you can define. And of course, there is a guide that specifies all of these different possibilities. So in this case, for example, you might decide that you do not want to train some layers of your network because maybe you have already trained them on similar data and you just want to train the output layer of the network or you can decide that you want to use a different activation function. So if I don't specify anything by default, we will actually use a Gaussian activation function, which is not the most common thing in the landscape of neural network training, but it is in some instances of neural network training for materials, but you can specify a different activation function. So you can have an H activation function or a rail or whatever you want. You can define the output activation function and you can define other details and you can even specify different networks for different species. So in principle, you might consider that, let's say you have a lot of elements of one species, a lot of atoms of one species and a few atoms of another species. So you want one main network with a lot of parameters and a rather smaller network trained together for the other species that has fewer parameters and in principle, you could do that. So these are all the things that you can define. And of course, if you want to do a training with the forces, here you will also define that. So you will define a second part of your loss function. Ah, yeah, I didn't mention that. One other thing that you can define is to change your loss function. Now the loss function that we use here, which is a default one is just, let's say the square of the difference of the energy per configuration. But of course, in some cases, this might not be optimal because you might be waiting, let's say what is close to the minimum and what is very far in a way that is just a normal quadratic way, but you might want to have a function that waits more, things that are very far from being ideal. So you might want to have a different cost function. And there is a certain number of cost function that we have a code inside of this. And so you can just change the shape of that. And the next simple case in which you might want to change something is if you want to do a training with forces, where the cost function will be the sum of a cost due to the energy and a cost due to the evaluation of the forces. And a training parameter will be how much relative weight should you give to these two function when you're calculating your cost function. On top of these other parts that might be added to your cost function are relative to, for example, regularization in your training, which again, I will not mention a lot about this, but the most common case is wanting to add a little bit of, let's say, decrease of all the parameters. So a normalization that tries to keep all the parameters of your training small and all of this can be easily defined and easily implemented in our case. So after all of this very long talking, let's go on and actually do a training itself. So we will, again, this is just clean up if needed, if you re-execute the code, but you just want to use this train algorithm, train code, and then passes a configuration, the configuration file that I just specified to you. There is a question. Yes. Could you repeat what you just said about the two networks? There was a couple of cells above. So you were talking about training a larger network and a smaller network, right? Yes, so here, as you see, the definition here of the network falls under the default network card. This means that in this case, we are using the same network for any species that might be inside of your code. But say that, for example, here, you're training and you say, well, actually, you know what? I had to put nitrogen here, but just 1% of my configurations have nitrogen. So I want to vary, let's say, all of the variability I can have in carbon, so we'll have a rather large network, but for the few nitrogen that I find, I might want to have just a very small network that can just take care of that with fewer parameters. So in this case, we could add a new card here and the card will just be called N, so for nitrogen, and you can specify a different architecture for that network and these things can be trained at the same time with every atom gets a different network. Because anyway, it already gets different network, we're just copying the same architecture, but it takes nothing to change the architecture for one specific atom, and this might be convenient in some cases for your training. So the configuration is then called N underscore network and the package automatically figures it out. Yeah, exactly. So it will be actually, I think maybe I have the documentation here. So in the documentation, in this input, so instead of having, let's say, instead of having default network, you specify a network, which is just, let's say square bracket and then the name of the network. And then in these, you can specify how to, let's say also other things, for example, like that you want to load the network because for each species, you want to load a different network from a different primitive calculation, or you just want to load some of the weights or simply reuse any of the cards that you have used before. So define the activations, define what is the architecture, and so on and so forth. So any of these can be specified at training time. Okay, where was I working here? So yes, so in this case, we trained for very short. So we trained for, let's say 1,000 steps. And well, of course this is a, we trained on a very small dataset of, I don't even remember, I think still small molecules. So we can do, let's say thousands of examples per second. So I mean, the good thing of this is that, training this network is a heavy, I mean, training this network for anything that is worth it, is a heavy computational task. So it's not something, if you can do it, if you just are training a network for a crystal and small vibrations around this minimum of this crystal, then you probably don't need any of these. You could write your own code, write your five lines of carriers to define your model. And you can do this. If you're training a really large network, these things typically runs for days, especially if it's with forces, even at the speed that you can have on a cluster. So or on the GPU. And so it is important to be able to define all of this. The good thing is that, now that you've done the training, but also while you're doing the training, you will be creating a lot of files. And these are again, the internal format of TensorFlow to create these checkpoints, which are like snapshots of your network as you're training. But out of these files, a very useful tool is that TensorFlow itself gives you a visualization tool that can see all of the parameters of the network as it is training. So even as your training is running and out of this parameter, you can see how the training is going. You can decide to stop it. You can decide to change some parameters. So in general, what you will want to do on your computer is this line commented here. So you want to run TensorFlow board, which is the tool of TensorFlow and specify the log tier of where you have done my training. Now for the specific case of Colab, there is, because these are both Google product, an extension that will show you what the TensorFlow board looked like inside of this Colab notebook. So in this case, for the small training that we have just done, hopefully this browser will be able to load TensorFlow board. Okay. So, okay. So this might be a problem of using Firefox because I'm used to run this on Chrome. So it might not be visible in this case. Well, I do not believe I have a quick way of showing you this. Okay, I cannot show you this. So let me just try to give you an idea of what is in this anyway. I mean, it's not for the sake of fireworks of this visualization, it's just because it's convenient. So inside, this is a board with a nice visualization tool, but I mean, it doesn't really matter where you can see all of the, let's say quality measures of your training, for example, well, the main one being the loss function, but for example, also what is the variability of the energy contribution of different species? What is the error per atom? What is the error on different terms like the forced training or the regularization term? And so you can see all of these terms and how they are evolving throughout your training. And you can also see what are the weights of your network, let's say as an aggregated quantity, so in terms of histograms and graphs, how the weights of each layer of each network in your training are progressing through time. And so you can see, for example, in some cases, not only how your error is improving as you do the training, but also how your parameters might, for example, not be evolving as much for certain layers or for certain species, how much one species might be suck giving always the same result while the other network does all of the work. So all of these things, when you're doing a real case scenario, might be very convenient to give you an idea as you're running your training of what is going on and what you might want to change in your training pipeline before you spend all of the time. Now, well, this is not running here at this moment, but this is the bottom line of what is in there. So after you do this, another important step, which I mean, you can also do as you are training. So again, it's a good benchmark to have, but you might also want to do it later on, is to do a validation. So again, I'm sure many of you are familiar with this, but let me give you the little primer on validation. So estimating errors as opposed to other tools that we've seen in these days that already come with a very nice error estimation or way of predicting the error. Estimating errors on your networks is rather complicated. So these networks have a very bad tendency of being so good at fitting your data, they tend to just fit the data that you're training them with. They tend to, let's say, memorize in some sense the data. And so they know how to give the right output for the data they've seen, and they do a terrible job at giving the right output for data they have never seen. So the trick that people use in your network, but also in other approaches, of course, is cross-validation. So you take some part of your data that you've not used in training and you try to see are the performance on my part of dataset that the network has never seen the same as the data that I've been training the network on. In general, the answer is no, but if you're doing things properly, still the result on your validation is good enough. So typically up to a certain point of your training, you will see that your error on the never seen data point tends to improve. And then after that, it might stop improving or even starts going worse because you're overfitting and we've had a lot of mentions of bias variance trade-off. So this is yet a similar case of problems with overfitting. At that point, you might want to stop and use that network for your applications and do not train the network any further. So it is very important that you can, let's say validate your network on new data as you are training it and see if the performance on that data is the same. So again, we have a tool for that. So these are just, let's say when I created my TFR data files, I saved a couple of them and I decided not to show them to the network so that I could use them later. So these are the two files I have here. And now I say, okay, I will create a new input to my script, which once more is very, very simple. Let's just say, well, I have some data in this validate directory. I made a training in this training directory. Please validate this data with this network and give me the output in this new directory. In this case, I will just perform a single step but you can do this on all of the networks that have been saved as checkpoints. And so I have a nice curve that shows you how the validation is going on all of your data. So again, this is just a matter of running this code. Outside of this notebook with its input file, you load the TFR, you validate network and then this creates, let's say one, in this case is just step 1000 because it was the last step of our training. And this is again, just a simple text file which contains the number of atoms in each configuration, the reference energy and the neural network energy. And this now already looks very good. Actually, this means nothing because this is just the bias that I put by hand. The network is not particularly good but it's good enough for a very few data. And this is a graph that showed you an error of 20 milliV per atom which is for this dataset probably quite terrible but it's good enough for this presentation. So with this, of course you can see what is the error, what is your favorite. I mean, also we prefer to give you all of the data and then you can do your own visualization because, well, in general calculating just one figure of merit like the RMSE on this data tends to be a terrible idea. This distribution of error is not necessarily Gaussian. So you might be doing a very poor job of doing that although people always report that in the literature. So just look at your data. You see here, there are outliers. These, if I compute the RMSE, it will not be the RMSE will not be 0.015 as you might imagine it will be 0.4 because this one point otherwise is at 25 Sigma. So there is something going on with this point and you might want to try and understand what is going on with it. Anyway, so this gives you the freedom to do all of these checks on your own. And after this, I just have to show you the last part of this, but this is not a good point. If there are questions to stop a second about this training and well, all of the parts of training that I did not tell you about that are available here. Also, I hope somebody is checking the chat if there is, okay. Okay, if not, well, I will just tell you a little bit about the last step. And in the last step, well, there are, let's say if you follow the tutorial in the code and if you're running on your own machine and you have lamps available on your machine, you can in this use your network to run a molecular dynamics. But since now I'm running on call up and doing this would be quite involved. I will just show you how to extract this data and tell you a little bit how you can then import it on this code. So again, as many different things, this is yet another script with yet another configuration file. And this configuration file in this case just says that I want you to take a given step of my training in this directory. And I want you to create an output in another directory of lamps format. So for now we have, let's say, we can export in a format that is very human readable. So it's our internal, let's say format and it's just made out of NumPy objects for the weights and a JSON for the structured network. And this might be, okay, if you want to import our network inside our code or if you want to open the network with your own code because it's very easy to access. Otherwise, we've created this lamps format which is just a more, let's say one input file which is just text, I will show you in a second and then some binary format that can be read by the lamps plugin or by an open key plugin. And I mean, we're of course planning to add more plugins. So any of these formats can then be used by other codes and then the network can be implemented in something that is totally outside of TensorFlow. So it's totally side of any of these. So we need to really have specified and reconstruct the network such that it gives the same output. Yes? So we have a question from the chat and he's asking if this is applicable to any electronic property other than energy for a DFT based test entering that set? Yes, so in principle, you can specify other properties. I mean, so if you want another global properties like energy that just works out of the box. So you just need to replace the energy with another quantity and you can just work things in exactly in the same way. Of course, the part that will not work there or that's, I mean, we never found any case in which it was making total sense would be the forces part. So the derivative of this quantity unless it makes sense for the quantity that you're training. But let's say if you want to trade something different that is a global property, you can. There is also an option. And I'm not sure in what stage of committed to our public version of this, but we have an internal code that works for sure that can train on single atomic properties. So this might be something else that you want to do in which case actually the network makes even more sense than what we're doing now. So you're not trying to predict a global property of your configuration like the energy or some other property. But for each atom, you want to predict something based on its environment in which case each network is a single network. And so you can also as an output as a target for network trained on a single property. So I believe this is available. If it's not, I mean, you can contact us and we can make it available for you. So in this case, if you wanted to train something local you could do that. This can be also done in principle on multiple properties and there are cases in which you might want to do this. For example, something we've been working on is a network that can also add a long range term to the like Coulomb term due to charge to your estimation of the energy. And in order to do this, you need to compute a certain series of local properties and then also compute some external equations. So all of this can be added but then it makes the whole setup way more complicated. So but in principle, yes, for single properties you can train that. I think in our setup, what we are most interested in is really finding configurations, finding specific configurations. So we have the talk the other day that was talking about the different tiers of how you want to represent an atom. And of course for some properties, I mean, for some tasks it makes total sense to just work with the raw stoichiometry because you just want to know what is the best stoichiometry that I might use that might give me a good band gap and I don't care about the configuration. For other tasks instead you want specifically to know what is going on with this specific configuration. So for example, the energy, I don't care about the, I mean, I don't want a network that just gives me the minimum energy of a certain stoichiometry. I want a network that for a given configuration gives me a given energy. So in this case, the energy and the forces to do molecular dynamics are our best use case scenario. But if you need to do something different it can definitely be accounted for and allowed within our code. Okay, I believe there are no other questions. So I will resume with the, well, this again very, very simple final step of just running one more script. And I will just show you very briefly what now this thing looks like. And it looks like one text file and then some binary data files which are just another internal format for easy read in C within lamps or open Kim. But the text file instead we can look at and the text file basically again has to have all of the information because so not only of course we need to extract the weights of a network and reconstruct the network but if you want to compute this on different configuration throughout your molecular dynamics you also need to calculate and this time on the fly you cannot do it before all of your descriptors. So this actually is the one of the more expensive parts of when you want to do the molecular dynamics. So here we have to copy all of the parameters of the descriptor so that they can then be recalculated by the code. And then we specify the one network for each species and we specify the file that we will be using to load the network. Now, if you do this, you have a format which, so again, let me just tell you briefly if you were a lamps user because this is the one that we also used the most we just created a new let's say pair style. So you just need to copy these files inside of lamps and recompile lamps. And then you can accept this new pair style which is Panna and you can point at this file at this folder where you have the network and this will be loaded and you can run the dynamics with this. So again, this is not something that is easy to show you within this notebook but there is not much to see there. Okay, so this last cell is for your usage if you just want to clean up all the master we just created but otherwise in general so I will refer to you well to this GitLab address if you want to use Panna and we are, so me and Rujero who is there are among the developers and we are here and if you would like to use the code and want to discuss something more we are available to discuss and otherwise if there are questions I can answer them or we can go for lunch early which is always nice. Yeah, I think it's time to experiment if you want because then you can ask any clarification or any troubleshooting. Yes, so I'm sorry that I mean this code was really just execute all cells but it was a little bit hard to give something more to do in one hour and 20 minutes, yes. Hi, thank you for the nice presentation first. If I understood it correctly you have your own implementation of the neural network potential in this plug-in for lamps, right? In this what? In this plug-in for lamps. I was just curious how it like the forward pass through the network, how the speed compares between your plug-in and tensor flow. So I've never estimated the speed for the single forward pass within terms of flow. We have done our implementation of the forward pass in NumPy and so the problem so the implementation within lamps is written in C and it's rather simple. One problem is that when you run molecular dynamics in something like this typically you want to parallelize over the atoms. So each atom has or each few atoms have their own core that computes things. So and they compute their descriptor and then you want to evaluate the network. So it doesn't hold very well to then scale because you cannot do a network evaluation where you evaluate a bunch of inputs all together. And also within lamps, the plug-in as you will find it because lamps does not naturally support let's say DGM or this lap pack is just implemented in a sort of vanilla way. So it's not the fastest. But let me tell you that I have tried to do that and overhead because you are doing one single evaluation of the thing just for mathematical application is not worth it. So it's not that bad. And you still spend quite a bit of time in the overhead of computing the descriptor which again is written to be relatively fast but it's not a very nice computation and then you compute the network. So this does not make it super, super fast with respect to let's say, Leonard Jones something very, very simple but it was fast enough to run decent dynamics in our case. It could be tweaked. It could be tried to make a little bit faster but there are really some issues about related to the fact that you're also running molecular dynamics which don't let you speed it up more than a certain amount. Okay, thank you. Yeah, so thank you for the presentation. So the question is that I understand that the data you generate comes from molecular dynamics. So it's, I don't expect it to access all possible configurations of the phase space like most likely ones depending on the system. So the question would be does, could instead the train benefit from visiting very unphysical configurations like in a sort of Monte Carlo approach? So I mean, our code does not necessarily come from molecular dynamics. Actually, I think many of the main works that we did we tried to access very, very configuration. So one way that we did this for example was with genetic algorithms. Again, there is a little bit of a trade-off and it's not something that is spoken a lot in this community, I think. What do you want to do is very important and oftentimes you find people that are interested or want to show a benchmark on very, very, let's say simple, very specific cases which there's nothing wrong with that but if you just want to implement something around the minimum of a crystal structure then you need a very simple network that you can train very quickly and you can have sub-micro-eV accuracy. And that is, if you want to build something that works for any possible allotrope of an element any possible thing also high pressure also super crazy structures I'm not sure if you can do that. So what we, there's a work that we did on carbon and we use genetic algorithms to try and explore the space. And again, we put a certain cut-off in energy. So we said anything that you want we try to sample it in such a way that we create crazy structure with genetic algorithm we do a relaxation, we sample it in such a way we know that we are anyway more interested in something around the minima, not just one minimum but all the possible minima up to a certain energy cut-off at a certain point you need to stop because I mean otherwise the sky is the limit and a network that can also consider two atoms being one on top of the other is not doable. But you can definitely see that's when you take this very, very dataset if you concentrate on just one area then you have a certain scaling and trade-off also of your training and validation error and then you try to validate on something very far away and it's terrible. So it's very important, yes that you do sample as much as possible but this also depends a lot on what you're going to do with this. So it's not, there is no one easy answer but it's a very subtle issue and especially when people just to say like, you know, we train the network and it's very good because it has this amount of error. This means nothing because you don't know how hard that dataset is you don't know the distribution of the error itself as I mentioned before might actually be some very weird thing. So and, you know then you can just change the RMSC we found cases in which if you were to just report the RMSC you could out of like, I think 10,000 data by removing 50 you could change the RMSC of an order of magnitude because you had very outliers and so you just forget about the outliers and then everything works. So it's very tricky. And so again, our point I think the point of all of this is I don't like automated things. I hope it passes through this. I like tools that can be helpful but it should not be too automated because then you don't see any of these details and then you end up with just a number and then you don't know what you're playing with. So I do encourage people to not just do molecular dynamics unless they know that they are just going to explore that page and to do weird things and then to decide what makes more sense to include in there that set and to benchmark when you change this parameter as much as possible and then you decide where you're comfortable with it. Thank you for the very detailed question answer. Okay, one more. Yeah, we have one question from the chat and if you could tell about the main differences between Panna and DeepMD and whether this is just in the descriptors. No, I think DeepMD, I'm not very familiar with it but I think you can also make coffee with it. So it does a lot of things. I don't know. I mean, I'm not here to sell the power of Panna against DeepMD. I'm not sure that we're doing that. I know that they do a lot of very, they're very strong on many things. They have a huge team so they have implemented a lot of different things. We have our code that works for our need and we try to be agile and implement a little different things. And if you find yourself comfortable with it and you would like to talk with us, you can use it. DeepMD has a lot of other features and it's definitely a very strong package. Also if you want to run on a lot of code so I wouldn't even know what are the features exactly to list but it's a very good package. So you could definitely try them out both and see what works for you. Okay, thanks. Okay, so I think we can thank the presentation and it's time, I think, and they are still around.