 Good morning everyone both here and abroad. I would like to start thanking Antonio for inviting me to give this lecture today. It's always a great pleasure to be back in Trieste after spending four years here during my PhD. And I am very, very glad to be here to tell you something about the connection that exists between. The connection that exists between the very broad field of machine learning and multi scale modeling and soft and biological matter modeling. The objective of this lecture is to provide an overview of some of the applications that machine learning techniques. In particular deep learning but not only that have been attempted and carried out in these contexts in the context of of soft matter modeling. Let me start these lectures. I'm acknowledging the the group with which I mean carrying out the work that I will present during the final part of the lecture so most of the things you will see until the end is work of other people and at the end I will give you a hint of the kind of work that we carry out in this context. And these are the people behind behind that work. The outline of the lecture is then is then this I will start telling you something about the the background providing some background about the in context of the work that we carry out and what machine learning has to do with that and then we will dwell deeper into into this topic and finally as I mentioned I will give you an idea of a specific applications of machine learning broadly spoken in this context as we as we carry it out. So let's start with this, this invitation in the context of soft and biological matter modeling. May I ask you how many of you know what proteins are and what how proteins work just to have an idea of the okay. Okay, I will. I do not want to go too deep into into this but since I will be mainly talking about proteins because this is all molecules in general and this is the, the main field of application of what I will be discussing, just to give you a very clear idea proteins are polymers are ethropolymers of biological origin, and they're constituted by repeating units, these repeating units are the amino acids, and they come in 20 types. These 20 types of amino acid combined are combined by molecular machines and cells in chains that collapsed onto each other onto themselves they form that typically not always but they typically collapse onto globular proteins and these globular proteins are structured objects that carry out structural and functional work in the cell. The, the amino acids interact with each other in a in a manner that depends on their specific chemical properties and their sequence, and depending on which amino acids are placed in a particular sequence. You get specific structures at various levels of the architecture so for example you see from this, from this cartoon that the amino acid chain is arranged locally in structures that are called secondary structure the primary structure is the primary sequence actually so the which amino acids you put in sequence and then depending on this particular sequence you can have a winding of the chain that forms helices the alpha helices or a flatter structure that is the beta strand and if you have many beta strands you can beat the sheets, and these structures are the fundamental building blocks of proteins that can arrange themselves in complicated very complicated structures that can be single or multiple chains. And these objects eventually perform a lot of work in the cell they can catalyze react chemical reaction they form structural elements. And because of the chemical material back and forth they do a huge amount of things and because of that it is particularly important to, to study them. Also because from a pharmaceutical point of view just to mention one thing you have that proteins are typically the target of pharmaceutical molecules that try to hamper or enhance the biological function of proteins, because of course these impacts. So when we study proteins, and for that matter any kind of biological molecule, and even not biological molecules of non biological origin. We can either do that experimentally and this is of course the starting point of all kinds of studies, but then we can also tackle the problem from the computational point of view. In order to do that we need to know what the system looks like so we have to have an idea of its composition and structure and experimentally we can derive this information that tells us how which atoms constitute the system and how these atoms are arranged and this provides us with with the, if you want to sort of cartography of the system we know where is what. Once we know this information we can provide. We can dress this structure with with interactions, we can provide at the classical level of course we can provide this model with with an interaction force field that can be employed in molecular dynamic simulations. We can use the structure of the system and interactions that the components the constituents are subject to in order to integrate numerically Newton's equations of motion and obtain trajectories that show us how the system moves and does things. Once we have this simulation once we have the trajectory that comes out of this of this numerical approach. We hope to infer properties of the system that can be subsequently employed or, or, for example pharmaceutical applications of course this is just one of the, of the examples that is very specific to proteins. But for example if we're talking about polymeric materials, we want to study the mechanical properties of these materials in order to devise better plastics with specific properties as a function of the stress of temperature and whatever kind of external forcing the material can be subject to. In the biological context then it is a paramount importance to to perform a numerical sampling of the. The conformations of the system may means of numerical similarities. Of course the systems are composed by a large number of atoms. Each amino acid has on average 20 atoms. Then you have to multiply this by the number of amino acids that you find in a protein and this means something that goes from the super small proteins like 30 amino acids to the typical proteins up to 300 amino acids up to the thousands. Of course this requires also the solvent to be simulated, that is water proteins leaving water with other course all units, simulating these objects requires a large computational effort, so it is computationally demanding. In the third stage, we can perform simulations of telly large systems that is, for example, entire viruses composed, putting together the virus itself and the solvent by something in the order of the million of atoms. There are also systems that are even larger that is, for example, cellular cellulite systems composed by membranes with proteins on top of that, and other molecules inside. Of course we cannot perform brute force molecular simulation for this systems, but with the help of some modeling. It is not possible to infer properties of these systems of that size involving billions several billions of atoms, or we can focus on smaller systems but studying them for longer times and there are specific machines that have been devised specifically for the purpose of performing molecular simulations that can tackle the problem from the point of view of the simulation length. Because of course it is not enough to simulate these systems for a few nanoseconds, but in order to understand something of the biology of these systems we have to go to at least the hundreds of nanoseconds and the larger the system the more difficult this is of course. So typically what we do is to perform large simulations on on clusters of computers of CPUs or GPUs that are gaining importance to just study the systems. Or, as I mentioned, there are specific computers like the Anton computer developed by my company based in New York, that is specifically tailored for this kind of work. And, of course, this is the, the tip of the iceberg of the kind of research that is done this is a kind of kind of work that can be carried out by a very few people in the world that is just a few Anton's in the same place, worldwide, but typically researchers can have access to computational resources that are large enough to perform important molecular simulations and the larger, and the more time passes the more accessible these computers are the more effective the soft as also we can produce a huge amount of data, but we do nothing with data if we cannot in some in some sense filter them distill them in order to get information out of the data. So I think it is after two weeks of this of the school it is absolutely evident that data per se, are not enough, we need to rationalize them we have to extract meaning out of those data, and this goes for the results of molecular dynamic simulations as well, of course. In the specific context of multi scale modeling and the applications of this broad field to biological molecules in particular. We need to extract information from data that entail a certain level of complication this statement is based on the idea that we have complex systems and complicated systems typically we have a mixture of the two properties and complex and complicated and not synonymous, we have complicated systems that can be simplified complex systems are constituted by in principle simple elements that interact in such a way that gives rise to an emergent behavior, and this emergent behavior cannot be restricted by the properties of the constituents themselves. So we need to, to study the system as a whole, in order to extract the important complex information about the complex behavior of the system, but we would like to do that, minimizing the amount of complication that the system has that is we want to extract as much as possible. And this is the goal of of an important branch of of modeling that is coarse grain, coarse graining applied in this context means to to provide a description of the system that is simpler so as to figure out what the, what the system does without losing important information. This can be interpreted in two ways, we can either observe and very detailed description of the system as it comes from a non-lattom molecular simulation, and we look at it in simplified terms so as to to understand it better so there is a filtering procedure, or there is the, or we can directly model the system in simpler terms so instead of describing each single atom that constitutes a single protein we can group atoms together and describe the system in terms of fundamental units that are that are atom-like balls that represent a large number of few atoms at least, and we can simulate that system, that system directly. So we have a simplification of the structure at the very beginning, and this provides us with a simple simpler model to simulate, and this means also a cheaper model to simulate and we can go much faster in that direction so we can study larger systems for longer times. The typical procedure for coarse graining starts with the knowledge of the system at a higher resolution. The representation of the structure of the system in simpler terms that is the mapping procedure that takes the structure of the system from a large, a highly dimensional configuration of space to a smaller, still large but smaller dimensional configuration of space. So we have a simplification of the structure. And once we have these few fundamental units that the system can in terms of which the system can be described. So we have to parameterize the interactions. We might know very well how atoms talk to each other, how atoms interact, but we do not necessarily know how these effective interaction centers interact with each other and we have to find a way of dressing those interactions. We have to parameterize the system. And at the end of this procedure, we have a coarse grained model. To perform these operations, there are of course several models, there is a zoology of possible ways of carrying out this procedure. So typically we have top down strategies, knowledge based strategies and bottom up or systematic strategies. The tops top down approach assumes that you already know something about the emergent behavior of the system, and you employ this information in order to tweak the interactions of the system at the fundamental level. That is, for example, you know what the structure of a protein is without knowing how it got there. And you construct interactions, putting springs between atoms or effective interaction centers of the protein, so as to have a certain structure. And this is a top down approach because you know the structure already, you are not waiting for the protein to collapse into the structure, but you can study important properties of the system and how it fluctuates about the native structure that is the biologically functional structure, or you can parameterize the interactions so that atoms that will be close to each other in the native structure will attract with each other, while the other ones won't. They will actually bounce on top of each other. This is a top down model because you are parameterizing fundamental interactions so as to observe a specific emergent behavior that you know, which is not what you would do from a fundamental point of view because you have to parameterize the interactions first and then see what would be the correct. This is what the system does, what you do is to put the answer into the question. The knowledge based approach is very similar in spirit to what you have in a top down approach but you use, you use a large number of a large amount of information to parameterize the interactions so in a sense this is probably closer to what you do in the broad sense in learning that is you provide a statistical bias for the interactions to parameterize the interactions so as to reproduce, not really the whole emergent behavior but for example correlations within, within the structure of the system. And then we get to the strategy that is closer to the physics approach that is the bottom up strategy in which you have a reference high resolution system. And you want to simplify that so as to incorporate as much information as possible in a rigorous manner and produce a simplified representation that is aware of what goes on at a more fundamental level. And of course, there is a large number of ways in which this can be done but there is essentially one kind of objective that all possible methods have what we, what we do can be framed for a, in a very formal manner, as it falls for us let's assume that our high resolution system has a certain Hamiltonian with its own degrees of freedom and interactions between the degrees of freedom and this is our fundamental representation. Then we have a mapping that simplifies destruction so for example a group of atoms that belongs to a certain amino acid is mapped onto its center of mass and everything that is left of that group of atoms is just the position of the center of mass. So we want to provide interactions between those degrees of freedom in a way that is very close to what you started from. Those of you who are familiar with the concept of renormalization group might recognize something very akin to that in spirit. You have that you have a high resolution system you map it in on a lower resolution system that can be structurally written that can be formally formally written in terms of a Hamiltonian, similarly to what you started from the, the whole issue is to find how to parameterize the new Hamiltonian so that it does what the old system did. But of course you have fewer degrees of freedom you had a loss of information in this procedure. So, in order to carry out this procedure formally, what you what you have to do is to figure out a probability distribution in terms of the new coordinates the fewer new coordinates. So this is essentially a marginal of the old probability distribution with a constraint that the, that the degrees of freedom that you that you are interested in are fixed. So you can marginalize over all the high resolution degrees of freedom fixing the configuration of the new degrees of freedom. So if you do that, I'm focusing on the coordinates you can do that also on the momentum, but this is not as interesting as for the coordinates, you get to consistency conditions that tell you that for the new model for the cause going model to be effective to be sound, the probability distribution that you have from the course grain representation has to be identical to the probability distribution that you have from the high resolution model. So you look at it only in terms of the cause grain coordinates. So here you have a probability that only depends on those coordinates, and there is no chance that can do anything else. And here you have a probability distribution that is a marginal over a larger number of coordinates, retaining only the ones who are interested in. Now the issue of course is how to do that, because this is very simple to write but very difficult to get. Carry out the integration what you find is that the effective potential that that exactly satisfies this condition is essentially a free energy that is minus one over beta the logarithm of the partition function of the old system integrated over all degrees of freedom with a constraint that you get specific coordinates in terms of the course grain variables. If you had this potential, you would be done. All the problems that you have will be solved. But of course, this is a very complicated operation to carry out. So, a large number of methods have been developed to approximately parametrized interactions that tend to this kind of potential. But focusing on them but focusing on the field of machine learning one can wonder whether this complicated operation can be carried out with the help of the large machinery, the diverse machinery that has been figured out in the context of machine learning. And of course, this, this question has a positive answer. And recently, or not so recently depending on how you measure time. But approaches have been implemented and employed to perform this operation. That is otherwise, a very complicated inverse problem in which we want to tweak the interruptions so as to get a force field that or effective potential energy that gets as close as possible to the, to the multi body potential of enforce as that quantity there is called. Let's get into the, into the heart of the matter, how can we use machine learning in this context to carry out this, this task. Of course, given the, the variety of problems that we have. The first thing that you have to, the first question that you have to ask yourself is what machine learning machine learning, as you know, much better than me is a very broad umbrella term that encompasses these and many other techniques that go from the very simple linear regression to whatever as complicated as it gets. And of course, specific methodologies are appropriate for specific tasks, you have no, no method that is one size fits all you have no Swiss knife in this context, or in general, in the case, and you have to find what is the appropriate tool that you have to employ, in order to tackle your specific problem. In general, in the context of soft and biological matter modeling, we can use machine learning, the whole of machine learning for several tasks, specifically the bus majority of which fall into this kind of big chunks of that is, evaluation of quantities, we want to figure out properties of your system in quantitative terms, and you can use machine learning for that. You want to classify that might be seniors in sense as a very core subset of the evaluation of quantity you want to see whether your system has a certain structure, or another. You want to analyze the data that you have, as I mentioned, you can perform a very large molecular simulation, you have lots of data, and then you have to dig into them to find some sense out of those data and machine learning can help in this kind of procedure. And of course, the, the complementary strategy that is to generate new structures that is, I can use machine learning in order to do. I can also factor the same thing that molecular dynamic simulations do that is instead of wasting time or spending time, integrating Newton's equations of motion, I can have a method that in a much faster manner produces new structures. So an example of the first category is the calculation of energies. We have to tackle the problem of computing the energies of of molecules at the very fundamental level we have to go into the quantum mechanical regime. So for example here you see an example of a molecule that has its own glory in the context of conness matter systems that is fuller in. It's not carbon put together in the shape of the ball. But of course if you want to calculate its properties to complete its properties at the most accurate level possible, you have to solve shredding equation, not Newton Newton's equations. And to do that you have to carry out very complicated and time consuming and resource consuming consuming calculations. On this end, you can think of employing machine learning to do exactly the bottleneck of the calculation that is the calculation of the energy determining the energy based on the structure of the system, and you can train a deep neural network as it was done in order to to produce the value of the energy based on the configuration of the system. We have subsequently employed in a molecular dynamic simulation in which the forces that act on the atoms are determined through the network and not through the solution of shredding dissipation and this speeds up the calculation immensely, and this allows you to to perform simulations of molecules that are fuller means or much more complicated stuff than that, in particular molecule that can be all the technological or pharmaceutical relevance. And to do that, of course you have to feed the network with some kind of information, and the, the more physically sound is the information you provide the network with the better it is for the network. I think that Professor Tomasetti mentioned a couple of days ago that you have to, you have to provide the machine learning methods with refined data, in order to, in order to get a better procedure, better result of the procedure better classification better structure of features, and so on and so forth. Of course, if you provide the network with features that are physically sound and already have some information about the relevant properties of the system. You simplify a lot the job that the network has to carry out. You could provide some way in some way, the configuration of the system the position of the atoms, and that's it, and you let the network do the job. You're not doing it. But if you provide information that is based on the properties structure properties of the atoms like for example, how many atoms are given atom has in its surrounding, and how these atoms are distributed what is the angular distribution of the core of pairs of atoms about central one and so on and so forth. This kind of features already have a lot. There's a variety of information within themselves, and a lot of physics about the system. If you do that, that is, you provide the network with features that are based on structural correlations within within the system, you can get information that is much more accurate. And then you can, you can employ the strategy also for systems that are larger that are more classical, let's say, in order to characterize the structure. Typically, people think that water is either a gas liquid or a solid that is ice, but ice is an is a horribly complicated thing. There are several phases of ice, and it might be very important to discriminate between the different phases of ice, not to mention the different phases of the liquid, different forms of the liquid phase that can be tricky as well. So what you can do is to, is to employ this local functions these symmetry functions as they are called to discriminate between specific local arrangements of water molecules that distinguish between different phases of the system at a very local level. So here, for example, you see a distribution of points in a plane where you have two very general descriptors of the local, local arrangement of water molecules that you for NQ six parameters that tell you essentially how tetrahedral and how exactly now the local arrangement of the molecules is, and you see that in terms of these parameters you have several, several spots, many of which overlap, which tells you that you have different phases of the system that can be associated to the different clusters but it's very difficult to discriminate just based on them, because you have regions where the different phases overlap in terms of this you these only these two parameters. So you can train a network based on symmetry functions. Yes. I have a question in previous two slides. Okay. I also forward. Like he is this. So I wondering, are you input all all kinds of feature into the neural network at the same time or so if the because you mentioned we input more feature into the neural network we can get more good results but for different kinds of feature they will have different kinds of structure of the data so I wondering you input all all the time, all of them at the same time or yeah yeah so so generally the different features are computed simultaneously on the same structure, or set of structures, and you feed, you feed the network with all of them at the same time. In this particular case what you do is to compute for each atom, several features that are, for example, functions that that account how many atoms you have in the neighborhood, how many atoms you have in the neighborhood and how the angles between them are distributed and so on and so forth. And then you feed the network with all this information is at the same time. You do that for all the atoms, and then you join this information you extract the sort of local energy for each atom, you sum these local energies and you get the total energy of the whole system, but all these features are passed to the network at the same time, and the network is trained, keeping this information simultaneously. Did I answer your question. Okay. So the, yeah, if you apply this kind of strategy to a system like water. And you, and you do that for different temperatures you see the, you observe the formation of ice seeds you have the growth of the, of the crystals, and you can monitor. You can see this freezing process proceed and in terms of the, of the local structure of water, and this allows you to discriminate the free energetic contribution of different phases to the, to the whole system. With that you can get much more accurate results than to the, to estimate the energy barrier for the, for the nucleation process then what you would have assuming that the system is in just one of two phases, and without without discriminating the different phases, in order to do that you have to discriminate the phases because each of them has a different free energetic contribution, and the network allows you to classify the different structures locally and instantaneously to the structure information that you that you pass. Then of course, the, in order to, to provide the system with features that are physically sound, you have to account, you better, you'd better account for symmetries that you have in the system so for example if you have an invariance rotational invariance, or if you have relationship between atoms that not depend on the absolute position, or whatever other kind of symmetry you might happen to have in the system. It makes a little sense to figure out features that explicitly account for them, as for example, writing the, the feature that corresponds to a particular structure of your of your system in terms of basis functions in this particular case, spherical harmonics that can be that are invariant upon a certain number of of transformations and eventually the features that you pass are the coefficients that you employ to project the structure onto a naturally invariant basis set in this manner you have, again, you have carried out part of the job that in principle the network would have to carry out, and these allows you to, to obtain more accurate results in for example the calculation of the energy structure based on structure, because you are, you are passing better higher quality information about the structure to the to the network. These kind of examples refer to how you can analyze the system in terms of by means of a deep learning approach, but as I mentioned you can also generate structure you can use these methods to generate you can train, for example, a network based on a set of structures that you know that are physically meaningful to discriminate between those structures that make sense from that particular point of view and those structures that don't. So for example, if you have, if you have a network trained so as to distinguish between configurations of a polymer that are statistically representative of a protein and those configurations that are not. You have in, let's see what you have obtained is a classifier that tells you what is the probability what is the likelihood that certain structure is a protein or it is not. You can use this as a method to generate new structures in that you carry out a sampling procedure that moves the, the, the structure around that the forms the structure so for example you perform a Monte Carlo simulation in which, you perform the structure, and then you, you employ the result that you get from the analysis based on the neural network to determine whether this with structure structure that you got is a realistic structure for a protein or not. This allows you to, to produce structures that are consistently realistic from the point of view of the, of the set of configurations that the network has been trained with. And it can be shown that you can generate structures that are consistent with what you would see in a molecular dynamic simulation of a protein. This is a network that is that takes as input, a set of distances between the different, different points, and in this particular, in this particular neural network what you do is to measure the, is to measure local properties of a subset of the protein in a, in a short stretch in a short strand of the, of the polymer, and then you move these windows along the structure and you do a sort of sort of padding. So you have a sort of convert, it's not a convolutional network because it doesn't operate like that, but you slide along the structure to measure the, the local, the local structure. So you have the polymer and to, and you train the network, based on this kind of information, of course, if you were to introduce this is done at the cost grade level so you have information that pertain only to the distance between the the sites that constitute the polymer so you have a polymer that is a big spring model, you have a point like particles connected by bonds. So if you were to introduce this on a more sophisticated structure you could introduce also directional interaction, if you had, for example, side chains that can point in one direction or another you can account for that you can of course enrich the model as much as you want you can enrich the information that you pass to the network. And this helps in discriminating better and before also generating better structures. I think that the objective of modeling in the context of cross greening is to, is to devise is to parameterize an effective potential that gets as close as possible to the multi body potential of new force. And this is done with specific machine learning approaches, like for example, the, the Gaussian approximation, proximity to potentials potential potentials that are based on that describe the effective energy the free energy or the system in terms of a Gaussian that are localized that are centered on configurations that have been indeed observed in the, in the training set so for example in a molecular simulation, what you train the, the network with is the set of coefficients of the, the, the pre factors of the Gaussians such so as to reproduce the, the free energy of the, of the system in terms of the, of the configurations that you have sampled. This is a sort of very smart and effective interpolation, because as a first thing, it relies on a subset of configurations that you have sampled, and of course what you see as it typically happens. So it provides a range within which you can have configurations that are interpolated by the ones that you have observed, and the potential that you have knows what those configurations are. But then you do not provide these Gaussians, trivially with the configuration of the, of the system. So you do not match one configuration when entire configuration with the instantaneous one that you have. So you have to parameterize this, these Gaussians in terms of specific 234 body terms that are computed on the Coswin sites of the system. These produces effective potentials that that are separated in terms of the order of the interactions. You have one body term that will be essentially an external field to body terms that represent pair was interactions three body terms to represent three both interactions, and so on and so forth. And this is particularly useful to, to gain understanding of the behavior of the system, because, whether, because you can discriminate whether the introduction or not, or the three body terms contribute substantially in improving the quality of the result that this tells you where, for example, whether the interactions that you have in the system are strongly directional or not, so that you have to account for the three body arrangement. In this work, the authors have applied this method to to two molecules in, in the liquid phase methanol and benzene. What you can see in the case of methanol is that introducing three body terms helps, maybe not dramatically but substantially in improving the quality of the, of the result in terms of how well the, the correlation the three body correlations are reproduced in the, in the system because methanol is not a symmetric molecule it has asymmetries. In fact, a description in which the system is just a point like particle doesn't account intrinsically at the level of structure of for this, for this asymmetry. If you put this asymmetry into the parameterization of the interactions by means of the three body term, you get a much better agreement. For example, in the case of benzene you get, you get a very good result also in absence of the three body term because you see that the level of correlation that you have within the system at the three body level is not particularly strong is not substantial In the case of methanol, it is important in order to characterize that into reproduce that qualitatively better, you have to, to account to, or that you gain, you have a, you have a gain in accuracy by including, including a three body term. Eventually, this is not just a matter of getting accurate potentials that carry out very well the job of reproducing structures that you that you have parametrized them with. It is also a matter of gaining understanding about the system because if you perform a simplification of the system, then you parametrized the system in a certain manner that manner that provides results that are qualitatively better than doing, and what you get to doing something else, you have acquired information about the physical properties of the system. So the advantage of using a bottom up approach is precisely that of gaining a larger amount of understanding about the physics of the system, and not just having something that works nicely through a black box that I cannot make sense of. In particular, an example of this is given by the application of machine learning approaches with a particularly smart method that is symbolic regression in understanding for example how reactions take place. So regression is a kind of machine learning in which you do not fit parameters but the network, the machine learning approach, tries to build mathematically sound functions of the of the input that is the method proposes the certain certain function with respect to another sees whether this works better or worse, and eventually what produce what it produces is a function that can be read in a, in a human friendly manner so to speak. Here, for example, you have a very simple system that has to go from here to there. So you're forcing a free energy barrier or an energy barrier so you have a very simple potential that has two minima and a barrier that separates them. And the idea is to figure out not only the, not only to observe the process of going from here to there, but also figuring out what is the most sensible reaction coordinate to carrying out this process. So the most sensible coordinate to study a transition from one configuration of the system from one state of the system to another is the committer. The committer is a function that that takes the configuration of the system as an input. And based on that it tells you what is the probability that you go in the pro in the, in the state of the products, let's say classifying these two states as the reactance, and the products state. So what you want to have is the description of the system in terms of the probability of going towards the products starting from a particular point in the configuration space. And then you have a reaction coordinates because it tells you what is the, what is the path, because the, let's say the, the, the path that goes orthogonally to is a committer surfaces that is surfaces in the configuration space that have the same value for the this is the most straightforward manner in which you can go from the product state to the from the reaction state to the product state. So the idea is to figure out the committer by means of a machine learning approach, the symbolic regression method has been applied in this work to carry out this job. What the, what the method produces is a function that is this that can be read in human terms. So you can see that this function in terms of the coordinates of the system in proximity of the path that indeed is observed is is followed by the system and going from from state A to state B is very close to the ISO committer lines. So you could an excellent proxy for the committer as a function of the coordinates, and most importantly, what the method does is to realize that there are only two possible variables that are important. So the many possible variables because you can construct an arbitrary large number of functions of the configuration of the system, but the system realizes the method realizes that there are only two particular coordinates that are relevant in the description of the system, and in particular one coordinate is much more important than the other. Why is that the reason is that even though the system leaves in a bidimensional space in practice you can rotate you can tilt the space and you can have a straight line, a sort of straight line, a very straight line that goes from one state to the other. The perfect reaction coordinate is very much one dimensional course it is not exactly one dimensional because the, the, the reactive path doesn't go straight through the surface from one stage to the other, but it is very close to a straight line. So the method realizes that and provides, let's say, relevances of the coordinates that are employed in the, in the construction of the committer. If you apply this method to more realistic and interesting systems, you see, pretty much the same behavior and this is very much informative because, for example, here you have the typical the guinea pig of biophysics that is the aniline alanine peptide that is a very small molecule protein like molecule that is composed by two amino acids. And this molecule can essentially flip a couple of angles can rotate a couple of angles and the, the configuration of space that is relevant for the characterization of the system is essentially two dimensional with a strong relevance, that's for one of the coordinates with the rest of the other. And indeed, it turns out that one particular coordinate is deemed extraordinarily important in the construction of the committer and in discriminating different configurations of the system. Then you have a few other. coordinates that account also for the presence of the solvent, because knowing how much solvent surrounds the molecule is important in order to characterize the state of the system because the system doesn't leave in the vacuum it is simulated in presence of the solvent and the water molecules that surround the alanine peptide, in a sense, are essential part of the system itself. Then you can study even more complicated reactions like for example the association of two ions, and you realize that there are a set of coordinates that are more gradually less and less important. The coordinates that are important that account for example for the state in which the two ions are close to each other. And let's say in direct contact, or the state in which you have a water molecule that mediates the interaction between the two, and so on and so forth, and this is a more nuanced as a, as a description of the system because the system indeed is much more complicated than what we have here. Another interesting example of the, of the usage of neural networks in the analysis and construction of configurations of proteins, just as a, as a typical example is the usage of auto encoders. Richard Feynman said what I cannot create I do not understand, and this applies very well to this kind of networks, because these networks apparently do something very trivial. That is they take a structure as an input, and they have to learn how to reproduce exactly the same structure as an output. Apparently it's trivial but it is not because it goes through a bottleneck. This bottleneck is essentially a coarse graining process that is the network has to learn to reproduce the same thing as that it has as an input in terms of very few parameters, not all the parameters that are provided as an input, but a much smaller number like for two. And the importance of this procedure is that on the one hand, you can provide a low dimensional representation of the system or the configurational space of the system in terms of just a few parameters and this allows you to discriminate better classes of configurations, but once you have done that once you have parameterized the network you can also forget about the training part the filtering part. The funnel if you want that takes complicated structures and funds them into simpler representations, and you can pilot the generation of configurations based on the values that you put into this couple of neurons there. Yes. In the middle of the hidden layer you only have two hidden units right. But does it possible you will lose too much information because you only have very small hidden units. Well, one of the big issues one of the important aspects to tackle in this kind of approach and an aspect that is extraordinarily informative is the is to figure out what is the smallest number of components in the bottleneck that you have to put in order to provide a realistic representation of the of the output with respect to the input. So if I put just one neuron, it is very hard to imagine that I can reproduce a complicated system that all the conformations of a complicated system, if I put too many units in the in the bottleneck. I can do this job perfectly but I gain a very small amount of information, because there is not an important difference between the amount of neurons that I have in the, in the entry layer with respect to what I have in the bottom. And because of that, you'll have to the first thing that you have to do is to explore how many what is the smallest number of neurons that you need in order to reach a certain level of accuracy. So if I put out already this is an important formation because it tells you that if you have to use five neurons by hidden units in the bottleneck. And if you go below that you won't manage to get the accuracy that you want. This tells you that the space in which the configurations of your system lead are is essentially five dimensional. Exactly. So you have to, you have to make a sort of a scaling analysis in which you, you start from a certain number that makes sense or maybe you can start with a very small number. And if it works, you're already happy, but as it typically happens it's not the case. You can do that up posteriori by my trying to rationalize what is the difference between the structures that you get going directly to the result. This is what you get by performing this kind of this kind of kind of analysis. And the way these maps are produced in the case of two dimensional hidden units but actually you can do that also with larger numbers is to project. So you have a set of values in each of the, of the bottleneck units. And this is your course grain configuration of the, of the system as the network produces it. Then you can try to find the, the most accurate, low, even lower dimensional representation of the system that is a two dimensional representation. If you already have two units, then it's fine, then it's easy. If you have more than two units, you can find a representation of your system in a true dimensional space. That is as close as possible to the, that arranges points as close as possible as they are arranged in the higher dimensional space. And you put points so that if they are closing the five dimensional space they are closing the two dimensional space if they are far apart they will be far apart. These allows you to, to cluster structures not in terms of the configurations but in terms of the mapped configurations that you have in the hidden units. In the whole dimensional space, and you see that all the configurations that happen to have very close values or the parameters that you find in the bottleneck have very similar structures. But if you go from one cluster to the other you can appreciate structural differences between the configurations, and you can. You can try to make sense of those configurations of course this is much more complicated, because you do not have any immediate correspondence. You have to get in the hidden units and the structural arrangement of the system, because these are very abstract values. So what you get in the hidden units is very abstract, and you have to make an effort in assigning a meaning to what you have here so for example, here it's relatively easy here I have a bunch of configurations that map onto something that I can recognize I can call a half an alpha helix. I have a terminology for that, and then I can, I can do this operation. Otherwise, I have to provide names for these arrangements, and I posteriori I can say okay now this configuration is a curling of category three, whatever that means. It's difficult to to make sense of the of the different different arrangements. But the beauty of the strategy is that even though describing these arrangements is difficult in human terms, they can they have something in common and numerically quantitatively this strategy gives you the quantity that they have in common. They provides you the multi dimensional space in which those structures have something in common that you can see, because they indeed are very similar, and it blurs out differences among them. And it allows you to to discriminate structures that have that that have differences. This might be hard with standard computational physics techniques for example if you compute the root mean square distance between structures between because structures that are very different from each other in distance might have the same root mean dimensions of structures that are grouped together, but they are different from the other two. So it provides a better classification that very global quantifiers of structural differences would provide. It also provides an instrument to generate new structures, because now you have a two dimensional space with a certain certain points that you have observed in the simulation that you have used for the training of this network, but you can also place your values, you can set your parameters in a point that has not been observed before, but the network will generate the corresponding structure. You can perform sampling by exploring conformations of by exploring regions of this parameter space that have not been visited, and you can create something that is new so you can actually extrapolate or better interpolate between points that you have seen in regions that you have not seen. And these of course is much more important. Yes. Okay. So you, you were showing in the previous slide there is an auto encoder and then you said you applied it together with the sketch map. Yes. So, and you talked about in your example of five dimensional latent space, and then where do you apply sketch map I did not understand, is it to to go from five to two. Yes. Okay, so in order to visualize it. Exactly. So it's a visualization procedure in which you, for example, if you have two neurons here of course you don't need that, because you already have a two dimensional space and you can see that. But let's say that you, in order to get a certain accuracy, you have to employ five 10 hidden units and these might be much small 10 units might be still much smaller than what you provide the network with, but it is difficult to visualize things in a 10 dimensional space. So what you want to do is to provide an even lower dimensional representation through sketch map that minimizes the quantity that that is the, the distance between points in the two different spaces. So that points that that are close to each other in the five dimensional space will stay close to each other in the two dimensional space. So they are far, they will stay far and then you will create maps that are easier to be easy to visualize that, and, and of course, it might happen that certain features are overlapped because you are effectively projecting the five or 10 dimensional whatever dimensional space onto a two dimensional one. But of course from the point of view of interpretation. It's something necessary in order to, in order to make sense from a human point of view of the data that you get. Okay, last but not least, in the context of machine learning. We need to talk about folding, because folding is one of the main problems that we have in protein, in protein physics that is, if I give you the sequence of the, of the protein can you predict the structure in which it will go? It's a very difficult problem. And typically what we do is to start from the sequence to construct the system in a to set up the protein in a molecular dynamic simulation in a non rolled fashion, then we let it go we let it let the simulation go we let the protein fold. And eventually, we get the native structure. This is the hope. Of course, it's very hard to get that, especially if you perform a molecular dynamic simulation at the a lot on level. Because the system is very large, but this can be very large but this is not really the problem the real problem is that you have to simulate for times that are extraordinarily long with the aspect of what we do, typically. And Anton can manage to fold relatively small proteins. But again, this is the tip of the iceberg, we want to do that, if possible, if not on a desktop computer on a relatively small cluster, this is important for important for pharmaceutical reasons for figuring out diseases and properties of biological systems and so on and so forth. So of course we would like to rely on machine learning in order to associate the sequence to the structure without having to carry out the molecular dynamic simulation but just jump in the whole process and getting to the final result. I would say this is essentially a black box in the sense that it is very, it is essentially useless, I would say, essentially because we never know but there's essentially useless to figure out the process that takes you towards the folding up to certain but but at least if you know the sequence of a new protein that has just been discovered, you can, in a fraction of a minute, get the a very good guess for the structure, because the network. And in particular the network alpha fold has been trained in order to associate the sequence of a protein to what not directly to the structure, but to the contact maps. So this is an example of the protein that is so that the protein is supposed to have in the native structure. This is essentially a matrix that tells you what is the probability of having two amino acids in coach weapons in in a certain proximity in the native structure. This is used as a bias for a steer the molecular dynamic simulation, in which you let the protein collapse. So as to minimize the distances according to the matrix that has been produced by alpha fold, and this is what you get to start from a certain position, and in a sort of steeper steeper the sense manner, you get pulled toward the, the minimum of the deviation between the instantaneous contact map and the one that you need to have in order to to be compliant with the results of alpha fold. And I said that it is partially useful from our users depends for the, from the point of view of folding is that this is not a realistic holding process. This is just a brute force steeper the sense to to force the chain into the structure. So these per se can provide some information but it about the pathway but it is very unlikely that it is realistic. And much better thing that you can do is that once you have the native structure or a very good guess for that, you can perform and hence sampling or steer the molecular dynamic simulations that based on the knowledge of the native structure will push will the protein to reach the native structure across a pathway that is that is more realistic that is physically sound and biologically sound. But since you know ready where you want to get, you can do that in a much faster manner, by means of an end sampling approaches. Okay, so in the last 25 minutes, I would like to give you a couple of examples of the kind of work that we in our group carry out in the, in the analysis of data that come from molecular dynamic simulations with methods that can be ascribed to the very broad. Yes, there is a question. Yeah, I wanted to ask if the dynamics of the protein folding process are specific for specific sequences or can there be different ways to reach the same ultimate structure. You mean the parameters like the parameters of the network. The process of folding that you're in. Let's say in the real thing in the yes, yes. Well, the. That's a tricky question that would require a tricky answer. In general, you can say that the process is the same, let's say the physics is the same. So, atoms interact, always in the same one. But the, the way a protein falls into its native structure is strongly dependent on the, on the native structure, and the pathway that defaults that it follows is specific to that of course you can have classes of proteins that are very similar in structure that most likely follow pathways that are very similar. A counter example to that is not that proteins that is proteins that have a not in them. Let's see if I can manage. Yes. So, if you have a not in the native conformation actually as the, as the protein that I was showing in this here. So this is a not the protein. This is a protein that has a backbone that forms and not. You can have proteins that have very similar structures but one is noted and it has a noted backbone. And the other one is not, you would, you would think that based on the strong similarities between the native structures, the folding process might be the same, but it is it cannot be, because in order to form a knot. So if you have a specific pathway, or much, we have a much more restricted series of pathways that you can follow with respect to the case in which you do not have enough. So this is a counter example of that, but typically if you do not have topological entitlements. And you have structures that are similar. Then it is likely that the folding pathway will be similar. And then if you have different proteins with different native structures or even different lengths, for example, the pathway can be arbitrarily different, because it depends, it really depends on the specificities of the protein. So, as a general thing, it is different. But of course it is driven by the same fundamental rules that is electrostatic interaction, hydrophobic interactions and so on and so forth. So one specific sequence should follow one specific pathway, if the structure is the same. And the pathway can be very complicated and can be branching different pathways, so you can have different intermediate states and so on and so forth. Eventually, a given sequence holds in a given structure as the risk, because there are of course exceptions like intrinsically dissolved the proteins that do not have a single native structure, a jump from one to the other in a very simple manner. But for a global approach is with one single well-defined native structure, you have that the sequences associated to a native structure, you can have slightly different sequences with very similar native structures, and so on and so forth. So depending on the sequence similarity, you can have structures that are close, close, close, and then they are not close anymore. But it depends on how close the sequences are. Okay, and the process should also depend on the solvent, nature of the solvent. It depends a lot on the thermo dynamical environment of the system in general. So temperature, for example, as well as the, the presence of solvents and cos solutes and cosolutes, the presence of other molecules. What we see in simulations is the life of a protein in absolute isolation with a few ions maybe what happens in a cell is that the folding process is hampered or helped by the presence of a huge number of other things that crowd. So the environment, and this might even be helpful for proteins to fold because they are pushed toward towards more compact configurations. Specific proteins need help in folding need to be shielded by the surrounding. And because of that you have chaperones that that eat up the protein in a cage that is in a box where the protein falls and then it is expelled. There's a large variety of strategies, but the environment definitely plays a role. So the kind of folding pathways that we observe in microdynamic simulations always have to be taken with a grain of salt, because, because we know that these conditions are not realistic with what happens in vivo, maybe in vitro might be closer. But again, we have to be aware of the immense number of limitations and approximations that our methods and our models have. And one more question is, so here we are dealing with having a sequence and from there going to the final structure, but the intermediate process, I'm thinking of, if we even can decode it to a great degree of accuracy. What practical use can we have of that information is like, I'm speaking of the practical applications of the folding process, the folding process, the correct one, or the one to get in. Say we get it to a great degree of accuracy, like, in whatever way. We can simulate the process. Yeah. Okay, well it is, it is absolutely key to understand how the protein reaches the biological biologically active state and because of that, it provides information. Okay, from the practical point of view I'm saying one example is that you can interfere with the folding. Because you have intermediate states that can be hampered with, you can design proteins if you know how it falls you can also design a specific sequence that follows a particular pathway in order to get the particular structure that you want. And in general the folding process is a is a brilliant example of complex system a complex process and an emergent property because you have interactions that are very simple. And we know how the physics at that level works, but it is very hard to to foresee from there, what will be the emergent structure that comes out of that. And understand the better we understand the folding process the better we can, we can perform this bridge between the basic information that is present in the sequence, and the result that we get as an emergent project as a collective property of the of how the system works all together. Thank you. Okay. So in the last 15 minutes, I will try to speed up to showing a couple of things so I will be very qualitative and of course you are more than welcome to ask questions after maybe during lunch. The goals that we that as many others we would like to do achieve in this context are to figure out how proteins work based on their physical properties and perform and carrying out a modeling of the systems that is informed by the physical system itself, like for example, constructing models that are course where the system can be represented in a course manner, and more accurate when you need to have a certain accuracy. You know that to do that, we need to, we need to simplify, but simplifying is a very complicated, very complicated procedure is a very complicated task, because there is no straightforward recipe to simplify. What we're going to do is actually to figure out a sufficiently straight forward and general strategy to simplify that, as I will mention can be applied in different contexts. I will start with the application of a methodology that actually originated from this place that is the framework of resolution and relevance that has been developed, mainly by my theory and coworkers. So this method is a sort of, we can say that, hoping that material doesn't get offended as a, it can be seen as the as a measurement of the information content of the system. This is what the entropy is do that is to based on a certain empirical probability distribution to figure out what is the information content within that info that probability distribution. We have two probability distributions. One is the probability distribution of the configurations of the system so if we have a certain number of configurations of the system, we label them based on some label labels s. And we group them together, we produce empirical probabilities that are essentially the, the relative population of each cluster labeled with a certain label, and you can compute the entropy of this description of the system that shows you how much detail we have in a particular description. Then we can compute the relevance, the relevance is a probability is a is the entropy associated to the probability distribution of the frequency of the sizes of the clusters that is the probability that this labeling induces a clustering where you have a certain probability of clusters of a size K, where you have K instances within the. Sorry, this is the probability of having a certain number of clusters. So this is, if you want, related to the not relative number of clusters that contain K instances k configurations. This can be related to the amount of important information that you have in the system. These two things are not the same. You can modulate this thing in a continuous manner. For example, if you have a very stupid labeling that discriminates just two kinds of configurations. Then you have only two states of the system and this corresponds to having a very low resolution with which you depict the system. If you have a super fine accurate labeling of your configurations, so accurate that you can actually discriminate any configuration from any other. The empirical probability that you get is one over the number of configurations you have. This is the highest level of detail with which you can describe the data set you have, but it is not informative at all. In the two quantities such that the relevance is actually the resolution minus a quantity that can be seen as a noise. If you perform different kinds of labeling what you get is a plot that looks like that you have that at the beginning the noise is very small at the beginning meaning that you have. The probability of you group together frames group together configurations of the system in in very few very large clusters. And at this level of resolution, it is very hard to have different clusters that have the same number of configurations within. And because of that, you, you can. You have very much relevance and very small noise. It happens, I will not go into the details but it happens that you have a very small noise. And because of this, as you grow in resolution the relevance does the same linearly, but at some point noise kicks in, and you start to have a deviation between the two up to the point that you have a maximum. And if you continue, you eventually drop to zero. And this makes sense because if this is the relevant information about the system, the important information. You have a description that is as accurate as possible, and you distinguish all configurations the amount of useful information in that description is essentially zero. Same as if you have a description in which you take all the configurations and you put them all together on the same label in between you have a soft spot where you have a high relevance for an intermediate value value of resolution. You can do the simulations of a protein, you can perform this operation of labeling many of a clustering of the conformations that we explore that are close to each other from a structural point of view. And you can do that, changing the level of tolerance with which you put together configurations, you can be very easy going you put together frames that have configurations that are pretty much similar but not that much. Now, you can demand that if you, if two configurations are even slightly different, they will be considered as different. So you can span an entire range. And what you get for different numbers of of atoms that to employ to cluster configurations together, you obtain a curve like this. What you see here is actually the approximate the resolution in that you have a certain tolerance for grouping configurations together, but you do not use the entirety of the system you use a variable number of specific items to group together the configurations based on them. So if you have a very large number it's as if you were grouping configurations, based on the entire structure, if you have very few atoms, if you consider very few atoms, you're grouping together configurations based on a couple of atoms and essentially, if you have to, it's only the distance between them. You can see that if you do this operation for various numbers of atoms, and various choices of atoms at a given number of items that you keep, you have curves that form that have this bell shaped bell shape. And of course for a given value of resolution you can have the different values of the relevance, as well as the number of atoms because you can perform choices that are more or less informative. So here's how to figure out what is the most appropriate number of atoms to retain in order to provide a course green description of the of the system. Well, if you use this number, you have too many atoms of course because you have a two detailed representation. Same as you go here you have a two course description and also in this case, it is not informative as well. So here's a particular point that is this where you have the slope of the core curve that on average is equal to minus one. This means that if you start decreasing the resolution at each step in which to decrease the resolution that say you decrease the resolution by an amount one. You increase the relevance by an amount that is larger than one. At this point, you have exactly the same amount of gain in relevance, as you have a loss in resolution. So you have that up to this point, you gain more than what you lose. If you continue you might get to higher values for the relevance, but you gain less relevance than the amount of resolution that you lose. If you continue of data compression. This is the optimal point, because at that point, that is the last point where you gain more or as much as what you lose in terms of resolution, you would like to have the, the, the more, the most compact representation so the lowest resolution, but you want to get there by gaining as much as possible information and this is the optimal point where you can place yourself. This means a certain number of, of atoms with which you have to describe the system at the course level, but it doesn't tell you which ones you have to pick, or it might, but it is not the most appropriate measure to employ. Turns out there is another kind of information information measure that is more adequate to figure out given a certain number of atoms to retain in the description. So you have to distribute them across the structure to get the most informative representation. And this measure is the mapping entropy. That is a cool buck Leibler. Distance could be clever divergence between two probability distributions. One probability distribution is the probability of sampling a certain configuration. So, if you want it is the old atom probability distribution. The distance from that distribution that is your reference so this is the one that you have in the outside of the algorithm that you average over. But here you have a probability distribution that is reconstructed based on what you have at the course grand level. That is to say this probability distribution is obtained by computing what is the probability that a certain high resolution configuration. This function belongs to the cluster of configurations that are at a certain that at, at the cosmic level are the ones that map onto which the high resolution configuration maps. This was horrendously said so I will try again. This function takes the high resolution configuration and provides the resulting course going configuration. This is the probability of sampling a given course grain configuration. This is the probability that a certain high resolution configuration maps on to. This is the probability of sampling a high resolution configuration that maps on to the cosmic configuration that you provide as an input to this probability here. This quantity here normalizes with respect to how many configurations you have at the course going level that at the lot on level that map on to that particular course going configuration there. So in a sense what you're doing here is to flatten the probability distribution at the high resolution level, assigning to all all that on configurations that map on to the same course going configuration. The average probability among them. Trying to use the fact that the picture is worth a thousand word. This is our mapping procedure that is out of a high resolution representation. We provide a filter that is we retain just a subset of the atoms. Then we reconstruct our high resolution image at the old lot on level by assigning to all the pixels that we have discarded. This is one of the one that we have retained is actually the average among a subset of configurations but let's say that this is this is the spirit of the idea. So you reconstruct this image and the objective of this procedure is to figure out what is the, the choice of those particular pixels to retain that minimizes the distance the discrepancy between these low resolution and the high resolution representation. This is a fairly intelligent choice, you could do a horrible choice like retaining all the, all the pixels at the top of the image. If you were to reconstruct the image you would assign to all the pixels of the image, essentially a blue color, but this will have a terrible distance from the reference image. If you have a uniform selection, you can reconstruct something that is certainly not as beautiful as this, but it gives you an idea of a mountain with something dark that might be interpreted as a tree, and the blue sky above. So this is this procedure is carried out it iterating over different choices of the outcomes that you retain in order to figure out that particular selection that produces. A course grain representation across an image that has the smallest deviation from the high reference one, and therefore the smallest mapping entropy. Why do we want to do that, we want to do that because in this manner, we obtain course grain pictures cause when representations of the system in terms of a few atoms or the small number of atoms that corresponds to. That particular picture, knowing which you know essentially anything about the system. The reason is that this particular subset is the one that constraints optimally the structure of the rest of the system. So if you know where, which these items are, and you know how their, their arrangement is in space, you can essentially determine anything else about the structure of the protein based on them. And it turns out that these atoms that emerge as particularly important also are those ones are among those you find those ones that are important from a biological point of view. So as a first thing from a procedural point of view you do not figure out the particular selection of atoms you actually carry out a large number of optimizations, because the, the mapping entropy landscape in the space of selections of the system is terribly rugged. So you perform a number of simulated annealing procedures to find the local minimum. And then you obtain pictures like this where you have the probability for a given atom to belong to one of the solutions of the mapping entropy minimization protocol. So you essentially have a likelihood of a high of an atom to be part of a configuration of minimum for the mapping entropy. And I believe that the, those atoms that have a high value of the mapping and sorry, they have a high value of the probability and therefore they typically appear in a solution of the mapping entropy minimization protocol. So that carry out an important biological function in that, for example in this case of this protein are the ones that interact with the natural substrate or this protein. This protein is is poison. It's the active molecule, the active protein of them, or the venom of a scorpion. And what this protein does is to clog an ion channel on the surface of the cell and prevent ions to go through the ion channel because of that the cell dies. In order to do that this protein has to interact with the surface of the ion channel, and it does so precisely through these atoms that are fairly exposed and are identified with a higher probability to belong to those configurations that minimize the mapping entropy. This is remarkable because we provided in the simulation no information whatsoever about what kind of interactions this protein has to have with the substrate, we didn't have the substrate in the, in the simulation, but it turns out that this kind of transformation is intrinsically present in the structure and the in the energetic, energetics of the other behavior of the system and the mapping entropy protocol can manage to figure out this, this procedure. Now of course the minimization protocol of them up. Yeah question. How to optimize. Exactly. Thank you because this actually goes in the direction of the next slide. The optimization is done selecting in a binary fashion, a subset of the atoms to retain to have zeros or one, you fix the number of ones that you have because this is the number of items that you want to and you, you shuffle around the configuration so if you, if you have a certain selection of atoms at the next step you propose to discard a certain atom and instead keep another one. Then you, or the space of the mapping. This, this is an optimization. Yeah, I repeat the question. So, actually, if you can repeat the question because I didn't get it. So you're optimizing over vector or discrete vector zero one. Yes, but can you optimize over space of functions and in general like your parameterize and in some way and then you optimize over. Well, the, the mappings that you that we employ our decimation mappings in the sense that we either retain or discard specific degrees of freedom that are already there. So when you do coarse graining, when you do coarse graining, you take a bunch of atoms, and you map them onto the center of mass of the group of atoms that you lump together in a certain in a certain beat. We could perform an optimization in terms of the assignment of atoms to certain groups and also modulate the coefficients of the mapping onto the particular course insight. Let's say you go from our smaller to be garb with regard with the neural network and then you optimize or the parameters of the neural network such that you minimize mapping. On the one hand, we, okay. Before going to the mapping entropy we could do something like that but this would increase, normally, enormously the configuration of space, because we would have a continuous range of several parameters with several constraints because you have to, to keep the mechanical properties. And this will be computationally very hard because at each step you propose a change in the mapping, and you have to recompute the mapping entropy. But precisely at this point, the machine learning kicks in, and it can be employed to skip the optimization procedure and the simulation procedure so that you do not have to perform the simulation or the system to begin with because of all our optimizations are based on the calculation of the mapping entropy that is done through the configurations sampled in an MD. But we can skip that and train a neural network to take as an input, the whole structure. And for a given selection of the atoms that you retain. So for a given mapping that is part of the input to the neural network. And the output value of the mapping entropy so that we can perform the optimization, still on the discrete set of configurations of atoms whether you retain them or not. And, but the calculation of the mapping entropy is instantaneous with respect to what we had before. So without this work for for two different proteins we trained graph convolutional networks on on two different systems, we found very, very consistent data so we get we got a very good correlation between the, not only within the training validation set and we have that values of mapping entropy that we get are very close to the ones that we expect from the optimization of the mapping entropy. And if you put together the neural network and the, and the fact that we have used to perform this calculation GPUs with respect to CPUs. So that speed up factor of 10 to the fifth. So it is way faster. Of course the problem that we that we have up to this point is that you have to train on a given system for the given system so in a sense it's not really a great example of generalization. But of course, this is just a, it's a first preliminary work in a sense. And it shows that a lot can be done in the direction making use of the machine learning in this context as well. Okay now in the interest of time I have to cut the very last topic. So I will conclude here. If there are more questions I will I'm available for for lunchtime and getting to this point I thank you very much for your attention. Are there any questions from the audience. Okay anyone online. No I don't see any item in the chat. Okay. Thank you. And, okay so with that we can conclude our morning session. We're going to have lunch break and reconvene at 2pm for the last lecture by Julia Carvanya. Thank you.