 Hvala, sem Satilio Vaggio, srednji profesor at the University of Cagliac in Italija. In tudi, as a part of the buyer's sales under the school, we will see how to use an asset sampling method to improve the silico description of molecular recognition processes. And in particular, we will see how these methods can be useful in improving the predictive power of molecular blocking. At the end, we will also see an application of machine learning approaches to improve the outcome of a classification of allosteric binders. So, here is a list, first of all, of some very useful references, which you can have a look at if you want to keep updated with the challenges, developments and everything that concerns with latest things about computational description of molecular recognition processes. The outlining of my talk is as following. I will show here the outline of both the parts of the talk. First, I will introduce molecular recognition and the tier behind molecular recognition that constitute the background for the methods that have been developed so far. And in particular, then I will discuss molecular blocking, which is the main subject of the talk, and then answer the sample techniques in relation to the improvement of the molecular docking outcomes. As I said, at the end, I will also discuss how machine learning techniques can also be applied in this context. So, about molecular recognition, what do we mean by molecular recognition? We mean to describe all the process that leads to the interaction between two or more biological molecules to the formation of a complex between them, so to the binding event. And it could be useful to explain molecular recognition when using a ligand enzyme systems, okay? So, if you suppose you have a target protein, you can describe molecular recognition by thinking of an inhibitor, for instance, that binds to this target protein to this enzyme. And the reason why this ligand will bind to that protein and all the others will not or will bind only constantly is that the inhibitor has the highest field for that specific salt on the target. So, only specific ligands can be recognized by the target in general, and this specificity is driven by the affinity of the ligand towards the target itself. You can imagine that molecular recognition is an ubiquitous process in cellular life, and so it's not difficult to understand that it's very useful to know the molecular details of behind molecular recognition, not only for basic research to understand, better biology at the molecular level, but also for drug design. Indeed, if you understand the mechanistic details behind molecular recognition, you can better understand how to tune the properties of ligands in order to maximize or to produce the interaction with some biological targets. In other words, this means that molecular recognition, because being and will always be a target for that discovery. So, here let's start to discuss the theory behind molecular recognition, focusing on the systems that will be the subject of metal. So, interactions between proteins and ligands. So, you see here there are three main theories that were derived in the years for explaining molecular recognition, and we will use all these pictures in which we have the ligand in black in the protein before binding, and then the formation of the complex. We will discuss three main theories, the lock and key theory, the induced field theory, and the conformational selection theory. So, in the first one, the lock and key theory that was introduced by Fisher in 1894. Basically, you can see from the image that the ligand and the protein interact between themselves as rigid bodies. So, the ligand can be considered as the key, and the binding site on the protein can be considered as the lock. Only the correct key will insert into the lock. The historical merit of the lock and key model has been to introduce a concept that is still very useful in the last 50 years until the interactions between macromolecals were better understood in terms of flexibility, as we will see later. So, I was saying the historical merit is that it produces the concept of shape complementarity by which only the correct key sites ligand can be set into the binding pocket of the protein. And in this theory, the binding, as you can see from the energy profiles that pop it up in the slide, is always associated with the stabilization of the bound conformation with respect to the unbound one. Clearly, the lock and key model cannot explain several bindings events, because if the shape does not match each other before the binding event, clearly, this model will fail, and describe it back. And the first model that was introduced to overcome the limitations of the lock and key model is the introduced fit one that was introduced by Koshlan in 1958. And in this, again, let's say, we can exploit the theory of enzymes in order to understand how this model was developed. So, the basic observations that led to this model were two. The first is that in order to, let's say, catalyza reaction, the residues in the binding pocket of an enzyme need to assume the proper orientation. And the second observation was that such rearrangements of the binding pocket in a protein can be induced by the binding of substrate of an enzyme. And the non-substrate of an enzyme will not induce such conformational changes. So, in short, the ligand will induce conformational change in the many side, and in this way, we need only one protein conformation, you see A in the image, in the slide, of the protein in order to explain this model. So, we have a conformation A of the protein before binding, and we have a conformation B of the protein after binding. So, we have a novel conformational state that will be generated and come into the energetics of the process. You can see the difference between the introduced fit and the looking T model in here because we have a protein conformation A that is unbound, and that protein conformation B that is induced by the binding event and that is more stable than the conformation. So, again, we will have a stabilization but the generation of conformation of the protein. Induced fit can account for minor to medium sites conformational changes in ligand binding, but, of course, it was shown soon after the proposal of this myth that it cannot explain all the binding events. And then another theory was proposed in the sixties that derives from the fluctuation fit concept by Straub and Zaborski and from the allosteric model developed by model Riemann and Schange. So, it derives also from the so-called free energy state theory of proteins and I will see better what do we mean by this. So, in the fluctuation fit concept one postulates that proteins are dynamic objects that exist as an ensemble conformation at equilibrium. And the allosteric model also adds to this the fact that the distribution of the population of each conformations are tuned by the interaction of the protein with the solvent with other molecules that are embedded in the solvent and also they depend from other thermodynamic parameters such as pressure, temperature, and so on. And so, the key concept is that binding produce a shift of the population distribution. So, to summarize, we have an ensemble of conformations each with a probability and the binding will change the probability to find the protein in each of these conformations. The key point here for the conformations selection theory to really work in describing monocular recognition event is that among the conformations that are sampled by the proteins are also the so-called bound-like or hollow-like conformations that are or ligand bound states that are those assumed by the protein by binding of a ligand. So, you can see here that in this case we have two conformations in order to make things simple but you can imagine that we will have many more. In reality, A and B that are sampled in the absence of ligand and let's suppose that one of these conformations has a good affinity for the ligand then the ligand will bind to this conformation and upon selection of the most suitable unbound conformation of the receptor that could be even, as you can see here, less stable in the absence of the ligand then the conformation A then this binding will cause a shift of the distributions and the conformations associated with state B will become more prevalent. The population of the distribution will cause then a change in the, let's say, stability associated with state B and the underlying free energy landscape that will be reshaped at the moment. Now, as you can imagine that the two last models intersect fit and conformational selections where user can explain many binding events and then it's clear that many people try to derive a unified view of the two models in order for them to change to be antagonistic and actually these results lead to theories that englob all the two models and they can be couplet in particular in a serial or parallel way. An example of coupling of the modes of the different theories of binding in a serial way is the following. So first you imagine a binding event as non-optimal one. So unless you have a perfect shape complementarity between a ligand and a protein, you can imagine that at the beginning there will be a non-optimal binding that will not induce large conformational changes in the protein and can be dominated by the loss of solvent from both surfaces of the ligand and of the protein. And then this can be described approximately as a with a lock and key theory of molecular recognition. But then what happens? It can happen that in order to optimize the contacts and to optimize the sort of interaction, the residues and also groups of the ligand can rearrange to form stronger molecular interactions and this can be seen as a kind of induced fit event. This binding will cause also a shift in the population. So it will change the free energy barriers associated to the transition between unbound and bound conformation. And then this shift of the distribution is typical of the conformational selection theory. Okay? So, now that we have seen briefly, of course, the models that have been developed to explain the molecular recognition events, let's introduce the so-called molecular docking meter. We have seen already this in the lectures by a doc. I will be brief here just to recall the main features of the meter. So in molecular docking, that is by far the, I guess, the most widely used meter to distribute molecular recognition in Siriko, you try to answer the following question. Given the structure of a protein and of a ligand, okay, and given a set of parameters that describe the interactions in terms of, of course, a little static, ligand-bond interactions, H-bond, and so on, will they form a complex? And if they will form a complex, what will be the structure of the complex and in particular the most stable one? So, to answer this question, in docking we generate a set of so-called binding doses, as you can see here, that are conformations of the ligand interacting with the target. To each of these conformations we assign a score that ideally should, let's say, be proportional to the frequency of this conformation occurring in real settings. And a reliable docking algorithm should not only generate such conformations, but among the many conformations that will be generated, these ones should be granted as the top ones, otherwise they will be lost in the sea of conformations generated and they will never record them. Now, you can imagine that this is a very difficult process because in addition to the sixth rototranslation and degrees of freedom that you have to explore in order to find the arrangement into rigid bodies, you have all in turn degrees of freedom to take into account. And these are proportional to the number of atoms that are making the ligand in the protein and so you can imagine that the problem is extremely challenging from a computational point of view. Several algorithms were implemented across the years to cope with this issue, the so-called flexibility issues, and they can be roughly classified in terms of the sides and the way they account for flexibility. So, from a partial way to account for flexibility to a full way to account for flexibility and from an implicit way to account for flexibility to an explicit way. And then you have the four, let's say, men, main vitals shown in the slide, soft docking in which you account for partial flexibility only of a region of the protein, okay, in an implicit way, for instance, by reducing the number of valves of the ligand and of the protein to selective docking in which you account for partial flexibility only of the selected regions, for instance, of the receptor in an explicit way by using rotander ligand, for instance, and so on. Then the next step is to account for full flexibility of the world protein in an implicit way, like in assembled docking calculations or to account for full flexibility in an explicit way, like in so-called on-the-fly docking details. Among these metals, one that gave momentum in the last decade is assembled docking. And in assembled docking calculations, to summarize, you use a set of different conformations of your target that are shown here in different colors in order to take into account for the plasticity of the binding site. As you can see on the right side of the slide, the conformations, the shapes of the binding site are different in each conformation. So one of these shapes within, of course, you can understand the method exploit the conformational selection theory. You can exploit conformations that are able to bind your ligand. Now, clearly the performance of an assembled docking will strongly depend on the quality, if you want to call it in this way, of the ensemble of conformations. And by this, I mean that if you don't have in the ensemble of even thousand or more conformations that you use, any conformation that has the right shape of the binding site to accommodate the ligand, it will be more difficult for you to find a true binding pose. So it has been shown that the experimental conformations are generally performing much better than those generated in Sirico. Although it's still desirable to obtain conformations that are different from experimental ones for several reasons. First, the site, let's say, the diversity in the PDB data bank is still low compared to the sites of the Dragon Ball Gina. Second, we have a kind of bias in the PDB data bank to somewhat rigid structures because most of the structure were derived by X-ray, still derived by X-ray methods. And there are many other problems, I will not cover all of them, for which is still useful to derive conformations by using computational methods. Now, another problem is that suppose you want to use monocular dynamics to derive these conformations, you have the computational problem that you see the ligand binding events here are occurring on a typical times that are larger than microseconds. And microseconds are still not easy to obtain for all the systems in all the labs. So there is a problem and that is in particular related to the fact of crossing. You see the barrier from the unbound conformation that is typically the one that you consider in most cases when you don't know any information about the binding of the ligand you are interested in to the bound conformation. So if you have a high free energy barrier between the two and you start your simulation from here, it will be very difficult to find conformations that are relevant. Just to give an example, suppose you have a barrier of technical permor, it will take years using standard molecular and simulations with the current settings to sample hollow like conformations of interest. And the problem is not only related to large conformational changes indeed. You can have barriers of a few technical permor that are related to local site-genery arrangements such as the one you see here, a tidalzine that for instance change the conformation in order to accommodate the ligand to in a very large hinge bending motions for instance, that you can see here, okay, from the grey structure to the green one that occur during the binding of a ligand to a protein. Negleting the flexibility can lead to very poor results. And so, let's say, really a lot of metals have been developed to let's say, enhance the conformational sampling of both ligands and in particular proteins for several reasons among and among these reasons there is also to use these conformations to improve the description of molecular recognition events. We will see in this lecture applications of metals from three different classes to obtain better results in docking and in the quad-screen. The first example concern the so-called replica exchange molecular dynamics method that is reported in the references that you can read in this slide. And in this method basically in order to overcome the limitations of standard molecular dynamics you perform multiple simulations in parallel of your system. These multiple simulations are often called replicas from which you have the so-called the replicas change labelling of the same system. You only change the temperature of the replicas. And by performing exchanges between the replica at fixed time intervals you can swap the temperatures or equivalently swap the coordinates between two replicas. You can see here that for instance you are the first step you exchange the light blue and the violet conformations and then you change also these two conformations and so on. And so you have the temperature assumed by the system at the first step is very different from the temperature which the simulation is taken at the end. Okay? Now in a typical setup you are interested in sampling the conformations at room temperature. And so your replica of interest and let's call it in this way is the one at the lowest temperature and you set a number of replicas and a distribution of the temperatures that will depend on the system size and on the, let's say distance in the temperature space between the lowest and the highest replica. Of course you generally would like to use very high temperatures in order to facilitate the crossing of conformational values. So in this way by generating a lot of replicas you basically and of course if you make sure that you have a good probability of exchange between two adjacent replicas in general one should assume probability larger than 0.3. You have a continuous flow you can see for instance the black curve here a continuous flow of replicas in between different temperatures. For instance in this one you can see that you go down in the temperature here you go up and so on. And in this way this flow produce a good mixing between the conformations that will enhance the sampling of structures that you will never sample if you do just one simulation at one temperature. Now of course you have to think how to exchange in order to produce a canonical sampling of your states. And this can be done by considering that you the probability of your system to be in a state x will depend on the potential energy associated with that state. And so if you have a quasi normal distribution you will have a probability that will depend on the exponential of the negative potential energy evaluated by KT. So if you let's say build your initial distribution in a way that you will ensure overlap between the distributions of adjacent replicas you will have a good fraction of exchanges between adjacent replicas and so you will have a good mixing. And you will generate let's say a good variety of conformations. You can imagine that the criterion that one can use to accept a swap between two replicas is the metropolis like one. So if you go from one conformation that is associated to a lower value of the potential energy you will accept the exchange. If not, you will compare the exponential factor to a random number and you will accept the exchange by comparing the two. So here is what I said in formula the metropolis criterion to determine the exchange probability to go from x to x prime. And the value of delta is given by the difference between the better factors and the potential energy of the two systems. And if t is larger than t1, in the case as I said x prime is lower than x, the exchange will be automatically accepted. Otherwise, it will be compared one. In this way, the system that finds themselves at higher temperature can swap to lower temperatures leading the conformation that is sampled to the replica of interest. As I said, despite, of course, the trajectories are discontinuous in the temperature using this criterion will ensure Boltzmann standard. And among the many applications of replica exchange molecular dynamics I found very interesting one by Osbutorpe and coworkers that used molecular exchange molecular dynamics to generate a sample of conformations to improve the virtual screening of compounds on three very challenging targets HIV-1 protease, siplin-bound dependent kanihz and androgyne receptor. So, in particular, the authors compare the performance of a sample of conformations generated by standard molecular dynamics to those generated by Remdi, to those generated by peaking conformations from the epilimidatamban. So, experimentally. Here are the details of the methodology. I will not discuss the details of the methodology, but I will show the results. So, the first thing is how to, let's say, the very interesting part, in my opinion, in this work is also the way the authors selected different shapes of the mine site. And then there are the methods by which you use the so-called pairwise normalized volume or overlaps of the mine site. To describe the concept, suppose you have two conformations of the mine site, A and B, and the volume associated to the two conformations are V, A, and B. Then you can calculate the volume overlap between the two. And you divide the volume overlap by the sum of the two volumes and you define the normalized volume or overlap according to the formula that you can see here. In this way, if you have a value of one, you have a perfect overlap. So, the shape and the volume of the mine site will be the same. Otherwise, if you have a value of zero, you will have no overlap at all. So, if you perform a cluster analysis on this parameter, you are able to pick up, let's say, to describe with a very long number of conformations a large variation of the mine site. So, in particular, the authors, using this approach, were able to select only four clusters, only four structures for each ensemble and each system. And the first result of the work is that a replica exchange improves the sampling of the mine site compared to standard MD in all cases and that was expected. And you can see here why, comparing the distance, for instance, between two residues that they selected and where to, let's say, as a marker for the different shape of the mine site. You can see that the green curve also oscillates much more than the red one. Also, they found, of course, that this improvement was consistent with respect to the conformations generated by standard MD but with respect to the conformations generated by experiment, experiment like X-ray, the results were system dependent. In particular, for the androgen receptor, they found that the X-ray clusters feature the highest shape diversity. You can see here for the androgen receptor. For instance, that you have many different shapes and in particular, the MD and the MD were unable to find this loop of the mine site that was found only by the X-ray. But in the case of HIV-1, for instance, their replicas changed and they produced the largest variety of conformations of the mine site. So it performed better than the sample generated experimentally. To, let's say, investigate how generating so many different conformations of the mine site reflect on the performance of docking and virtual screening. They use the famous dataset of music for the coils that is made by active molecules and the coils in order to see if the structures were able to, let's say, improve the description of true values. In order to do this, they used LiDAR for docking and scoring and they evaluated the performance of the virtual screening through the enrichment of the database. The enrichment factor is described by this formula and basically it tells you how many true structures are collected in virtual screening as top rank poses. You can see better in this concept here. You have on the X-axis the percentage of screen and molecules in your database and here you have the percentage of actives. So in the ideal setting in which you have 10% of actives in the database, you will record all the actives as in the top 10% of the compound. So they will be the first one to be collected. In ideal setting with 50% of actives, you will collect them after screening 50% of the database and in the random case you will not have any, let's say, any curve in the sense that you will select randomly actives and decoys. In general settings, let's say, in general terms, the closest this curve is to the corresponding medium one, the better is your method, the performance of your method. Now, in order to, let's say, estimate how an ensemble of conformations performs compared to picking only the best one, the authors propose to compare the enrichment factors for the ensembles to the average of the enrichment factor found using each single structure in the pool of conformations derived from each system and each methodology. So, you can see that the X-ray ensemble performs better than any X-ray structure in all the cases. You can see that the count of active ligands in the ensemble is always larger than the count for the average of the four conformations and you can see these also in the enrichment factors. Also in deep performance quite well for all systems but only for the adorogen receptor it was performing not in the proper way and the worst performance overall was found with the standard energy. So, the authors show that using this kind of methodology improved the outcome of the sequence. The second method I will discuss is the so-called accelerated molecular dynamics that was developed in the lab of Andrew McCammon in 2004. And we will see an application by Morton Bonatti from the University of Milan that was also published very recently. So, what you do in accelerated molecular dynamics in order to enhance the sample of conformations is that you have done no negative boost, delta V, the dysfunction of the conformation of the system to the potential energy phase. But you only have this boost when the potential energy of your system is lower than a reference energy P. In this way, you do what you see in the picture here. So, suppose this is your potential energy and this is the reference energy E. You have the boost in order to make shallow the profile of the potential energy only in this region. And this translates into this mathematical, let's say, formulas in which you have that the root star is equal to Vr when V is larger than E and is equal to Vr plus the boost potential when the potential is lower than the threshold energy. In this way, you increase the escape of ability from a minimum and you can sample better other conformations. And the evolution is boosted by a nonlinear accelerated accelerating factor that is the one in the formula in which basically you use the so-called boost factor that is the exponential of the boosting potential. So, the simulated time that you will need to sample a certain region of the conformation will be, let's say, estimated by multiplying the time for the boost factor. An important thing is like with repress change you can have a canonical sampling of your system and this means that you can determine, of course, thermodynamics and equilibrium property with a capacity. Now, let's come to the functional form of the boosting potential. You can see in this formula that you have basically the ratio between the difference between the threshold energy and the potential to the square divided by the same quantity plus factor alpha that is called acceleration factor. And you can see here that by setting the values of alpha from zero to larger values you can reshape the landscape the potential energy landscape in different ways. So, if you take alpha equal to zero basically you will generate a flat landscape above the threshold energy. If you use larger and larger factors you will become closer and closer to the real profile. So, the potential is continuous and reproduz the shape of underlying potential even at the high energies. There are two versions of accelerated molecular dynamics that provide two different levels of acceleration of your dynamics. The first one is the so-called dihedral boost accelerated dynamics in which you only accelerate the torsional degrees of freedom of your system. So, you applied a boost to the potential described in the dihedral and the energy is given by this formula in which you have an average potential plus four times the number of values of your system. And in the dual boost protocol you add to the dihedral boost another boost to all the degrees of freedom of your system of interest. So, the averages that you see here can be calculated from short molecular dynamics simulation you take the average of your dihedral potential the average of your total potential and then you can put them in the formulas and in this way you can calculate the energies for the two boosting methods. Mothebonati, as I said, applied accelerated molecular dynamics in order to improve let's say, the description of molecular binding events on a set of proteins among which there is one that has a very large conformational change upon binding of a small molecule in the center of the protein. You can see here, comparing the orange and the cyan conformations that there is a very large change bending motion of this protein. So, they compared conventional molecular dynamics to single boost and double boost accelerated molecular dynamics. And by performing cluster analysis using volumetric criterion like which is similar to the one used by scutal collaborators and by performing document glide they were, let's say, able to improve the molecular book in the description of the binding event. But let's analyze what did they found. So, the first thing to see is that the use of accelerated molecular dynamics improve the sampling of all alike conformations. And you can see here this by comparing the RMST from the complex, from the structure of the protein in the complex in the conventional molecular dynamics you can see that you basically never go below 3M. And in accelerated molecular dynamics with single boost you see that you reach some conformations that RMST has low as 2M. In addition to this, the authors define an absurdity of the ideal angle in order to describe the hinge bending motions and so this is an angle of course this is absurdity of the ideal angle that also defines this movement. And you can see by comparing the conformations sampled in terms of the ideal angle that in the case of conventional dynamics you go in the direction of sampling conformations that are far away from the hollow one while in single boost molecular dynamics you sample conformations that are closer. However, only when using a single boost molecular dynamics suspended by 0.5 microsecond with respect to the initial simulation the authors were able to sample conformations that are that can be called as hollow like one. And the important thing to note is also that the improvement as expected of the conformations of the sampling of hollow like conformations translated in improved docking. And this can be seen in this slide when you compare the docking the docking results and if you take the 10 top clusters and you calculate generated by cellular molecular dynamics and you calculate the RMST of the binding site with respect to the hollow one you see that you have 1, 2, 3 several conformations that have an RMST of the binding site very close to the one of the complex. And this translated into generation of poses that feature an RMST you see in this row that is very low 1.5 conformations from the corresponding complex. So also this method the author shown was able to reproduce hollow like conformations of the protein that were very good in reproducing conformations of the complex itself. Ok, so this is the end of that first part of the method we will see in a while the second part in which we will describe an algorithm that could develop in our lab and an application of molecular processes.