 Hi, everyone. Welcome back to the second part of my talk on advanced methods for computational retrieven drug discovery. The third application we will speak about today is developed in our lab, the University of Kaljeri, in collaboration with Fabio Petrucci, the Subborne University of Paris, and Alexander Bonven at Utrecht University. The method is based on metodynamics, tukaj je tehnika zelova, lego in perinello, v 2000, in vzupil je to počet, tako odnah od njih minima. In včasno, to je vzupil in metodenamik, pa je počet po sestu. Hajistori, priprenen, potenšel, to je zrpunčen od zelovaj kolektivne varjebov, ta metakordinacija, funkčenja kordinacija systemu, ta tačna v koletivarjevolj spes, do konformacije, do počke v tem spesu, ko je vzite v systemu. Vzite vzite v systemu, da vzite vzite s sampljenje konformacije tudi je, da se boljšči, zelo tega izgovala se v termini. Zelo po vzlušenju, bilo muzikati metanemis, s nekaj, odzvalo, da se boljšči, zelo toga izgovala za vzlušenje protikrujev, izgovalo od vzlženja geometrija. Tudi, pozlželi se vzlušenje, zelo, odzvalo, odzvalo rekonitiv. Pozirno, na metanemiskam, is that you don't need any prior knowledge about the conformation of the hollow complex of the hollow protein. There are several flavors of metodynamics, and we use in particular the combination of two of them. The first one is the example of the well-tempered metodynamics, in which you see here what happens relatively to the plane metodynamics, you see in this movie, that basically you feel much less than what you do in standard metodynamics. Why? Because the potential is decreased over time, exponentially, and in particular you reduce the height of the galshons at regions of the collective variable space that you already visited. In this way you can sample for longer time while perturbing much less the system in comparison to what you will do with plane metodynamics. There you see that in the exponential formula for the height of the galshons you have a factor delta t that has the units of a temperature. And this factor, the so-called bias factor, is used to define to which extent we want to, let's say, enhance the sampling in terms of eight of the overall barriers we want to cross. The important thing about the bias factor that is defined as the ratio between delta t and t plus delta t is that even using the well-tempered metodynamics you can extrapolate the free energy of the process you are investigating. This is not of interest to us, but it is relevant for the validation of the method. In addition to this, we will use the so-called bias exchange metodynamics in which you perform different simulations of the same system in parallel, applying a bias on different sets of collective variables. And at regular time intervals you try to exchange two consecutive replicas in the same way we have seen for repeated exchange molecular dynamics and using the same principle. In this way you can enhance better the plane metodynamics, the sampling of the conformational space. So, our approach in a nutshell is represented here. We call it edis that stands for ensemble docking with an asset sampling of the proper shape. And the workflow is this one. First you identify the binding sites on a protein of interest. You can either use experimental information or other methods. There are several servers that find the so-called drangable regions on the surface of proteins that are putative binding site regions. Then we enhance the sampling of different shapes and volumes of these binding sites using a novel set of collective variables that we introduced in edis. And we perform a multi-step cluster analysis in order to derive a set of conformations to be used in ensemble docking calculations. Then once you identify the binding site on your protein that in this picture is the ensemble of yellow spheres, you basically calculate the principal axis of inertia on your site, and these axes define also the so-called principal planes, principal inertia planes. There are three planes orthogonal to each other and orthogonal also to each inertia axis and passing through the geometrical center of the binding site. Once you define these planes across each plane, you can calculate the so-called number of contacts between different groups of atoms because each plane will divide the binding site into groups of atoms, those that are on the left side of the plane in green and those that are on the right side of the plane in orange. You can calculate the number of pseudo-contacts using a sigmoidal function, in which the number of contacts of each atom is calculated with all the atoms of the other group and then you sum over all the atoms of the first group and you obtain a cumulative number of contacts that you can use to enhance the breathing of the binding site. In particular you enhance this breathing across three independent directions, and then you enhance the variation in the shape along all the three directions of the space. In addition to this, we also force the enlightenment or, in this particular way, the collapse of the binding site by acting on the radius of generation of the binding site. Namely, we use a window approach in which we start from the upper structure, record the value of the radius of generation that you can see here, and we put small software strains on the radius of generation in a way that we sample only conformations that are 50% higher and lower than the initial value of the radius. Then, after a while, we select a random conformation that has a value of the radius of generation lower than 5% with respect to the value of the upper structure, and we start another window and so on. In our applications we use four windows that correspond to a collapse of the binding site of 15% in terms of the radius of generation. This value was used for all the systems that I will talk about in a while, despite the different degrees of compaction of the binding site during the binding of the binding site. In particular, as test systems, we use the three proteins for which we have the hollow and upper high-resolution structures and also previous computational work for using different approaches in order to test to compare our approach with previous ones. The first system is the recombinant resin a-hydrolase that, as you can see, from the comparison of the upper and lower structures in green and yellow respectively, bot at the level of global conformational change and local conformational change undergo very minor structural rearrangements. Here is the levand, as you can see in black, and you see that there is only a minor change in a tyrosine on the binding site. This corresponds to a variation in terms of RMC of one extra and in terms of radius of generation of only 1%. The, let's say, a recent work by fordli and collaborators investigated the performance of a recently developed variant of the famous program Autodoc that is called Autodoc for Flexible Receptors in reproducing the poses of a series of systems, including erorizing. And among the features of Autodoc for Flexible Receptors there are the explicit selection of flexible residues and customized scoring function. Now, despite the minor changes and the went by rising upon binding of the ligand, it was one of the few systems for which no native-like pose was found using this new developed version of Autodoc, confirmin' that it's a very difficult target for these programs. The second system is the better-to-possivel transferase that corresponds to uracil dephosphate, and this protein undergoes hinge-mending-like motions upon the bind. As you can see, the binding site closes on the ligand. This is clearly visible also when you compare the movement of single amino acids you can see in the picture here, shown by Magenta Aros. And indeed, we have a change of the binding site, amounted to about almost three amps in terms of RMST between the aporeolo conformation and to about 10% variation in the radius of radiation. Also in this case we have recent computational work for comparison. This work is a paper that was published in 2010 by Seligar and De Groot, and they used an algorithm called the T-concorde to generate all-alike conformations starting from the actual of 10 systems, including the better-to-possivel transferase that they called the Gluco. And they were unable to find for this particular system binding poses featuring an RMST from the experimental structure lower than to angstrom. Actually they found only one pose out of more than, out of 5,000 ones, and this pose was not among the top 100 ones. The last target that we have seen in the previous lecture is the loss binding protein. And this protein undergoes even larger conformational changes upon the binding, as you can see also in this case from the pictures, amounting to a change in the binding site larger than for angstrom and a variation in the radius of radiation of the binding site larger than 55%. Sorry, 25%. Also in this case, as I said, we have a previous computational work for comparing our algorithm and is the work by Motem Bonatti I discussed it in the previous lecture. And now let's come to the simulations we performed with our approach. So first we performed a standard molecular dynamics of the aposystem in order to compare our approach to the standard ensemble docking protocol. Then microseconds. Then we also performed a shorter but still long one microsecond molecular dynamics simulation of the static from the experimental conformation of the complex in order to verify if provided we have a good conformation of the system of the protein we are able to generate nativite polis. Then we performed we applied our method we performed by this change well-tempered molecular dynamics simulations on four quality variables and for each window we performed 400 nanoseconds cumulative simulation that means that each replica was extended from 0 to 100 nanoseconds in each window. And in addition to test the dependency of the results from the docking engine we tested two famous docking programs that differ both in the searching of the nativite algorithm for the nativite polis and in the scoring of the polis themselves. These are hard dock and auto dock. So in terms of sampling of nativite structures let's see how does the standard MED performs. Here we are showing the comparison for beta-gluposil transferase and in gray you see the experimental structure and in red the best structure generated by the standard molecular dynamics simulation. You can see that among the features that are not reproduced by the standard MED is the displacement of this alginin that should fall apart in order to allow the ligand in gray to bind to the pocket. The best structure generated by our method is shown here in green and you can appreciate that we are able to generate a very close conformation to the experimental one. This can be better quantified for all the system in terms of the fraction of conformations that feature low LMSD value at the MED site compared to the all experimental structure. And you can see that only in the case of rising the standard approach performs equally well than it is. But for beta-gluposil transferase and for the loss binding protein the standard approach was unable to generate any single conformation that was as close as two hamstroms in terms of LMSD from the experimental conformation. In particular I would like to draw your attention on the fact that these percentages that we generate with either are retained or even increased in the set of clusters that we generated for the docking products. And here you can see how this better segment reflects in a better docking result because, again, only in the case of rising we were able to find native light poses ranked among the top five for using the clusters derived from standard molecular dynamic simulations. While for all the systems the conformations generated by EDES were able to give native light poses ranked among the top two ones. Ok, so I hope I shown that our method was able to perform very well for the three targets studied and discussed here. And as a last example now I will talk about how to use machine learning algorithms in order to provide a correct classification of allosteric compounds of the heat shock 90 shaperon into activators of the protein or inhibitors. This is a work that was republished a few months ago and was, let's say, led by Alessandro Pandini at Brunei University in London and Giorgio Colombo at Pavia University in Italy. First a few words about allosteric binding that is a very important process in regulating cell life. In allosteric binding a ligand change the functional state of the protein by binding to a site that is not the orthosteric one so the classical site that we can think when we talk about, for instance, enzymes that are activated by cell states. So in response to allosteric binding a protein can change its conformational state and increase or decrease the affinity to another molecule that regulates the function of the allosteric protein. Now it's very complex to, let's say, apply the standard theories to allosteric binding and also the standard methods for several reasons. First it's very difficult to catch the conformational changes that derive from the binding of allosteric ligands. In addition to this the scoring functions are generally not very good in reproducing the affinity of allosteric binders to the site and the same is true also for experimental methods that are generally good in providing information about affinity of orthosteric sites, but they are generally also poorly informative on allosteric. Now all these issues, let's say, sum together to the point that deriving structural activity relationship for allosteric ligands is very difficult. It's much more challenging than for standard binding events. In order to develop an approach to describe to predict allosteric binding one should integrate information not only on the binding event, but also on the conformational dynamics of the system after the binding to allosteric site course and also how this reflects on the biological activity. In particular in changing the functional state how can be done this of course there are several receipts and the authors here propose to use machine learning models that are trained on molecular simulation data in order to predict the functional effect of allosteric binding. To be precise they propose to use docking data and data derived from molecular learning simulations that are machine learning classifier in order to distinguish between activators and inhibitors of the protein. The protein is shown in this slide. I told you that it is the heat shock protein protein 90. It is a very important protein. In the active form it's a dimer and it's a dimer composed of monomers that you see in this slide and the monomers are composed by three domains. The n-terminal domain that is shown in green and this is the place where ATP binding occurs and regulates the function of this protein then there is a middle domain in white and there is c-terminal domain in red. Let's say it has been shown that there is an allosteric site as you can see here the interface between the c-terminal and the middle domain and binding of ligands to this site is able to change the functional state induced is able to change the way ATP is processed then in short it changes the functional state of the protein. Now how the authors wanted to, let's say, develop the algorithm in three steps. They first used molecular dynamic simulations starting from three experimental structures of the protein one in complex with one activator two in complex with two inhibitors and in this way they generated a series of conformations of the protein structural ensemble of the protein and at the same time since they are simulating one activator and two inhibitors, they consider potential allosteric effect of course under the hypothesis that actually has been demonstrated by previous studies that even relatively short molecular dynamic simulations such as the one they performed here that were long a few hundreds of nanoseconds are able to provide to describe local rearrangements that are informative of allosteric movements of the protein. The authors employed cluster analysis on different sets of regions of the protein in order to generate 14 representative protein structures to be used in ensemble docking calculations. The ensemble docking calculations were performed on a set of 133 compounds chosen among those that have demonstrated activity against and in particular there are 49 milliliters and 84 activators. The data obtained from the docking were used to train machine learning algorithm in order to predict the functional effect of each event and in particular three features for every complex first the score of the top pose obtained by docking second the root mean square of the 10 top poses for each event and third the root mean square deviation of the atomic positions obtained from the top 10 poses for each event that should give information about the structural adaptation of the binding site during the docking calculations. Before embarking themselves into the machine learning algorithm applied to classify the elegance the authors investigated the performance of unsupervised approaches such as cluster analysis in distinguishing between the activities of the ligands they consider. They used cluster analysis because a first analysis of the 42 features they calculated revealed as you can see from the graphs here that there was a very good overlap of the features so in other words we cannot identify a single feature that was able to tell us this is a new beta, this is an activator then they performed a cluster analysis and they used two very famous clustering algorithms varying the number of clusters from 2 to 6 and 2 is the most relevant because the classification we are aiming for is between activators and inhibitors so two clusters and the performance was measured in terms of the so-called purity that takes the value of zero if the assignment is random and of one if the assignment is covered for all the ligands the purity they obtained were pretty low lower than 0.7 for two clusters so they met methods and so they decided to apply machine learning algorithm as I said the models were trained on properties that are deriving from different mining conformations and in particular the authors used the so-called holdout method in which you divide set of features and sorry, your set of data into one training set that contains 70% of the data and one test set, so you train using 70% of the data and you test your predictor and you remain in 30% of the data you can do this by randomizing the choice of the training set and so you can let's say, avoid too much overfitting of your meter so the performance for three different approaches is shown here so we will discuss in the following only this the algorithm where evaluated in terms of several parameters including the precision the recall and the false positive rates and the overall precision, the overall accuracy was higher for the support vector machine so we will discuss in the following only this algorithm namely its application to distinguish the compound between activators and inhibitors of the chapter 1 now the performance we have seen is good because sorry, I forgot to say that this balance accuracy is 0.89 so the accuracy is very good but it was obtained using an approach that didn't consider didn't take into account the kinometrics properties the physical chemical properties of the compound because the machine learning algorithm was speeded by the score so there is no information about, let's say, the chemistry of each compound so the authors decided to compare this performance to the performance of approaches based on the so-called fingerprints of molecular properties in particular these fingerprints in a way identify the properties of the different chemical groups that compose each compound and by classifying the 133 compounds in this way, using the so-called tani motor coefficients that is a very used let's say coefficient to classify the similarity between chemical compounds they classify the activators into four classes and inhibitors also into four classes the performance of the machine learning approach is compared to the performance of this classification based on the calculation of the chemical similarities between all the compounds the method used is the easy method and you can see that if we plot the performance of the two approaches chemically based and machine learning based for each of the classes we see that the machine learning approach can be done for basically all the classes but the last one so the current classification in other words cannot be found by using only fingerprints that do not consider of course the effect of the binding on the protein structure and dynamics so it is not surprising perhaps that a method that in a way includes the impact of the binding on the dynamics for this kind of systems perform much better and this is also the last that the specific chemotides come out from the analysis using the machine learning approach so we are arising the two lectures I think that the take-on messages are several ones first I hope I convinced you that computationally assisted drug design is good and improved in accuracy in predictive power in the last three degrees and an asset sampling techniques are particularly good to address challenging targets including elastic processes detection of cryptic binocides that are binocides that are generally not occurring on the protein and they occur only in particular situations also in the case of poly specific binding such as in the case of some transporters that are found in bacteria in humans for instance taking into account that you have many techniques you would like to use and the choice depends of course on the application you have to find the most suitable one for your application this is almost always the most difficult choice to do and you must consider the underlying theory of the method you would like to use I think that in the case of the most difficult target hybrid approaches can be a solution in which you consider flexibility of different levels by combining different algorithms for instance in a sequential simultaneous manner and also when possible use experimental information if you have it but don't forget that the use of experimental information if that experimental information is let's say confined to specific chemo types could be also could have implications in terms of biasing the chemical space of the ligands you will select towards the space of the ligands that are part of the experimental information we have considered I will like I'm done and I will like to acknowledge before closing the people that participated to the work I shown coming from our lab, in particular Paolo Lugeroni, Andrea Baccio and Giuliana Malocci and as I say Fabio Pietrucci at the University of Paris and Alessandro Bonvena University full track and all of you your attention I hope you enjoyed this lecture and of course I will be available for questions and discussions during the summer school. Thank you.