 Hello everyone, here am I again for a short talk in this BioXL summer school about our efforts using Hadock for drug repurposing against COVID-19. And this is yet another view of Pula where we should all have been this week if it wasn't for COVID-19. So let's take first a look at SARS-CoV-2 and a key protein which might be interesting to target in a design of drugs to stop the virus. For this I'm going to use the following paper which was published quite recently in Nature and gives quite a nice and broad public overview of the key proteins, their function and how to potentially disarm them with drugs. And the first one of course of which you are probably all very well aware and some of you have even been working on it is the spike protein which sits on the surface of the virus and interacts with the angiotensin converting enzyme 2 to enter our cells. So potential target to prevent cell entry will be to block these interactions. Another critical protein of the virus is the main proteins. So when the virus enters our cells it delivers its viral RNA and the viral RNA is then translated into proteins, long strands of proteins in our cells. And one of the first protein produced is the main proteins also called M-pro but also 3Cl-pro depending on nomenclature. And this proteins is critical for the virus because it cleaves the polypeptide chain produced into functional proteins and these are the non-structural viral proteins which the virus needs for its further life cycle. So by blocking the main proteins we can in principle block this process. Now there are of course quite a number of structure of the main proteins already available. Basically we have homologous proteins from SARS-CoV-1 basically but also recently as a response to the pandemic the structural biology has been working a lot on those systems and now you see here a structure release on the 6th of May which is a crystal structure of the main proteins with an inhibitor bound to it. Now the third actor in this process is yet another protein which is further down in the life cycle of the virus. It's the RNA dependent RNA polymerase also called RDRP. Now what does this protein do? It actually produces viral RNA and this RNA is needed for the virus to increase the production of proteins but also to produce its own viral RNA material to be packed into new variants. So if we can block this particular protein again we're going to block this process, this RNA production which will again stall the virus. Now the structural biology community again has been working hard in the last months to generate structure of these and you see here one structure of the RNA dependent RNA polymerase which was published a few months ago in Science and this is a cryoelectron microscopy structure at 2.5 angstrom resolution which contain triphosphate form of remdesivir a well-known say antiviral or promising drugs in this case which we've been hearing a lot about in the news. So we have structures for all of those three key proteins. Now there is much more structure known about the viral protein. If you want to look more at the structural landscape of the virus proteins you can take a look at this covid-bioxcel.eu-proteins website which has been set up in collaboration with the more SSI in the US and collects a lot of information about the structure, the proteins of the virus but also all kind of information about simulations, molecular dynamic simulations, docking model and all of that. So it's really a hub for all the structural and modeling work done on the virus. Now how are we going to target the virus using HADOC? What are our strategies actually to perform small molecule docking in HADOC? We have three main strategies that have been used so far. So the classical one will be what we call binding site driven docking and that's pretty much in line with what I've been telling you in the first lecture introducing HADOC where we are going to define some information about binding site. The second one is what we call the template-based docking strategy. Here we need to rely on existing structure or molecular structure of the target protein with some other drugs. This is a strategy we developed along the Grand Challenge D3R, a Grand Challenge. This is a blind competition for the prediction of protein small molecule complexes to which we participate for three times actually. And the last one is a rather new and recent development in HADOC where it's also template-based so we need to have some structural information but we're going to use a shape, basically shape information as restraints to drive the docking. So now I'm going to shortly explain to you those three strategies before coming to actually the results of the screening that we have been doing. Now binding site driven docking is a classical HADOC type of approach. In case of small molecule binding to a receptor, we usually know the binding site on receptor. So we're going to target this binding site by defining the residue in the binding site as active in the HADOC setting. So HADOC means that those residues should make contact with the ligand otherwise there's a penalty. Now this is pretty similar like if you think of small molecule docking software like Otodoc there you define a box around the binding site. You never really target the entire protein but you target your docking to a limited region. In the case of the SARS-CoV-2 proteins we know actually where the active site is so we can't define this information. Now in the rigid body docking stage again the ligand is defined as active. So what we have effectively is a set of restraints that are going to draw the ligands into the binding pockets at the rigid stage. Compared to our standard protein-protein docking we increase the weight of the van der Waals energy term in the scoring function to favor pauses that have a good van der Waals interaction. The default for protein docking will be 0.01. Now in the second phase of the docking where we're going to perform a flexible refinement of the pauses and final refinement in explicit solvent or not we change the restraints that we are using. So now only the ligand is active and the receptor binding site becomes passive. This allows the ligand to explore the binding site. It means that if some residues in the binding site are not contacting the ligand there is no more energy penalty for that. The only requirement is that the ligand should contact some residues in the binding site. So this gives freedom to the ligand to explore the binding pockets. We also change slightly our protocol for small molecule docking by removing the high temperature step in the simulated annealing part which is done here to refine the interface of the molecule. And we also decrease slightly the temperature which we do those simulations. Finally we change for small molecule this is rather recent setting. We change the weight of the electrostatic intermolecular energy term to 0.1 so 10% of the score instead of 20% and this protocol has been described in the paper showing our first participation to the D3R Grand Challenge. So this is a classical Haddock type binding site driven docking but adapted to small molecule. Now the template-based approach is a different one. So for this we need first to have available 3D model of related receptors, homologous receptors or ideally the same receptor that you are targeting but with over-ligands bound to it. Now to select which receptor we are going to use for the docking we first look at what kind of ligands are bound to those receptors and we are going to select the receptor which has the most similar ligand to the ligand that we have to dock. And we do this similarity measure by using the TANIMOTO coefficient and you see here two different ligands that share here a common substructure. So we select the receptor which has a ligand with the largest common substructure to the ligand that we have to dock. Now how to sample ligand conformations? We do some flexibility during our docking but often this is rather limited. So from the small string which are simple text string describing the chemistry of the ligand we generate 3D conformers up to 500 pair ligands so some of them don't reach 500 if they are only few rotatable bonds. And for this we use the OpenEye Omega Toolkit which is free for non-profit usage. So we find it quite performing very well to generate relevant conformations. Now we are going to... So the receptor that we have selected contains a ligand which has a given conformation it is not the ligand we want to dock but there is information in there. So we are going to compare the conformations, the shape of those 500 generated conformations with the shape of the ligands in our template receptor and we do that by matching the shapes basically and also matching kind of the pharmacophore model and this is done using OpenEye rocks. So we are going to select the 10 best conformers out of the 500 that we selected which have the highest combined score and these 10 conformers will be given to Hadock for ensemble docking. So we start from multiple conformations of the ligand. Now in this template based docking protocol which we use in D3R round 3 and 4 so we skip the first two stages of Hadock so we don't do the rigid body docking, we don't do the flexible interface refinement and we can do that because we superimpose the selected conformers onto the crystallographic ligands in the receptor that we selected and this superposition is done using the OpenEye shape TK software. And what we do is only run the final refinement stage of Hadock basically this water refinement that you see here. So this was a very successful strategy in D3R Grand Challenge 3 and 4. This strategy has been published in this paper so you can look up all the detail and what you see here are the 20 ligands that we had to predict for Grand Challenge 4. In yellow is the crystal structure and in blue are our predictions. So we see we did very well so the top one, the medians of the top one pauses for all those 20 is 1.5 and the best is 1.2. So this was a very successful case. Now we didn't follow this particular template-based protocol for our efforts against COVID but we follow a new protocol which is still template-based but now we're going to use the ligand information from the template in a different way to define a shape that we will use during the docking. So the identification of the receptor and with the more similar ligand is the same. So we have a template structure which contains a ligand which is the most similar to the ligands that we have to dock. We transform this ligand information into a shape and these are basically dummy atoms that are inside the receptor and then we define restraints from the ligand that we have to dock to those shapes. Now the way the restraints are defined depends on the size of the, so if the shape is larger than a ligand we're going to define the restraints from the ligand to the shape which will cause that the ligand should overlap with the shape and vice versa. If the shape is smaller we define the restraints from the shape to the ligand. And now we don't need to pre-select conformations, so we can use all the 500 conformations that we generated with open eye to dock into the receptor. So now we do again a full-fledged docking but the advantage of the shape is that it can induce, it will select the most suitable conformation to fit into it and it will also induce more conformational changes during the refinement process. So that's the protocol that we're going to apply to some of the cases for this COVID work. So now coming to our drug repurposing screen using Haddock against the three main actors of the virus. So what did we do? We selected approved small molecules from drug banks. So these are approved drugs that have passed clinical trial and are in use for all kind of different applications. We wanted to limit the system. We don't want to take very, very small molecule, there is all kind of stuff in drug bank and we didn't want to take too large ones so that these make reasonable drugs. We have an atom filter. We discarded any drug containing metals, organometallic compounds because simply our Haddock server cannot handle those. The way that we generate the topology and the parameters for the small molecule is using pro-drug and we cannot handle organometallic compounds there. So this gave us about 2,000 compounds from drug bank. We added a few more compounds that are basically the active form of some of the drugs. Some of the drugs are pro-drugs meaning they need to be processed into the cells, into the active component. For each of those we generated up to a maximum of 500 conformations again using open eye so not all ligands have this and you see here the distribution of sizes in terms of number of atoms for the ligands that we selected. Now here is the first actor, so that's the main protease, main pro for which we have a crystal structure. All the work by the way has been done by Manon Reo and Panos Kukost, they are both postdoc in my lab and Panos was a former PhD student. So what did we do, the docking protocol? So they are quite a number of known ligands for this main protease also from SARS-CoV-1 and in the last months the diamond synchrotron released quite a number of small fragments screening data sets so they have been crystallizing the main protease with all kinds of small molecule fragments so we have a, we use a set of 58 fragments in the binding site and also some other structures with ligands. As non-structure, so they are two of dorms, there is an apo and hollow form plus they are many additional structures. So this was a starting point of information for the receptor and this is a starting point for the ligand and the protocol that we follow here is the shape-based docking protocol that I just explained before. We also follow the slightly adapted version of our shape base where we make use of pharmacophore information. So we're going to define a shape which represents the pharmacophore information, so what you see here are all the small fragments and ligands that we extracted for the main protease and from these we derive a pharmacophore model. So we can represent still this pharmacophore model as a shape consisting of fake beads but the beads now have properties that link them to a special type of atom and when we define our restraints we can make specific group of atoms overlapping with specific beads in that case. So we did basically two screening, screening 2,000 compounds twice against the main protease. So how did we do this kind of computation? It's a lot of computation, so we could do the entire screening of 2,000 compounds in about three and a half days using high throughput computer resources provided by the European Open Science Cloud project and also the EGI. The web server of Haddock has been operating on this kind of resources since more than 10 years now and this is what is powering all the machinery behind the server allowing us to serve a large user community. And you see here the week of, so this was April, in about a week time you could see the number of jobs running at different sites around the world. So this is more, a lot of sites are actually in Europe but you find the Open Science Grid here, you find sites in China for example, this is Beijing. So all those sites are supporting our efforts. Now what came out of all of this, so these are the results already. So you see here on the bottom left, the top 10 compounds that come out of the screen. So you see the score for the pharmacophore model, for the shape-based aluminum model and here we report the best score of the two and this is how we rank the model. Now this table can be sorted as you wish and everything is available on the bonnetlab.org.covid site. There is also a graphical view of the results, so you see the Haddock score in arbitrary units, this is important to remark because these are not binding affinities and if you click, if you hover a point here it will tell you which drug it is. So when we look at the results, so you have to be very careful when you look at the docking, it's in-cylico docking so it doesn't have to mean anything relevant, but when you look at your top compounds, it was already interesting to see in the top 10 here we have recovering 3 protease inhibitors and if you look in the top 100, we find several drugs that are actually in clinical trial currently in Europe. So this was already a good sign that our scoring function is able to recover drugs that are actually relevant for this. Now just a side note, Hydroxychlorine with chloroquine which was mentioned as a potential fantastic solution to COVID-19 is not ranking very well in our screen, it's around 800. Now the same was done for the INE polymerase, the entire screen here took about six days because it was a larger system, we actually to cut some domains that were not relevant for the binding site that we are targeting. So here we didn't follow a template shape-driven docking protocol for the reason that it was not enough structural information to follow a template-based protocol so we went back to our say classical hydro-combining site-driven docking protocol. Again you see here the top 10 everything is available online so you can go look at it and you see an interesting compound which is the active form of Remdesivir, Remdesivir triphosphate which scores at number five. Now here currently we are also targeting now the angiotensin converting enzyme receptor so the idea here is to block the receptor in a rather closed form which will prevent the binding to the spike protein but these calculations are running as I speak. Now if you have noticed carefully the ranking that I presented you then you should become aware that there is a potential problem with all this in-cylical work. So we have what we will call frequent haters. So the top rank compounds in both cases for the main protease and the RNA dependent RNA polymerase is septolazone, a beta-lactam inhibitor and this is a highly polar compound you see here the structure. So is this an artifact or is this a miracle drug? So one way of looking a little bit into that is to take a completely unrelated target and in this case we took di-refolate reductase which has nothing to do with the virus. We took the top 20 compounds predicted by a rank by Haddock for the main protease and RNA polymerase together these were 36 compounds and we docked against DHFR and you see that septolazone is also ranking on top for DHFR but DHFR has nothing to do with the virus. So that makes it very suspect. So it's a sticky compounds that always score well. So you see here the DHFR rank, the main protease rank and the RNA polymerase rank. So this one is one everywhere, all the place. But for example if you take the second compound for the main protease it's rank number two. It's ranking pretty poorly for the RNA polymerase which makes sense and it's only 20 for the DHFR. So by looking at this kind of analysis and data we can exclude some drugs for further testing. So where does this all take us in terms of conclusion and perspective? So we have been doing a screening of approved drug against the three targets and the last one for AC2 is ongoing and should finish actually this week. The most interesting compounds for the main protease are actually being studied by now by Molecular Dynamics in collaboration with Attilio Vargio and Giuliano Malocci from Calgary who are also co-organizing this by Excel summer school. So a strategy to block the virus waiting for a potential vaccine will be to design a cocktail of drugs as is done for HIV. We still have no vaccine for HIV and the therapies these days are using cocktails of drugs that are targeting different proteins like the three main actors I discussed about. Of course all of these predictions will need to be validated and we are planning to do that in collaborations with people in Utrecht and also in the context of a large european innovative medicine initiative project. What could also provide insight into the validity of our predictions are epidemiological data meaning that by analyzing patients that have suffered from COVID-19 and we might see that some group of patients have less affected by the virus than others and those groups were maybe already taking some drugs which are in the in the screen that we are doing. So if this kind of data become available we can cross-check our predictions. To finish with I want to acknowledge the people who have been doing most of the work so this is Panos and Manon in my group you've seen their pictures. I want also to acknowledge Ed Moritz from the pharmaceutical sciences department who has all the knowledge about all these drugs and he's collaborating with us in coming to some prediction of what could be a potential cocktail which will then go into testing and of course the entire lab for every all their contribution and offering for supporting all our work including of course bioexcel. Thank you very much for your attention.