 Good afternoon. Good evening. Good morning in case you're joining us from a different time zone. Nice to see people joining us from very many different places. It's a great pleasure to have Vincent Zoot and Antoine Dainard from the Molecular Modeling Group at the Swiss Institute of Bioinformatics today who will be telling us about drug design applied to COVID-19. Just two housekeeping points. We are taping this course, so please mute your audio. And we will be using the chat for questions. As this is a two hour course, we'll have two sessions of question and answers. The first one being somewhere halfway along. So don't hesitate to put questions in the chat and we'll take them one by one during those question and answer sessions. OK, I think that's it for me and I'll give you, Vincent and Antoine, the four or rather our screens. Thank you. So good day, everyone. So maybe we can start by presenting ourselves rapidly. So I'm Vincent Zourette. I am assistant professor at the University of Lausanne in the department of oncology. And I am also group leader at the Swiss Institute of Bioinformatics, where we are working actually on computer-added drug design, developing new approaches, putting them for free for academic research as a website and maintaining this website. Thanks. I am Antoine Dainard. I am a senior scientist in the group led by Vincent, doing different things all around development and application of computer-added drug design methods. Thank you. Yeah, so this is the objective of this lecture for today. So obviously we have planned many things if we cannot do everything. That would be a bit pity, but you can also register to the lecture that we are giving for the Swiss Institute of Bioinformatics also. So basically we are going to start with an introduction about drug design in general and computer-added drug design in particular. Then there will be a short theoretical introduction about molecular recognition and its implication for drug design. And this will be followed by a visualization of a ligand protein complex using visualization software, UCSF camera with Antoine. Then Antoine is going also to present you different therapeutic strategies, discovery strategies, sorry, to tackle the COVID-19 today. Also, he's going to present different approaches regarding absorption, distribution, metabolism, excretion. And this will be followed by an exercise where we are going to use a Swiss admin and show you why it works. Then I will take over to make a short theoretical presentation about structure-based and ligand-based computer-added drug design, how we can calculate molecular similarity to perform virtual screening. This will be followed by a Swiss similarity exercise done with Antoine. Then I will also introduce molecular docking and in particular the algorithm behind Swiss docs, which is the docking software EA.DSS. And again, this will be followed by an exercise with Antoine using this web tool. If we have time, but it's not sure, actually, we could also present you a tool that we have developed for target prediction, Swiss target prediction. So there is possibility to make an exercise, but it really depends on the amount of time that we really have at this moment. We don't want to push too much. So basically, the idea is to have a flavor of computer-added drug design in general applied to COVID-19 in particular. And also about the tools developed at the CIB to perform computer-added drug design. Obviously, those different tools are quite general. So not only they are working, obviously, for targets linked to COVID-19, but also to any other protein targets, basically. So as Monique said, since we are quite numerous and using a session, the best is to write your questions the web platform using the chat possibility of Zoom. Then Monique is going to select some questions and we will be happy to answer them. OK, so let's start with an introduction about drug design in general. And as I said also about computer-added drug design in particular. So first of all, we would like to remind that for us today, a drug is only the active ingredient. So the small molecule, which is responsible for the therapeutic effect. So this effect can be for a therapeutic objective, so to cure the disease for a prophylactic objective, preventing the disease to occur and finally also for diagnostic. Again, today we would like only to consider small molecules, drug-like small molecules and not new entities that have emerged recently like antibodies, for instance, which are out of the scope of today's presentation. So very often the public at large think that most of the drugs are coming from natural products, plants or other entities. But really, most of them, it's only true for 6% of them. For the 94 other persons, either we have modified the natural occurring molecule or actually this molecule is purely man-made. And so for 94% of the cases, actually we are going to use the different approaches that we will see in the next slides. So the goal of drug design, obviously, is to create those small drug-like molecules as a new chemical entities that are able to interfere with a bio-macromolecule, typically a protein like an enzyme or receptor, in a way that this molecule is going to activate or inhibits this protein leading to the therapeutic effect. So for this to occur, we need to create a molecule considering two main purposes. The first one is molecular recognition. We need the small molecule to bind tightly to a binding site or another catalytic active site of a protein for the effect to occur. But also for the effect to occur in the body, we need the molecule to reach the target in this body and stay there long enough in a bioactive form. And also we need this molecule not to be toxic. And for this, we need to address different questions regarding absorption, distribution, metabolism, excretion, and toxicity, which are also very hot topic in drug design. So you see here the typical drug design pathway starting from identifying the protein targets of interest to cure a given disease. Then we enter the drug design or discovery itself, which is the three most central red steps, where actually we start by finding a first series of compounds showing some signal of activity against the target, so able to bind to this target. Then we select among those those that are the most promising. So those with the most potent activity, drug like properties, synthetic accessibility, so on and so forth. And we optimize those molecules that we call lead and this lead optimization molecule will end up, ideally with a couple of molecules that will be able to enter the preclinical development and at the end of the stage to enter the clinical development themselves on human beings. So the whole process, as you know, is very expensive. So if we are dealing with a project which has the chances to put a molecule on the market, we are expecting a budget of about one to two billion dollars. And also it's a very long process, eight to 12 years, most of the time dedicated to clinical developments, but also preclinical developments. So actually this is a very long, a very expensive process, but it's also a very dangerous process in the sense that there is a huge attrition rate during the preclinical and clinical developments. So basically just about five percent of the molecules that are entering preclinical developments will have a chance to be accepted, approved to enter the market and will reach the patient. So to accelerate the process, to decrease the cost, to increase the probabilities that the molecule will end up being accepted by the authorities, we need actually different approaches to achieve all those goals. And one of the possibilities to use computer-added drug design, among many other possibilities. So actually the objective of computer-added drug design is to use computing resources to come with rational ideas about how to create or modify the molecule, for instance, to increase the affinity for the target and also how to make decisions regarding, for instance, which molecules will be promoted to the next step of the drug design pipeline. So computer-added drug design actually is very important in the field, so that's surprisingly a huge amount of different approaches have been proposed by many different groups and companies in the world to tackle many different questions all along the drug design pipeline and notably during the drug design and discovery itself. So as all of those, we have ourselves this lot, different computer-added drug design tools, but what we did was actually to create a series of different tools, addressing different questions for free, for academic research in the form of websites. And since we have the support of the Swiss Institute of Bioinformatics, actually we can maintain these tools in the very long term. The oldest tool, SwissDoc and SwissParam, actually, will soon reach the 10 years life in next year, if I'm not wrong. So we have many different tools. Today we only use the ones that you see in a black, in a dark red box, so SwissDoc, Swiss Seminarity and SwissSanmer. If we have some chance, maybe we are going to see also Swiss Target Prediction. So those different tools, actually, we have conceived them to discuss with each other. So most of the time, with the main exception of SwissDoc, the output of a given tool can be directly used as an input of the next tool without any technical burden. So no download of file, no file translation, no reformatting of the file, so on and so forth. So it's just a problem. One click submission that you can launch a new calculation from a molecule outputted by the previous tool. So you will see when you will do the exercises that actually the tools are very easy to use. Actually, we have worked a lot so that they are easy to use. And we use also them a lot for teaching, but they are professional great tools developed to make research. So all the algorithms have been published in peer reviewed articles. And actually, we are receiving more than one million jobs a year on average on our tools right now. So actually, many people are using those different tools for their own research. So in computer-aided drug design, we have two classes of approaches. And this is also true for the Swiss Drug Design Project. We have approaches that are based on the structure of the targeted protein. So basically, we use the three dimensional structure of the targeted protein when it is available to take decisions about how to modify the small molecule, for instance, for a better binding. And we have so called ligand-based approaches. And here, maybe we don't know the structure of the protein, but we have the chance to have a collection of molecules that are already active against this protein. And we are using different techniques, machine learning techniques, to extract information from those molecules in order to create better ones. So typically, we can use the quantitative structure activity to see bioelectric replacement and so on and so forth. You will see a couple of examples. On top of these two types of approaches, we also have different strategies for drug design. So the typical strategies that have been used for decades and which is still the most used nowadays, is actually the use of one drug with a high specificity for a given target related to the therapeutic effects that we would like to obtain. Here, we pay a lot of attention to get the most selective, the highest selectivity as possible in order to decrease the possibility that this drug will banter another protein and create this way potential side effects, some of them could be detrimental. So basically, this is the main approach that we are using today, trying to get a molecule with a high affinity for a given target related to a given indication, trying to get, to prevent the binding of this ligand to other targets. But it appears that if you work in the field, you know very well that it's nearly impossible to be totally selective for a given protein. Generally, drugs will bind to many other proteins, typically six to eight other proteins. And one idea that give rise to this drug repurposing strategy is to capitalize on that instead of trying to avoid it and find out possible other targets maybe related to other indications for the same drug. Let's say that if the drug has been developed for indication A and is able to bind to target one, if you discover that this molecule is able to bind to get two, maybe if this target could be useful to tackle another indication another disease, then you could propose using this molecule and the advantage, the big advantage of this strategy is that you already know what are the defaults and qualities of your molecule. It has already passed different clinical phases, not to be the phase one. So you know everything regarding the distribution in the body, the toxicity, the dose that you can give to a human being safely. And actually you can save a lot of time this way. You already have a molecule approved for the market. You already have a molecule that you master very well in terms of toxicity, distribution, so on and so forth. On top of that, more recently, another class of molecules have emerged, a strategy, sorry, of drug design has emerged, which is drug polypharmacology. So here the idea is to capitalize on the capacity of a small molecule to bind to different targets, all of them being involved into the treatment of one given disease, one given indication. This way actually you can potentially decrease the concentration given to the patient, because you are actually tackling the disease at different levels using different targets. And basically you can be more potent against the disease, potentially with a lower quantity of the small molecule. And also you can tackle different questions regarding resistance upon mutation. If one target is mutating to avoid being bound by the ligand, it will be very difficult that at the same time in the same cell, you will have all the mutations in targets to entry that will avoid this binding to. So this was all for the main introduction about drug design. Let's have a look now at the first topic, which is very important to address when we design a drug, which is molecular recognition. Obviously, we have seen that for a drug, a small drug to be active against a given disease, we need this small molecule to bind tightly to a protein target and change its activity. For this, we need the two molecules to recognize each other. And this is done through different molecular interactions that are mainly non-covalent. So most of the drugs actually are not binding covalently to the target. Also, there are some exceptions. So basically we are dealing with different types of so-called non-covalent interactions. The two main ones being those that you see here in those two movies. So on the left side, you have non-polar interactions described, for instance, by the Lennard's potential of the Van der Waals interaction. And we have other actually terms to describe these non-polar effects. So generally, we see the next slide that we consider that those interactions are very important for shape complementarity. On the right side of the slide, you see the possibility actually to have electrostatic interactions between the ligand and the protein. So basically, sol-bridges and hydrogen bonds, they actually are generally considered as responsible in a large part of the specificity of the interaction between the ligand and the protein. We will go back to that in two slides. So regarding the fact that Van der Waals interactions and non-polar interactions are responsible for the shape recognition, actually this can be understood by this complex between an HIV-1 proteas inhibitors or an antiviral drug that you see on the left side, on the right side of this figure in balance stick interacting with the hydrophobic pocket of this HIV-1 proteas here. You see that the tertiary butyl fragments that you see in the sphere representation where the atoms have the radius of the Van der Waals radius of the atoms. Actually, this fragment is feeling perfectly the binding pocket of the target. And actually this is done to optimize the Van der Waals interaction between the ligand and the protein. And you see that actually the drug designers need to choose the best fragment as possible to map the exact shape of the binding site in order to maximize the non-polar interaction. And this is a reason why actually we consider those kind of interaction, this class of interaction is extremely important for shape complementarity. On top of that, as we have seen, we have also important interactions driven by electrostatic, so for instance hydrogen bonds. Again, you see here non-tiviral inhibitor of the HIV-1 proteas binding into the active site of this one. So here we have only represented two isolysine of the protein, so the two residues that you see in a light thick line in the upper part of the scheme. And the inhibitor is represented with a black ballistic representation. You see actually the hydrogen bond network which is created between the inhibitor, an active site water molecule and those two isolysine in the upper part. Actually here the designers created a molecule where you have two carbon ions exactly at the same good position and good orientation to respect the geometry of the hydrogen bond. Actually a hydrogen bond as you know only occurs if you respect the distances and the angles between the atoms that are exchanging this hydrogen bond. And so here we consider that this is at the origin of the specificity. Why? Because if you want this inhibitor to bind to another protein which would not be HIV-1 proteas, you will need to have a hydrogen bond donors functions in the binding site exactly at the right position and the right angle to make a similar hydrogen bond with the ligand. And this is extremely difficult. Imagine that if you have like five to ten hydrogen bond donors and acceptors in the ligand, you need to find another binding site of another protein with exactly those five to ten hydrogen bond donors and acceptors complementary to those of the ligand. So it's quite difficult. And this is the reason why actually these those constraints in terms of angles and distances are responsible for the specificity of the molecular recognition. So on top of the typical nonpolar and hydrogen bond interactions, we also have of course and obviously aromatic interactions as you know them. So we typically like this example of denepezil binding to acetylcholinesterase because in a few cubic angstroms you see three different types of interaction. A typical pie stacking between one phenyl ring of the ligand and a tryptophan ring. So this is the one which is number one here in this case. You also have hydrogen bond interaction between the water molecule in the active site and the same phenyl ring. And basically also a catch on pie interaction here between the positively charged nitrogen and phenylanalin residue of the protein. So this is interaction number three here. So actually the common point between all those interactions is this playing by the fact that above and below the plane of the aromatic cycle you have electronic clouds that are constituting a kind of negatively charged region above and below the cycle which is aromatic while the cycle itself the plan of the cyclic self is positively charged. And actually this explains the interactions that can occur between positively charged or negatively charged region of these aromatic cycles and different components of the protein. So on top of those typical interactions there are other factors that actually play a role in the strengths of the association between ligands and proteins. So you have the effect of water molecules that are able to bridge the interaction between a ligand and a protein as we have seen in one of our previous slides. We have the effect of the solvent itself which is able to shield the interaction between two charges for instance as we can see in this movie. We have the conformational change upon binding so unfortunately most of the time the conformation that the drug is going to adopt within the binding site of the protein is slightly different than the preferred conformation when the ligand is alone in solution and actually bringing the ligand from this preferred conformation in solution to the one he needs to adopt in the binding site is costing energy explaining that actually most of the time we try to find a quite rigid ligand in order to avoid this conformational penalty upon binding. And finally we have an entropy penalty generally upon binding of the drug that we need to control. So as you can see here the numbers of degrees of freedom so the disorder of the ligand for instance is smaller when the ligand is bound into the protein than when it is alone free in solution. So basically this freezing of degrees of freedom the freezing of this disorder is unfavorable to the binding and we need to cope with that again. So most of the time by redefining the small molecule so that the freezing is less intense because actually the molecule is already frozen in the solid. So we should not forget that all the interactions you are going to see with Antoine when you are using SwissDoc and UCSF camera actually are frozen because they come from X-ray structure which is a kind of single shot image of one possible conformation of a complex between a small molecule and the protein. In the real case application actually all those molecules are moving at room temperature at atmospheric pressure together with water molecules and all the interactions that we will see are continuously broken and reformed and in some circumstances it is necessary to take this into account in order to have a better estimation of the strengths of the association between the drug that we design and the targeted protein. So now actually I'm going to give the floor to Antoine for the next part. Thanks. So the next topic is just very rapid and crude context about what are the main strategies to discover therapies against COVID-19. So there is a crude ranking on this slide about from less rational design involved to the more rational design involved. So if we start with Plasma which is a speculative yet interesting strategy to administrate Plasma from people that had suffered from infection of SARS-CoV and have recovered. So we expect with that that they have all the immunological moieties, antibodies good to kill the virus. Then of course we got vaccines which is I think everybody agree the most advanced and promising approach. But we don't have to forget that most of the advanced vaccines today are based on new technologies. So there is still some surprise to discover. Drug repurposing. So Vincent exposed this strategy. It's excellent for short term. So developing the vaccine, developing the drug is very long. So if we can a shortcut with already very advanced molecules it could help a lot in the meanwhile. But of course even if this is successful so it's a difficult exercise but even if it's successful it cannot be ideal. I mean a molecule is designed for a specific target and then you use it for the other one it could work but it's not optimized for the new. Antibodies of course that's also very advanced and drug discovery which is the topic of today. Discovering, designing small molecules on viral or host targets. You have understood with Vincent that it's a very long process. It's very risky because the attrition is dramatic. Even if you enter the clinics you still have only five percent of possibility to access the market but if successful with all the unknown and the doubt about the new technology the sustainability of vaccines it could be extremely effective as backup or frontline therapies and actually many institution, private and public have initiated rational drug discovery against COVID-19. Antiviral drug discovery is a specific case. Actually we are in a good position in the sense that we already had successes well-known successes of rational design, computer-aided drug design, structure-based drug design if we think about anti-influenza or anti-HIV agents really structural and rational design had a critical impact on these discoveries. Designing small molecules you have to find a target. It can be a viral protein. The downside is that viruses are very clever to put in place mechanisms of resistance. So if you design something it's probable that after some times the viruses will become resistant at least partly. But the bright side is that it's potentially very specific to what your target, so what infection, what virus and less prone to side effect it's really anti-viral drug. Or you use host protein targets you will have for sure less resistance but most probably you will have an impact on a physiological pathway and this can lead to side effects that are often detrimental if not a clear no-go for the therapy. So focusing on COVID-19 if we think about the host so the human protein targets I don't want to go through the entire list of targets that are hot today but we can talk about the first one so this angiotensin one converting enzyme two so the ACE2 because it's the protein will chiefly take in charge the entry of the virus into ourselves. So we thought it's very important it's really that protein that recognize the spike protein on the surface of the virus and make the virus enter. You had a lot of biostructural data around this also the some very large electron microscopy structure of the complex of multiple enzymes but in term of strategy having a full blockade of this enzyme is really questionable in the sense that it this has a very important physiological role in the lungs in the oxygenation so it has a protective role in respiratory distress and tone so blocking this in the context of COVID is very questionable so there are more other targets that are on the side that are helping the entry of the of the virus into the cell that can be more clever targets like a few proteases that are priming by cleavage the spike protein of the virus and help the entry of the virus so you have for instance the TMP SS2 which is very interesting because already some repurposing are tested with Kamostat from instance or bromaxin molecules that have nothing to do with antiviral drug but at least in vitro they see we can see some effect on reducing the entry fluorine is also interesting because it's specific it's not helping entering all the viruses but it seems to be quite specific to SARS-CoV-2 so it can explain the higher pathogenicity of this virus and it's interesting to start design because it can be more specific drug so you have a lot of x-ray structures with peptides inhibitors who will help the design of small molecules inhibitors also Catepsin-L is important to prime the spike protein to help entry of the virus it's very interesting because Catepsin are very well known even before COVID-19 because it's also very important in oncology so in some in many cancer mechanisms you have Catepsin involved so designing a drug to inhibit these proteins is not from scratch you can leverage a lot of knowledge and you have a lot of bio structural that chemical data and so on also some kinases important I give you the last one so the big five where a drug were proposed to be re-proposed like apillimod which is which was developed for Crohn disease and psoriasis and can be potentially re-proposed all right this was for the human target but if we go to the SARS-CoV-2 main targets again not the anterior list but the spike as protein so this is the main protein on the surface of the virus where he's recognized by the cell prime by a few proteases to have the internalization of the cell a lot if is known about which receptor buying motive is doing what what is specific to the SARS-CoV-2 you have a lot of x-ray and priori and data and already you have some functional antibodies peptides and small molecules that reduce the binding to ACE2 the main protease what was maybe one of the first target evaluated you have a lot of x-ray data because it was already seen in the previous outbreaks of coronaviruses namely the MERS and the first SARS so already inhibitors have been tested and you have a lot of drug design project and repurposing that have been done and trying to translate from the the previous MERS and SARS to the new COVID-19 responsible agent and finally maybe the RNA dependent RNA polymerases also a very early discovery well conserved amongst the RNA viruses and this is because of this that you have very famous drug repurposed like the Remdesidir which is in late development against Ebola virus that have been one of the first very effective drugs tested in the clinics against against coronavirus for the purpose of today the examples the demo the exercise and the discussion we decided to focus on the main proteases so these three CEL pro lot of repurposing because it was discovered the very early step of the outbreak most of the repurposing are in fact already antiviral proteases inhibitor so if you look on on the foreign molecule on the left of the red bar you can see that BOSSE PREVIR TELA PREVIR are actually already protease inhibitors but they are used in another disease which is a hepatitis C virus infections lopinavir and tritonavir the two in the below are HIV protease inhibitors and they have been evaluated pretty far in trying to be repurposed in trying to inhibit the SARS-CoV main proteases for research on the right hand side of this slide you have two molecules that are let's say surely more difficult to develop but maybe in the long term more information can be more interesting so this she sees something on the top it's the repurposing of veterinary drug so it's a drug which is used in cats for an infection of another coronavirus and it's a reversible inhibitor pro-drug and okay it's very difficult to monitor all the clinical trials but it seems that it will be very soon implemented in very serious clinical trials and the last one is carmophoid which is very interesting so it's a pro-drug so it's cleaved here actually and for treating cancer it's this uracil analog here the fluoro uracil which is the anti-cancer drug but the rest by chance was found to be covalently bound to the main protease of SARS-CoV so you have one part of the molecule which is an anti-cancer and another one which is potentially something that could help in COVID-19 repurposing of course but also drug design there is a lot a lot of biostructural data on this main protease responsible for RNA replication and in fact many different pathways to control what is the response of the host cell and among the 188 crystals we we propose to analyze one specific entry which is in the protein data bank which is this 5MG1 it's a co-crystal with a no-covalent ligand so you have with a crystal is made by soaking the main protease with a small molecule which is not which binds non-covalently to it and it was possible to make a crystal of the complex and resolve the coordinate of all the atoms with x-rays and this is this information that will be used to understand and make drug design so we will use simply some portals some web pages to access the protein data bank like the RCSB portal but you have other like PDB Europe for instance where you retrieve quite a lot of information and we use here in the lab for research but also for teaching and for presentation we use UCSF camera which is a free software for visualization and calculation calculation and parametrization for modeling and drug design right so we can directly go to camera right so this is something you can install on a mac on a pc on a on a Linux machine whatever and you can import the coordinate of the code crystal so you only need to know the code actually sorry yeah so if you go in on the PDB on page and you type whatever but if you know the code it's better but you can type coronavirus you have a lot of possibility to to browse to search you have a lot of information on the quality of the crystals but also on the ligands as you can see here if you know the code you can directly import the coordinates in camera and here you have already a good view pretty advanced view of what happened the protein is shown in cartoon we second the restructure depicted in helix sees better sheets and so on and you have full atoms of the ligands the adjuvants the small molecules and the sidechain of the amino acid in contact with it what one thing that render the thing even more understandable is to generate the surface right here we have the surface of the protein with the ligand inside we can kind of make it more visible no it's not possible here sorry it's not working perfectly but i think you see the ligand with inside this little cavity here with all at all so we can select this ligand what is this one it's highlighted here like this we can change the color maybe like this this right here it's very visible so here you can see like the lock and key concept a small cavity a ligand inside and you want to know for example what are the age bounds so as Vincent I have showed you one of the major way to recognize a small molecule by a protein is to is by age bounds so you can calculate and show them very easily in camera like this yeah exactly you can apply and here you have this green light that show you the atoms of the ligand who are involved in age bounds if we hide the surface like this it's not working yeah you have a view that the phenol here is making important age one network with a serine with a histidine and with the backbone of the molecule and a carbonyl and an NH is driven also some important interaction with the backbone here which drive the molecular recognition another thing that is important to note is that here you have another carbonyl which is making interactions with water molecule by default you will see that normally we get rid of crystallographic water this is a normal process this is the first approximation but if necessary you can add the the crystallographic water of course Antoine is it possible to show this full screen it's very difficult to see it's too small you mean is it better that's clear thank you that's better because full screen is not possible on the sharing here okay so again without the the surface that was showing in general the lock and key so the shape recognition the van der Waals interaction as Vincent I have introduced you it's not only a question of shape you have also all this specific interaction among them you have this H bounce and if I zoom very very much for instance here you have a phenyl which is making a lot of which is involved in network interactions around here which explain the binding what I was talking about is also these two H bond with the backbone and the last thing I was talking was here you can see that for whatever reason the crystallographer let two water molecule or place two water molecule here and they are involved in H bond here but for normal docking normal modeling normally we get rid of this water and this is one of the approximation we make all right so these could be a starting point for a project like you have generated yourself a co-crystal with something that is an inhibitor or you have retrieved like we have done this information very often when you are very early in a project and mainly in academia you don't have chemical capacities or enough money to make all the molecule you want so very often in our context we try to get further into the analysis and this is exactly what we're going to make with the ligand here so we're going to see if this ligand have some properties that we can already think to optimize instead of going right away to this ligand and this is the topic of the next slides right it's really as Vincent mentioned at the beginning it's not only important to develop a drug to make a potent molecule the molecule has to have all the properties to have the good faith in the organism so to reach its target into the body so to give you a little bit of a context that's a simplified view of the journey of a drug into the body so if this is a drug a novel drug the active ingredient has to be liberated and solubilized in your gastrointestinal tract where it can be absorbed in the bloodstream and as we have evolved to to protect ourselves against foreign molecules parts will be excreted by metabolism or physical excretion but the rest will be distributed into the whole body and the small molecules of drug will eventually meet its therapeutic target mania protein and the binding of those two will generate hopefully a therapeutic effect this is all what Vincent have talked about so the recognition of the small molecules by the protein but equally important is this red box which we called pharmacokinetics it's all about the properties to get well absorbed to get well distributed to be not too much eliminated and typically we are dealing with this earlier and earlier in project to focus on what has the most the molecule that have most probability to have the good properties to be having well absorbed well distributed and so on and so forth and not only trying to already design something that is very potent the percentage of active ingredient which is in the bloodstream is the bioavailability and this is monitor very early in project what is the clinical impact of admi so absorption distribution metabolism expression of course if you do a test in vitro the concentration of the drug at the target will be the dose so you have this beautiful sigmoid here of the effect versus the dose but in the whole organism not the anterior drug will be absorbed not all will be well distributed part will be metabolism excreted so at the target the concentration of the drug is no more dose and the curve is much more complicated and this has to be taken into account very early when we develop and discover drugs Vincent told you the incredible attrition rate of developing a drug so you have 10 years to come here in preclinical phase and then you have very costly clinical phase and only five to three percent of what enter here will be an actual drug and if we look at what are the main reasons for this failure at phase two and three of clinical development so where already 15 years or 10 years have been spent millions of dollars to generate mainly its toxicity but still you have five percent that seems maybe not too much but it's a pity that after so long five percent of the drug are dropped only because they are not able to reach the target into the body so estimation of at may at earlier step in the in drug discovery very early and because at this step of the project you have no access to physical samples of your drug of your molecules the chemist have generated only one milligram just to test if if it's active so this has forced in the development of computer models to predict at may from chemical structures computer models has to be rather to have a reliable prediction they have to be fast because you treat a large number of structures and they have to be descriptive enough and easy to interpret in chemical terms because the main objective is to modify the molecule to target the specific properties among our tools we have developed Swiss at may where it gathers collection of tools regarding at may and physical chemical properties a little bit of toxicity and other things of the molecules to help in such medicinal chemistry projects to focus on the chemical that have some chance to be your drug so I will not go into every detail so if you submit one or multiple molecules into Swiss at may you have two outputs you have this one panel per molecule output and you have a graphical output so if you have multiple molecules inputted you have multiple dots here we will discuss about four things this bioavailability radar the boil deck a machine learning classification model for PGP substrate and the filtering of some problematic fragments so we calculate a lot of physical chemical descriptors that can be number of atoms small descriptors to generate this kind of plots that are radars that are a range of physical chemical properties optimal for the drug to be bioavailable to be drug like so we calculate properties and the radar of the molecule has to fall into this pink zone to be considered as bioavailable so it can be the size so we know some range of molecular weight where if it's too big or if it's too small the molecule has almost no chance to become a drug one is the polarity so this axis is polarity we calculate a parameter which is this TPSA the topological surface Rhea what it is if you take paracetamol you generate the molecular surface and it's the surface dedicated to the polar atom so more or less non-carbon and non-hydrogen link to the carbon and you have a surface so nowadays we do not generate the actual surface but we have fragmented based very fast approach to estimate the atomic contribution to the TPSA and we have guidelines like for instance for a good gastrointestinal absorption you need to have less than 140 ancient square so here in this example this molecule is 77 ancient square this is right in the middle for that specific properties everything's okay it should be a bioavailable compound this is only crude guidelines in the group we do also research and development and we analyzed more the different properties and for instance we focus first on passive absorption we built data set of well-absorbed and non-well-absorbed molecules we use multiple descriptors and we generate for instance heat map like this like in this lipophilicity versus polarity map we can see here that there is a high concentration of well-absorbed molecule defined experimentally in a zone here and on the other side you have a region of the space populated with a molecule that are not well-absorbed so our idea is there is a clear signal let's try to encompass this zone and build a predictive model to classify the molecule for being well-absorbed or not on this very simple physical chemical space so to make a long story short we optimize an ellipse we thought it is the best shape to encompass the well-absorbed zone of the space we go back to the basic geometry books to get what is an ellipse and you have five parameters to define we run optimization process with Monte Carlo generate random ellipses calculate how it's possible to divide the well-absorbed from the non-well-absorbed iteratively make some optimization with some random numbers and actually at the end of the day we're hands up with 10 million ellipses to evaluate for their capacity to englobe most well-absorbed molecule and discard non-well-absorbed molecule we calculate the Matius correlation coefficient which goes from minus one to plus one to perfect prediction and out of these 10 million ellipses this is the best one so the one that englobes the most experimentally defined well-absorbed molecule and excludes the poorly well-absorbed molecules we said okay we are able to do so for absorption why not dealing with another very important biologic membrane which is the BBB the blood-brain barrier which is a shield that protects your brain it's a physical barrier like you have tight junctions to prevent paracellular penetration but it's also a biochemical barrier for instance you have a pump which is the PGP this glycoprotein that pumped out substrate out of the brain but even though the passive diffusion through the BBB is really defined as the major route for drugs to access the brain from the bloodstream so why it is so important of course if you want to develop a drug who's target is in the central nervous system like you want to develop a drug against alzheimer or a hypnotic drug you have to optimize passive diffusion through the BBB to access the brain but on the contrary if your drug or whatever molecule is supposed to only tackle peripheral targets it's better not to access the brain because then you are on a safer site for unwanted effects so we perform the same thing we get and clean data for experimentally defined BBB permeants and BBB non-permanent molecules we generate heat map in the same referential again a very clear signal where the molecule that penetrates into the brain are located we optimize a different ellipse and among the 20 million ellipse tested this is the one that classifies best the BBB permeant inside from the BBB non-permanent outside we have two very simple classification models but the good thing is that it's exactly in the same referential so you can merge them and we choose a very fancy acronym the boiled egg and also the right colors with the two ellipses and at the same time by plotting whatever input molecule into this graph you are able to on the same time predict the gastrointestinal penetration which is absolutely important and the access to the brain we implemented that in our Switzerland website and you can see that for the molecule you have the dots in this plot telling you if I am in the egg I am well absorbed if I am in the yellow I am well absorbed and accessing passively the brain you can see that these dots are colored according to PGP plus or PGP minus it comes from another classification model which predict if the molecule is a substrate of this active pump which pumped out of the brain or back to the lumen the molecules this piglycoprotein and this one and give the color according to this prediction if it's blue it's a substrate it's pumped out if it's red it's not a substrate it's predicted not to be pumped out to generate this model it was not possible to be as simple as these two linear parameters we have to use a machine learning pattern recognition algorithm which is the SVM so the support vector machine it's a classification model so you have two groups PGP substrate, PGP non substrate and you try to separate them as much as possible with descriptors in hyperspace and the prediction is the same once you have the hyperspace and the two groups you have a new object a new molecule you predict them and if it's close in the space to a group which are the molecule being PGP substrate the prediction is that it's PGP substrate if it's PGP non substrate on the contrary so as I said it's really separating as much as possible two groups PGP substrate PGP non substrate here it's only two descriptors so it's a plan but if you have multiple descriptors like 20, 30 descriptors it's a hyperplan there is a hyperparameter which is the margin with which you separate these two groups and you also it has been seen that it's better not to be linear to separate the two groups with so much descriptor so much axis it's very often much more efficient to transform the coordinates by a Gaussian function and this is also another parameter that you have to optimize so to make these machine learning SVM classification model predicting PGP substrate again we have to build a training set and a test set with literature with experimental data training set is to build the model test set is to test the model with molecules that haven't been used to build the model we calculate as much descriptor as possible being descriptive enough being orthogonal and then we use a Python library which is libSVM with a Gaussian kernel to let the machine learn to separate the two groups with all the possible descriptor in all the possible hyperspace the two groups should be separated as much as possible so it's a grid search exhaustive search of the best combination of all the descriptor and the two hyper parameters the margin and the Gaussian and at the end of the day you have generate all the possibilities and you take the one that has the best classification accuracy in cross validation and with the external test set so it's very important to know how you can extrapolate with an external test set so we always falls into the 75 to 90 person classification accuracy which is enough for this context of streaming databases the last thing I want to talk about Swissadme before using it is the core the recognition of problematic fragments so this is very simple it's only a recognition pattern it is known it's knowledge page that in molecules you have fragments that are problematic can be toxic can be reactive can make the molecule unstable can generate aggregates and so false positives in vitro can be die and so if you read out of your assay is a spectroscopy it's problematic and so on and so forth and here it's simply that we implement this knowledge base is my molecule having this kind of problematic fragment that can be aggregator or generating instability in the molecule so what I propose is just to use Swissadme is very quick and then maybe we can go for a few questions or we are more or less in the middle of the thing so Swissadme as all our web tool it's an independent website with an independent URL so you type Swissadme.ch and without any login without any other thing to do you access to the input page in most of our websites you have the possibility to draw a molecule and then generate this one line notation which is called a smile which is what is truly inputted to the machinery you have the possibility to import a molecule if your molecule is a well-known molecule so if you type aspirin again and you can transfer or if you have access to the smiles you can directly paste the smiles if we go back to our example of our small molecules co-crystallized in the main proteases of SARS-CoV-2 you can see that here in the PDB portal you have description of your ligand and you can have even more description of your ligand in this page you have access to other dedicated database a name and so on and so forth and you have the smiles so you can simply copy paste the smiles from here to here you can type space and give a name and with that you have one molecule to submit to Swiss Arnais it should take less than 10 seconds and this is the panel we were talking about you have a depiction of your molecule you have a canonical smiles generated you have a lot of different descriptors and prediction and you have the bioavailability radar so again on this six axis of six properties you need to fold into this pink zone for a first estimation of the drug likeness the bioavailability of your molecules everything seems okay everything is in the pink zone everything seems okay about solubility no problem if we look at the boil deck this is the prediction so this molecule was mapped into this plot here this is the dot and you can see that it's inside the egg so it's predicted as passively well absorbed but it's outside the yellow so predicted not accessing the brain which in our context is we think a good thing because okay physiopathology of COVID-19 can evolve but as far as our knowledge today the target is not into the CNS so it slungs its testes and this kind of thing so it can be a good thing not to access the brain right everything's okay except one alert as you can see here so in what I told you in this recognition of problematic fragments you can see here that this triple bound bromine here seems to be problematic and actually it is it can make your molecule unstable so this is something instead of testing already this molecule it can be interesting to already patch this deficiency instead of starting with this molecule that we know already have a deficiency we can already make some step forward in silico by trying to select a molecule that is like this but maybe not with this problematic fragment that render the molecule unstable how it is done if you are working in a chemical department if you have a lot of chemical resource if you are in a industry you can start to understand what is the role in the binding of your molecule in the in the protein and trying to optimize already here and design and convince the chemist to make the molecule you design here if I go back to the model panel designing something that is accepted here very well in this sub pocket seeing the interaction you see that there is no specific interaction it's mainly van der Waals shape to your condition so you can design something inside of this to fill up this little cavity but very often you don't have so much so much chemical capacity to trying to generate hypothesis at the real beginning of a project so you will go more pragmatic and seeing what is obtainable as what I can buy what I can get, what I can purchase is something that is similar to this molecule and being extremely pragmatic instead of launching a big chemical program maybe you can buy something and get it and make a test on a few similar molecules and having a first start of the project this will be performed by virtual screening this will be shown by Vincent a little bit so up to now we have seen many different approaches that are related to structure-based design and we have visualized an interaction between the ligand and the protein already Swissadme is compiling different so-called ligand-based approaches but you will see that actually we can systematize a bit this translation of approach so basically what you see here is the interior of the binding site of the kinase B-RAF in which we have superimposed different inhibitors so obviously the inhibitors are not seeing each other we have just superimposed the binding mode of several of them to compare them all together in the binding site and what we see when we extract from the binding site the different conformation of the small molecule so the bioactive form binding into the protein is that there is a kind of common shape between those molecules which is a kind of wave I don't know if you see my pointer but you have a wave going from the left to the right that you can see for instance as an example for the upper left compounder you will find the same shape in the other one with maybe an extension on the lower right corner of the molecule but all the molecules are sharing this global shape so basically they need to have a shape which is complementary to the binding site of the protein on top of that we are going to find also different functions that are common between all of them like hydrogen bond donors and acceptors in the middle of the molecules in different chemical forms and we see why in the scheme of the binding mode of the ligand in the protein binding site if you have a look at the upper right corner you will see that you have many nitrogen atoms in the same place in different chemical contexts but they are all there just because actually they need to be complementary to the hydrogen bond donors and acceptors of the protein itself so again a condition necessary for the molecules to dock into the protein is translating into common features into the molecules and basically at the end of the day it means that all those molecules are able to bind to the same protein in this simplicity coded into the chemical structures and there are different ways to try to extract this information when you have a large number of ligands binding to the same protein and this knowledge can be used in return to design new approaches which is that for direct design we have the choice between structure based approaches here on the left and ligand based approaches on the right to design the small molecule so one of the main ligand based approaches that we are using is for virtual screening so it can be done structure based by docking you will dock a molecule a bit further and you will see that you could potentially do that for many thousands of molecules but the fastest way to do that is to use the ligand based approaches which is based on the similarity principle and the assumption behind the scene here is that if two molecules are very similar they are prone to be active on the same target so basically if you hold that for true for the moment we will demonstrate that a bit later if you have the chance to have a molecule which is already active against a given target in your hand what you can do is to screen a library of molecules with non-activities so this library can be molecules that are for instance commercially available in a given vendor and screening means that you are looking into this library of commercially available molecules though that are very similar to the molecule which is active in your hand and once you have listed all those molecules you can buy them and test them hoping that actually due to the similarity principle they are more likely than the others to be active on the target of interest so to perform that we need to focus at least one active molecule on the target of interest it is not necessarily a drug like molecule it can be a substrate so something which is not stable in the enzyme it can be a natural product it can be a molecule coming from the competition as long as you have something you can start the process and also if you have the chance at several molecules that are available then of course you can make one virtual screening starting from one ligand and cross the different results to get a consensus screening so finding molecule if you have a molecule in the commercially available one that is similar to many different non-active molecules of course buy this one and test this one in priority so to make the similarity principle work we need to define the similarity between molecules obviously to calculate the distance between them and we have different ways to do that so the first one is what we call a two-dimensional similarity or chemical structure similarity and it relies on the similarity between the chemical structures obviously here you see that these two kinase inhibitors actually are based on the same scaffold and with similar, more or less similar side chains and even if you have not done chemistry we see immediately that actually those two molecules are similar the problem of course is that if you compare hundreds of thousands of pairs of molecule you cannot do it visually like that and on top of that we need to be able to compare the molecule which means that if I add a third molecule in this list is this third molecule more similar to erlotinib or to gefitinib a human being would have difficulties to answer this question so we need some approaches to do that automatically for us to do that extremely rapidly if possible so this is the first definition of similarity the second one is based on three-dimensional shape of the small molecules here we consider that molecules are similar whatever the chemical structures they have more or less the same shape and distribute the charges and the non-mopolar fragments in the same position in space like those two ABL inhibitors here in this case so let's have a look at the algorithm behind the simas a human being can do that for one pair of molecule quite easily but we need to do it for a huge number of pairs and we need to be able to quantify the distance a similarity between the molecule so we need an algorithm to do that and this is done through fingerprinting so basically if you have one chemical structure for instance we are going to construct a two-dimensional fingerprint here so this fingerprint will be a vectors of 0 and 1 why? because computers are able to work with binary vectors extremely fast they have been conceived for that and they are extremely efficient in doing so so what we do is that we extract different chemical features from this molecule and in our case a chemical feature is constituted by seven consecutive atoms linked together when such a feature is present into the molecule you put a one in a given position of the vector if the feature is not present you leave it to zero what you have done for this first molecule A you can do it for a second molecule B again you extract all the possible features some will be common to A some will be different so basically you will have these two different vectors at the end of the day and now comparing the chemical structures both down to comparing those two binary vectors which is much easier for a computer so basically we can calculate the tanimoto coefficients also called the jacar index by dividing the number of times you find a one exactly in the same position of the two vectors in both vectors and you divide this number by the number of times you find a one in a given position of the vector in at least one of the vectors so if we just take the example of the 0 and 1 that you see right now on the screen we have three times a one at the same position in both vectors and we have four ones in at least one vector so we divide three by four and it gives the similarity of 75% between the two molecules you see you can quantify the similarity this way and this tanimoto coefficient is ranging from zero for totally different molecules with not a single fragment in common to one for molecules that are totally identical and sharing all the chemical features so since this approach is based on comparing binary vectors you can really compare one million pairs of molecules per second which allows us actually to perform virtual screening so how it works so to get an idea of the efficiency of the approach you can see here three molecules and you can guess that actually they have common scaffold chemical features and indeed if we calculate the similarity between them using the approach that you have seen previously so taking into account all the features this time and not just an extract you have a similarity which is above 40% and going up to 76% between the cofenac and numiracoxib on the other side you have here a second family again you guess that actually those molecules have the same scaffold and are quite similar to each other and indeed you can quantify the similarity to be around 60% in this case of interest if we compare now molecules from between the two families you see that the similarity is never going up to 28% it's around 17-25% on average so basically the three molecules on the left are closer to each other than the three molecules on the right also those molecules are similar to each other within the different families so you see that thanks to this you can quantify the similarity between molecules you can cluster them regroup together molecules that are very similar to each other and you see the basis of virtual screening where you try to find molecules that actually could share the same activity and that you expect are chemically similar but you can also use this information to predict the targets because on the left here you have three molecules that are similar to a they are all cyclo-oxygen azimuthers actually and on the right you have three molecules that are all EGFR inhibitors or kinase inhibitors so you can imagine that if I had a molecule on the screen and I see that this new molecule is very similar to the molecule on the left and quite different from the molecule on the right this new molecule has much better chances to be a cyclo-oxygen azimuthers and by some machine learning actually we can train a model in order to make such predictions when you have a collection of molecules for which you already know the targets and this is the purpose of this target prediction actually but for the moment we are more in screening and as I told you there is a second definition of similarity which is based on the three-dimensional shape of the molecule this time so here we do not consider the chemical structure we just say that the molecules will be similar they have the same shape and this is obviously done for instance by superimposing the two molecules and estimating the similarity by the volume overlap for instance you have other approaches that are much faster so here you can treat about 20 to 40 pairs of molecules per second which is not huge much better than a docking but not huge actually there are alternative approaches where you transform the shape of the molecule in a vector again and by transforming this 3D into one-dimensional vector you increase dramatically the speed of the process but this is out of the scope for today what we would like to introduce before I leave the floor to Antoine for additional exercises is three similarity so basically you have seen the principle of screening it's very easy we look into a library of molecules those that are very similar to a user compound and if you find them you buy them and you test them so virtual screening means finding the most similar molecules so the principle is easy there is just a quite large technical burden to use this approach because we receive molecules in batches they are not prepared potentially we need to transform them into three-dimensional structures we need to find the most appropriate protonation state and totomer for each molecule we need to clean them because in the databases molecules come as maybe a salt with something which is not that interesting which is not the drug like molecule itself and so there is a huge amount of cleaning to be done a preparation for the screening itself so that to transform all those molecules in the good format for the screening tools and all these technical burdens actually preventing medicinal chemists from using that type of approach so what we did at the C was to prepare different collections of molecules so bioactive drugs commercially available molecules for different screening tools so that now there is no such technical burden anymore you just draw the molecule of interest on the sketch you select an approach for calculating the similarities between molecules you select a library and you just submit the calculation to obtain a list of similar molecules at the end of the so I don't know if I will present now maybe docking I'm sorry because this is part of the exercise sorry I thought that Antoine would take over just after this slide but I'm just going to continue and go back to structure-based drug design so we have seen the principle of ligand-based drug now actually you will see how we can perform calculation with structure-based approaches so basically docking means trying to find the most probable position of a small molecule in the binding site of a protein this is done using a typical algorithm decomposing two pieces a sampling engine which tries to find out the most probable position of the small molecule in the binding site of a protein you see it in action in this movie here and you also have a scoring function that tries to estimate the strength of the interaction of the same molecule in those different geometric position in the binding site of the protein so the interaction between this ligand and the protein and the idea of this algorithm is that the molecule the binding mode sorry with the best core correspond most probably to the experimental binding mode that you could determine experimentally so this is the basic principle of docking and actually this opens many doors notably structure-based ligand design let's imagine that you have the structure of a targeted protein, you have a virtual molecule or something that you know is already active experimentally you can dock the small molecule in the protein active site if you have no experimental structure and this is going to give you an estimate of the strength of the interaction between the ligand and the protein but also many information regarding what can be done to this small molecule to enhance those interactions so you can create another virtual compound you dock it into the protein you get an idea of how it's going to bind into the protein thanks to this docking software again you have ideas to announce the molecule you create a new virtual compound you dock it, you get ideas about how to modify create another compound so on and so forth in a kind of cyclic phenomenon so that at the end of the day you can obtain a molecule that is quite satisfying for you it does very well in the protein binding site you seem to have plenty of very nice interactions so that if you are satisfied enough you can synthesize this molecule physically and test it experimentally to see whether it works so you see that actually docking allows you to make this exercise and to perform a so-called structure-based ligand design so as I told you the typical algorithm of a docking software is in different steps first you create different position of the small molecule on the protein binding site for instance by translating it, rotating it rigidly, rotating different diagonal goals you create this way a cloud of different possibilities that are possible from a geometric point of view but not necessarily from an interaction point of view so you need to score each of those possibilities to see which one is making the best interaction with the target and this position is actually expected to be very close to the experimental binding. So there are several ways to do that with different levels of approximation you can have both the ligand and the protein conformationally rigid this is used in the past when the computational power was limited then we have the situation more or less today where we consider the protein as rigid and the ligand as flexible so the ligand really adapts its conformation to the shape of the binding site and finally you can have if necessary both the protein and ligand conformationally flexible this is extremely computationally costly but it can be done for a couple of molecules if you consider that the induced fit of the protein might be very important. So you have a huge number of docking software you can guess that it's very important in the field and actually you have different approaches based on different sampling algorithm or pausing algorithm on different scoring function on different ways to handle the protein flexibility I will just take two examples which are autodoc vina which is a typical example of a software used in academia it's free and open source actually and IEDOC DSS which is the engine that is behind swiss dock so for the sampling algorithm actually you have many different possibilities so systematic search, sarcastic search deterministic one I will just take one example of a sarcastic search so the algorithm of autodoc vina basically and an example of a systematic search which is the one behind swiss dock actually so autodoc vina is mainly but not entirely based upon a sarcastic search which is the Monte Carlo algorithm so the idea here is that for the docking we just start from a random position of the ligand in the protein binding site so random but something which is more feasible I would say. Then you select randomly one diagonal goal one complete rotation of the molecule or translation of the molecule and you select a random modification of the molecule around this degree of freedom that you have selected previously so for instance in this case we have selected one diagonal goal and we have made a random rotation around this diagonal goal and we have obtained a new possible binding mode if this new binding mode actually has better interactions than the previous one we keep it and we apply yet another random modification to it if it's not the case then we apply the metropotenance criterion which means that we will keep potentially the position even if it is worse upon given criteria so we do this because this way we can create new posies that are mainly maybe less good than the previous one worse than the previous one but opening the new dose so this way you can jump from one possible binding site into another thanks to the fact that you are going to accept from time to time a binding mode which is worse so you can do that hundreds of thousands of times creating a new post curing it creating a new post curing it so on and so forth and at the end of the day by ranking all those posies you end up estimating what could be the potential native complex at the end of the day so this is one example of something engine another one that is called systematic search and we will see why is the one behind ea.dss so the first step in this approach is to find out the binding pocket so this is done by an algorithm that we have developed you see here the surface of the hiv1 protease if you cut it we see the interior of this protease and you have here a splendid binding site which is the active site of this enzyme and thanks to our algorithm we can map different points within the pocket this pocket and we actually map all the pockets and to this approach so this way we can find out all the binding potential binding pockets on the surface of the protein and we have an idea of its size and of its shape thanks to those different pink points that you see here once you have done that the next step is to filter the binding pockets according to the size of the ligand obviously if the size of the ligand is much larger than the size of the binding pocket it's not going to be able to be to enter into this pocket and you can discard this pocket from the search and concentrate on another pocket where you have a good chance to be able to bind so actually thanks to this swiss dog can perform what we call blind docking in the sense that if you have no idea where the molecule could bind you don't create a simple search space we limit it to a box or a sphere around a given pocket for instance you can let the algorithm go it's going to select the most probable binding pockets for you and docks are going in those pockets for you automatically with Antoine you will see that you can accelerate even more the process when you are sure about the binding site so once we have selected the binding pockets we start to generate the binding modes themselves and to score them so to create the binding mode what we need to do is to start from the small molecule we identify the central points of the compound that is going to be docked and once you have identified the most strong troll rigid fragments that you see here in red what we are going to do is to find out all the other dihedral on goals in the molecules and we are going to order them as a function of the branch to which they belong in the molecule and the distance from the dihedral on goal to the central fragment so you see here for instance that we have the different dihedral on goal of the molecule as a function of the branch to which they belong and the distance from the central part once this is done we remove everything that is flexible we only keep the central rigid fragment and we position it into the binding pockets that we have determined in the previous step so we will actually use several of the ping points to position the fragment so that we actually have to take into account the fact that the pocket might be large and the center of the ligand is not necessarily the center of the pocket so once we have positioned the central fragment we need to reconstruct the rest of the molecule everything that was flexible and that we removed in the previous step and we do that in a systematic way and this is why this is considered a systematic search so here systematic search doesn't mean that we consider all the possible geometrical possibilities, this is impossible we are systematic in the sense that when we reconstruct the molecule we try to consider all the dihedral on goals in a systematic way so basically let's imagine that we have already reconstructed the left part of the molecule which is in black here and we would like now to reconstruct the right part of the molecule starting from the first two atoms which are colloid in red here so one oxygen atom and one nitrogen if we consider all the dihedral on goals that this molecule there's two atoms can form actually you will see that they can occupy the circles that you see on the screen and the idea of SwissDoc is to consider all the possible positions of these two atoms on the circle once this is done for those two atoms then you have to find the best position for the oxygen and nitrogen this way then you can reconstruct the next two atoms those two carbon atoms that are in red here and we are going to turn those two carbon atoms around the corresponding dihedral on goals trying all the possible on goals until we find the best one then you will get the best position for those two carbon you will be able to reposition the next one so on and so forth until you have reconstructing the whole molecule so at each step you try to find out the best possible position of the atoms before reconstructing the other next step will be to score those different positions to get rid of the worst one for the remaining one we cluster them together basically most of the very often we are going to reconstruct the molecule in a very similar way several times just because you will start nearly from the same starting point you will have the same constraint imposed by the shape of the binding site and there are not millions of ways to reconstruct the molecule in the binding site so you generate several times very close answers you group them together and at the end of the day you clean those clusters to only keep one or two representatives for instance and finally once you have done that you score those clusters using a physics based scoring function that takes into account the salvation model so contrary to all the scoring function used by Wiener for instance here we try to stick to the physics of interaction between the ligand and the protein in order to get a better description of the interaction between the two partners so indeed the second part of the algorithm of ducking software after something in giant cells that we have seen is a scoring function so here I have put many information in grey because it starts with the scope of today but we can say that a scoring function has two purposes for the ducking the first one is to find out which binding modes among all the geometrical positions is most likely the one that correspond to the experimental binding mode and second we need also the scoring function if possible to be able to compare different ligands with each other so if you duck 10 molecules the scoring function should duck them appropriately and tells you which one is actually the most likely to have the highest affinity for the targeted protein at the end of the day so ducking is not a solved issue so people have been working on decades for the ducking software and when you benchmark that to see whether they are able to reproduce experimental outcome you see that there is still room for improvement and I was mentioning this D3R competition that is occurring every two years and you will see that there is still room of improvement if you have a look at those different papers deep learning is one of the approaches but you have many other room for improvement including for instance homology modding taking into account the tens of thousands of binding modes that are present in the PDB in order to get an idea of the binding mode and applies that not using deep learning but using so now I think that I am going to leave the floor to Antoine so that he is going to be able to do the last two exercises in just one room thank you very much ok ok so I will cut a little bit the story because I think the purpose here is to show the websites and not maybe all the details again if you want the technical and the technical details I really encourage to enroll to the course we are giving by the end of the year organized by SIB when we have two full days to describe exactly what's going on so here in this little demo the story was we have a co-crystallized ligand that can be a starting point for a drug design project against Covid because it's potentially an inhibitor of the main protease of SARS-CoV-2 but by running some prediction with Suicide May we have seen that already this molecule has some deficiency and as I told you before if you don't have and that's the majority of the case many resources to do chemistry and optimization and so on and so forth you can use virtual screening, ligand-based virtual screening to find similar molecules that potentially you can obtain and generate hypotheses to start your project we will use Swiss similarity which is our ligand-based virtual screening tool for some of our tools as in Suicide May you can see this little icons there this is the way to communicate from one tool to the other of course you can still type the URL of Swiss similarity if you can go into this bar but if you want to input into another tool the output of one tool you can click on this icon for instance here for this molecule if you want to run Swiss similarity with as input this molecule you click on the twins here and automatically you reach in a new tab of your web browser Swiss similarity whose design we think is not so different so you have a sketcher which is already filled by the molecule you want to find similar what you still have to define is one a similarity metrics a method that can be 3D methods like electroshade or 2D methods like the Fingerprints showed by Vincent and we have prepared for you which is the toughest part of screening multiple database which you find similar molecules for instance approved drug approved by the FDA you have the molecule that are active in binding assays in Campbell and this is the number of molecule you're going to screen and what we're going to screen is this zinc drunk like database with more than 10 million molecules what is this it is a database of more less compiling vendor catalogs so it tells you if the molecule you have inputted or similar molecules to the one you have inputted you can buy so for our proposal it can be interesting and we only will screen with FP2 for a question of time of course you can screen in 3D or in a more advanced combined thing but if we go in 2D depending on the load of the server it should take a few seconds I hope there is not too much no it's starting so to screen more than 10 million molecules should take 20 seconds because FP2 is this transformation of the molecule into binary vectors and so computers are made to do this so this is rolling and very soon we're going to get the result hopefully okay that's a bit long so I guess I have generated the calculation before yes right so this is okay I have to okay I hope you can see so this is the output of the calculation we just launched so a reminder of what what molecule has input a reminder of what you have screen which is this collection of vendor catalogs so FP2 is this 2D fingerprints and these are the results the results are ranked by the most similar molecule to the second most the third most and gone and so on and so forth you have a 2D depiction of the molecule you have a score so if you this is here the tiny motor coefficient the 2 fingerprints of this guy and this guy so we can say that it's 2D chemically speaking 80% similar this one for this one then you have lower score and lower score mean molecule that are less and less similar which is not automatically a bad thing so you can generate further hypothesis like the cyclic system bigger system things with charge and so on and so forth but if you look at the first molecule actually is exactly what we were looking at like it's exactly the same scaffold you have this phenol here this small peptid like thing here with a methyl there and there the same but you miss the part that was defined as problematic by Swiss admin so this is a very interesting molecule because you have the same scaffolds you can potentially generate the same starting point of your project not having the problem of this instability here this is the ID of the molecule into the database of origin so here in zinc you have a link if you click on this link you have access in the new tab in the zinc database so you are out of our tool and you enter the database we have screens and you can have a lot of information but what is interesting for us for instance are the 16 vendors who are potentially able to provide you this molecule let's say you want to go to Backem you click on this and in one click you have the information that in one day you can be shipped in Switzerland a fourth of a gram for 84 francs of this molecule so 250 milligrams is far enough to do in vitro testing so in a few clicks you can have all the information to get a physical samples of this molecules that can be an interesting starting point for your project of course as it is pretty cheap maybe it could be interesting to see the diversity of what you are going to buy so it can be interesting to see what happens if you change a little bit the scaffold if you change a little bit the electrostatics of the molecule if you can have I don't know 10s or 20 molecules from the same vendor it will be very cheap and your starting point you will already generate a bit of structure activity relationship in one shot that's very interesting something that we haven't said but I think it's indescendable is that drug design is really a multi objective process so you want a molecule that binds well to the protein you want a molecule that have all the admin properties you want a molecule that is cheap to buy you want that and everything you change everything you modify you have to then assess in the other tools so here by clicking here you will assess what happened in Swiss at May so you have Swiss at May launched in the new molecules and you can see that yeah the radar is okay the boiled egg is okay because it's well absorbed not reaching the brain it's a PGP negative and you have no more of course the alert of the problematic fragment and then the final thing could be before deciding if you want to go and buy this molecule the question is does it make sense when we look at how it binds in the protein because the real starting point was this right it was this molecule is interesting and this molecule is interesting because it's a potential inhibitor to the proteins of the virus and this is why it is a potential inhibitors because it's well recognized by the protein at the atomic level so the question would be is this molecule binding the same way to what we see in the co-crystal so we will perform molecular docking with this molecule into this exact target to see if this molecule is able to bind in a similar way to what we think is an interesting structure-based starting point for this we use our tool which is called SwissDoc which is the graphical interface of EADoc DSS as Vincent have shown you so again if I type the wrong name it will not work so it's a bit fuzzy on my screen it's not SwissDoc it's SwissDoc here we go so again a website totally free if you click on submit docking you can directly set up your docking you can either import your 3D file you have generated for instance with Chimera you have generated with whatever software able to generate your PDB files, MOL2 files which are the standard for 3D structure but when you have the chance to have a protein which is crystallized in the PDB and a ligand who is in a database like Zinc you can search for this directly in the interface so we know that our protein have this code and yes you have access to this you select the chain A you select for docking this is a fast shortcut instead of generating the protein, opening into Chimera preparing with the good protonation state here in a few clicks everything is done in the back end totally transparent to the user potentially same for the 3D structure if it's a new ligand you have designed you have to generate the 3D save a file on your computer import a file if you have a Zinc number and here you have it potentially you can search directly through the interface here we go yes it's possible to get this compound and prepared automatically by the web tool so generating the 3D having the good protonation state and so on and so forth what you need to give is a name of your job so whatever Zinc Molecule in Mpro here we go your email address and you have extra parameters extra parameters is the speed of your docking going from I'm feeling lucky being extremely fast to accurate the faster you go the less precise your your docking so let's go to accurate and something that is very important is if you don't define what we called a region of interest which is a box the search will be made if you don't define the docking solution the algorithm will test all the cavity around the entire protein but here the game the question we ask for this specific docking is not where is the binding cavity where are the alternatives something the question is if I buy this ligand and escape from this problematic fragment does it will potentially fill the same cavity as this one so to speed up the process and also to be more precise in the question you ask we will define a box encompassing exactly this binding pocket to make all the search inside of this one so you can visualize this one into into camera for instance with this kind of tool so you give a center of your box then you give a size of your box and you have the box here so I know more or less where it is of course you can adapt graphically you can move the box you can change the size of the box the game is to find a box small enough to answer exactly your question and large enough not to miss some of the possibilities so we enter the coordinates here and we start docking starting docking there is a lot of process totally transparent to the user to prepare to set up the files and so on and so forth and after one minute or so I should receive this message meaning that the docking was set up correctly and it is sent to a server to perform calculation with a box of that size of that size and the kind of parameter we use it's in between 20 minutes to 2 hours long calculation so you will not wait in front of this screen for 2 hours so we deal with emails actually so I have run the thing yesterday and this is the email I receive when submitting it like it was submitted correctly and so on and so forth and this is the email where 25 minutes after the process send me an email to see yeah the docking is terminated and you can click here to see the result so clicking here is getting to this result page which is a lot of information very crude 3D visualization of the protein and where the ligand is binding it is interesting if you have made blind docking on the whole surface by clicking on this you have an idea but here we have we have dedicated our calculation focus on this website so it's not very informative what you have to do is to download the output file of the docking by clicking here and you have downloaded an archive which is uncompressed into a folder of this kind and you can automatically open the thing into chimera for instance tools view dock there we go and this sees the docking I ran yesterday and normally a new window will pop up with warning which is this one it's the main window of the docking results every line here is a docking solutions and the docking solution are clustered so all the zero is the first cluster all docking solution are the same then you pass to cluster one all the solution are the same you can add the real score so what Vincent has explained to you all this calculation based on the real physics of the salvation effect are summarized in this full fitness the more negative the more favorable the binding mode so one way to do is to really display the full fitness and so what by this full fitness and you can see that you have solutions docking solutions multiple docking solutions that are very close in number so in one KKALPAMOL year you have a few solution you can browse them I will try to zoom as much as possible so the yellow is the co-crystallized ligand the purple one is one docking solution generated by swiss dock for the zinc compound you can buy this is another docking solution actually they are in the same cluster they are almost the same this is another solution this is another solution and you can see that the two first of the same solution second solution third solution the solution class third the scoring function is actually an excellent overlap with the co-crystallized ligand and you can see that the numbers are very similar so it's not that you have a gap in energy between the first and the third the interpretation you have to be careful with but the first interpretation is that it is very possible these small molecules that you can have shipped in one day will bind the way we think to the protease of the coronavirus and will not generate some problematic instability because of the triple bound to the bromide so this is an example of with a few clicks will navigating between ligand base virtual screening adne prediction PDB structure docking you can select in a few minutes or a few hours it will be very very affordable starting point to generate design of course you have to buy this compound you have to have a test you have to test it in virtual most probably you have to test these molecules plus different analogs it's very recommended to test not only a very precise this molecule plus different analogs of these molecules you hope to have a first signal of activity and start to build a rational showed you an interactive optimization cycle to improve molecular recognition in the binding side but also improve maybe some admin properties improve many different things all right I think with that we are done I just let to know that we had no time to tackle Swiss target prediction but I think technically it's very similar to our old website you can access it by the target click on it and it's this time instead of having a direct screening you have a reverse screening that give you the most probable targets for your molecule so I think with that I am done