 Hello everyone, good morning, good afternoon, good evening, wherever you are looking at this recording. I'm Alexandre Bonvin, I'm a professor of computational structural biology at the Bifood Center for Biomolecular Research at Utrecht University in the Netherlands. And I'm a partner also in the Bioexcel Center of Excellence. And today I'm going to speak to you about integrative modeling of biomolecular complexes. And what you are seeing here in background is actually a beautiful picture of the region of Pula where this summer school used to be before the pandemic and where we surely hope to go back once things will have returned to normal, hopefully. So these are the topics I want to cover today with you. I will give you a general introduction to set the stage and then I want to cover some general aspects of docking, discussing different approaches to modeling complexes and then focus more specifically on HADOC and how you can integrate information to guide the modeling process. In the second part of my lecture I will show you different application examples and how you can use different types of data to model antibody-ondigen complexes, use mass spectrometry data or cryoEM data to guide the modeling process. And I will end with the topic of membrane complexes which are notoriously difficult to model and to study experimentally as well. So what are we speaking about in general? We are speaking here about the social network of proteins. And what you are seeing in this picture is a view of the center of Utrecht, which is a beautiful city you should definitely visit if you have the chance and you see a lot of terraces, a lot of microscopic interaction at the human level. And if you think of life at the cellular microscopic level, everything which is happening is pretty much controlled by interactions between biomolecules and proteins are playing a very important role in those interactions. So if we want to understand what's happening at the cellular atomistic level, we need to basically shed light on those interactions. And this brings me to the concept of the interactome. So what you are seeing here is a lot of dots connected by lines. The dots represent proteins, the lines represent interactions between those. And if you do statistics on those you will realize that there are many more connections than there are dots, meaning that proteins, biomolecules in general, interact with multiple over-bar molecules. So this is a complex network which is also dynamical. It's going to be a rewired depending on where you are in a cell and at which time you are on the cell cycle. And if miscommunication happens in those networks, this usually leads to disease. So to understand what's happening in this complex network, we need also to be able to add the structural dimensions to what looks here like a more two-dimensional network. And this brings us to the structural biology of interactions. Where you have here on the right side the more experimental method and on the left side the more theoretical method to study complex interactions. And here of course you find the classical structural biology method like X-ray crystallography which has been contributing the majority of structure in the protein database, nuclear magnetic resonance and cryo-electron microscopy these days which is really the new star in structural biology which is contributing a lot of very complex assemblies, a lot of information on complexes. But next to those say three key methods that are providing you atomistic information on structure of complexes and proteins in general. There is a lot of other experimental methods that are providing you pieces of the puzzles. So these might not give you the full atomistic picture. So here you see my spectrometry which is playing now a day an increasing role in structural biology but also small angle X-rays, small angle scattering, threat basically any experimental method that can give you a bit of information on whole molecule interact is useful. But you will have to combine this information with computational approach if you want to go to atomistic description of those interactions. And this brings us to the part of the computations and there are different ways of looking at complexes. You can try to generate those biology modeling simply provided there is enough information in a protein database so you can find an homologous template which is for complexes it's not that easy because we have over 150,000 structures or 170,000 structures in a PDB these days but there is probably only around 6 to 7,000 real biological complexes represented in a PDB. So it's a small subset of this interactor. The interactor might show you before as a size of several hundred thousands in humans so it's a very complex landscape. But other methods like molecular dynamics could be used to look at complexes and docking which is the method which is central to this lecture today. So to define docking in a nutshell given the structure of two proteins but many complexes that are formed in life consist of more than two molecules so we need to be able to model more larger and more complex assemblies as well but let's start simple. So given the structure of two molecules can we predict how those assemble basically? So we are trying to solve a three-dimensional puzzle by docking. What do we need to do? We will need to sample the space between those molecules meaning sampling translations and rotations and for all the models that we generate we need to assess in some way if they are fitting nicely or not. This is done in different ways, we're going to come back to that but you can think simply that the shape complementarity should be important so they should fit nicely into each other and next to that you can add the chemistry and the physics and you see the electrostatic energy, the Coulomb potential and you see Lena Jones potential to describe von der Waals interaction. So this is docking in a nutshell. So what do we have to do if you think of docking? So we have some kind of conformational landscape or maybe better said interaction landscape and we have some kind of measure of the interaction so the energy between those molecules. And we need to sample this energy surface in order to ideally locate the minimum which we will assume correspond to the native real structure of the complex. So this is the sampling phase. So we need to sample the space and in this sampling phase we're going to generate a multitude of solutions and then we need to score them to identify basically the most likely native solutions that the scoring parts. So docking consists of sampling and scoring. Now if you have data which can be experimental or bioinformatic data you might decide to use those data and you can do that in two ways. You can use the data during the sampling so we're going to bias the sampling and no longer sample the entire space or we can use the data in the scoring to improve the energy functions that we have and help in finding the right solution to the problem. And this is an illustration of the two strategies. So if we don't use the data during sampling then we have to search to do a global search of all possible interactions all possible arrangements of the different molecules and then we'll have to score them. If we use the data during the search basically the data are going to describe or focus the search on a small region of the interaction space so we can spend maybe more time searching in this region and we have less problems hopefully in the scoring phase. So this is beneficial but this has also dangers because if the data are bad or pointing to the wrong region of space you're never going to find the right solution. So there are good things by doing say a initial search where you could cover the entire space and there are good things by using information-driven search techniques but we have to trust the data. So when we speak of integrative modelling in these days we are speaking basically of the combination of different information sources experimental sources together with modelling often docking to try to find the solutions to the structural modelling of a complex and you see here an illustration of some of the information sources that can be combined. This is by no means covering everything. So you might be able to measure some kind of distance information there are many ways to derive distance information so you see paramagnetic relaxation enhancement so this is NMR based measurements electron paramagnetic resonance EPR threat so those will tell you some kind of information about residues that are potentially close to each other in space. Cross-linking detected by mass spectrometry is also giving you specific information about say maximum distance between two residues NMR is giving you a lot of information so even if you cannot say collect the classical information to solve the structure by NMR you might still be able to map the interaction sites doing this so-called NMR rotation so you know where things are binding but you don't know how they are binding NMR might give you information about the relative orientation of molecules there are different experimental sources for that HD exchange also a very popular technique these days in combination with mass spectrometry will again allow you to detect interfaces without telling you how the arrangement is but you know where the action is where the binding is but it can be as simple as doing mutagenesis together with some binding assay where you identify mutations that are preventing the binding and you will interpret that those are important for the binding hence they must be at the interface shape information coming from say medium to low resolution co-electron microscopy not all EM these days reaches atomic resolution small angle x-ray scattering so this is shape information and this is again valuable information and if you don't have experimental data you might still have we still have sequence and you can do bioinformatic predictions so you can try to predict binding interfaces but you can also try to predict contact between proteins in a form of co-evolution so co-evolution is a field which has been developing a lot these days it's also at the basis of the success of Google deep mind with alpha fold for the prediction of structure of proteins but you can also use this information to define to identify predict distances between residue between molecules so all of these can be combined with some modeling approach docking to generate structure of complexes here is a number of reviews the last two are rather recent one that we wrote on a field of integrative modeling of complexes but also modeling at the coarse grain level and we have some previous review on the topic and there is also every two to three years a special issue of proteins coming out which is dedicated to the capri experiment and capri is a blind experiment to predict the structure of complexes so capri is the equivalent of cast for structure prediction but capri focuses on the complexes so if you want to know what the field is doing you can look up what's happening there so this was my general introduction so let's move now into more specific aspects of docking technology so what are the choice that you have to make when it comes to docking modeling complexes so first we have to think about how to represent the system in the simple example I showed you at the beginning so we are speaking the shape they should be shape complementarity so do you need to represent all atoms or can you deal only with shape so these are choices that have to be made then you have to make choice on how to search this interaction space this complex energy landscape if we assume that the molecule are rigid we will have to sample three rotations and three translations you can fix one molecule at the origin of space and what you need to do for the second molecule is sample all translations in three dimensions and for each and also sample all rotation and for each rotation that you sample you need to do this translational search so it's a six dimensional search problem for two molecules assuming they are rigid but of course biomolecules and proteins are not rigid and dynamic is part of the function so we will have to deal with the flexibility in some way as well we will have to think about how to score all those models that we are generating so how are we going to fish for the best model how are we going to handle flexibility and conformational changes some of the choices that you make here will impact on what you can do here and also if you have experimental data or bioinformatic data how are you going to use those in the bonding process so in terms of representation of system the first docking algorithm we are using an explicit representation of the system where all atoms were described so we have the coordinates of all atoms and when we do the translation and rotation we are rotating those coordinates the first docking algorithm was written by Shoshana Vodak and Shoshana this is late 70s, early 80s so this is a real space search and you see here an example so this is an antibody antigen complex so this is the antibody and here you have the antigen most techniques so well you will find docking software that use explicit representation and you will find docking software that use grid-based representation grid has a lot of advantages so you might just simplify the description of the system you don't need to work with all the atoms anymore and the concept here is that you basically put a grid around your molecule and then you digitize your molecule onto those grids so here you have an example of a tires inside chain so you will define the surface so you will assign properties to grid points or to the voxels actually of your grid you see here in blueish color the surface and you see the inside of your molecule has been grey and then once you have digitized your protein onto those grids what you are using for the search is only the grids and you are going to match grids against each other and in doing that you want typically to maximize the overlap between the blue region because this will be a good shape complementarity and you want to avoid overlap between the grey regions because this will be atomic clashes and there are many software that are using this approach of discretizing the 3D structure of your molecule onto the grid here is an example of PIGR which is an older docking software you can also play with the resolution by changing the grid spacing meaning that you could do your search at a lower resolution or higher resolution and the search can be done in real space again where you move your grids basically or in full space what you also have is our systems that use a mixed representation of your molecules this is done more in the small molecule docking field where you are basically representing here on the grid the energetic field coming from your protein and the ligand the small molecule is described explicitly and the small molecule is moving in this grid which represents the energy contribution of the protein so this is implemented for example in older dock and ICM so this is a mixed representation I mentioned at the beginning that maybe we only need to represent the surface of the molecule and that we can forget about the inside and this is what is being done in a software like Hex which was developed by the late Dave Ritchie and here we are basically using a combination of spherical harmonics those spherical harmonics are mathematical functions trigonometric functions and you see here a number of those they are also used typically to represent the electronic orbitals of atoms and you can use a linear combination of those functions to represent a 3D shape and this is what Hex is doing so instead of working with thousands of coordinates of atoms Hex is using actually a linear combination of 45 spherical harmonics you have to optimize the weight of those to match the shape of your protein so you see here a surface representation of 2 protein protein receptor and the ligand and this will be the shape representation of those two and you only need basically 2x45 coefficients to represent those two and then the search will be done in this frame of harmonic space which makes this kind of approach extremely fast in terms of computing by varying the number of components of terms that you are using to describe your molecule you can also change the resolution of your system there is one last representation that I want to mention here and this is basically the idea of representing molecule as a 3D puzzle so if you think of a 3D puzzle you have all the puzzle pieces so what you are doing here is to decompose the surface of your molecule into puzzle pieces based on the surface curvature, is it convex, concave so you can basically decompose this protein into a number of puzzle pieces and when you are trying to solve a puzzle you don't take all the pieces together and try to find a solution to your puzzle but you take one piece and you are searching for the corresponding piece that will be matching and you will do the same now in the docking so you take one piece of your receptor and you look in all the pieces from the ligands which one does match in terms of shape, curvature and all of that and that's a very fast process it's called geometric hashing and this is implemented in a software like patch dock which is again a very fast software because of this type of implementation so this was about how to represent the system in docking so now we go into the search method so how do we search the space there are different ways so they are first the approach, the docking approach that do a systematic search so we are going to systematically sample all rotations and all translations and for two molecules these are three rotations and three translations again you can fix one molecule in space at the origin and you need to rotate the second one and translate for each rotation translated and for each orientation that you calculate you need to generate a score scoring can become a very expensive part depending on how complex your energy functions are now the translational search is often carried out in full space when you use grid based techniques so if you digitalize your protein onto a grid since grid are an equal spacing of points then you can use fast Fourier transformation techniques to do the translational search in one go hex which use the spherical harmonics does the rotational search in full space because there are properties of rotation of spherical harmonics that makes that very efficient as well now example of search software that are grid based are ZDoc, very famous one plus pro gram FTDoc where they are using basically grid to do the search and do a systematic search of all possible solutions so how does it work so you see here a system of consisting of two molecules so for the receptor which typically will be your largest of the two molecules you are going to discretize it on the grid the size of the grid depends on the size of the largest molecule and you are going to define properties to the grid voxels so you see here the surface again in a blueish representation and the core in a red representation so this you only need to do once because the receptor is fixed in space for the ligands you need to simple rotations and then for each rotations you are going to discretize the molecule onto the grid a grid which has exactly the same dimension as the grid of your receptor and once you have those two grids you are going to calculate you take the fast Fourier transformation of the grid which gives you the complex conjugate and then in the Fourier space you do the correlation function of those grids and this gives you in one shot basically a sampling of all the translations in the system so what you see is a two dimensional example if you do the correlation in Fourier space you get this kind of information where you have all the translations that are possible and for each combination you see a measure of the correlation coefficients and this depends on how you define your functions but in this case we want to maximize the overlap between the blue regions and avoid overlap between the red regions so there is a way of calculating basically the score and the highest point here in this is the one that will be giving you the best matches between the molecules and you are going to store those highest points and then you do a different rotation of your system you do your Fourier transformation correlation in Fourier space and you store the highest coefficients once you have those you can generate your 3D model by taking the inverse Fourier transformation for a given combination of coefficients and this is giving you basically your complex so this is how docking is happening using fast Fourier transformation and grids now systematic search it can be done in different ways you can do it stepwise where you might start at low resolution to speed up the computations and then increase the resolution and at the same time use maybe simple scoring function at the beginning again for computing time purposes and then go to more fancy scoring functions when you increase the resolution usually when you do this kind of systematic grid based search you need to refine the solutions because there will be clashes at the interfaces you also have methods that are energy driven so now we are not going to use this grid in fast Fourier transformation but we use conformational search techniques with the aim of minimizing some kind of energy function so you can think of energy minimization molecular dynamics, barnion dynamics Monte Carlo genetic algorithms swarm-based algorithms and in all those you will find software docking software that implements some of those methods or a combination of those methods and often in combination with some kind of simulated annealing scheme now if you are doing this energy driven search methods you are going to minimize the energy of the system so you need a starting point for this and that is something that needs to be carefully thought about different methods different software implement different methods so here you see an example of ICM and actually Attract another famous docking software use a rather similar approach where you basically define pins on the surface of your molecule and you can put them at equidistant along the surface and then you take a starting point for your minimization or for your energy sampling basically all possible combination of those pins another approach will be to apply random rotation from the system separate the molecule and then start a minimization or an optimization process from all those different starting points which is what we do in Haddock for example so you need to sample a lot so you need to have many starting points and for all of these you are going to try to optimize the energy of the system now what about flexibility flexibility makes everything more complicated it makes the docking problem harder so we have no longer only three degrees of freedom or three rotations and three translations but we also need to sample internal degrees of freedom allow for side chain motions allow for loop motions and that makes everything more complex also the scoring becomes more difficult a big problem that we have in the field it is very difficult to predict a priori if conformational changes are going to take place when something is binding if we were able to do that it might make the docking problem easier in the sense that we know that in specific cases we have to play tricks to sample conformational space but if you don't need conformational changes for the binding and you are going to play a lot of tricks and use fancy methods, usually the outcome of the docking is usually worse so knowing when you need it and when you don't need it is also very important so most docking methods these days can only handle rather small conformational changes there are two benchmarks in the field and one of the cut-off which is used to define is it a challenging docking problem or is it or not is only 2.5 angstrom root mean squared deviation of the conformations between the three conformations and the bond conformations of the complex so 2.5 is not a lot but 2.5 is enough already to make the problem very hard how you are going to deal with flexibility depends on the method that you choose for the sampling so the two are interconnected so one way of dealing with that which can be applied to many different search methods is what is called soft docking where instead of having hard spheres to represent atoms or molecules we allow some kind of overlap between those so this is an implicit way of representing flexibility but the outcome of this kind of docking will have to be refined to remove those bumps another way of dealing of doing this kind of soft docking will be to shave the side chain at the surface of your protein so you see again a grid if you have overlap of grid points during the docking this is a bad thing so you could empty the grid points of the side chain on the surface and say ok if I have overlap in these regions it's not penalizing my solution so that's an implicit way of doing this kind of soft docking and this is something that is done typically in grid based docking algorithm what you can also do is to start the docking not from a single structure but use ensembles of conformations to do your docking process depending on software that you are doing you're using you might have to repeat the docking for each conformation and some of the software will take all the conformation at the same time during the docking process now those conformations can come from for example different grid flow structure NMR ensembles or you can use techniques like molecular dynamics or normal modes to generate and sample possible conformations of your system and this is applicable both to rigid and flexible docking approaches and then if you have methods that allow for flexibility explicitly so you're going to use here energy minimization molecular dynamics type Monte Carlo you can have explicit flexibility in your system which can be limited to sidechain or sidechain and backbone this of course increases the computational course the course so this is typically used later in the docking pipeline to refine the models and then last scoring so we will be speaking about how to represent the system or to search the space to deal with flexibility we generate a lot of models so how do we know which one are the correct one and this is really the Holy Grail in docking so if you can figure out what the perfect scoring function is the remaining of the problem is just computing time so assuming we have infinite computing time then we should always get the right solution out of our modeling but we don't have infinite computing time and our scoring functions are not perfect or your score also depends on how you represent the system and how you deal with flexibility and in a field you will see that there are scoring functions so very specific to the type of system that you are modeling in principle nature only has one set of laws that govern molecular recognition but in practice people might be tuning their scoring function to a specific type of system because it performs better for this answer but it doesn't make them generally applicable so what do you find in those scoring functions in a nutshell you will find terms that are related to the van der Waals interactions and usually you are calculating the intermolecular component of the van der Waals energy the intermolecular component of the electrostatic energy you might be looking for specific hydrogen bonds between the molecule you might be measuring the amount of surface which is buried between the molecule desolvation energies that are some way of representing the gain of energy or the loss of energy when you remove solvent from the surface of the molecule you might use statistics or statistical potentials so there are different ways of different scoring functions and you find all kind of combinations are there and if you have data if you have experimental data you can of course also use the data to filter your solution that's a very good way of selecting solutions actually so surface related term they are typically based on the solvent accessible surface area which was introduced by Fred Richards so you measure the amount of surface of a molecule and when it comes to complexes what is relevant is the area which is buried upon complex formation which is called the buried surface area and this is basically the difference between the accessible surface area of the separated molecule and the surface area of the complex so this difference gives you the buried area and typical values for proteins complexes range between 1200 and say 2500 square angstrom but you will find complexes that are way larger interfaces and you will also find complexes that are smaller and still are binding very strongly the disolvation energy is also an empirical function which is a function of the solvent accessible surface area of atoms so you can calculate that pair atom times some kind of parameter which is a function of the type of atom so an oxygen will have different properties than a carbon when it comes to calculating the solvation energy and the disolvation energy as for the buried surface area is basically the difference of the solvation energy of the separate free molecule minus the solvation energy of the complex pair potential statistical potentials so these are been introduced quite some time ago and you see one example based on analysis of the protein database where you see amino acid amino acid potentials but you also have atomistic potentials that are out there d-fire is a scoring function which is used quite a lot in the field both for calculating the quality of predictions of single structures but also in complexes and basically you obtain those functions by doing statistics on the PDB and you will have combination of amino acids in this example that are favorable positive again in energy and you have combinations that are unfavorable so this is based on statistics basically and you basically measuring how well does a model that you generate matches what you find in a PDB and this is implemented in different software so in general the more sophisticated the scoring function that you are using for your modeling the more time it will cost as well so there is always the balance of how much time do you have to get results so you can do very fancy things but it might take a very long time to get results finally when we look at models you might be generating, we might be sampling millions, billions of solutions when you do docking and you of course only keep the best ones but the best one might still be a large number a few hundred, a few thousands so what is also often done in the field is to cluster the solutions and you can roll each other into a bag and instead of scoring at an individual structure base we score at a cluster base some software even use the size of the cluster as a scoring method it depends very much on how you are doing the sampling if the size is a proper parameter or not and there are different ways of doing this clustering so you need to measure the similarity of two models this can be done based on RMSDs positional RMSDs which can also be done based on the contact that are made between the molecule the fraction of common contact which is something we introduced so these were all the general aspects about docking and now I want to come to more specific aspects of docking with Hadock so information driven docking with Hadock so Hadock has been developed for more than almost 20 years by now and it was developed to make use of experimental information from the start so actually these were limitations that we had in solving structure by NMR of complex which basically gave us the idea to develop this docking approach where we incorporate data to guide the modeling process and over the years we have added support for a large variety of data and I'm going to show you some application in the second part of the talk I mention also that many complexes consist of more than two molecules so in Hadock we are currently able to dock up to two molecules but going to that large amount of molecules only makes sense if you have data, if you don't have data this is a pure waste of time symmetry is also something that we can leverage in the docking because it limits the solution that you have to generate and as you will see Hadock is a software that allows for explicit flexibility during the refinement stage and we have been participating to this blind competition over the years with a good performance over the years so how do we encode the information that we have at hand in Hadock so many of the information that we get is telling you that some regions or some residues are important for the binding but it's not telling you how the binding takes place so which contacts are made and the way to encode that into some kind of energy function is to use what we called ambiguous interaction restraints so we have an example here of two molecules A and protein B and we have identified for example from mutagenesis that those red amino acid are important for the binding and this one for side B but we don't know which contacts are made the green ones that you see here are what we called the passive residues these will be the surface neighbors of the active so we want to have a good definition good coverage of the binding site which is why we usually pick the surface neighbors of what has been identified experimentally to increase a bit of definition of the binding site and then we're going to define distance restraints between each red residue that you see here and the entire interface on the other side so each red residue will have one distance restraints which are the functional form like this one, this is a typical energy function used in NMR structure calculation so it's harmonic for it has zero energy between upper and lower limit then it's harmonic for the lower limit and then on the higher distance side it's harmonic for a short period and it becomes linear and this linear function has the advantage that the forces here are not constant and the forces is what is driving the molecular dynamics and energy minimization so how do we calculate the distance that goes into this function in the case of these highly ambiguous distances so we take all possible combination of atomic-atomic distances between all atom of a residue i and all atom of the entire interface on the other side so we take all possible combinations and all those combination of distances we sum up as 1 over the distance to the 6 and at the end we take the inverse 6th root of that distance and this is giving you one effective distance and this one effective distance is what goes into this energy function this will be this R value here so each residue has effectively one distance restraints but to the interior interface on the other side and this summation here has the property that this distance here is always shorter than the shortest distance that enters the sum so at the end you are basically satisfying your energy function as soon as some short distances exist between the interface so these ambiguous restraints have the property so you have this network of ambiguous distance between the interface so this has the property that is going to draw the molecule toward each other without pre-defining what the orientation should be so you can sample different orientations but you will make sure that the residue that you define as active should be part of the interface so that's really the basis of Haddock since the restraints or since the experimental information or predicted information cannot always be reliable so we will have false positive, false negative what we do by default is to typically remove randomly 50% of the information for each talking trial that we are doing and we might even remove up to 90% in cases where you use bioinformatic predictions so how do we calculate the interaction between the molecule and how do we search this energy landscape so we defined an energy function which is a classical force field which is also used in molecular dynamics where we have bonds between atoms angles, torsions are on bonds and the non-bonded interactions these are important for molecular recognition and on top of this classical say force field terms we add an energy function that represents the experimental data and the search in this energy landscape is done by a combination of energy minimization and molecular dynamic simulations so we have typically three stages in Haddock in the first stage the molecule are treated as rock solid, rigid and we prefer an energy minimization driven by the data that we put in so you see here the process of this energy minimization and the spheres are amino acids that were detected by NMR to be important for the interaction so this sampling is done between 10 and 100,000 times typically we take the best model and then we subject those models to a simulated annealing protocol in torsion angle space so we do now molecular dynamics in torsion angle space and we add flexibility at the interface and also along the side chain and the backbone look at this loop it just flipped over so this is an optimization of the interface between the molecule and which residue are optimized is automatically defined by default based on the contact that are made between the molecule at the end we take those solutions and we use to put them in a bath of water and then do a very very short molecular dynamic simulations in the latest version of Hadock we don't do that by default anymore we do only a minimization because at the end the models are not changing that much still the energy changes so the option to add water is still there but we don't do it by default in terms of flexibility when we speak of Hadock there are different levels so we have implicit flexibility as I explained you but also explicit as you've seen in the movies we allow for explicit flexibility along side chain and backbone so in summary we have free stages, rigid body minimization simulated annealing in torsion angle space and refinement in explicit solvent and Cartesian space and these are the scoring function that we use at the different stage and this is the scoring function that we use at the end to select the final model it has 20% of the intermolecular electrostatic energy 100% of the intermolecular van der Waals energy the dissolvation energy term which I explained to you before and the experimental contributions so you have these intermolecular energies in blue you have the dissolvation so basically the price or benefit of removing water from the interface and then you have the experimental information we also do clustering of solutions and actually typically we score on a cluster basis most of Hadox users are using our web portal which is available from WNMR Science UNL so we have more than 22,000 registered users to date and it has served a large number of docking runs since 2008 and we are able to provide those resources because we have access to grid resources distributed worldwide through projects with the support of the European Grid Initiative EGI and the European Open Science Cloud among others you can also get the software to run locally but there you will have to do much more manual intervention while the server does a lot for you this is a short overview of our current user base where we have more than 22,000 and you see that worldwide it's used all over the place with a lot of users in Asia but also in the US and all of Europe combined we have seen in last year that there is a significant increase in the use of our portal because Hadox is being used to model a lot of complex related to Covid actually so we see this Covid effect so this was the start of the lockdown in April and we have seen almost a triple in the number of submissions to the portal and about one third of all the submissions since April last year are Covid related so users can tag their submissions as Covid and this is also the number of single users unique users per month and you will see that so this is again April 2020 you see that at the start of the pandemic this has more than double so we see actually the wave also of the pandemic reflected in the usage of the portal and you find here Captain Hadox if you think that Hadox is only a fish Hadox is also a cartoon figure this is not a real album of Tintin and the coronavirus but actually that's something that you find in internet this is a combination of Tintin going to the moon and Tintin and the Blue Lotus which is happening in China so people have been forcing the the pandemic here now Hadox I already mentioned it is a core software in the BioExcel Center of Excellence as such we are operating a forum under BioExcel where you can ask questions you can search for answers to your questions but you can also ask questions directly and get answers to that in the BioExcel Center and next to Hadox which is our core software in Utrecht we are also operating a lot of other software that are relevant to a different aspect of modeling interactions so as last a very short example you see here again Captain Hadox now as a pirate and this particular example has to do with iron piracy this is how a bacteria is actually hijacking iron from its host and it's doing that by using this receptor Fuse A and Fuse A is binding to ferredoxin from the host because ferredoxin contains an iron software cluster now in this team of people we have NMR people, we have X-ray people and we have modulus ourselves they could not manage to crystallize the complex but they could manage to crystallize this beautiful membrane protein but they never managed to bind to find crystal structure with the ferredoxin bound to it by NMR they could however study how ferredoxin binds to the receptor so they did NMR titration so you see amino acid sequence and you see some kind of displacement of the NMR signal and you see there are specific regions along the sequence that are affected by the binding and if you map those on the surface of your protein they map to a very specific region so this is the region which is interacting with the receptor so this is information that we can use in Haddock to guide the modelling process on the membrane side they could not do the NMR but we also have information because we know which loops are the extracellular loop of this receptor and this is where the binding has to take place so we can define this region, those extracellular loop as the binding region for the red region here of our ligand the information that we give to Haddock and Haddock returns you a number of clusters you see here the first two clusters that are actually quite close in space and this is the top solution so based on these models you should go back to the lab devise experiments to try to validate which one of those two solutions is the most likely correct one doing with agenesis for example this was not done yet when the paper was published but hopefully it will be done so with that I want to close this first part of my lecture and I thank you very much for your attention so far and I will see you for the second part, thanks!