 Welcome back for part two of my lecture on integrative modeling of biomolecular complexes in the context of the Bar-Excel online summer school. So now that you know everything about docking in general and docking with HADOC, we're going to move into application example illustrating how you can use diverse set of data and diverse information to guide the modeling process. And we'll start with the modeling of antibody antigen complexes. As you are probably all aware, the antibody structure consists of two chains and heavy and a light chain. And the action is happening in this head where you have the, you see here the light chain in magenta and in more grayish color, the heavy chain. And the loop in these regions are the loops that specifically recognize their antigen. They are consisting of six loops. So we are, they are called H1 to H3 for the heavy chain loops and L1 to L3 for the light chain loops. H3 is typically the longest and the most difficult to model because of conformational viability. So this is where the binding usually takes place. And this is of course information, this is knowledge that we can use to guide the modeling process. So to benchmark actually how well this kind of information can guide the modeling process, we use the small data set of 16 complexes that are extracted from the docking benchmark 5. So this is a reference in the field when it comes to docking. This is what you are using to basically measure how well your method is doing in terms of predicting complexes. We decided to test different docking approach here. This is the work of Francesco, former PhD student in my group. And we use four software which allow first that are simple to use, that some of these are available as web portals, but they also have specific options to deal with antibodies. So CLOSPRO, ZDock, these are two rigid body grid based fast forward transformation docking software. And they have options to either define these hyperviable loops or mask regions for the search. And LightDock is one software which uses artificial intelligence, swarm based optimization, and also has options to define residues that are important for the interaction and our own Haddock software. So those allow you to define restraints either for the scoring, which will be done more in CLOSPRO and ZDock or to drive the modeling process which is done in Haddock and LightDock actually. So we tested different scenarios which represent different types of different levels of information on the system. For the antibody it's clear we have the hyperviable loops where we know that the binding is taking place. We might not know exactly which amino acid within these loops is actually contacting the antigen, but in general we have this information. On the antigen side things are more complicated. It's not always easy to predict where things are binding. So we have actually three different scenarios. In one scenario we assume that we have no knowledge of the binding site on the antigen so we are targeting the entire surface. For Haddock this will mean that we define all solvent accessible surface residues as passive. And then we have a scenario where we say we have some knowledge of the binding site on the epitope and this knowledge is based on the knowledge of the crystal structure. So it's an artificial set. But we define all residues that are within 9 angstroms of the antibody as being part of the binding site. So that's a rather well defined epitope. And then we have the ground truth which is all the residues that are making contact at 4.5 angstroms so this is the perfect scenario. So you have a perfect definition of your binding on the antigen side. So we tested all three scenarios with all four software. How do we measure the quality of docking models in general in the fields? So these are criteria that are used in CAPRI, this critical assessment of predicted information, well accepted in the field. So we have three measures. The first one is what is called a fraction of native contact. So if you have a reference structure, a crystal structure, often you have a list of contact that are made across the interface. If you have your docking model, you can calculate how many of the contact that we find in the reference are present in the model. And this defines this fraction of native contact. So the number of common contact between the two divided by the number of contact in the reference. So if you have a fraction of native contact of one, you have predicted all contact correctly. So the other measures require superimposing the model to the reference and then we have two measures. In one case, we superimpose on the receptor, the largest of the two molecules. And then we calculate the RMSD difference on the ligand, on the smallest of the two. So that's called the ligand RMSD, error for ligand. The other one, we do the fit of the molecule of the model onto the reference on the interface. And we use for that the interface of both the receptor and the ligand. So this is more focused on the interface. Because it's more focused on the interface, the RMSD values that you are calculating are typically smaller. Here the RMSD values are larger because changes which might be removed from the interface might lead to higher RMSD value even if the interface is well defined. So to define a high accuracy prediction, we want to have at least 50% of the contact correct. The ligand RMSD or the interface RMSD should be below one extra. We have a medium quality solution which is attributed to two stars in capri terminology. If you have 30% of the contact correctly predicted, ligand RMSD below five or interface RMSD below two. And you have an acceptable solution if 10% of the contact at least are correctly predicted. And the interface RMSD is below four or ligand below 10. So this is the matrix that we're going to see in more cases in my examples. And here are already the result of all the docking that we've been doing. So a free scenario times four different software. Each row in this plot corresponds to a scenario. So here no information on the antigen. So we have the entire surface of the antigen. Here we have a loose definition of the epitope on the antigen. And here we have the true interface. So this is the perfect case. Each column corresponds to a different software. What do you see on the x-axis of those plots? This is the number of models that have been ranked by the software which you are considering. So what's your success rate? Which is the y-axis. If you only take the top one model predicted by the software. And we depict the quality of the models by colors. So dark green is high quality within one X-Trum. Light green is medium quality within two X-Trum interface RMSD. And blue is an acceptable model. So if you look at all the plots together, you directly see that actually the header column has the most dark green. And also has in general the highest value of all. So that's the first observation. So using the information directly in Haddock to drive the modeling process does benefit clearly the generation of models. What you also see is that the more information you input into the system or the better the information is that you have the better all the software are performing. So here you have the true interface. And you see that Cluspros, ZDoc, they are all doing very well. Actually all software are doing quite well if you select the top 100 models. But not all have the same scoring capabilities. For example, a LightDoc is doing very well even without information on the antigens. So you see a success rate here that goes to above 60%, 65% success rate in the top 100. But it's not able to identify those models because they are not scoring in the top say 10, which is only 6%. Haddock will reach about 31%. If you have the entire surface, top 10, which is actually slightly higher than Cluspros and ZDoc is reaching about the same level as well. As soon as we start adding information, then you see that things change. So we see Cluspros is also all software benefit from the information. But we see that in general actually Haddock is one of the best performing and we see that we are generating now high quality model because we use the data directly to drive the modeling which is not the case in Cluspros and ZDoc. And if you have the perfect interface then you have 100% success rate in case of Haddock, even top 1 model while the other software reached out after top 10 or top 15 in the case of ZDoc. So there's a clear benefit in using the information for the modeling process. So these were single structure based statistics. So now we can look at cluster based. So if we do clustering and we assess the clusters and you see here this is only Haddock in this case, the three scenarios, entire surface, loose definition of the epitope, perfect definition and now the X axis goes only to top 5 because the clustering limits very much the number of models that you have to loop and you see that after clustering if you have good data you reach 100% success rate. This was already the case in single model. With a loose definition of the interface we are above 50% with clustering if you look at top 1 and if you look at single structure clustering we are below 50%. So clustering does help also in the scoring but if you look at here, single model, no information on the epitope we have 25% if we look at single structure and we are below that if you look at cluster based because there might be more difficulty to cluster structure where you have very little information. The message here is if you have no information at all it's better probably to do a scoring on a single structure based for Haddock if you have information then clustering really is the way to go. So if you want to read all the details about the story and many more things you shall refer you to this structure article that was published last year. So using information and in this case for the antibody it's the knowledge of the hyperviable loop with some knowledge of the epitope if you have it clearly improved the modeling of antibody antigen complexes. So let's move now to another application example which in this case is modeling from mass spectrometry data. And this is the work of Adria Melchion from a post-document group and we are going to look at an assembly, a complex which has to do with the bacterial circadian clock. A circadian clock is the reason why we have jet lag if you move say from Europe to the US. So we have an internal clock that kind of measures day and night. And bacteria also have some bacteria also have such a system. And we are here looking at a cyanobacteria which actually uses light as an energy source. And it has a very fascinating system. So you only need free protein in this bacteria to generate this molecular clock. So you can overexpress those free proteins called K, A, B and C. You add phosphate and ATP and the clock starts ticking. That's all you need. How do you know the clock is ticking? You can monitor, there is a phosphorylation, dephosphorylation process taking place between these proteins. And you can monitor the phosphorylation state of the system by mass spectrometry. And then you can really follow the frequency of your clock. So MS in this case gave us some information. For now we are only going to focus on the complex form by K, B and K, C. So from native mass spectrometry where we are looking at a full complex we know that the stoichiometry of the complex should be 6 to 1. So 6 K, B molecules binding to 1 K, C. And doing hydrogen deuterium exchange experiments with MS detected by MS the binding interfaces of those proteins have been identified. So you see here K, B which is the smaller of the two proteins and the blue regions are protected from HD exchange when the complex is formed. And this defines one surface of your protein. We also have some mutagenesis data, those free amino acid here, arginine and lysine, so positively charged amino acid if mutated, abolish or alter the binding properties. Here you see K, C which is the larger of the two components. So you see it's a much larger system. It's kind of a double donut. And what you see in blue are again here the HD exchange data identified by MS. So those regions are protected from exchange when the complex is formed. And you see that there is a 6 volt symmetry. So on the top you identify 6 binding site and at the bottom we also identify 6 binding site. So this will be 12 in total but we know that the binding is 6 to 1. And we also know that the binding should happen either on top or the bottom but they should not be a mixture. What is also interesting is that if you open the donuts you see that there are also protection differences at the interface between those two. So there seems to be a communication between the top and bottom region of the system. So what did we do? Since we have two binding sites, we did two docking experiments. One is targeting this one. So we didn't dock 6 to 1 but we docked 1 kb on 2 kc and we only docked on 2 this region. And we did the same for the bottom. So the top region is called the C2 region and the bottom region is called the C1 region. And this is the outcome of the docking basically. So we get two sets of solutions. Based on the Hadox score we could not say if the top solution C2 were better than the bottom solution C1. Forget about this for the time being. And these are the different clusters that we obtained. So we have two sets of solutions and based on our scoring function we are not really able to distinguish those two. But MS comes to the rescue and in this case it's ion mobility mass spectrometry. So in mass spectrometry you can have a molecule moving in a spectrometer so the native complex moving in a spectrometer and you're measuring the time it takes to this molecule to bridge a fixed distance in a spectrometer. It's flying against a gas flow of ions. And the hydrodynamic properties of the molecule will define the times intake. As if you are swimming in a swimming pool, if you put a sombrero hat and go swimming you're going to swim much smaller, much slower than if you have no hat. So this has to do with your hydrodynamic properties. So by measuring this time of flight basically you can get information about the 3D shape of the protein. So that's a long extrapolation but it's an experiment which is quite easy to do. This has been done to study the maturation of viruses for example but it's also telling you about the arrangement of a protein. If you want to read everything about it I refer you to this Nature Protocol paper 2008. So those data were measured for this KBKC complex and this brings us now to this value that you see here below those complexes. So those values are the value that we predict based on the model so we can back calculate what this collision cross-section is and it's a surface area, you see the units are nanometers to the square and we can compare those predictions to the experimental values and the experimental values are indicated by the dotted line here. So the experiment tells us that those values should be between 133 and 140 square nanometers and if you see the different solution you directly see actually that this indicates that the C2 solution should be fitting the data better and the C1 solution is actually has two large surface areas for the experimental data. So based on this we predicted our best scoring model of the C2 interface as being representative of this complex and this was published in 2014 in PNAS. Now in 2017 our collaborators managed to actually get a cryoM structure of this complex and what the cryoM model reveals is that C1 is the right structure. So we screwed up and this happens, you know, in research it's going to happen to you for sure, at least once, it's part of research. So we should not hide this kind of failures but we should try to learn for those failures and there are different reasons here why things went wrong. One possible reason is that we are dealing with a non-globular system. If you look at KC, you see this protein as a hole in the center. So it could well be in the spectrometer when you do MS those proteins are flying in vacuum. So if you have something which is non-globular, it could well be that the system compacts in the vacuum in the spectrometer. And if this happens, what you're going to measure is an underestimate of what you have in solution. And this will basically, so if this is an underestimate of what you have in solution you have to move up those two dotted lines and then the green one becomes the more, the better fitting one. So that could be one explanation. But there is another explanation here. And this explanation is that nature is also fooling us. When we did the modeling in 2014, we used a crystal structure which was a perfectly fine crystal structure of KB. When the cryo-EM complex was published, there were actually two articles back to back published in Science and one of those was describing a crystal structure of KB but that structure was different. The fold was different and this different fold is also what is found in a cryo-EM structure. So it's exactly the same sequence, the same construct but if you compare those two crystals structure so the one that we used when we did our initial modeling was this ground state fold this GS fold and you see it's beta-alpha-beta, beta-alpha-alpha-beta The same sequence later on was crystallized in a different fold the first part is the same but the second part is completely switched so you go from alpha-beta-beta-alpha while it was beta-alpha-alpha-beta before Same sequence to different folds so this is not something that you can really predict we had a crystal structure, we trust that crystallography is generating good models and there was nothing wrong with that structure but this is one of those rare examples where a sequence can exist in different folds so now if we take the correct fold of that protein and we repeat the docking and in between we also had improved our modeling capabilities in Haddock by introducing coarse-graining meaning that we are going to simplify the representation of the system so we group four atom, four heavy atom into one particle so we have less particles to deal with so it's a smoothening also of the energy landscape of the surface and for this we use the martini force field from the Silver Tians marine group we do the docking at this coarse-grained level and at the end we transform back the system into a fully atomistic description and we have support for both protein and nucleic acids and this was described in those two publications so we use this approach and now we want to model the full complex we have six kb binding to kc so it's a seven molecule docking experiment that we are doing we use the same data that we used in 2014 we apply c6 symmetry because we have this symmetry in a complex and seven body docking using the coarse-grained implementation in Haddock and what you see here are the docking models superimposed onto the crystal structure so it's not perfect but you see that there is actually quite a nice similarity not the crystal of the cryo-em structure which was about 4.5 angstrom if you look at the center of mass of the crystal of the kb molecule in a model it's very close to the center of mass in a crystal what is even more interesting is that now the correct solution which is the bottom solution scores much better than previously so previously we could not distinguish between those two in addition by doing coarse-graining we have a 7-fold speedup in the docking process we did not use the cryo-em density for guiding the modeling process but we did use it to validate our model now if we fit our model to the cryo-em density and we use camera to do that the correlation score of the model was 0.82 the correlation score of the structure deposited in the PDB is 0.84 so the model is slightly lower than what has been experimentally determined but we did not use the data in the modeling process so that's a nice example of using first the right structure, the right fold but this is not something that you can really foresee and using an improved methodology, coarse-graining that allows us to model the full assembly leads to the correct solution so that was for us a lesson since I was mentioning cryo-em now it's a good way to move into the cryo-em topic so cryo-em and cryo-electron microscopy and cryo-electron tomography now are really the new stars in structural biology so you're probably all aware of what's happening there so you are basically vitrifying a sample and then you're making an image with electron of this sample and in the recent, in the last years there has been a huge development in cryo-em because the detectors in particular have been becoming way better and because of that cryo-em has moved into higher resolution structural biology so I think the record now is probably 1.6 extra resolution so what you're getting are 3D reconstruction basically so you have 2D images that you have to interpret find what the orientation is that you are looking at and then you are reconstructing a 3D object if the resolution is high enough then you can directly fit your proteins and you build your amino acids even in density as you were to do for crystallography but there are still cases where the resolution might not always be sufficient and in those cases you would have to rely on some over-modeling tricks so this was the resolution typically before 2013 we were speaking of blobology where you will have maybe 10 extra, 8 extra resolutions or even 30 extra where you can see the shape of your molecules shape of your complexes but you cannot really distinguish which amino acids go where and this day we have full atomistic picture so that's really an impressive improvement and for that they also receive, the technique, receive the Nobel Prize now if you don't reach high resolution enough to build the model and this is what has been happening for many years you need to, so what has been done is to use existing structural information so crystal structure of component of a complex and then try to fit those into the density to generate a 3D model of your complexes and this is still happening because not all cryoEM maps reach the resolution which is sufficient to build from scratch also you should know that resolution in cryoEM is not a constant resolution depends on where you are in the map so some region of the map may be very high resolution and some of the regions might be low resolution so you might be able to build from scratch structure in part of your maps and you will have to rely more on modeling for other parts of the map now how this fitting was done or is typically done is to do that one molecule at a time so you take one molecule of your complex you search a bit like we do in grid talking you search all translation and rotation and you measure the correlation to the electron density in that case and you find the locations where those molecules fit but you do that one component at a time meaning that you never take into account the interactions between the component typically during the fitting process so you don't take energetics into account and flexibility is usually added afterwards so we want to see if we could actually use Haddock which used the energetics during the docking process to improve those cryoEM models to generate better more reasonable views of those interfaces and this is the work of Hido von Zundert who implemented basically EM restraints into Haddock so the way we do that so Haddock uses CNS CNS is a crystallography NMR system it's a software used in NMR and in crystallography to calculate structure of molecules it has a lot of restraining functions so the distance, this ambiguous restraints that we are using in Haddock are actually applied in CNS CNS is also a way of describing density because it's used for crystallography so you can transform an electron density into a crystallographic representation and then we could use the energy functions that are in CNS to optimize against this density now if you try to dock directly against the density the system is not converging so there's a bad convergence and it's not working so our way of solving this issue is to first identify in the density the most likely location of the molecule and replace centroids at the center of masses of those and there are ways of doing that to come back so once we know where the molecule should end up you might not know which molecule end up where we define a distance restraints from the center of mass to each molecule to those centroids and we used distance restraints again into Haddock to bring the molecule together together with electrostatic van der Waals interactions so this is basically generating an initial model of the complex once this complex has been generated we turn on the energy term that represents the density and we're going to optimize against this density and these optimized solutions are going into the flexible stages of Haddock where we also have these energy functions active so we can now incorporate EM data into the modeling process into Haddock and combine this information with any other type of data that we might have to drive the modeling process now how do we get those centroids since there were ways of doing that but we wrote our own fitting software which is called PowerFit and PowerFit gives you next to the location of the molecule and really it outputs also the coordinate of the centroids the most likely location of the center of mass of the molecule this is the input that you can then fit into Haddock so here is one application example so we're looking now at a 16S ribosome and we're looking at a protein called KSGA binding to that ribosome there is an EM map available this is the resolution 13.5 angstrom we have the crystal structure of the ribosome we have the crystal structure of KSGA on the RNA side there are data from Hydroxy radical footprinting that are telling you where the protein is binding so basically the RNA is protected from these Hydroxy radicals because of the presence of the protein so this is pointing to the binding site on the ribosome site and on the protein side we have also mutagenesis data so there are three residues that have been mutated and shown to prevent the interaction, prevent the binding basically so this is all the information that we have to do this modelling process of course there is a structure, a model of this structure of this complex that has been deposited this is for ADV, it looks beautiful as long as you only look at the backbone representation but when you turn on the side chains and you look at the interfaces you see a lot of clashes because the fitting was done in a rigid body fashion so everything yellow here are clashes if you look at the three mutations that are important for the interactions one of them connects to the RNA but it clashes actually and the other two are not even contacting actually only this one makes a reasonable contact this one clashes and this one is not even contacting so the model that was built based on EM data does not really explain the data that we have on this system so can we do better? that's the question so can we use Haddock to generate a model which will not have all these clashes, all these bumps and here is the result of this modelling so you see here now the RMSD with respect to the deposited structure in a PDB versus Haddock score this score now contains the correlation between the model and the electron density you see that we are getting basically one set, a unique set of solutions and this is the solution that we are getting and now you can zoom in in this interface and look at for example at those three amino acids that were mutated and they are all making nice interactions it doesn't per se mean that they are correct but at least they explain the reason why if mutated those residues are preventing the binding and what is also interesting is that the model now reveals additional key residues there are two arginines that we identified from the model that seems to play an important role in the binding as well those two, these were the other one so this gives you additional handle to test the model so we could do mutagenesis on this and test if indeed this is changing the binding and all those amino acids that were previously mutated but also the one that we identified also actually quite highly conserved so you see a map of conservation on the surface on the backbone of the protein so this is all consistent so we have now a way of supporting cryo-electron microscopy data in Haddock it's available also from the web portal and it can be combined with any other type of data that we have with Haddock so now last part of my story we are going to move into membranes and membrane complexes in particular so about 20% of the protein is consist of membrane proteins and if you look at the statistics so you see that the number of say the number of proteins, unique proteins and the number of membrane proteins is very different so this is statistic from the PDB so we have a lot of structure in the PDB that are say soluble proteins and this is a small amount of membrane protein this has been increasing in recent year because this is not going on the way to 220 so the situation is becoming better but a lot of the new structures are all of the same type these are GPCRs a lot but those membrane proteins are very important because about 50% of the drug targets are against membrane protein and they have their challenges to study experimentally difficult to study by NMR, by crystallography and by cryoEM you start seeing models of the membrane proteins but the resolution in the membrane might be sometimes problematic so you also need methods to be able to model the complexes that are formed in the membrane but if you think of what's happening in the membrane the environment is different so inside the membrane we have a very hydrophobic environment in water, for the soluble protein we have of course a water environment and when a complex forms in water typically what you observe is that the regions or the interface regions are quite hydrophobic so you want to bury hydrophobic regions when you are in solution in the membrane this is not going to give you much discrimination so in the membrane the membrane will have will impose some kind of different energetics to the recombination process and further the membrane also impose restrictions because you know that the protein cannot rotate 90% 90 degrees to be inside the membrane there are regions of the protein that want to be in the membrane and there are regions that want to be outside the membrane so this is topological information which in principle we could use and pretty much most docking approaches that have been developed until recently were only targeted and optimized for soluble proteins there are some examples of there is some work from different groups on using membrane potentials so first thing we did was to define a benchmark, identify complexes it's a small benchmark about 35 entries but unique proteins so no overlap between those entries and we have tested HADOC we bought any just in a standard protocol basically we bought any membrane specific tricks and the data set is available from SPGREED the data repository of SPGREED so if you are into scoring function optimization you could download those and try to optimize things so if you have data to guide the modeling HADOC is doing already quite well and in the iron parts example I showed you at the end of my first part you see that you can do very reasonable docking using the knowledge of which loops are in the solution still we wanted to explore more this field and one question that we wanted is can we actually explicitly account for the presence of the membrane in the docking process and for that we combined two docking approach I only mentioned LightDock which is a swarm intelligence based docking approach with HADOC and this is the work of Jorge and Brian and Brian and Jorge developed previously LightDock before joining my group and now we have been integrating the two so LightDock is based on a glow worm swarm optimization algorithm and glow worms basically you have this energy landscape here and glow worms attract each other depending on the amount of light they emit for docking purposes light is transformed into some kind of scoring or energy function and glow worms they are usually come in pack in a swarm so what you have is that you have a lot of different glow worms you see here two swarms so this is one swarm this is the other swarm and the glow worms will tend to move in a direction attracted by the over glow worms that have lower energy so this guy here is going to move in the direction of this guy because this one has a lower energy than this guy it's also similar to what you see you know flocks of birds flying they follow very closely each other the same is going to happen here but in a complex energy landscape we use that to sample this energy landscape now LightDock also has the ability of defining specific residues important for the interaction so LightDock was used in this work so how this does protocol for membrane docking works so we take so since we are using crystal structure from a PDB there is a database called Mempro MD where they have been doing embedding those membrane proteins into a lipid bilayer using a coarse grain opposition in the martini force field so you can find the website of this database here so we extract the receptors from this database together with their membrane and what we keep are only the phosphate groups of the membrane and these phosphate groups are going to define a boundary for the docking process now if we know that we are targeting the extracellular loops for example we are going to position the initial swarms so this is one swarm around those loops so each blue point here is a swarm of glowworm basically and these glowworms are basically different starting position of the ligand conformation that we are going to dock and then we run the LightDock optimization and the models that come out of this optimization are clustered and then because LightDock is more of a rigid body type docking approach it has a flexibility option as well we need to refine those models to remove clashes and for that we are going to use Haddock actually so that is the pipeline these are all the complex that we tested so out of the benchmark these are the complexes that are where a soluble protein is binding to a membrane receptor and you see here the results of the LightDock docking in gray will be the docking results without accounting explicitly for the presence of the membrane and explicitly means using those phosphate groups and you penalize solutions that penetrate this phosphate layer basically once we use this membrane topological information into LightDock you see that the results improve dramatically in all cases so these are the alpha helical complexes, beta-barrel, antibodies and if you took that this is the all complexes together and again you have the number of models that you consider so great improvement by using the topological information of the membrane now those models typically have clashes at the interface so what we're going to do is to use Haddock and we use the coarse-drain to atomic transformation of ability of Haddock to basically refine the model so we don't do docking in Haddock, we just use Haddock as a refinement tool and what you see here is again all the complexes with the left column being the models ranked based on the score and the color coding indicates the number of clashes so you see the red one has more than 100 clashes at the interface so all the rigid body, all the LightDock models have quite a large number of clashes after Haddock refines you see that they're all transforming into majorly green colors meaning that we have removed the clashes without modifying much the structure of the complex itself so we have this new protocol that combines LightDock and Haddock and accounts explicitly for the presence of the membrane and we declash them all at the end with Haddock so this was actually published in Nature Communication earlier at the beginning of this year actually you have the bio-archive reference but it's out in Nature Communication now in terms of perspective, Haddock the machinery itself can already handle explicit membrane so we can do docking with explicit membrane and this was demonstrated in different papers so these are users that have been using the server we needed to change some parameters to allow for that but once we had done that they could do docking in this work they are doing a study using NMR data to look at the binding of a protein on the nanodisc and this is a simple test case that we did of a small molecule docking into an ion channel with explicit presence of the membrane so we might be moving toward explicit membrane docking in the future this is something that we are actually benchmarking and developing now so that's work in progress so now to conclude and give you a little bit of perspective so I hope to have given you an overview first of the docking methodology in general and convince you that using data when you have them using information to guide the modeling process is really a useful thing to do you have to realize that when you do this kind of modeling what you're getting out are models and these are not experimental structure but the models are very valuable to generate new hypothesis and drive the experimental work so you can build on the model to validate those as such information-driven docking integrative modeling is very complementary to classical structural methods so where is the field going? so what you see here is a picture that we created actually to celebrate the 50th anniversary of the protein data bank and this is an illustration of what I would like to call integrative structural biology of dynamical landscape so you see this funnel where you have a cryo-M machine, mass spectrometer, NMR machines you might be doing large-scale experimental assays you combine that with software and together you get a representation of macromolecular assemblies but we'll have to capture not only one model but many of those assemblies are dynamical so I think the trick will be in the future to also describe the full landscape of those assemblies you see here the 50 has different conformation it's more open here, it's more closed here so we will not have only a single representation of those structures but we might have a movie of those and that's I think the challenge that the field is moving toward now these kind of models are not usually accepted in the protein database but the protein is working on ways of representing those models and what you see here is the PDB dev which is a prototype archiving model for integrative models so if you are working in a field and you're generating some model you can't deposit those models now and in the future this will become integrated into the PDB with that I want to close I want to thank the people in group who have been contributing to many things that I have been telling you today so this is our usual group picture since COVID is here we have support from different national and European projects and of course Bar Excel is very key to all the software development happening around Hadock and other pictures of former group members and software developers that have been contributing some of those you have seen already in my slide and thank you very much for your attention