 Okay, good afternoon, everyone. This is the biocell webinar number 36. Today's webinar is going to be about prediction of protein-protein interactions in Capri. Our presenters will be Shoshana Wodak from the very interstate Brussels, and Mark Lansing from the University of Delhi. I'm your host on behalf of the Biocell Center of Excellence on Akuma at the University of Edinburgh. Before I hand over to our speakers, I first want to give you a little introduction about Biocell. Biocell, the Center of Excellence, is aimed at improving the performance efficiency and scalability of key applications that are of interest to life science researchers. The applications that we have in the project we're focusing on include Chromax, CP2K, Patek. We're also interested in workflows to improve the usability of these applications and associated pipelines. We're developing workflows, including incorporating common workflow language and CompSS. As well as these activities surrounding the development and improvement of usability of applications, we're also providing training in-depth support and consultancy to help promote best practice and to train end users in the optimal use of the applications and workflow tools. So, with regards to today's session and today's presenters, we have Shoshana Wodak, who is a visiting group leader currently at the VUBA Structurobiology Research Center at the Vienna University at Brussels. So, Shoshana was scientific director of the Portion Engineering Team at the Plant and Egg Systems in Belgium. She co-directed the Center for Structurobiology and Bioinformatics and has been a member of EMBO. As well as serving on numerous panels and advisory committees for Horizon 2020, including Horizon 2020 in the US and Canada. She is serving on the Management Committee of Capri, which is primary interest for today's session. As well as Shoshana, we have Mark, who will mostly be answering questions at the end, helping answer questions at the end. So, Mark is based at the Université de Lille. He's also in the Management Committee of Capri. He is a team leader of the Computational Mechanical Assessment Biology Group at the Institute for Structural and Functional Biobiology. So, finally, just to give a little idea of the kind of the partners that are included in the BioXell Center of Excellence. It includes a number of universities, research institutes, consultancies throughout the European Union. So, I will now hand over to Shoshana to start her presentation. So, hello everybody. So, I'm going to give the presentation on Capri and on the history of Capri and hopefully, you know, on what we expect in the future for this important endeavor. So, just a short introduction. It's about protein-protein interactions. That's what Capri is about. And this is not a new area of research. It has been around for 40 years. We have been studying protein-protein interactions at the molecular level, using information from protein structures and dynamics. And we have also been more recently studying protein interactions on the cellular level, mostly on the genome scale on protein, which proteins interact with which other partners. And these I'm showing these, you know, big hairballs of networks. So, we have been analyzing this on both sides in the molecular and cellular level, yet many, many questions remain unanswered. So, I'm going to give you the presentation and that's the, you know, the plan of what I'm going to cover. I'm going to cover some fundamental principles of protein-protein interactions. What we have learned from X-ray structures about protein-protein interfaces, which is important to understand this in order to know what we are doing. And I'm going to give you a background and more recent aspects of protein-protein docking, which I call then and now. And the rest of the talk is going to deal with Capri critical assessment of predicted interactions. So, protein-protein interactions, some of the fundamental principles that everybody needs to know in order to be able to do anything to think about protein interaction and to develop software and etc. So, finding affinities and rates. So, genome-wide studies answered by yes or no, the question do protein A and B form a complex. Protein-protein interactions are dynamic and they are subject to the law of mass action. So that's what this slide explains, you know, A plus B forms a protein A plus protein B forms a complex. You have a forward kinetic constant and association constant and a dissociation rates constant. And you have the equilibrium constant, which is the ratio of the two. The very important quantity is the equilibrium constant and the Gibbs free energy, which is the logarithm of the equilibrium constant and it depends also on the standard state. So, the equilibrium constant and the Gibbs free energy are very important are the quantities that determine the association. In other words, the values of these quantities determine whether a complex is formed, given the component concentrations. So that's very, very key that it's the formation of a complex is concentration dependent. And that has to do with crystallography where you know crystallization is formed at very, very high concentrations of components usually and NMR as well. So the dynamics and time scales of complex formation, they are governed by the rate constants, the association rate constant, the bimolecular one and the dissociation rate constant, which is monomolecular. And the lifetimes, you know, it takes the lifetime, the time it takes to form a complex and the lifetime is a complex. I show, you know, what depends on the time it takes to form a complex is inversely proportional to the concentration of the reactant. But inversely also proportional to the association rate constant and the lifetime is inversely proportional to the association rate constant. So these are really important quantities to understand and to keep in mind. Now, protein-protein interactions in the cell, you know, they span a very wide range of binding affinities and lifetimes. And this is, you know, this is a table that has been put together by Jean-Edges and I a number of years ago, which is really important. And it shows you the time scales and the association constant scales ranges, you know, from one molar to one picomolar, and, you know, it kind of positions the different complexes you find in the cell, starting from, you know, oligomeric complexes on the right-hand side where the association constants are really important. The association constants are very small and the association constants are high. And the lifetimes of the complexes are really high on those from days on the right to microseconds on the left. So you see the specific interactions in the cell and are more towards the right-hand side, and the non-specific interactions are those which are, you know, have weak association constants and are very, very short-lived. So this is, again, important to keep in mind. Now, what we have learned from analyzing protein structures in the PDB, these are studies that have been done a while ago and they still hold and kind of direct our thinking of what, you know, what the association, what the interfaces of these protein-protein complexes tell us. So it's really important, this is the graph that shows you the interface area of protein complexes, a small number of protein, now, you know, considered a small number of protein complexes which have been analyzed in the PDB. And my surface area, surface area is important because the buried area is one of the major driving forces that bring two proteins together. So just the area itself, which is proportional more or less to the hydrophobic contribution to the Gibbs free energy of association, is actually proportional to the interface area. So here is an analysis of non-protein complexes, which, you know, shows you, it tells you, for example, where the antibody-antigen complexes come, and this is, you know, in the middle here, towards the surface, buried surface area of 1,700 angstrom squared. And then you have the protease inhibitors, which span your wider range. So this analysis allowed at the time to determine what a standard kind of standard interface or recognition module would be. And the standard interface for true for actually biologically relevant complexes is about 1,600 angstrom squared. And it has a good number of hydrogen bonds and, you know, the contact, the interface comprises about, you know, 24 amino acids. And, you know, the interface atoms, they have so many buried atoms, etc. So it really gives you a picture of what a recognition module would be. Now, other studies have also shown that interfaces between proteins, either, you know, between homo, homo dimers or hetero complexes, they comprise, you know, somewhat different, somewhat different contributions of the residues of the interface to the interaction. In other words, if you take apart the, kind of, open up the interface and you're looking at the surface that actually interacts with the other subunit, which I show here on the left-hand side, you see a region in red, which is the core of the interface, and it comprises residues which contain at least one fully buried interface atom. And then you have in blue the ring residues, and these are residues where all the atoms remain accessible to some point at some level to solvent. And the properties of these residues are somewhat different, and analysis of the properties of the core residues, which has been done much, much more recently in 2012 by Levi and Teichmann, showed that the amino acid compositions in these core residues is actually quite different from the surface. And then you have here this, what they call the stickiness scale, and that's quite interesting because that can be used to kind of recognize interfaces to some level. So, for example, it shows you here that an arginine is actually quite well accepted in the core of the interface, whereas, you know, the less sticky amino acids are lysine, the more sticky amino acids are phenylamine, isoleucine, or cysteine, so you can see that. So this is what we learned, and another interesting study which is very relevant to Capri is that it is not clear to single out, not easy to single out specific interfaces, which correspond to biologically relevant complexes from non-specific interactions. And as an example of non-specific interactions, the studies that I list here on the bottom took the interfaces in crystal structures because these interfaces form, they are not biologically have not been selected by nature, they form because, you know, you increase the concentrations and there are many contacts that form the crystal. So the study that was undertaken at the time was to compare the surface areas and the packing, actually the surface areas, the buried areas in true complexes, in biologically relevant complexes, with those found in crystal interfaces and this is the graph that is presented here. So the bar graph is the interfaces of two of biologically relevant complexes that I showed a little bit before, where you see that, you know, the peak of this bar graph is around 1600 angstrom squared that, you know, was the reduction module that I highlighted before, and much smaller interfaces, the green peak is the green graph peaks at much smaller values at buried interfaces around 600 angstrom squared, and those are the small interfaces that you find in the crystal. So the two biological interfaces are much larger than crystal interfaces, except that, you know, you have sometimes overlap between these two distributions, which is shown also here. So the larger interfaces in crystal contacts are mainly associated with two fold symmetry axis and sometimes they correspond to two interfaces and sometimes no. So that's part of the problem of identifying from crystal structures, the true interfaces from the non-specific ones, from the non-specific ones. And this problem is actually currently not really solved and this is what I present in this slide. So how does one identify the biological unit from crystal structures, and this, you know, some computational procedures are available to do this, and two that I list here are PISA, a computational procedure which evaluates the physical and structural properties of interfaces in the crystal and, you know, kind of looks for interfaces which are more likely to be stable interfaces of complexes and identifies this biological unit based on that. And the other procedure is called EPIC and it uses mostly geometric measures and sequence conservation to also identify interfaces that have been selected by evolution rather than just haphazardly by the crystallization procedure. And a more recent approach is QS-align, which is actually a procedure that really relies on predicting if you have a new crystal and you have an interface. I'm sorry, I have a, if you have a crystal and you want to identify the biological interface, you actually rely on the information that is already available in the PDB on related structures and related quaternary arrangements of structures. So you do global superimpositions, global alignments of your interface or the biological unit that you think is in the crystal, you align it to what exists in the PDB, and you try to see whether the same quaternary arrangement is found in related structures. And this is how you decide whether you're predicted, your biological unit is reasonable or not. So, you know, that's also something which is published, but you know it's very helpful and it helped recently to reassign more correctly 95% of the assemblies in the PDB anyway. So, this is kind of a background of what kind of challenges you need to, you know, you're facing when you want to predict or model protein-protein interaction. So, what about protein-protein docking? And this is, I would like to give a short history and then, you know, what, where we are at now. So, protein-protein docking, what is it? It has been derived, it has been defined at the time and still is defined today, but you know, more loosely. It's to derive the atomic model of a protein assembly from the three-dimensional coordinates of the components. In other words, you need to know the components, and these components in principle exist, you know, are stable on their own, but if they are available in high enough concentrations, or even, you know, if they can interact with high enough affinity, you know, there is a particular orientation and position or particular interface that they form and that is biologically relevant. So, this is what I show here. So, docking is to find out which interface, which part of the receptor on the left-hand side and the ligand on the right-hand side, which part will really recognize each other and form a stable association. So, as I say on the bottom, it applies to assemblies for which the individual component exists in free form. So, as defined and still today, the challenges of protein-protein docking, I mean, it has two important components. One is efficient sampling of rigid body degrees of freedom and alternative conformations that the subunits can adopt upon association. That's very key. And the other key component is to identify from all these sampled, you know, sampled poses, identify the stable association modes. And since you sample many poses, so you need to identify stable association modes from a very large ensemble of docking solutions. So, you need for that, to be able to do that, you need to find, you know, the needle from a haystack. So, you need robust and reliable scoring functions, scoring or energy functions that will tell you whether, you know, a particular association mode is really likely to be stable. So, protein-protein docking really started a while ago and this is, these are the people who have been very influential was, you know, Cyrus Levothal is the father of the folding paradox, protein folding paradox, but he was interested in protein docking as well. And he taught me a lot. So, he was my PhD advisor at the time and I worked together for many years with Jeanine Paris and the docking actually, the first docking calculations were done at the SICAM workshop of 1976 were also the molecular dynamics, the first molecular dynamic simulations were carried out. Everybody was doing molecular dynamics and I was doing docking. So, the first docking calculations were done at Columbia University with very, very old, at the time very powerful computing computers, very huge, you know, IBM 31691s and very, you know, today's standard is very, very basic graphics, but this is how it all started. So, that was, we use the computer program that was developed by the lab of Cyrus Levothal was the only one at the time together with another, you know, another important effort at Princeton. So, the very first docking that were carried out, I think, were in the lab docking a small molecule that died nucleotide into the enzyme, into the active site of the enzyme RNase S. And, you know, we had to do this because we couldn't see really the phosphate that was the phosphate position of the CPA because it was exchanging with the sulfate when the crystal was being prepared. So, we had to derive, you know, a docking procedure, but the docking programs worked only on internal degrees of freedom, in other words, on phycyangles and chiangles. So, we had to, you know, we had to figure out a very, you know, pseudo atoms that were positioned at particular positions in space, so that a translation was actually a rotation where the axis of rotation was very far away. So, we had, this is what we had to do, but it worked very well and we managed to do, you know, to rotate to change the rigid body positions and also the angles of the nucleotide as well as side change. So, that's the first, you know, kind of flexible docking that was done. Now, because we started this, this was really exciting, then, you know, we tried to, you know, we transferred this, we extended this to protein-protein docking using polar coordinates. So, we were interested in docking two hemoglobin molecules to one another in order to build a hemoglobin fiber. And so, the docking calculations at the time were carried out as a function of six degrees of freedom. We had two Eulerian angles for one molecule that you showed like a sphere on the left-hand side, Eulerian angles for the second molecule, and as you move these angles, you change the relative positions of the two spheres, and then you had a spin that you kind of turned around the axis that links the two centers, and you had to, you know, we had to sample the distances and this is shown at the right-hand side. You had to optimize the interaction in 1D along this inter-sphere distance. And for the protein at the time, the computations were taking very, very long time, so we had to use a very simplified model where there was one interaction residue, one interaction center per residue, and that was the model that was just at the time derived by Michael Levitt. So, we used this simplified interaction center and we took a term that was a repulsive term shown, you know, at the E non-bondant here, you have a repulsive term, which is to the Rij0 divided by Rij to the power of 8. And then a very, very simple solvation energy that also allowed us to evaluate the varied area. So, with this very simple model, we applied it to DOC for the first time, very first time to DOC, the BPTI, the bovine trypsin inhibitor, to trypsin. So, we actually sampled the entire surface of the BPTI, but we limited the search region to the active site of the trypsin. And on the left-hand side, you see a bar graph of the surface and varied surface area as we were docking different parts of the BPTI into the active site of trypsin. And the energy, it's kind of, you know, it's not really energy, it's actually the varied area. And the native, you see the varied area of the native is the highest. And then you have two other, you know, two other poses of the BPTI relative to the trypsin, where the surface area, varied surface area is also high. This, you know, the position I kind of an intermediate I1 and I2. And when you look at the BPTI surface, which is on the, just on the lower right-hand side, you see that there are a number of surface regions, which I highlight in blue, that actually give very nice varied surface areas with the active site of trypsin. And the only reason for that is that, you know, the complementarity of the surface complementarity is not so good, but it's better than everything else. And it's not exactly, not as good as the native, but it still really, you know, fits more or less. What you're seeing here on the bar graph is the landscape of possible, you know, and energy landscape of the interaction. And that was the first time that such a landscape was, you know, was produced. So anyway, this was a long time ago. And for why no one was interested in protein-protein interactions, but years later, you know, things started to, you know, move again. And nowadays, in more recent years, the modeling of the three dimensional structures and protein assemblies have been more and more integrated. And so, you know, nowadays it involves much more, you know, template-based modeling at several levels, docking, conformational sampling, scoring. And it is supported by a vibrant community of methods developers. And these development have been really stimulated by Capri and by Kask. And I will talk about that a bit more. But anyway, just to show you kind of the evolution of the whole field, protein-protein docking, as I just described, it used to be. And even when we started Capri, and this will come in a moment, we had to start from the unbound structures of the two components. Because the idea is these structures need to be, you know, you need to know what the three-dimensional structures of the proteins of the components are. And then you do the docking. And this is what, you know, this was what I just showed you and what it was for a number of years. Later on, as we got, you know, more, as the PDB got populated, more and more, we got more and more examples of homologues which had this similar structure to the structures that you wanted, you know, to model. And hence, we started, you know, what we call today template-based modeling of individual subunits. In other words, you didn't need to provide to model the complex. You didn't need to know the unbound structure. You just needed to know the sequence. And then you could go search the PDB for homologues and build a homology model of your components and then dock these components to each other. That's much, very much what is still done today if, you know, in many, many circumstances. And even later, you know, what, as the PDB became more and more populated with complexes, but this, you know, has never really caught up with individual subunits. You could also have cases in which the complex that you want to model has homologues of the entire complex in the PDB. And hence, you could actually model the interface as well, you know, as in the individual subunits. So this is kind of template-based modeling of complexes. And even later and more on what happens today is that, you know, all this together, you combine all these approaches into what, you know, you probably know well in bioexcel is an integrative modeling of large protein assemblies or integrative modeling of more complex, biological components. So it's combining different things, comparative modeling with homology-based modeling, worth docking, fitting, combining the docking, you know, using also proteomics data, using density maps from electron microscopy and, you know, building these, you know, much more complex assemblies. So this is, you know, how everything is coming together. Now, this is the picture in which we kind of operate nowadays in Capri. But now let me talk to you and let me now go into Capri and how Capri and CASP have contributed to all this. So CASP, critical assessment of structural prediction, and Capri, the critical assessment of predicted interactions, you know, both played a very, very crucial role in fostering progress, kind of the progress that I just kind of summarized in the field. And also they were very important to build the communities. So what about, this is, you know, going more deeply into Capri. So Capri is a community-wide, double-blind experiment. It was modeled after CASP, and it was launched in 2001, so a while ago, but after CASP, so it was really modeled after CASP. So it aims at assessing the performance of protein docking and scoring algorithms, but now we can say that it's performance of predicting, you know, protein assemblies. So it's about the prediction of the structure of an unpublished protein protein, the complex protein DNA RNA complex. Depending the number of components, you can have, you know, two or more components, also of protein peptide complexes. And it has also been occasionally extended to the prediction of binding affinity and interface water prediction, as well as I'll show you the prediction of just interfaces. So the Capri management committee is listed here, and, you know, Joel Zana was, you know, directing Capri a while ago. He, I think, left Capri around 2013. So these are the people currently involved and really contributing to running the challenge. So the prediction of what is special in Capri, which differs from CASP, is that the prediction rounds are held on a rolling basis. So the typical number of predictors is smaller than CASP, it's about 30, 40 per round, which is already high. We have a number of docking servers that are participating sometimes 15, sometimes more, and that was in 2018. And the total number of rounds was that was held is 47, and the number of targets to date is 162. It's much less than CASP, but, you know, because of the positive general of complexes of targets that we can get, but hopefully this is going to change. Now, what is, what is the Capri experiment? We call it still a docking experiment, even though as you heard, you know, sometimes it's more template-based modeling than docking. So the crystallographers, the structural biologists, now, you know, electron microscopists also, they submit the atomic coordinates for a target complex before it's published. And again, as I said, it's an ongoing basis on a rolling basis. Whenever a target becomes available, whenever someone, you know, has something interesting that they can offer, but we are soliciting these targets. The predictors are provided with the sequences of the protein subunits and asked to return 100 models of the complex, but they only need to rank in terms of, you know, what they think are the best models, only five of them have to be ranked. Previously or a long time ago, only unbound structures were provided. Now, no, we only provide sequences. The assessors, and this, you know, assessing team has been for always the same team of assessors, and that's also different from CASP. So they are given the three-dimensional structures of the target in confidence and all the structures of predicted models. And they have to establish the correspondence between the target and the model using established quality assessment criteria. And this is very, very important. I'll show you this criteria in a moment, I'll summarize it. The identity of the predictors is with health on the assessors, and the performance of a particular group is based or is ranked on the basis of the five best predicted models. Now, a number of years ago already, we added a scoring experiment. As you remember, I said in the beginning of the talk that, you know, docking has two components, the sampling and the scoring. So some people are really good at sampling, and some, you know, some groups are much better at scoring. So we decided to allow people to score models based on the submitted models of others. So in the prediction round, you know, people submit, there is a deadline for submitting predicted models. After all the models have all the hundred models are submitted, then, you know, we offer these models to scores to extract from those or to identify from those models which are likely to be correct. So the predictor submit their hundred models, and it's often, you know, totaling about 3000 models per target on average. The Capri management shuffles and combines all these models while keeping track of the origins, and predictors are given access to the shuffled set and asked to return five best models, and the assessors evaluate the models in the classical experiment. So these two experiments, you know, are really run in run, you know, one of the scoring experiment after the after the predictor. So we also had like three now cast Capri assembly prediction rounds, and these are from 214 to 16 and 218. One was very recent with a large number of targets and these, these prediction rounds are run over a period of about one month, one and a half months during the summer, and it's quite, quite tight schedule for the predictions. So anyway, overall, you know, we had critical assessment of predicted interaction, or meetings, and we have a number of special issues of proteins, you know, describing, describing the results. So to date, as I said, 47 prediction rounds, 162 targets, and the results were presented at seven evaluation meetings with a seventh one was just held in April at the EBR. So the copy assessment protocols, you know, is, is kind of a science in itself. Okay, so it's not very easy, not completely straightforward to say whether you know how good the prediction is, it's was always not not straightforward, even for protein structure prediction, but for the complexes, it's, it's also complicated. So this is, you know, a summary of what we've done, of course, if you have questions and it has been published, so you will, you will find the answer. So it has like, for first of all, you have to define the residues present in all the, all the predictive models. You have to align the sequences in the models to the target, you have to reject models where the sequence differ too much. Then you have, you know, you're using some rough estimates. This is not, I'm not giving you exactly at the protocol how it's run and just, you know, kind of summarizing what we, what we are looking at. So then you have, you evaluate a rough estimate of the accuracy of the three dimensional structure. It's like backbone RMSE of the model versus the target, you know, the difference of units usually. And for, you know, for evaluating the actual complex, you, these are the, the standard capri protocol evaluation. So you evaluate multiple interfaces in the, you know, in the complex and in the target. So for each interface, you are, let's say, on the right hand side, you have the target, you count the number of residue-residue contacts, you know, defined in a certain way. The number of ligand interface residues and the number of receptor interface residues and, you know, also you need to define the interface in a particular way. Knowing this, you then compare, you know, you then superimpose the receptors of the models to those of the interface of the, of the, of the target. And then, you know, you evaluate the position of the ligand relative to the superimposed receptor and this gives you an idea of the displacement of the ligand component. The angular displacement and the distance, which I'm showing, you know, the dL and a theta L and you also evaluate, you know, you evaluate the number of native contact that we called, number of, we call native contacts and the number of additional contacts which are not native and we also evaluate the actual root mean square deviation of the interface residues themselves. So these are the various quantities that you evaluate and then you, yeah, what I wanted just, this slide is also just to illustrate that you need to, if you have a certain, you know, a certain assembly that is predicted, you look at, you look at the individual interfaces and you look at the overlap of individual interfaces in the model to interfaces in the target and you look, you select, right now you select the best predicted interfaces. So we are not completely happy with this protocol yet, but you know, we need to find something better to evaluate the full assembly relative to the full assembly of the target, which is still an open problem. Now, in doing all this, in evaluating the assemblies, you know, the assembly of the target assembly, the assembly in the predicted model relative to the assembly of the target. Sometimes it's not obvious what the assembly of the target really is. So, in some cases, to really find what, what is the biological unit of the target in other words, a problem that I just mentioned before. So, sometimes, you know, the assessors have to go back to the crystal contacts to find some answers. So one example on the, on the left hand side is that we have, we have a complex in the crystal where the membrane localization domains are actually interacting with each other in the crystal that's in A. When you go to B, it's another, it's another set of contacts from the crystal structure that have been, have been identified by the assessor that see that these membrane localization domains. It's a different interface and they positions the membrane localization domains in a parallel orientation which seems to be kind of fitting into a membrane. Which, which, I said, which mode in, you know, which, which biological unit is correct is not known, but at least this is something that we have to look at. And on the right hand side, C and D, you have a similar situation where, you know, you have a dimer of a, of similar subunits and you have two, two smaller subunits interacting on both sides of a particular dimer interface. But if you look at the crystal, you find a much more globular arrangement where the two, you know, the turquoise and the, and the yellow subunits really interact with each other. So it is, you know, it's not completely straightforward. Anyway, copy evaluation, the scoring and ranking of models. We, you know, what, what Capri has been doing over the years is being very, you know, lenient in, in, in defining using ranges in terms of how good the interface is, rather than, then, you know, quality quantitatively, you know, really having a continuous measure. So, you know, for high the score, you have a good interface where the number of recalled native contacts is at least 50%. And the ligand RMSD is smaller or smaller than one angstrom or the interface RMSD is smaller than one angstrom. And, you know, an acceptable interfaces are actually, you know, interfaces where the recall of the native contact of the interface is more, is at least 10%, which is not very much. And the RMSD of the ligand can be between five and 10 and the iron RMS can be between two and four. So, also models with too many classes are not considered, but more recently we have been looking into a more continuous, you know, score, where, which is a function of these different quantities, because these different quantities have been very useful. But, you know, sometimes it's, you know, if you use thresholds like that, you know, you are not always being very fair to people. And this is something we have been looking at recently and this is published, I forgot to put in the reference. So, many targets we had, you know, these a number of target the targets in copy have evolved a lot. They were, you know, more easier targets, you know, in the early days, and more specific types of complexes in the early days. And they have evolved and you see on the left hand side, you see some examples. And on the later target, actually what we are showing is every time, you know, a, as a reference, this based on called receptor, and then you see a cloud of points which are the center of mass of the ligands that has been predicted by all the ligands taken together. And then maybe you can see a little bit of green, a little bit of red. This is where the actual correct position of the center of mass of the target is. So sometimes, you know, you see, in some cases the predictions are pretty good around this region of around the red point, but there always are, you know, many kind of decoys or incorrect poses around it. And, sorry. And recently, you know, some success stories and some failures, you know, in this case, this target 95 was a interesting complex with a nuclear zone that has been correctly predicted. You see the complex is positioned here near the region where you see a little red and a little turquoise and this was a successfully predicted complex. And the other cases where complexes where you see, for example, this target 96 97 you see a red, you know, receptor and two positions. One was, you know, of the, of the, of the ligands, one is the correct one and one is the one, the best predicted one, which was, but don't overlap. And in the target 99 and target 100 are cases, interesting cases of ternary complexes, which where the red, you know, the red subunits adopts a number of confirmation depending on what the other complex what the other subunits, what the interaction partner is. And these changes are very difficult to predict and usually people fail to do so. So we had overall targets, you know, these are the evolution of targets, which is not really very crucial, but you know, it has evolved quite quite a bit number of targets have evolved quite a bit in recent years. So, what, what have we seen during this year so we have seen that docking has become a very field, very active field of research, and many new groups are regularly not group new groups are regularly entering the field. Groups from China from Japan, you know, from other other regions, you know, from Eastern Europe as well. Now, the performance of docking methods have remained quite robust, despite the increasing complexity of the target so the, the, the variety of the targets has increased. The, the size of the assemblies that have to be that are offered this target has it has increased. And yet, you know, the docking methods have, have no systematically. It's hard to say they have improved, you know, as such, but definitely, you can say that that they have been able to meet the challenges as they come. The scoring functions have improved as per copy scoring experiment that we have seen. And what is really interesting and it actually, you know, tells us tells that the progress hasn't has been achieved is that the performance of automatic docking servers has much much improved. And this is fostering a wider use of docking algorithms by the scientific community and this has been quite remarkable. So that was written a maybe two years ago. Nowadays, you have many more servers and about four or five servers are really quite, you know, quite reliable. And they perform nearly on par with human predictions in human predictors in copper. So this is, I just show another one or two slides on on on some recent results on Casp Capri prediction performance. Here you see each, you know, each column here represents the performance based on this continuous score, which is called the docs Q score on particular interfaces of target so we are still evaluating targets based on on on the I mean, how, how good the interface is, and the colors here are based on the copy criteria. Red is a high quality complexes high quality prediction, green is medium quality prediction. Blue is is is acceptable and yellow is incorrect. So we have a number for easy target those you know that can be readily modeled using you know sequence information. Those the predictions are really quite good so this gives you an idea of what the performance is, you know, when you have enough information on on homologues and not necessarily and sometimes even on complexes. But for difficult targets where you know little information is available for the individual components that alone the complex. Then it really depends for some parts, some of these more difficult targets are are assemblies, you know, where you have more than one interface. And some interfaces in these assemblies are predicted better than others and these are the, you know, the red, red bars and the green bars. So this is kind of an overview, you know, over a limited number about 20 targets and like 14 something interface recent. And, you know, that's also this is the results for the same interfaces but just measuring how well the individual subunits have been predicted in terms of the three dimensional structure of the subunit. And that's the molecular root mean square deviation of the backbone. And you see for the easy targets these deviations are really low. And for the difficult targets you see that some components are probably, you know, very poorly modeled and hence this influences the result for modeling of the assembly. So we also, as I said, you know, in in Capri, you know, we have been extending, you know, the scope of what we look at what we do in Capri. So one one analysis that we did and we did it again this year is the prediction of actual interface residues not of the contacts how well the contacts of the given interface are predicted but how docking how the modeling and docking that that was done by predictors, how well are the actual residues of the interface predicted. And we find that, you know, many of the incorrect models based on the Capri criteria actually correspond to reasonably correct interfacing other words, you predict the region of the protein that interacts, but you predict the exact relative position of the two components correctly so the prediction of interfaces of interface residues seems to be easier than the prediction of contacts, although that's not always not always true but at least this was the result that we go to while ago. Now we also went into in some cases the complexes the targets that were provided questions by the authors where can you predict the interface water molecule positions. And this is something that we asked the predictors to do and they, you know, derived potential positions or, you know, inferred position of water molecules and then we compare these positions to the actual positions in the interfaces. These were interesting complex was certain complexes had a number of homologues in the PDV that the authors knew, and they knew which water positions were conserved in these complexes and these were the water positions that the predictors were much more successfully predicted. And another area that knew quite a bit of development recently is the predicting of protein peptide complexes and here you see an example of a very nice prediction of a peptide which was actually very, very nicely positioned relative to the complex in the target. Okay, so just a quick, a quick summary because everything really depends on the on the on the methods. And this was something that we discuss a lot in company meetings, and that people discuss in the company community exchange a lot, you know, information on etc etc. So, the generation of models, you know, of course, you can use homology modeling, you have to use homology modeling sometimes you use in house procedures, sometimes you use, you know, public servers like modelers and Swiss model. And in cast in common cast experiments, you use, you know, people use models or it's submitted by the cast community during the cast ground. Then, you know, template mode based modeling, and these template based modeling are used increasingly for homo oligomers in other words, you find much more commonly templates for for, you know, homo dimers and homotrimers than you found for hetero complexes. And then of course you need to do to do ab initio docking in, you know, in cases where you don't find templates. And there, you know, you have a number of methods that vary depending on, you know, on on on the groups, you know, fast rigid body sampling, followed by refinement with conformational flexibility data driven docking, integrating the function biochemical and biophysical information that something that haddock is using a lot and now other other other groups as well. And then you have a whole panoply of scoring functions with all type of, of, of components of terms in the function shape complementarity, geometric hashing, geometric matching or geometric hashing also, Van der Waals potentials, electrostatics, hydrophobic potential, reservation, some of them, you know, are similar, but just called differently by people. And then you have hydrogen bonding potential experimental restraints as I, you know, just mentioned a bit as well. Rotomere probabilities of our noise volumes and some machine learning also although not really recent. Anyway, the methods what is important is that because, you know, copy has expanded into different areas. So the methods were adapted to have diverse types of complexes such as protein peptide protein RNA DNA protein with the saccharides and so on. And, you know, as these new targets are being offered, people, you know, really amend their methods and that has been made a motor for progress in the field. So, that's, you know, kind of copy and now a few slides on where we are going. Now, not long ago, together with with more Alexander Bourvin and the Capri community as a whole, we established, we opened a web resource, which is called capri docking capri docking.org. And this is a portal and GitHub for assembly modeling software that is being developed by the capri community and it's not only software and it's it's also benchmark data sets and docking servers and all this. So this, you know, you can have a look and you see what what we offer there and this is also a portal for communication with the community at large. No, and it's just the beginning so it's not here for a long time but you know it's it's being developed. So one, as I mentioned the important, any important development is the performance mean in the docking servers and in in the capri docking you can find the performance of these servers as you know as as as our challenges evolve. And that's really important if you want to know you know which server actually the idea is to tell people which server is better at work but you know for the time being, these are the servers that really perform well in, you know, in capri. Now. So, where are we at. What we know, you know, across the years and especially know more recently is what we can say is that the binding free energies is still not, you know, not not not performing, you know, more performing optimally. So these are the force field scoring function conformational sampling positive reliable you know structure affinity benchmarks. All this, you know, needs to be, you know, improved in other words the force field themselves the scoring function. The conformational sampling is also important because free energy, you know, you need something for that. And what you also need very crucially is reliable structure affinity data benchmarks because this is, you know, this is what helps you to improve your, you know, your computational procedure so all this is still, you know, still not optimal. And we still cannot hardly discriminate binders from non binders, and hence, you know, evaluating the specificity what you know what what is a specific binding mode is still remains elusive. And an important, a very important shortcoming still is the poor ability to model conformational changes. And this is something that, you know, I bring up so to some extent data driven modeling procedures overcome some of these limitations, but not always. So, another aspect that you know we have been also kind of discussing and aiming towards is what I outlined here in on the kind of pink background so you need to consider the system, not just the building blocks. In other words, you need to better integrate three dimensional structure with prediction of the of the 3d structure of the protein and component and the assembly model something which actually cast doesn't do. And we don't do it. Capri doesn't do it. No, really optimally either. So, you know, what I think is crucial also is to integrate the dynamic component better modeling of local and global conformational flexibility, using both computational and bioinformatics. So, to try to address all these, you know, things also we have been discussing how to go forward with us, you know, in the framework of elixir and the, and the formation of 3d bio info community of elixir. How does Capri contribute to this. And how can Capri be integrated in a larger effort. So I'm not going to go in all these details but you know on the right hand side. You see, you know, some, you know, I talk about we talk about benchmark data sets and knowledge portal. And these are aspects that you know we would like to develop, you know, together with other partners. And to be more precise. I mentioned here, this could be a potential collaboration with bio Excel. We would like to develop a knowledge portal for tools for modeling protein flexibility that would be useful for for for docking. In other words, can you predict lightly moves and moving parts of, you know, components. Can you, you know, can we model confirmation ensemble somehow and then use them to dock. And maybe that could be multi scale docking docking where these things, you know, represent the protein and different different, you know, resolution levels and do, you know, do something that will help to move forward into model be confirmation change. But you know, what we also need is benchmark data sets to evaluate methods for building scoring and ranking models of protein complexes. And we have some of those already this is an effort it has been ongoing and in Capri. For example, we have a score set a copy benchmark for scoring protein complexes where you have a lot of decoy we have a lot of incorrect models and spiked with correct models and you know you can use you know these type of data sets to improve your predictions to see whether your prediction method is working correctly. And then you have a number of other really, really very useful structure based benchmark for protein protein binding affinities has been developed and benchmark for testing grounds for integrating homology modeling and protein docking. But, you know, we need probably need much more than that. So this is my, you know, my presentation I hope I was not too long. I don't know I haven't looked at it at the time. But, okay, so thanks to all all the people involved, especially Mark Lansing who is on the call here and he will answer questions because he has been the main assessor together with me on all these evaluations. And some year and no all have been running Capri from the EBI and I should have mentioned them way before because without them this could not have been possible. And Capri is a completely on volunteer basis enterprise so thank you to all of these people. I'm done. Thank you very much. So on behalf of all the attendees, let me thank you indeed on this very nice history and overview of Capri, as well as looking at the current challenges and proposing some ways forward, including tools and support that might be available by collaboration with Alex L. That's great. So I will just so encourage anybody who has any questions to use the facility in the go to webinar application. I'll put up a slide just now. And as a reminder, visual reminder of where this lives, how you can ask any questions of Toshana in the go to webinar. Yes, I should add that Mark Lansing is here in attendance and I will just unmute him now so if he has any comments to make. I don't have any questions. I think let me check. No questions asked so far. So, Mark, if you have any comments to make at all with what Toshana has already said please feel free. And if anybody mean meanwhile would like to ask any questions about Capri or anything else that Toshana has presented, either her or from Mark, do enter your question in the questions tool. If you have no comments, it was a very nice presentation. I would just like to add that if people have questions later that they are free to contact us by any means. Thank you, Mark. No questions at all. Any questions. No. Okay. Don't see any questions. So, I mean, this is this is a good chance to ask questions. It doesn't really matter. Yeah, let's let's give people a minute to think. Okay. And I suppose. Was Alexander Bova on the call? Yes, I believe so. I see Alexander on the list of attendees. So he is here. Alexander, do you have a comment? I have a question that Alexander has just entered. Even before you said his name, I don't know. But anyway, I'll read it out for you. So, I, and then I will unmute Alexander as well actually because he might might ask himself. Great. The question is, have you seen examples of AI applied to docking already? Now, I will let you respond. I will also unmute Alexander so he can enter the conversation. Yeah, I was hoping that AI would come up. Okay. Hi there. Hi. Go ahead if you would like to respond to that. Okay. To protein protein docking. No, not yet. I know that AI has been used for small molecule protein docking. Controversial results. But the problem for, you know, protein protein docking is, is really very different, I think, then from, you know, in some in some way from, from protein structure prediction. Because you have all these different conformational changes. And also that the same potentials that are being used, you know, the database derived potentials and the models are being used for protein structure prediction may not be, you know, necessarily valid for, for interface predictions. This is something we have seen many, many years ago, deriving these knowledge based potentials. Because AI is now the deep learning people have used basically the same idea that was developed many years ago to, to use, you know, known protein, known structures to derive distance based potentials for interactions between residues. They are just, you know, doing very fancy, you know, smoothing of the functions and things like that, you know, to get the deep learning and they have a much bigger set of data. So, yeah, the data sets, you know, also the, the, what I think is, is, is hampering, you know, the prediction of protein complexes, the AI application of protein complexes is the data set. The small size of the data set of available complexes. And actually, maybe not so small, but the fact that, you know, we don't really know what the complex is usually. So we need to make sure that in the complexes we study on that we use as reference to derive the models are really biological complexes. I mean, that would, that would be my, my long answer. Thank you. I have muted Alexander because due to background noise, he's not able to use his microphone to respond. But if, if Alexander you want to follow up with any further reply to that response to that you can do so in the questions tool. Just a reminder that this webinar is, as it's recorded, it will be available to review online in the near future via the BioExcel website and on our BioExcel YouTube channel. It would be nice to know, you know, what the bio, if there are some BioExcel participants or people from the community of BioExcel, whether, you know, what, how they think they can help. Yes, so I'm sure that the prime, well, the obvious member of the community of Capri, as you know, who's also part of BioExcel is Alexander. And I'm sure you're in touch with him. I don't know if there are any other attendees who are, I know that BioExcel project itself, it is myself and Alexander, and I'm in a slightly different area. Yeah, I mean, we are really open, you know, I think I can, I can, I can speak for, you know, for the Capri community that we are really open to any, any collaboration in this respect, especially, you know, for all these confirmation changes and the modeling of flexibility. I mean, I think this is really important. That's great. Yeah, great. We can follow up on that. Okay, there's a response. There's a response from Alexander with with a link that I will email saying that a relevant, an example of a relevant BioExcel use case or rather a sort of a demonstrator research project in this area would be an antibody design project that is linked is on the BioExcel website indeed. And I will, the link is visible in the questions panel, but I don't know if you can see it if not I will mess it and send this later by email. Yeah, that could be. Yeah, we had a contact. So go ahead. For anybody else listening is interested indeed in what BioExcel is doing in this area, then you should have a look at the BioExcel website. And under research projects, there is a listing of the kind of projects that we are engaging in, including the one that Alexander just linked. And the body design through bio and accurate interactions engineering. That might be of interest to people in the company community as well as in the bio community that's interested in using, for example, for example, headache. Combination with Gromax and PMS, which is one of the focus applications of bio Excel. Yeah. So the idea there that so the idea there is that in that in that in that demonstrate your research projects. The idea is to use Gromax to improve the antibody 3D structures and then accurately sample multiple alternative confirmations. And then the resulting information will be used as input in headache to model interaction with its target. This is exactly what you know what would be useful to do. But in a more general way, but I think why not to start with the antibody, especially nowadays you have nano bodies. Small enough to do all this kind of gymnastics. Yeah, and then apparently so then the next step is at some point is that the models of these complex or optimized with MD using Gromax to select a final model that can be for which the Mx can be used to improve the binding infinity. Yeah. Anyway, it's not my area, but if anybody's interested, please do have a look at this and get in touch with us and we can discuss this. Alexander has responded further, saying that as bio Excel would also be interested to push for more antibody antigen targets and to get interest from industry in this. Yeah, so this is something we have been discussing and already with with Alexander. So I got in, you know, I had, you know, thanks to him, you know, we had some contacts with some fellows that will, you know, assemble structures on antibodies. Actually, not structures, but look at the PDB to find out groups that actually work for work with antibodies and body complexes and try to have a nice list of these, you know, groups and be able to solicit targets from them. And we actually have quite a bit of targets, you know, at the view be where I where I work because they are the world center for for nano bodies. They, you know, they turn in, you know, protein nanobody complexes all the time. So that could be, could be of interest as well, although these are, you know, not really full, full antibodies. But anyway, this is something we are looking into and, you know, I would be very happy if we get more targets, any nano bodies or anything else. Yeah. So I see one attendee has raised their hand that may mean that they are interested in speaking to ask a question, rather than entering questions thing. I will just try this. So that the attendee is Noel Carascal. I will unmute you, Noel, and see if you want to ask a question. If I don't hear anything shortly, I will mute you again, but go ahead. Okay, I'm not, I'm not hearing Noel. So maybe it was an accidental. Oh, I can hear you now. Yes. Okay. Sorry, I'm doing it from my phone. So it's kind of, anyway, you're a bit soft. So please speak up. Okay, my question is, is this a purely geometrical kind of exercise, or do we have to infer nucleotide or one in diphosphates or ATPs that are attached to the protein or post translational modifications are those provided to the for the contest or we have to sort of guess that they would be there. Okay, so for that, I mean, that's that's a fair question because this, you know, is a is a added complexity. But for the time being, you know, we get actually we work on targets that are being offered by structural biologists. In other words, people who have already analyzed the structure of a given complex. So whatever information is necessary to build the complex is more or less, you know, is is communicated to the to the predictors. So far, we didn't have cases where we where there were PTMs in other words post translational modifications for relations or anything else in the region that was important for, you know, in the interface region in other words region that was important for prediction, predicting So in general, this was not a problem. Okay. Yeah. Okay, so you will provide that information on the case by case basis. If there is an information say yes, okay, you know, there is, but you see if you say that that is a particular residue which has to be, you know, was correlated and then, you know, and it's important for the prediction you're already giving away where the interface would be so it depends. So far we didn't have this problem. Thank you. And it's not only geometric it's really, you know, it's really building a model which is physically chemically reasonable. I see. Thank you so much. Welcome. Thank you for your question. I will mute you again. So, if and nobody has any other questions. Then it remains for me to say to thank you again so shut up. And I have on the final slides that is showing now attendees can see the more information about bio Excel. So if you're interested have a look at our website. Thank you very much again. Thank you. Okay. Good. Thank you. Thank you, Mark as well. You're welcome. Okay. That concludes this seminar. Okay. Finished. Bye everyone.