 Welcome everybody to the BioXcel webinar series. The series is supported by the European Center of Excellence for Computational Bermolecular Research. So the today's speaker will be Alexander Bonven. Alexander will tell us about ADOC, in particular the new feature of ADOC, and it will also provide us a guide demo on ADOC. Alexander is Professor of University of Utrecht in the Netherlands. He studied chemistry in Lausanne, Switzerland, and then he moved to Utrecht for his PhD. After a period of post-off and the Whale University and the 80-H Zurich in Switzerland, he joined Utrecht University again, and then he was a point-full professor in computational structural biology in 2009. He's a member of the Server-European project, and he's wrote a lot more than 200 publications, and has been in several positions inside the University of Utrecht, and since 2019, he's the scientific director of the BioFoot Center for Molecular Research in Utrecht. Now I will give the present right to Alexander so he can speak about ADOC to you. Well, welcome everyone to this BioExcel webinar. It's a pleasure to have the floor today to talk to you about some of the new features of ADOC version 2.4 and the new web server that we have operating since a few months, and actually officially in production since about a month now. So I'm going to discuss with you first some general introduction to ADOC in case you don't know what ADOC is about. I want to discuss a few aspects of the restraints that we are supporting to guide the modeling process since ADOC is an integrated modeling platform, so we try to make use as much as possible of information to guide the modeling process. I will tell you some dos and don't here based on what we see from our users. I will then describe what are the new features in the 2.4 version of ADOC which covers both say the local installation of ADOC but also the web server. And in the last part I will take you to the web server and go through the different menus to highlight some of the new feature and some of what you can do and not do and what you should do in some cases. So we are speaking about structural biology of interactions and what you see here is a slide showing a collection of methods with experimental methods on the right side where you find the classical structural methods, crystallography, NMR, cryoelectron microscopy these days. And on the left side you find more the modeling techniques. And these days you see more and more that it is the combination of computational experiments that bring you to the full picture of macro-medical complexes. Even with crye, I'm going to really high resolution these days, there will always be system that escape high resolution and for which you will need some kind of modeling. Now, ADOC falls under the class of docking where we are trying to model the assembly of a complex based on the non-constituents of the part of the complex. So in a nutshell, ADOC is now probably almost 20 years old since the original ID, it was published in 2003. It started because of NMR data or actually classical structural studies by a lot of study complexes, which were not working but we had information about the interfaces where things are binding. And this is where the ID came from of using this information to drive the modeling process. So we can use all kinds of information in ADOC that tells you something about which residues or part of a structure are important for the interactions. And we have a way of encoding this rather as a fuzzy information into restraints to guide the modeling process. And over the years, we have added support for a lot of different type of restraints. I'm going to come back to that later but different types of NMR data. You can use bioinformatic predictions. And more recently, we added support for CryoEM. Pretty much since the start of the development of ADOC, we went up to six molecules or simultaneous docking of six molecule and now this limit has been pushed to 20 in the current version. Symmetry is also an important component here which you can leverage. So if you know that you are modeling an oligomeric or oligomeric assembly, you can use symmetry in post-symmetry. We do have flexibility accounted for during the refinement process and typically the final models are refined in explicit solvents. And since 2003, pretty much the early days of ADOC, we have also been constantly participating to the CAP-RE experiments, which is a blind experiment for the prediction of complexes and getting consistent very good results over the year. So how do we search space in ADOC? So you have an ensemble of molecule or collection of molecule, two in the simplest case when you're speaking about binary docking. And we try to use experimental or predicted information to guide this modeling and this is done using classical force fields with energy function representing the chemistry and the physics of the system, like bond angles, torsions, and the non-bonded interactions. And we add to these additional energy terms to represent the data that we have. So this is pretty much similar to doing structure calculation by NMR or refinement by crystallographic, using crystallographic data. The difference with NMR is that you need to account properly for the non-bonded interaction having full van der Waals representation and full electrostatic representation because these are important terms when it comes to interaction between molecules. Now, how do we perform the search in principle? If you have two molecule, your search problem is a six-dimensional one. If we assume that there is no flexibility in the system. So the search is a combination of energy minimizations and molecular dynamic simulations. We are using forces to drive the search. So we're calculating the derivative of the energies. Sometimes in a literature, you read that ad hoc is some kind of Monte Carlo process. No, it is not. So in a sense, this is very different from our Zeta, which many of you will know as well, which is a Monte Carlo process as in not using the forces. The forces are interesting because they tell us in which direction you need to move your system basically to find the minimum of your energy function. So how does the protocol looks like in ad hoc? The docking protocol, we have three stages in the current version. The first stage we consider the molecule as rock solid. So it's a rigid body energy minimization driven by the data that we put in. The second phase consists of a simulated annealing refinement protocol where we introduce flexibility along the side chain at the interface and then along the backbone. This part is done in torsion angle space, which has the advantage that you can freeze degrees of freedom, the degrees of freedom being the diagonal angles, in this case, and let over three to rotate while still allowing the molecule to move freely in space. So it's different than imposing position restraints, for example, in molecular dynamics. Now the final stage is a refinement stage in Cartesian space where we might or not use an explicit representation of the solvent around the complex. So first stage rigid body minimization when a molecule are considered rigid. Second stage simulated annealing with flexibility introduced at the interface. And this is typically done automatically by looking which residues are in proximity of the other molecule that you are trying to model. And in the final stage, you do a final refinement Cartesian space. Currently in the two four version, we only perform an energy minimization without solvent. In the previous version, you will have the solvent included and do a very short molecular dynamics. This option is still there, but we disable it, but a default is not to use explicit solvent since benchmarking showed that we were not winning much if anything. And this was only adding a computational time. Now the functions that you see here on screen are the scoring functions, which are different than the energy functions that you use during the calculations. And you see that the different stages in a protocol, the function changes. So at the beginning, we have a very low way to the Van der Waals interactions, which because the interface is not yet optimized full electrostatic, a dissolvation term. And this will be the restraining term where you see the air energy or any other energy term that you have there and a buried surface area term. And at the end, which is basically the final stage of Haddock. So we scaled on electrostatic to 20%. We have full Van der Waals interactions, dissolvation energy and the restraint energy. So basically those are illustrated here. So these two green energy terms here, we are only considering the intermolecular restraint and intermolecular non-bonded energies between the molecule. We use full non-bonded during the calculations, but in the scoring, we only consider these intermolecular terms. The dissolvation term is an empirical term, which is derived from the work of Juan Fernandez-Requio, which is basically accounting for the bonus or the price that you have to pay for dissolving the interface between the molecule. For example, it's a good thing to remove water from hydrophobic surfaces, but you're going to pay a price if you remove water close to charged side chains. And the last part represents the experimental data that we put into the modeling. In terms of flexibility, we account for several levels. We have an implicit way of representing flexibility, which means docking from ensembles of structure. So you can provide RADOC an ensemble of models. You should not use too many of those, otherwise the number of combinations between molecule is going to explode and you have a dilution problem because if only one combination of starting conformation will lead to good results, now you have the chance that you get those results become smaller and smaller as you increase the number of combinations. So we don't do a docking, we don't repeat the docking for each combination, but we do one docking run starting from multiple combinations. Now the explicitly is already mentioned that so during the second and the final stage of the refinement we add flexibility explicitly along the side chains and the backbone at the interface. Most of our users are actually using the web portal for RADOC and you see a snapshot of the new 2.4 interface. I'm going to come back to that one and give you a live demo. So we don't have yet an official publication of that particular portal but the two publications that you see on the left and bottom are the one for the first version. So we have the portal operating since 2008 actually with an update in 2016. So where do we stand now in terms of user base? So we have more than 17,000 registered users in the portal from more than 110 countries. We are able to provide those resources and all the computing associated with RADOC because we are operating currently as thematic service under the European Open Science Cloud Hub project. And in the past we have been operating also under other EU projects that gives us access to grid computing, high throughput compute resources distributed around Europe, but also worldwide. For example, we got access recently again to the US Open Science Grid resources. We have resources in Asia as well and several high energy physics sites started providing us resources in the last months to support COVID related research. The total number of users you can see here and these are different services operated in Utah. Now we have been processing on average around 3000 docking runs per month, which means about 100 docking runs per day on the server over the years. And you see in the last couple of months there is a strong increase. We have actually double our capacity or processing capacity because of COVID. And we are also now monitoring the submissions at COVID related asking users to tag those and you see that these are coming up as well. So this is really like the April and May we see a COVID related effect on the use of the server. So these submissions here translates into more than 10 million jobs that are sent to these distributed grid resources mainly around Europe, but also worldwide. So Hadock is one of the core software in the Bar Excel Center of Excellence, which organizes this webinar series. So we are in the context of Bar Excel improving the software, making new developments. The new portal was developed and completely rewritten from scratch under Bar Excel, the first version of the project. We are now in the second phase of Bar Excel. And Bar Excel also operates a forum where you can go to ask.barxcel.eu and we have a Hadock forum which has quite a large number of posts where people can ask questions and they can, the forum is freely searchable. So you can search the forum without being registered but if you want to ask a question you need to register. And if you register, you can also answer questions from other users. So far it's mainly developers that are answering who are answering questions but in principle you can contribute to that as well. It's also always a good idea before asking a question to check if it was not already asked and answered because the forum is searchable. This saves us time and efforts on our side to provide user support. So Hadock is our core software in my group in Utrecht but next to Hadock we have a number of other services that we are operating all offered as web portals that are all related in some way and not over to biomolecular complexes. So we have bioinformatic prediction methods. We have hotspot prediction methods, methods to analyze the content and the consistency of distance restraints. This is mainly linked to mass spectrometry and crosslinks, trying to predict the affinity of the complex because the scores that I described before are not predictive of binding affinity. In general, if you look at docking software, the score of different docking software correlate very poorly with the binding affinity of a complex. Two recent and newly developed server Pro-ABC to predict the paratope in antibodies and PDB tools which is a web portal to our PDB tools script which allow you to manipulate PDB files like preparing those for using Hadock and we have now a nice new interface to that one. So this was kind of the general introduction. I want not to switch a bit more to some specific of the type of restraints that we are supporting in a portal. And actually the first point here is how do we encode interface information into distance restraints to drive the docking? So you associate distance restraints over the distance as a connection between two points but here we don't, when you have data that are pointing you to a region on the surface of a protein that might be involved in binding and you don't know to what they are binding, it becomes tricky to define distance restraints to bring those interface together. Now, we have found a solution for that and the solution actually is coming from NMR from the work of Michael Nildres and Axel Bringer a long way back because NMR is dealing with assignment problems where you have to assign signals to pairs of protons but there is ambiguity in these assignments and you need to be able to deal with this ambiguity. And this is what we are using here in Hadock to basically bring together interfaces that you have identified or predicted in a way that they will be pulled together by some kind of distance restraining function but their orientation is not predefined. So how do we do that? We have the concept of active and passive residues and this you will find back in the interface of the server. So an active residue will typically be something, a residue that you have identified from experiments so that you predict that should be at the interface. So it's like a critical residue for the interaction. Now, since you never map perfectly the interfaces from an experimental perspective, you might be missing part of the interface. So we define passive residues typically as the surface neighbors of the active. So in the case of binary docking, so two molecules, so you might have a set of active residue on one molecule and a set of active residue on the other molecule plus the corresponding passive on the surface. So now what we are going to do is to define a distance restraint between each active residue of each molecule and all active and passive residue on the other molecule. So if we are, what we're going to do is we calculate all combination of distances between all atom of this active residue and all atoms of all active and passive on the other side. So if you, let's assume that you have on average 10 atoms per amino acid. So if you have one active on one side and 10 active plus passive on the other side, you can calculate 1000 individual distances. What do we do with those distances? We're going to sum them according to the function that you see here on the right. The sum of the whole combination of distances of one over the distance to the six power. This is kind of a dipole interaction in NMR. This is where the potential comes from actually, but this is also the attractive part of Elena Jones potential. So you sum all those distances, you take the inverse six root of this and this is giving you one distance, one effective distance. And this effective distance is what is entering the energy function potential, which is only harmonic for a very short region in this region here and then becomes linear, which has the advantage that when you have large violation of your distances, which is going to happen in docking when you start the docking and the molecule are separate in space, but also when you have false positive data in your set. So if you are far away in terms of distance, the force here is now constant. The energy goes up, but the force becomes a constant and this is avoiding that your simulations blow up because of numerical instabilities. So for each active residue, we have basically one distance restraint that's going to pull the interface of the other molecule toward that residue. If this residue makes a contact with any atom from the active and passive on the other side, the energy will be zero. And that's the way that you can encode very fuzzy information to drive the modeling process in head up. What we also do by default, because when you are using data, they might not be perfect. If you are using bind formatic predictions, typically you might have a lot of false positive in your prediction. So we buy default randomly delete 50% of those distance restraints for each docking trial that you are doing, meaning that different models will be originating from different combination of the restraints that are defined in the first instance. In this way, you discard from time to time, from time to time, the bad data, and you hope that you're going to get good results, but you're also discarding from time to time the good data and you might be searching in a wrong region of space. And then it will be up to your scoring function at the end to distinguish what is good from what is bad. So in Hadock, we have three ways of defining or entering distance restraints into the computations. They are called ambiguous restraints and ambiguous restraints and hydrogen bond restraints. This is basically just a naming because you can put anything into these different categories. So these are just different categories. Each category can contain in principle both ambiguous and an ambiguous distances. So an ambiguous will be, I know exactly which pair of atoms should be in contact, like a cross-link from mass spectrometry. And you can also give different scaling constants to the different class of restraints. So it's just, we could have called them apple, pears and bananas instead of ambiguous and ambiguous and hydrogen bonds. But you will find those back in the interface of the portal. There are a number of important differences in the way that you input the information. Ambiguous restraints, by default, 50% will be randomly removed. This can be turned off also on a server and in a local version, of course. The unambiguous restraints, anything that you put into the unambiguous class, basically, will always be used. And that you cannot turn on. And the hydrogen bonds class is by default not used in rigid body stage, but only in the following stage. And the idea is to define maybe hydrogen bonds if you want to maintain the secondary structure of your protein while giving it a lot of flexibility. Now Hadock is using CNS, crystallography and NMR system, developed by Axel Bringer's lab. This is actually where I did my first postdoc a long time ago. So the syntax for the definition of those restraints is based on CNS or X-Pro syntax. The two syntax are actually exactly the same. So here is an example of such distance restraints definitions. So you see first on the first, say box on the top left, this will be a residue based ambiguous interaction restraints. So here we assign a distance between residue 38 of chain A and residue 15 or 16 or something or 57 of chain B. So this is all the residue on the other chain. And here is the distance range. So here you see three numbers. The definition here is that the lower limit is the first number minus the second number. So we put a lower limit of zero. We use van der Waals interactions. So the van der Waals are going to take care that you don't have clashes. And the upper limit is the sum of the first number plus the third number. So this is too extra. So distance range for this ambiguous interaction with strength is between zero and two. This might seem surprising considering that the minimum distance of approach of two carbon that are not covalently bonded to each other will be around three extra. So the distance here is shorter than the shortest distance that could take place in real life. Why is that the case? If I go back to the energy function, it's coming because of this function here. So this effective distance has the property that it's always going to be shorter than the shortest distance that enter the sum. And you can do your mathematics. If you take, for example, thousands distances at five angstrom distance and you do the summation and you take the inverse six roots. So you don't take the average, you take the sum. You transform back to the effective distance. The distance that come out of it is about 1.7 angstrom. Okay? So that's the reason why actually, whoops. That's the reason why your upper limit is always quite low. Now here on the right, you find an example of a definition of not very specific distance restraints between two atoms. And this will be an example of how to define a cross link restraints, applying the cross link to the C alpha carbon in this case. Since we might not know what the side chain is doing. So now this is a specific distance restraints between two points where we specify the chain, say guiding the syntax, the residue number and the atom name. And again, here at the distance range, now it's between zero and 23 angstrom. This is coming from the chemistry of the cross links that you are using. And here is another example where we have a mixture of things. So this would be kind of an ambiguous interaction restraints, but not specific for protein nucleic acid interactions. It's because the specificity in terms of binding to DNA must come from the basis and not the ribose, because ribose is the same for all. So we have here residue 27 of chain A, which will be a protein, two residue 32 or 33 of chain B, which is nucleic acid, and only the atom that are in the base. So we don't want to define the restraints of the ribose. So there's a lot of flexibility in a way that you can define distance restraints in, in CNS and other, and that's really quite unique to the CNS and explore software. Now, what should you don't do basically? And that's an important point. We see real, we see people using the server that have no information about where the binding sites are, and that start defining the entire protein as active or passive. Now that's not a very smart thing to do because it means that you're going to calculate all distances between the two molecule and that becomes an N square problem. The computations become slow and it's a waste of CPE resources. So if you don't know the interaction sites, there are other ways of dealing with that. First of all, you might have an interaction site for one of your partner, but not the other one. So what you could do is to only define the solvent accessible residues on the second partner. And that's, I'm going to show you in a bit, that's what we are doing, for example, for antibody antigen modeling, if you have no information on the binding site for the antigen. If you don't have any information for both molecules, then you can use the ab initial options in Hadock and there are two modes, center of mass restraints, which is basically on the fly defining a distance restraints between the center of mass of the molecule, the two molecules and this is based on the size of the molecule. So the distance is automatically defined. And the second one is a random air restraints. So where we randomly picked a solvent accessible residue on both sides, define a patch around that one and the ambiguous restraints are defined between those patches and then you have to sample the entire surface. So you need to increase a lot of sampling in those cases. So like this surface accessibility option is something that have to have been building now in a server so that people can simply by clicking on the button automatically select all the solvent accessible residue of the partner molecule. Now Hadock also supports multi-body docking more than two molecule. Here the search problem is much more complex. So it's really makes sense to use multi-body docking only if you have some reasonable idea of where the binding sites are. Otherwise I would not have much trust in the results unless maybe you are dealing with a symmetrical system because symmetry helps a lot in reducing the space that you have to search. But this is a recent paper that came out where we actually testing different protocols and one of the protocol is for the antibody is always easy because you know the hyperviable loops so that's information that you can give to Hadock on the antigen side it's much more difficult to predict where the binding sites are. So there is no good bioinformatic metals to do that. So you might be targeting the entire surface of the antigen using the surface accessible residues. And if you have of course more specific information you can target the DAP top dive. So if you want to know more I refer you to this paper. Now what are the other types of restraints that are supported? We had since quite some time NMR based orientational restraints in the form of residual dipolar couplings, pseudo contact shift, relaxation and isotropy I'm not going to go into details here. You can give diurnal angle restraints this will only affect of course the intramolecular terms because you cannot define diurnal angles between molecule. We have a radius of duration restraints which was meant to include a radius of duration derived from sex measurements. In practice it's not doing much. We didn't really follow a benchmark it but it's not giving better results than just the center of mass restraints. Symmetry was mentioned. This has been there since quite some time. To define symmetry we use the concept of symmetrical distance restraints which was again introduced by Michael Nages for an NMR problem. So you can in that way by defining pairs of distances which are required to be equal. So you don't define what the distance should be but you just require that symmetrical pairs of distance should have the same value. You can impose symmetry and you can define different types of symmetry and now we go up to C6 which is new. We also have non crystallographic symmetry restraints and this is the equivalent of an RMSD restraints should be zero basically. You enforce that the molecule are exactly similar if you superimpose them but you don't have any symmetry operation between them. So now some of the new features in Hadoq 2.4 that were not present before. First of all, I mentioned already we can go up to 20 molecules and the first implementation of that was described in this nature method paper. We have support now for cryoelectron microscopy data. So the protocol itself was described already in 2015 but it takes a long time to implement everything, make sure that the web portals are working and everything. So this is the reason why the portal as the new version 2.4 has only recently officially released. So here the key is, so we use cryoelectron density which we should can transform in a kind of a crystallographic density and CNS the computational engine of Hadoq has all the tools to actually handle those. Importantly here is that we don't turn on the density from the start as an energy term. We first pull the molecule into the density using the concept of centroids. You need to define the position of those and there's a way of doing that using some other tools that we have and once the molecule are inside the density then you can refine directly against the density. So this is all described in this paper. Another rather recent addition from last year was to actually enable coarse graining into Hadoq by implementing the martini force fields. And this has been done now both for proteins and for nucleic acids and the implementation is shown here and the reference are here at the bottom. So the server is going to take your full atom model automatically convert it into a martini based bid model. We do the docking at a coarse grained level and now the final step, the final refinement step consists of transforming back the model into an old atom model. So we are morphing actually the conformational changes at this stage and this is using again a concept of distance restraints. So we have support for protein and nucleic acids. If you use the local version you have to manually disconversion the first step. If you use the server it's just clicking a button and things happen. So what are our things that have changed or are available? So we have now implemented automatic detection of the amino acids. This was a feature request from users that were dealing with bacterial peptides that do contain the amino acids. We support cyclic peptides and this is the peptidic bonds which I'm meaning here by cyclic. So if the distance between the C and N terminal is close enough we will create a peptidic bonds for cyclic peptides if the option is turned on. The server was always automatically defining the histidine protonation states. Now we added an option to do that automatically based on the electrostatic energy so that you can also use this if in a local version of Hadox so you don't need to manually define all your histidine protonation states. The server still by default uses more probability for doing this. We switch on nucleic acid nomenclature to one or two letter codes because the old server 2.2 still uses requires a free data code for nucleic acids. So now we switch to one or two letter code which means also that you could have a mixed molecule consisting of half RNA, half DNA which was not possible before. We are also offering an option to rebuild missing side chains and missing atop in the context of the complex. So this is interesting if you just want to refine a complex because by default Hadox will build all the missing side chains. It's always doing that but it's doing that for the isolated molecule so you don't account for the presence of the other molecule. So now this is a new option. Also interesting if you want to do metagenesis for example. We have also added options to fix molecule in their original position so that they are not going to be randomly rotated and moving in space but they are fixed. For example, if you have already fitted some molecule in the density and you don't want them to randomize orientation this is one of the things that you can do. We switched to a distance dependent dielectric because it was giving us better results. The solvent shell is no longer built by default in the final stage but you can turn on this option if you want. And we also changed the analysis part so that you can turn off completely the energies or not do clustering. Clustering only is the default now because it saves time, processing time on our side of the pipeline. So now it's time to switch to the live demo for the last, say, five minutes to leave some time for questions. So I'm going to switch to Chrome and if everything is okay, now you are seeing my web browser. So this is vnmr.science.u.nl which is kind of the official entry point for a different portal and you see them here. So we're going to go to, first of all, what has changed compared to the 2.2 portal. When we're using the 2.2, you will enter a lot of data into the portal, into the forms and at the end, you will enter your credentials to submit the job. Now in the new Hadoq 2.4 portal, you first need to log in to the portal before starting the submission. Through the HGI check-in here, you can log in using your home university account if you enable these options or you can have your own login. If you are not registered, you should register. So now I'm going to log in. So okay, I'm in the system. I go to Hadoq 2.4 and this is the entry point. So you have the registration part. This is where we're going to go to submit a new job. We have links to tutorials and you know, I've linked here to the forum by Excel forum where you can find a lot of information. You can see user statistics, world map of users. This is what I was showing you. What is also interesting is the list of modified amino acid supported by Hadoq. So if you are dealing with special amino acid, it could be that we are supporting them. There is no guarantee of that. But you see here, for example, acid-related lysine. So lysine with an acid chain on its end terminal. So this is the list. So if the amino acids that you are interested in is not in this list, then the server is not going to support it. You could define it as heteroatom for the submission, but then it's not going to be connected to the preceding and the following amino acids. So let's go to submit a new job. Now, so for those of you who have used the 2.2 portal before, you will realize that this has a very different look and feel. So let's do a test, run. Then you can submit. So nowadays, the portal also has a little bit of information next to each field to tell you what is the meaning of a field. So I'm going to submit. I want to use all chain in a model. So let's find a model. Okay. You see here the number of options. So is it protein, peptide or ligands at the same category? Nucleic acid, DNA or RNA, before you had to specify separately if it was RNA or DNA. You might also upload a protein nucleic acid complex like a nucleosome onto which you want to dock a protein shape option. This is something unpublished and we're still working on that, but we can also now use shape in a modeling process. I'm not going to give much more detail here. Here you see the new option and you see when hovering over the options, you get some pop-up windows that are telling you what's going to happen. So if you want to use the coarse-graining into Haddock, you just need to switch that, to flip that switch. By default, all molecules are going to be coarse-grained. So when you do one action here, it's going to be applied all of them. There is the option to check for cyclic peptide and you see the option here to fix the molecule at its original position. So these are all new options that were not present in the previous version of the portal. By default, the termini of the proteins are always uncharged because very often you don't have the full sequence in your structural model, but it's a capped model. So they should not be charged at the end. So let's upload the second molecule for things to work and now we go to the next. So at this stage, a validation happens on your model. In the previous version of the server, you will submit everything and you might get an email that something was wrong with your input PDB. Now if something is wrong, the server is going directly to notify you and tell you what is potentially wrong. So you cannot submit before you have passed all validation. Now the server has extracted the sequence and you see the secondary structure is coded and this is the menu where you can define now a list of active and passive residues. So you see that for the passive by default, we automatically going to define the residues are those that are around passive. So here I could, there are different ways now, you could even select from the sequence. So let's say I want to take this one, let's take orange as trend. I want to have this beta strand as active. So here is again, the sequence viewer. If it goes right, I should be able now to define yes. This works fine. So you see a select residues, they pop up here. You can automatically define passive residues. You can now also on the server visualize what you have selected. And you see here that this is the beta strand that I selected in this case. So that's just a random selection, but you have this new option of visualizing your data. For the second molecule, let's say I want to define now automatically all surface residues as passive. This is one of the new option that I showed you. You could define manually histidine protonation states, but the server already figured out what should be the protonation state of those histidine, but you could override that. You have the options for semi-flexible segments, which by default is automatic, but you can say that it should be fully rigid or fully flexible. This will be the menu for where you will be entering the cryo YAM restraints and defining the possible position of molecule in the density. So let's go to the next. Now, so we started with the input data where you get the PDB files. We went to input parameters and now we are in the docking parameters. So here you see many options. This is, if you had custom distance restraints, like the example I showed you, this is where you could input them. So now you can no longer input ambiguous one because you already define ambiguous one, so you could have all ones. You have here options, like if you want to turn off the random exclusion of data, you can do that here. Here you find random patches and center of mass. So these are the two ab initio options that I was speaking about when discussing the ab initio mode of Hadox. So this is where you could turn those on. And you see now a lot of menus and each menu has specific settings, like the sampling parameters. Depending on the access level, when you register first to Hadox, you get easy level. So depending on the access level that you have, you will be able to change parameters or not. So we limit these kind of things here. So you see here the default settings for the number of models. You say, yes, we perform final refinement, but we don't refine with a short molecular dynamics in explicit solver. That's the default. If you want to do that, you can turn it on. I'm not going to go through all the menus here. It's simply too many of those and too many options. What you can see here is, do you want to have a full or limited analysis? This was also a new option. So no analysis, only cluster base or full analysis. And finally, you can do your submission. And now these days because of COVID-19, you can tag your submission as being COVID-19 related or not. This one is not. Okay, so now if you go to your own account, you also have job information and you see here a list of jobs that are still there. So let's look at this one, for example, that will bring you to a result page of the server if the page is still existing, yes. So this is the results where you see now the typical results of ad hoc cluster based. So this looks like it was before. Some new options here is that you can again, visualize the results directly online on your web browser. And if you go to the bottom of the page, you find plots that are showing you the clustering statistics and you can turn all the plots are linked. You can turn on and off specific cluster and see how these compare for different energy scores. So this was my very short demo of the portal. I'm going to switch back to my presentation now to finish things. So there is number of ad hoc related resources that you can find online. So I've shown you the web portal with some hiccups when I use Chrome. You can look at the statistics how many jobs are being processed and being running. If you are looking for special amino acids, look at this particular link. We have an online manual now for ad hoc 2.4 which has been updated describing all new options. And we have a lot of tutorials also on our board and lab website describing different scenarios from simple protein protein docking to using cross links to dealing with oligomeric symmetrical oligomers using C-alpha, C-alpha restraints. This will be kind of a template-based modeling ab initio docking, lingon binding site docking, antibody antigen that one is going to be online anytime. And a very much integrative modeling tutorial making use of all different tools this week's PowerFit and ad hoc. To finish with I want to thank the entire team of people at Utrecht University in my group over the years who have all contributed to the development of ad hoc. And in particular, the ad hoc 2.4 portal developers, Brian, Rodrigo, Mikael and Jörg. Brian and Rodrigo are both working for by Excel. Mikael and Jörg were former postdoc in a group. With that, I'm finished. I will be glad to answer a question after showing you that there are already two new webinars that are going to take place in autumn. One about bimolecular simulation workflows using boy excel bidding blocks by Adam Hospital and one about mollywood streamlining the design and running of molecular movies. And now is the time for questions. Thank you very much for your attention. Thank you, Alexandre, for the very nice presentation and tutorial. We've got a number of questions. The first of which is from Thibault Freide. Thibault, if you're there, I haven't muted your microphone and you're welcome to ask your question. Otherwise, I will be reading your question for you. Thibault does not appear to be there, so I will ask the question in his stead. The question Thibault asks is, how does Hadek's performance compare to other similar software? We are, of course, the best. No, I think it depends very much on the problem that you have. If you want to get an idea of kind of blind docking, you should look at the cap results and over the years, since we started participating in 2008, we have been doing very well. So that also, it depends on the information that you have at hand. So I think one of really the special feature of Hadek is that if you have data, you can input in those as restraints to really guide the model. So it's very different than abinitio docking software like ZDock and Klospro. Now, if you want to get a comparison of what you can do in the structure paper that I showed you in my slides, here we are comparing actually Hadek with ZDock and Klospro and LightDock. So four different docking software that have options to use some information to filter solution or to guide the modeling. So using the data to guide the modeling over software, using it more to filter. And if you look at the results here, in this case, we are getting the best performance. Klospro is doing also very well, but using the data to drive the modeling allows you to generate better quality model in general. So that will be my answer. But if you have very little data, I think there's a really good software out there that I'm doing an excellent job. And another integrative modeling software that I should mention here is IMP for Andres Salli, who is also doing an excellent job. Thank you for that answer. I'm very gonna quickly go to a rather interesting question that we've gotten, which is, is Hadek free? Yes, Hadek is free. So it's free for non-profit usage. If you are a company and you want to use it, then we have some licensing conditions, but registration to get the software, so we do ask people to register to obtain the software. We do ask people to register for using the portal, because we are also providing a lot of computational resources for free. So registration is required, but Hadek is free. Thank you very much. The next question we have is from Nitin. Nitin, I have unmuted your microphone if you would like to ask your question. Otherwise, I will ask it in your step, in which case I will ask the question again. Nitin asks, can we perform rigid body modeling for the multi-domain complex protein by using SACs and PRE data, e.g., all domain structures are known individually, but by putting them together? Yes, so that would be a scenario which would be interesting for Hadek. So one type of restraints that we don't have is we cannot input like the SACs curve as a restraints. So we have published paper of using SACs data to filter docking solutions. So you could use the PRE data as distance restraints. So that's a natural way of using those in Hadek. And people have been using that since many years. So we have several papers where this is being used. So you could either use your PREs to generate model and filter them afterwards using your SACs profile. The shape information that I shortly mentioned as being an option on the portal is something that we are working on now, which will be a way of basically including a SACs shape into the modeling process. But that's work in progress. Thank you again for the answer. Our next question comes from Bob Schifrin. Bob, I haven't muted your microphone if you'd like to ask a question. Hi, yeah, could Hadek be used to dock domains onto other domains within the same protein or like linkers between them? I know the portal says a minimum of two molecules, but is there a way around that at all? So the portal has a minimum of two, but of course you can catch a molecule in pieces and treat the pieces as a simple separate molecule. So this is one way of modeling actually large conformational changes. So we have a paper which is already quite some years old now where we do flexible, multi-body docking of domains to represent large domain motion type conformational changes. So when doing that you can, if you have some information about your interface you can try to do that and you can add additional distance restraints between the termini of your domain. If you know, so flexibility between the domain might be an issue. So you might want to define those regions as fully flexible and use rather loose distance restraints first to reconnect the domain. Or even if you are missing parts because it's not with the density, they are disordered, you have no information what they're doing, you could dock and if you know that you are missing four amino acid you can estimate what is the maximum distance that four amino acid allows me to bridge and you will define distance restraints between the termini of those domains. So yes, perfectly possible. Great. Thank you for that answer again. Our next question comes from Roberto Maya. Roberto, I have a muted your microphone if you'd like to ask your question. Hi, Alex. Otherwise, I would love to. So I was wondering, especially if you want to use PRE data as let's say as physical constraints, I knew that you have some, let's say new modified amino acids or let's say structures that will be coupled to the target protein. So how you can physically define this constraint by using the MTSL in the biot protein? Yeah, so we have a modified, we do support a modified system which has MTSL attached to it. I think it's called CYM, if you look at the library. But in general, so the issue with the tags that you add to your protein where to measure PR is that those tags are usually quite flexible and so you don't know what they are doing. So the flexibility here is an issue. So I will rather recommend to put the distance restraints to, if it's a system that you mutated, put it to the sulfur atom and you can add a correction to the distance that you measure to account for the group that you are not representing and the flexibility as well. Because if you describe explicitly the tag and the tag is in the wrong conformation or you might get into trouble during your modeling. Okay. So continuing this way, how you can discriminate some portion of the structure that should be far away from the interface, the medic interface, how you can define constraint. So you want a constraint to define that something should be far away. So you want a negative distance basically kind of. Okay. Great. Well, it's not a negative distance but you want to say this should not be closer than 20 extra. Is that your question? Yep. Yeah. Thank you. You see my presentation? Yep. Yeah. So here you have a distance. So I'm looking at the MS cross-link restraints. So here you have a distance which is between zero and 23 angstrom. So if I wanted to have a distance which is say 23 angstrom and higher, if I was to define here 23 zero that will put the lower limit to 23 angstrom and then you should add a large number as third one because it could be 200 angstrom or something. So if you want to have like a minimum of 20 angstrom you could define as the three numbers that you see on stream, 20, zero, 200. Okay. Great. Thank you. Yep. Thank you for that answer. I'd like to start by before going to the next answer I would like to let the attendees know that in the interest of time I will only asking the next three questions but if anyone has any further questions you're more than welcome to post them on bioexcel.eu. There will be a forum for this talk and answers can be answered there as well. Our next question comes from Lina Rosano. Lina I have unmuted your microphone if you'd like to ask a question. Otherwise I'll ask it in you. Hello. Hi there. So I have a question where I can see that you can set the active residues to interact from one protein to interact with the active residues on the other protein. So can it be like very specific like residue 31 interact with residue number 50 in the other protein like one to one interaction? Yes. For that, so I'm again back to the slide with the distance restraints so you can do that but you will have to prepare this file by yourself and then you upload it in the portal in the submission tab. So if I, I don't, that's me. I'm switching back again to the my browser. So you have the, this is the submission tab. So you have one which is called input parameter. So that's the second tab. Sorry, the third tab, docking parameters. And here you have the possibility to define to upload restraints file. Okay, so the file that I was showing you the presentation, those restraints. So if you create this file by yourself and you know exactly what kind of contact you want to have you can upload it here by simply selecting uploading a text file. Okay, so you're not going to define active passive in a first menu of the portal but you're going to upload a distance restraints file which is customized to what you want to do. That's what you have to do for example also if you are using cross link from my spec commit. Okay, thank you Mr. Bonvin. You're welcome. So before we get on to our last question there's a couple of quick technical questions that have been asked. One is whether there's a limit to the number of submissions you can have to the server at any one time? So we do not limit the number of submissions but we do limit the number of active runs per user. So we add in a recent months because of COVID some users are meeting 900 jobs to the server. So this was a 2.2 version of the server. But if you were to allow all those jobs run at the same time then it will block the server for all the other users. So we have a maximum number of job per user so that all the users will always go through. So we have to be patient at some point if you upload a lot of jobs. The other technical question is whether there is an upper limit to the size of the molecules you can submit to the server? Yes, there is. So this is something that we have defined in a server. I don't remember exactly. So I know that like we have been accepting jobs it's quite large. We have been accepting jobs that were represented the entire spike protein of SARS-CoV-2 with AC receptor. So this was going through but you have to realize that the computing time is going to take more and more. If you're going to very large molecules my advice would be to turn on the course granting option because this is going to speed up your computations by limiting the number of particles. But I don't remember what is the current maximum number of atoms that we have but there's a limit because, you know, CPU is limited and memory is also limited. So we cannot accept a huge system like the whole ribosome will be challenging. You can do that, but then all resources. Thank you very much for that answer. And our last question comes from Buj. Buj, I have unmuted your microphone if you'd like to ask a question. Otherwise I will say it first. Hello, Alexander, very nice presentation. I'm really looking for new options in Hadoq. My question was about the antibody and antigen blocking. So I was wondering what kind of changes are happening during this blocking on the antibody structure, especially if we consider this CDR regions, especially in the CDR tree heavy chain can be quite long and absorb like really major conformational changes on the CDR regions upon docking or what kind of algorithms do you introduce? And I will also take into account these end glycosylations that can happen on the CDR regions and what kind of strategies do you address for these difficulties? Thanks. So end glycosylation is not supported. I think someone should mute the microphone, yeah. So end glycosylation is not supported. You could have the glycans as a eteroatom. So the server does support cofactors and small ligands as well. So this was not accounted for. In terms of conformational changes, what you can get during the flexible stage of the refinement, it's not going to do miracles. So in general, the more specific data you have, the more conformational changes you can induce. For antibodies, the H3 loop is the most challenging one. So if you look and in the paper, we actually discussed exactly this point. So you should read the paper. But so we don't see very large conformational changes. You see some improvement in the conformation of H3. But what we see more is that the contact that I make are improving a lot. So you might have the right contacts generated even if the backbone conformation is not yet perfect. And I think contacts are important because if you want to do engineering or start doing mutations, predicting the correct contact might be more relevant than predicting the exact correct backbone conformation. But for things that are challenging in terms of conformational changes, it might be better to try to pre-sample conformation using some other methods and then give an ensemble of conformation to the server. But look up the paper if you want to know all the detail about the antibody molding here. Thank you very much for that last answer, Alexander. And thank you again for the very interesting presentation and demonstration. Thanks to all the attendees for coming to this great talk. And I just want to end very quickly by reminding everyone that we will have more presentations on bioxl.eu webinars this coming autumn. Thanks, everyone. Bye, everyone. Thanks for attending.