 Good afternoon and welcome to the latest of BioAccel's webinar series and today our main presenter is Guido van Zunderth and he's going to be presenting on robust solutions for cryo-EM fitting and visualization of interaction space. After Guido has finished and we're going to get a little bit of a demo of some of the things that he's been talking about with my colleague from BioAccel, Mikhail Trele and from Jörg Scharschwitt. My name's Adam Carter. I'm one of the people involved in the BioAccel project. So I'm going to tell you a little bit about the project just for no more than three minutes really and then we'll get on to the main presentation today. So before we go any further, I should let you know that this webinar is being recorded including the question and answer session at the end. So we will then be posting it on YouTube so that we visit it again later on, but you should be aware that we've been recorded. Okay, so just a very brief introduction to BioAccel because we're, I expect that many of you are now familiar with the project as we've been going for a while now. So BioAccel is a new centre of excellence for computational biomolecular research. It's built on what we describe as three pillars. The first one is excellence in biomolecular software. So we're actually developing and improving some important pieces of software as part of the project. We have some of the lead developers from these programmes, from Gromax and Haddock, for example, in the BioAccel project. And we're working to improve the performance and efficiency and scalability of these codes. But as well as the codes themselves, another important aspect to what we're doing is trying to improve usability of these codes and other related tools. So we're looking at the whole kind of process that these pieces of software are used in. So we're looking at how workflows can be used to automate processes around the simulations themselves. And finally, an important part of the project is consultancy and training. So we want to be sharing best practices from amongst the different people in the community and training end users in subjects related to high-performance computing and the pieces of software that are shown here. One of the ways that we want to interact with the wider communities through interest groups, this slide shows some of the interest groups that might be of interest to you, in particular the integrative modelling interest groups, probably related to some of the work being talked about today. So if you do want to join these interest groups, they're free to join. You can just go to bioxcel.eu and you'll see how you can join these interest groups. We have forums online. We can also do things like host code repositories. We have a chat channel and various other pieces of software and things that we can offer to the interest groups. But also we have a budget for face-to-face meetings as well. So we can bring people from these different communities together to discuss their work. And we will have some time at the end for questions and answers. So the best way to ask a question is for you to type it into the question box and you'll see it on your GoToWebinar control panel, which will look something a bit like this. So if you type your question in there, I'll invite you to ask it with your microphone if you have one or I can read it out to the speakers. So now it just falls for me to introduce today's main presenter. And we're very happy to have Giro van Zundert with us today. He studied chemistry and nanomaterials at Utrecht University in the Netherlands. And then he obtained his PhD at the Computational Structural Biology Group under supervision of Professor Alexandre Bonvan, who is also involved in the BioExcel project. His research focused on new methods and protocols for integrative modelling, such as cryo-electron microscopy integration in the HADIC micromolecular docking package. As of October 2016, he joined the Schrodinger Inc. as a post-doctoral associate working on room temperature crystallography modelling in collaboration with Stanford University and University of California, San Francisco. So thank you very much, Guido, for joining us today. I'm now going to open your microphone and invite you to present. So I will hand over control to you, Guido, so hopefully you can take it from here. All right. Thanks very much for the introduction, Adam. And thanks for BioExcel for giving me the opportunity to show some of my software that I've developed. I'm now going to full screen. Is it working? All right. So the title of the talk is robust solutions for cryo-end fitting and the visualisation of interaction space, and it boils down to that I'm going to discuss two different software packages. One is PowerFit, which is geared towards cryo-electron microscopy, and the other one is Disvis, which is more used for cross links, for example, from mass spectrometry, but it works in general for any kinds of biophysical data that you can translate into distance restraints. A small note at the moment, I work for Schrodinger, so they have proprietary software, but the PowerFit and Disvis software have been developed during my stay at Utrecht, and they're all open source, so you can just download them from GitHub. Let's first discuss the PowerFit software. So you can just download that from, like I said, from the GitHub page, which is at the bottom of this slide. And we also provide a web server of which Mikael and Jörg will talk more about later on. So PowerFit is geared towards cryo-electron microscopy. This might be a bit super fluid, but the main principle behind cryo-electron microscopy is that you have these 2D projections, which you sort of cut out and you make to the class averages. From these class averages, you can using common lines in Fourier space, you can back-transform into your original three-dimensional density. Usually you see these isosurfaces, which you can see at the right top of the slide. But actually what we're actually really dealing with is sort of the three-dimensional image, where we go through it slice by slice, as you can see here at the right bottom. So that's sort of really the data that we're working with. Now, even though there have been major responses in cryo-electron microscopy in these few years, building structures have been an issue. So without only looking or with only looking at the density, it's still difficult and a lot of work. So what people usually do is they combine high-resolution structures which have been obtained either by X-ray crystallography and MR or homology modeling, and that's what they combine with cryo-AM data. And typically the first step to proceed is through rigid-body structure fitting in the density. This gives them a sort of local atomic interpretation of the density. This is even done at very high resolutions, like I think the record is now a bit below two angstrom per single particle, and even there the first step is just rigid-body structure fitting. And this is where power fit actually comes in. This is where it fills sort of the gap, where it does it automatically. And after you've done that, you can perform real space refinement or manual refinement typically done with COD. But I'm not going to discuss that, so it's really going to be about the rigid-body fitting. And this is important because it usually constitutes the first step. This needs to be done automatically, preferably, because you need to have like an objective measure to fit your structure in the density, because usually people think they see something, but it needs to be also like an objective measure to see whether this is really like a good fit. Another way is sometimes the program can see things where you should fit it well, a human cannot see it. So it's sort of both ways. It works both ways. So the approach that power fit is taking is in a way extremely simple and very basic. So what it does is if you have your initial structure that you want to fit, for example, here is a case GA, and to the right you see the ribosome density at 13-inch resolution. It performs a six-dimensional exhaustive search. So again, it treats the unit as a rigid-body, and it tries to fit the subunit in the density at every location in the map. Then it rotates the subunit and the procedure starts all over. So it just performs a full six-dimensional search of the three translational degrees of freedom and the three rotational degrees of freedom. So what is typically done is this subunit is transformed into a density, and the density is then cross-correlated with the target map with the ribosome. And there's like always, I guess, there's like two problems. One is the sensitivity, because even though we're using the cross-correlation, there's still a whole sort of class of cross-correlation scores that you can use. So you need to sort of optimize that and see what works, what doesn't work, and with all kinds of filters. And the other thing is speed, because it is a six-dimensional search, so the sampling will take quite some time actually. So we try to optimize both of them in power fit, and I'm first going to talk about increasing the sensitivity of the cross-correlation score. So the first thing, which was, I think, applied in 2002 by Chacon, is the Laplace pre-filter. What it does is it transforms your map by just calculating the sort of second derivative at every point in the map. The effect of this is that it enhances the edges. So if you see to the left here, you see a figure of some flowers in black and white, or in gray image, and if you then apply the Laplace filter, then you see that these edges are sort of enhanced. So this is then a two-dimensional example of a real-life object to see what the effects are. But if you look at it at cryo-electro-microscopy data, to the left you have the original data, and to the right you have the thing where the Laplace filter has been applied to. You see that the contrast has been increased, also the noise, so we need to take care that the noise is not taking too aggressive forms. But this is one thing which has been shown already to really increase the sensitivity of the cross-correlation score. If you just calculate the cross-correlation score between the Laplace filter of your original model map and of the target map, then there's another issue, and that's the issue with overlapping neighboring densities. So imagine that you have a big density map consisting of many chains, and you want to fit each chain individually in the map. Then if there's neighboring structures, then there will be some overlapping density which will cause systemic noise. So to the left here you see like two structures together with their density at a certain resolution. There they're not overlapping, but if you look here to the right when they're really neighboring, you see that at the border here there's overlapping density, so there's some some noise there which decreases then the sensitivity of the cross-correlation score in general, because yeah again there's noise to it. In order to minimize this we can make sure that we sort of put the emphasis of the cross-correlation score on the core of your search object, because that means there's the least amount of systemic noise from neighboring densities, and so we sort of expect that to be to increase the sensitivity of your scoring function. So this is sort of how it works. Imagine we have like a two-dimensional object here it's a ball, so one is like a density which has density, and if it's empty it means that there's nothing. What we do is we erode it as it is called, which means we remove the outer layer of ones, and we keep eroding this structure until it's sort of empty. If we erode the third grid here then it will be empty, and after that we sum all these grids together, and what you see then is that the core has like an uplifted value, well the further away you go from the center or from the core, the lower weight it will have, and this you can build in into the cross-correlation score to sort of uplift the impact of your internal of the of the core voxels. This approach was actually implemented I think in 2003 by a paper of Wu, but we also implemented it in PowerFit and we mixed it together with all the other cross-correlation things that have been happening there. So these are the two major things which increased the sensitivity of PowerFit, and I will show you later on a graph of a plot of how sensitive it is actually. So those are the two things that have been implemented on the sensitivity side, and then we also needed to take care of the speed of the search because it is computationally pretty demanding to an extent. So the major thing is that you can calculate these cross-correlation scores with using fast Fourier transforms, and to the left here you see just the equations, I won't go through them, but this reduces the computational complexity of calculating all the cross-correlations. The next thing is that we also use optimized rotation sets, so we use the least amount of rotations that is possible in order to scan a certain density of rotation space. This sounds, this is in reality more complicated than you would naively think it is, so we use optimized rotation sets. We try to minimize the size of the target by re-sampling and trimming the target, for example here you see a cross-section of the grow yield complex, and each square here consists of eight by eight voxels, so we can re-sample it because it was over-sampled and we also trimmed the data, so originally we have like a very big grid, but we end up with a very small grid which heavily speeds up the search well as almost a negligible impact on the sensitivity. And finally, and also Mika is going to talk more about that in here, we also transferred the code to be GPU accelerated, so it works both on multi-processor systems, but if you have a GPU with open CL, so it doesn't matter whether you have NVIDIA or AMD, you can run it on there. So some sort of heuristic kind of examples where we apply it, power of it on. So a typical example to first apply it on is the grow yield grow yield system, usually you see this 23-angstrom map to the left, and what you can do with power of it is for example if you have the trans sub-unit here, you can fit it in a density and it will find all stop seven solutions there. The same for the cis part, but if we go for the grow yield sort of helmet or bottom or lid, the grow yield lid, we need to use the whole lid because at the 23-angstrom resolution there's just not enough information to fit each sub-unit independently. However, if you use like a 9-angstrom map which is also available at the EMDB, you can fit one of these grow yield sub-units independently, so you can see there's more information there, so you can fit a small sub-unit in the density, you can see then the impact of the resolution. To be honest, I'm never too impressed if things work on the grow yield grow yield system because almost all algorithms I've thrown at it works there, but it fails at different more, typically more difficult targets, for example the ribosome. So I also applied it to some, to two ribosome examples early on, so we can fit this RSGA structure here in the density in a 10-angstrom resolution ribosome map and the same for this KSGA structure here at 13-angstrom. But this is of course just very heuristically, so we wanted to explore actually the limits of rigid body fitting and see whether we can detect whether the top fit really is like substantially better than the next fit, because since power fit is just in a way a stupid computer program, it will always give you a solution, but you don't know whether you can trust the solution. So in order to sort of discover that, we tried to apply it to many cases. So we downloaded from the MDB5 semi-high resolution ribosome maps in the range of 5.5 to 7-angstrom and what we did was we tried to fit each sub-unit that was fitted in the ribosome map, in the ribosome, in its respective map with power fit. So there were in total 379 sub-units and in order to get the impact of the resolution we sort of down-filtered the original ribosome experimental density all the way down to 30 angstrom. So what you get is the following. So we have seen here four scores. The blue one is the regular local cross-correlation, that's sort of the base core. The green is if we apply also the correlated procedure as I have introduced. The LLCC is if you also apply the Laplace pre-filter before it and the yellow one is if you apply the Laplace pre-filter and the correlated approach together with the cross-correlation. On the y-axis we see the success rate of the 379 sub-units and on the y-axis we see the resolution of the map. As you can see the local cross-correlation sort of the base core is able to fit about like 75 percent of the cases and it falls down rapidly and it's almost like 5 percent or 10 percent at 15 angstrom resolution. If we use the correlated approach things get a bit better like it shifts one or two angstrom to the right which is good so the success rate is going higher. It's increasing and also if we apply the only the Laplace filter it gets substantially better so this is by far the biggest impact. But again if we combine it together with the correlated procedure and the local cross-correlation then the sensitivity increases even further. So far to my knowledge this has been the most sensitive scoring function out there so far. Then the next question of course is okay so here I only looked at the number one best solution that PowerFit gave me but can I also check whether this top solution actually makes sense? Can I deduce from some kind of metric or some kind of objective score whether the top solution can be trusted or not? And in order for this we applied the Fisher-Z transformation which is a simple transformation of your cross-correlation coefficient. Here's the formula so you just take sort of the log of one plus your cross-correlation divided by minus your cross-correlation. So the reason why you want to do this is because you can then provide confidence intervals on this Z-score so you can provide a confidence interval a sort of error measure on your cross-correlation and this gives you sort of sigma which is normally distributed so this sigma you can back calculate it in a sort of probability density. This was introduced already in 1921 by Fisher that's why the Fisher-Z transformation and Fogman introduced it in CROW EM. So if we then look at the top solution that PowerFit is giving and the second best solution and we calculate the difference in sigma in sigma between those two scores we get the following on our true positive rate. So here on the x-axis you see the sigma difference between again the top score and the second best score and on the y-axis you see whether the top solution is actually really the correct solution and this is very interesting because you can see that if there's a big difference in sigma between your one between the top score and the second best score like around two or higher we almost have like a hundred percent chance that this is indeed like the correct solution indeed. This goes nicely down if you go to like 0.25 sigma so if you do a PowerFit analysis if you do some rigid body fitting and you want to see is my top solution can it be trusted just look at the sigma value of your top solution and compare it to the second best sigma values and if this difference is let's say higher than two then you should be very confident in your in your fitting approach or in the result. So this is actually a major thing because you're always looking for objective measures to validate your approach and here you can see that there are actually objective measures which can validate it. That brings me to the end of the of the PowerFit section so again it's just a simple software which performs a rigid body fitting procedure but with it with optimized sensitivity and and speed but also providing robust indicators for reliability measures. Now the second software is this which is actually toggles a totally different kind of problem but it uses the same procedures or same algorithms as PowerFit actually. So when I came up with the idea of this this I was looking at a case for which we had like two two structures and there were cross links available between them. So cross links you can get them from mass spectrometry and they sort of define a distance between two residues. So at these two structures together with the with the cross links and I was thinking like okay are these actually consistent anyway like are there any solutions out there can I can think of configuration of the complex where these cross links are actually consistent with each other. First I wanted to so in a way that the question I wanted to answer is like given two interacting structures and a set of distance constraints between them are there any solutions that satisfy all constraints or in general end constraints. I first wanted to solve this analytically but I wasn't really managing and I don't know whether it's possible but after introducing or working with PowerFit I actually had the tools to approach this numerically. So again what we do with this is we make a shape out of the receptor and we divide it into a core region which is in blue and the interaction region which is in gray and then for the ligand or the scanning chain in a way we call that the ligand we only make the core region so this is sort of defined by the Van der Waals radius so in here it's sort of clashing and what we can do then is just use the same algorithms in PowerFit with Fourier transform and a full six-dimensional search to just sample billions of complexes. So if this core region is overlapping with only the gray region it means that the two structures are interacting. If the core region of the ligand is interacting with the core region of the receptor it means that it's clashing. So we define here a complex as a sort of configuration where there is interaction between the receptor and the ligand but there is no or hardly any clashes between the two of those. Again we do then a fine six-dimensional search and this results in billions of complexes that we sample. I will show you some numbers later on. So the whole procedure in this is actually based on just counting so we sample all the billions of complexes possible complexes and for each complex we just count how many of the constraints or the cross links are satisfied so we just count and we just get numbers. So that's why you get this boring table but it gives you information. So what you see here on the left column you see the number of consistent restraints or constraints I use those terms interchangeably here and in the second column you see how many of those how many sampled complexes are found for a certain number of consistent restraints and to the right you see the fraction so it's divided by all the complexes sampled. So interesting numbers here if you start on top so we look at complexes which are consistent with zero restraints or more and that just means all the complexes we have found and that's this big number which is I think 19 billion and that we set to one and then we just move on all the way to the bottom of the table so we look at complexes which are consistent with all eight of the restraints that we that we garnet. However we see then that there are zero complexes found and that means that at least one of the restraints are in your set is actually a false positive. Then we can go on a bit further we look at complexes consistent with seven of the eight restraints and then this finds about like 10,000 of them. That doesn't mean that all of those seven restraints are non-false positives that are all true positive restraints it just means that at least there are complexes out there which are consistent with with seven of them. So we know now that the whole set of eight restraints is not fully consistent so the next step is of course to sort of see which one of those restraints is actually the false positive. So we're then exploring the data consistency we're trying to detect false positive constraints. So we start again as what we did for the initial initial step of this list. So we again systematically sampled all the complexes. We count for each complex how many constraints are satisfied there are consistent and then the next step is for each complex consistent with n constraints we count how often a specific constraint is violated. So we just check in a set of for example all complexes consistent with all constraints or all minus one we check which one of that constraint is then violated. So then we get and we normalize over all the complexes so we get a fraction instead of a big number because that's that's more insightful. So again we get even more numbers I will I will again help you go through the table. So the first column again to the left here is the number of consistent restraints. So for example if we look here at the eighth column we look at complexes consistent with eight restraints and the other columns are how often this restraint a certain restraint has been violated in all the complexes consistent with all restraints. So if we look at the first at the bottom row of complex consistent with eight restraints we see that all the restraints violations are set to zero which is which makes perfect sense because if we're looking at all complexes consistent with all restraints it means that none of the restraints are violated so all of them are by definition zero. However in this case it's a bit more complicated because we didn't find any complexes consistent with eight restraints but we decided just by definition to put them at zero anyway. So no matter what you run the bottom row will always be zero. Then we go one row up and we look at all complexes consistent with seven restraints for those we found we found like ten thousand and what we see then is that all restraints are never violated so restraints one until seven are not violated but restraint number eight as is like violated in yeah in 100% of all cases. So that's great because we know that there's one false positive in this set and this means that restraint eight is actually the false positive. We should continue with this so restraint eight is false so we put a red circle around it. Then we look at all complexes consistent with at least six restraints. We see here that most restraints are never violated in that set but restraint seven is now suddenly violated in 99.7% of the of the cases so that's like a huge amount so it's very fishy in a way. So restraint seven is something to to keep an eye on. So we mark that sort of yellow and we go one one more up until we feel that we're safe that we can trust all all the restraints. So we look at restraints one through six and the numbers are still not super high. You could argue that for restraint four for which there's no 37% violation you could argue that that's high but compared to the 94% and the 100% for restraints seven and eight I think it's more trustworthy than the others. And this is actually a great because what I did with the eight restraints was I used six experimentally determined restraints which were all consistent with the data and I added also two false positives and these were actually restraints seven and eight. So with eight we are sort of guaranteed that it's a false positive but with restraint seven it's a bit more difficult and you have to sort of just trust a bit in it that this is that this is at least you know that it's fishy that you need to look out for restraint number seven. We can also count other things for example we can explore which residues are mostly accessed so it starts again with the two procedures as always just systematically sample billions of complexes count through each complex how many constraints are satisfied and then for each complex count how often a specific residue interacts and we'd look then at again at complexes consistent with for example all restraints or all minus one and again we just normalize this over all counted complexes and then we get a term which is called the average interactions per complex for each residue we get this and then we sort of postulate that probably residues that for many interactions are more likely to be at the interface for example we apply this on the case that I showed you before and the red parts here are the residues that are often that had a high accessible a high average interactions per complex ratio and these are actually also almost all interface residues we need to quantify this a bit more but so far it all looks very good so that's the part of sort of quant the quantization of of the information content and see how consistent is our data what can we extract from the distance restraints but another thing is like a question as in where can the ligand be found for complexes consistent with any restraints so where in space is the ligand presiding if I have these if I have these distance restraints or constraints and for this we have made a sort of like density grid which you can visualize either in pymol or you see is a chimera which is a discrete density and you can increase or decrease the value of this density which is and this shows you sort of the region where the center of mass of the ligand is for to be consistent with a certain number of restraints so what you see here to the left in this gray area this is where the center of mass of the ligand can be and if it's to be consistent with the six original experimental constraints the orange center here is actually the real center of mass in the complex so you can see that it nicely falls inside this gray area so this gives you a region in space where the where the center of mass of the ligand can be you use all orientation information here sure you don't know what the orientation of the ligand is but at least you know that the center of mass should be there then the last question that you can ask is in yeah what what space does the ligand most likely occupy and then more specifically for complexes with end restraints so where in space so what what the space is mostly occupied by the ligand this gives you then sort of an average shape kind of information and what you get out of is then if we look at this occupancy analysis for complexes consistent with all the six restraints we get this sort of continuous density again and what this density means is for example if you put the isocontour on 25 percent or 0.25 it means that this space here which is enclosed by this gray volume has been occupied by the ligand in 25 percent of all the cases of all the complexes consistent with all with all six restraints with all true positive restraints of course if you reduce this isocontour value for example 10 percent is the volume the space occupied by the ligand becomes bigger in a way and what is what this density means is that in 10 percent of all the complexes that are consistent with all six restraints this is where the where the ligand presides so it really gives you like an average shape so is this helpful like is it sort of truth or what it gives you so what you see here in orange is actually the original the the structure as it is that the the complex I can see if you look at the 25 percent isocontour level that this shape is sort of matching in a way and the structure of the complex which is nice because it gives you that there's low resolution information of where the ligand can be found and the most likely place of where the ligand can be found for complexes consistent with for example all restraints for all minus one or in whatever number you are interested and that brings me to the conclusion so I've showed you power of it which is a which combines speed sensitivity and reliability for rigid body fitting a very important step in high resolution modeling in cry-m data and I also introduced you with this this for some more explorative modeling for determining the accessible interaction space the set of all consistent complexes and how you can visualize and quantify many of these of these sort of parameters and that's it from my part so yes perfect yes thank you very much indeed and so we will have some time for some questions at the end you're welcome to type your questions into the question box in the go-to webinar control panel at any time but we'll I'll take them at the end once and once we finish with the presentations today so thank you Gido I'm now going to hand over to my colleague Mikhail who's going to give you a demonstration of some of these things okay maybe you can yeah share the screen yes I think oh sorry that I okay that's okay in the meantime I can I can quickly introduce ourselves so I'm Mikhail today and I'm sitting together with with your trust because we both work in in a group of alexandroma here at Utrecht University we work on a web portal but because of this reason for it so after this nicer overview of the two software we're going a bit into new interfaces we developed to to use them in a what we think user friendly way so one of the one of the one of the main advantage of this new website that have been up for a bit more than six months now that they use some GPU resources provided by the European grid infrastructure and as Gido already said the GPU computation is quite poor over the normal CPU usage could have on a regular computer so we made some some numbers quickly to to compare that so when you use a local cluster made of eight CPUs you can see that you get an average time for instance for the arena polymerized using this piece of with a complete complete scan of the of the space of of 266 minutes on when you go on the GPU you reduce that on your speed of six times by six time on you reach this 45 minutes speed which is kind of of of significant but that's one of the main advantage of the of the website web server that they are using this GPU calculation provided by the API to technical details of the of the implementation of the web server just quickly over the main steps so of course it starts by a web form that a user has has to submit after after a very short step of registration so once you submit a web form so that if your disease or a poor fit job we make some quick validation on the on the web server site the user credential but also the field value are checked on if anything is wrong we directly output to the user that's something went went wrong of course on that you can just correct the few mistakes you might have made if a job successfully go through this validation step we have a preprocessing step that involves some packaging of the input file still on the web server site and once this step is done on is done we can we can go to the master node which is our interface between the working nodes on the on the web server and basically you have two solutions there so you either submit to local nodes that we have here on use on CPU computational time or under the second option that the one we want to highlight you can submit to agreement so basically to to one of the GPU cluster provided by the EGI that that's on some GPU and to to run a gbz on poor fit on these working nodes we're using docker sometimes of some docker containers that that are some installation of this is on poor why do we use docker containers because first we don't have we don't have complete control of the of the GPU cluster they can be all around the world Europe at least sometimes the world so we don't have any right there so we cannot really control what what is installed so the docker solution is quite only because you just have to actually download a docker container and then run it and then when it's done when either this is our core of the job is done you just retrieve your data you let you let the GPU cluster quite key so that's something quite only on it also allows very quick updates of the software if needed for instance when we we need to to catch up with the with the GPU driver we can easily use the the docker the docker container they are of course available for users and you can find them on any code that looks out so that's something you can use on your on your side so once a job come back from a from a working on a master node we we packet the output files on one of the on few of the files sorry of the output file are processed on our local cluster by camera to do some image generation you see that that's that allows us in the results page to have some some visualization of the results that's something you don't get automatically on a on a look on a standalone usage of this result or if it and then in parallel and at the same time you have a post processing that is also done of all the raw values output by this is on profit on the idea there to format these results in order to to nicely prevent them to the users and it's still done on the on the web service if you have a complete results page that takes all the output of this is on profit on nicely format on the premium them to the users so I want to be quickly on all these steps now we can we can see a bit more details how it goes in real life let's say so I live in New York explaining you through a live demo how we we can write this is important okay so both web servers are hosted here in Utrecht so they are available via the main website of the group which is head up that science dot u dot and l and you see up here there's this was in profit available you can directly go to the main page and on those main pages you just get a basic description of the server you can see a comparison of the runtimes on the grid and on the cloud resources of a default job based on the settings you also see the references and down here you also have the GitHub repository if you want to download this software in order to run a job you should first register because we have to keep track of who's submitting to the especially to the cloud resources because this is usually reserved for academic users so we need to know who's running the jobs that's why you have to can register but it's usually processed very quickly and once you have your credentials you can go to the submission page and it basically gives you pretty much all the options that this was a command line tool will give you you can give even a specific tag to your run to for later tracking so you know what you actually did for example just give it an example run and then you can provide your input files which is usually for this bus as you know you have a fixed chain which is a part that will be care that is kept fixed then you have the scanning chain and you need the restraints file that is pretty much all the required input for this bus and you could also as submit some extra residues for the interaction analysis and then you have the options to do the quick scanning which is basically very fast analysis which we rough search but you can also do a complete scanning for this the occupancy analysis is done automatically or you go to custom scanning and then you have control over the fine-grained control but you but you can also look at the parameters that are used for complete and quick scanning but you can just adjust it if you use this option then you have to provide the email address and once everything is in you are able to submit the job and then you basically get the run ID you also get the tag you gave it to the run and this page will be updated every 60 seconds until your results are available and you get an also email notification to the email address you register with that with the ID or the link where you can monitor your run and you will also get an email once it the run has been processed successfully if you go back to the main page you have other options to look at you have also a help page which kind of gives you a little bit more detail on the expected input and output and the email address where you can ask for support and there's also a link to a tutorial which goes to the bottom of the web page at the moment we only have a PowerFit tutorial for the command line version and for the web server but this is a tutorial will be online shortly and you can also go to the bio excel support forum where you can ask your questions directly and so that other users can profit profit from the questions that might be common as well and you can also pause maybe that your question of the issue your encounter has already been encountered before so you can check this out okay and on the main page you also have the link to go directly to the GPU accelerator well so as soon as you see and you had the code enabled you know that you're submitting to the GPU resources and not in the local ones and the examples you will get for the results you will get for dismiss so once the processing is complete you will get a page like this so first you see for the example you have a description what the example is about then you have the your results as a target including all the files you would get from this list you can also download the images that we generate for presentation on the web server if you're interested in that and you see the references so for this list we display the accessible interaction space basically at different levels of restraints so in this example this would be the accessible interaction space in the center of mass of the complexes for six restraints and you have six different views of you of the proteins of the space to see where this is located and then you get also the the table which can first you the complexes that are consistent with that number of restraints and we see as in the presentation way we have the same settings that Pito presented so we don't have any complexes consistent with eight restraints and then we also present the c-score that were calculated and in this table you can see you see the the c-score of this and it also highlights the ones that are most likely false positive so on this c-score and get the punitive false positive restraints and you also see the violations how often the messages are violated and this is also highlighted the most likely violated residues or the ones which are most likely false positives and then you also get the interaction analyzes until you also for residues to supply and here you see which are most often in contact to the receptors so basically and with the color coding we kind of try to highlight the important information in case you have one where all restraints are met then you will not get any highlighted highlighting but this will just notify you that there are actually complexes consistent with all restraints so you might not even discard any of those and for power fit the server looks pretty much similar so we have again the landing page with all the information and the links to the quit and the local server and you can go to the submission page which has all the features as well so here you have the default power where parents from power fit you have the you need to provide a map a resolution of the map then you can provide a structure that's supposed to be fitted if you if the structure has multiple chains and you only want to fit one of those at a time you can also provide chain IDs and you can also give it a run attack for later tracking and the rotational sampling interval will kind of determine how fine grain disampling is and this should be 10 should be fine but you can go down to 5 and below 5 it's limited because then disampling will get too big but you also have the option to do some fine for some fine control over the run like disabling the pre-filter removing the core weight scoring function but based on the results speed to show it I guess it's not usually not something you want to do so it's not displayed by default and then you can submit the run and again we we kind of we send the results in a more formative way than the plain text that you will get from power of it and you can download all the output files in the archive you can download the images in our process and you display the best 15 solutions and also show the sigma difference to the right the highlight the sigma difference to the next which either was saying if it's above 3 most likely a true positive or we add two and then for each fit is also by default the best 10 structures are provided from power of it and we provide the images of those fitted structures in with the density map and again in six different solutions and you can also download the pdbs separately if you don't want to download all the run files again there are multiple examples available so here you only have one fit power of it actually only manages to fit one the right one fit of the structure and with this complex you see you have the core l complex and due to symmetry you don't have such a good separation between the symmetric process but you still get a warm top fit okay that's pretty much all from our side so I guess we can go to the questions thank you very much Jörg um that was that was great to see so you can step through so we have one question already in here so and just before I read that one out I'll remind people that you can ask your questions now by typing them into the questions box um so Pradval do you have a microphone if so um I'll invite you to ask your question directly um to the speaker oh I'm just having a look maybe that Pradval has um has dropped out from this session uh so I'm going to read out the question anyway then at least you will get his um answer on the recording so the question he asked and I presume this is um for Brigido he asks uh how to fit atomic detail structure of monomer protein on cryo em map 25 angstrom resolution um of filament of 25 monomeric units and how reliable is it for example cryo em map id okay he's giving some quite specific details here um do you need to know the the details to answer the question so how to fit atomic detail structure of monomer protein on cryo em map at 25 angstrom resolution of filament of 25 monomeric units and how reliable is it right something that can be answered so the reliability we can just you can look at the sigma score right that was given so that that's how you can see whether it's reliable or not uh furthermore to fit it is I don't know for me you you can just use use power fit so um either through the web server or locally and you collect the top 25 solutions or you look at the top 30 solutions and see whether it makes sense um okay so you would assume that all 25 should be uh highly significant compared to to other solutions um so the full atomic data I mean it's converted into a density anyway so as long we we need that full atomic data anyway to to uh fit it in inside the density for example we also tried it once if you only have c alpha atoms uh but then the cross correlation is not calculated nicely so we we really need to have full atomic data structures in order to fit the problem okay thank you very much so um prajwal I hope that answers your question if you're watching this later in the recording if not then this gives me a good opportunity to point people at the fact that we have a question and answer forum called ask.bioxcel.eu if you go there you can um uh you can ask any other follow-up questions that occur to you um so from the people in the room just now do we have any other questions either about the software itself or the web-based services that are offered to access it okay I maybe have um one question then just while there's a or where we can see if we get any others in from the floor um so Kido I was just interested to know and before the the web-based service was available do did you have a any way to act to access how many people were using your your software and do you think having the web-based version has broadened access to the to the code uh definitely I mean we could see how many people downloaded for example on on github you can see how often your repository has been cloned um so there were several that that cloned the code but since the web server interface I think it has increased a lot so I guess it just made it more accessible and people don't have to deal with the installation um the results are also displayed uh it makes more sense how the results are displayed so it's definitely an add-on to to uh to making your software more available definitely okay thank you for that Kido and so uh I think we have one question in now from um um Elizabeth uh and Elizabeth um do you have a a microphone I'm going to try and unmute you and you can ask your question directly if so and Elizabeth okay no microphones I will read out the question in that case so the question is um as follows uh she says thank you for these clear and interesting lectures I have a question about power fit does the software accept pdb with unk residues or missing residues yes definitely I mean as long as as the elements are known it just works there's no no concept as connectivity so you just it's more you should see that your structures you should look at it more as a collection of atoms so whether these are connected or whether uh whether there's residues or new residues or known residues it's all in there it doesn't matter as long as it's just a set of atoms it should definitely work um again we do in order to to make to make sense of the cross correlation function you do need to have an all atom view of it again if you only have like c alpha atoms it's probably not going to work um but if residues or unknown new things doesn't matter as long as it's just an element um we can deal with it okay thank you uh and Elizabeth I hope that answers your question and yes she says that she'll she'll test that out soon that's great okay well we're reaching the top of the hour now so unless we have any other questions and as you last chance to type them into the question box then all that remains to me to do is to say thank you very much to all of our presenters today and to remind you that if you have any follow-up questions you can post them at ask.bioexcel.eu and we will have more webinars coming up and in the next few weeks we don't have a precise date yet for the next one but i'd invite you to keep your eyes open at bioexcel.eu slash webinars where you'll be able to find out about everything that's going on and if you're interested in this kind of work please do sign up to our interest groups as well thank you all very much for coming along today and I hope you found that useful and do keep in touch with what bioexcel is doing thanks