 Good afternoon everybody. Welcome to today's edition of the BioXL webinar series. Today we're going to hear about furthering our understanding of antibody structure space, and this comes from the Pistori Alliance Advanced Project. And we have a couple of presenters for you today, Sebastian Kalm from UCB and Richard Norman from Pistori Alliance. And I am your host Ian Harrow. Using our long, hopefully. This is to remind you that this webinar is being recorded. First I'll start with a short overview of BioXL. BioXL is a centre of excellence for computational biomolecular research funded by Horizon 2020. We have three pillars of excellence. Excellence in biomolecular software, which focuses on our three main codes for GROMACS, HADEC and CPMD. Our second pillar is excellence in usability, where we devise efficient workflow environments with associated data integration. And on this slide are some of the workflow platforms that we support. And then finally excellence in consultancy and training, where we promote best practices and train end users. So just to point out, we have a number of interest groups as part of BioXL, covering a number of relevant aspects of computational biomolecular research. And these are listed on this slide. And we also host a number of support platforms, which are also on this slide. So for this webinar today, we will have an audience Q&A session at the end of this webinar. So for this, you should make use of the questions panel that you will find. So please enter your questions as we go along during the webinar that you may have. And the presenters will try to answer your questions during the Q&A session at the end. So let's just turn to our today's presenters. So I've mentioned we have Dr. Sebastian Kalmer, UCB. Sebastian has been working at UCB since 2013. He holds a position as principal scientist in the computer-aided drug design group, where he supports the development of new medicines. And his specialty is in creation and application of software to model 3D structures and interactions of proteins with particular interest in antibodies, which we're going to hear about today, and membrane proteins. Our second presenter is Richard Norman of the Pistora Alliance. Richard manages the Pistora Alliance Advanced Project, which you're going to hear about. Richard manages his consulting business. And he's been providing services to the pharmaceutical, biotechnology and related life science industries. And he's had over 15 years of experience working in the biotech and pharma-suitical industries, supporting discovery and development of new drugs, diagnostics and neutrosomes. So without further ado, let's turn to the presentation and our first presenter. Over to you, Sebastian. Thanks again. Hi, everyone. My name is Sebastian. I will be presenting the first part of the talk. And before I move on, I would like to stress that actually a lot of the work I will be showing was done by Dr. Konrad Kravchik, who is a post-doc at Professor Charlene's group in Oxford, the Oxford Protein Informatics Group. And towards the end, I will hand over to Richard. Okay, go ahead. Next slide. So here's the overview of my, well, actually of the whole talk. So first, I will give a very brief introduction to antibody structural biology. I should say I'm not an antibody biologist. I'm a computational person, but I will aim to give a very brief introduction for those of us who aren't working in the antibody field. I will then talk about the impact of 3D structural information on therapeutic antibody projects, specifically with three examples at UCB, the company I'm working for. And then I will give a comment of the application of 3D modeling, and especially with a view towards large-scale applications to next-generation sequencing data sets, which can have millions of sequences. And then move on to the problem of being able to model all of those millions of sequences and essentially figuring out where the gaps in our knowledge are in terms of our ability to model. And then I will hand over to Richard, who will present the Pistol Alliance Advanced Project, which aims to address some of the gaps in our knowledge in this field. Okay, next slide. Ian, can you? Yeah, thanks. Okay, so introduction to antibodies. I will mention two types of antibodies in this talk, so I'm concentrating on these. IgGs are your very typical antibodies that we usually talk about and look at when we look at antibody structure. You can see the rough domain makeup here on the left with three constant domains in the heavy chain with a variable domain at the top and a light chain with the constant domain and the variable domain, two copies of each of these make up the antibody. The IgGs are essentially the largest part of your adaptive immune response and are secreted in the blood as a response to invasion by pathogens. The IgMs are a different version, essentially a different isotype of antibodies, which when in the blood appear as this type of pentamer, you can see there, otherwise quite similar in shape to the IgGs. And they are essentially an earlier version of antibodies, which is prevalent mainly immediately after invasion by pathogens. And in the blood they are present as these pentamers, but they're also present as a B cell surface receptor, so as a single copy of an antibody on the B cell surface. Next slide. If we look at these antibody structures as a just slightly more realistic version, this is still a cartoon representation of course. Then here on the left you have an IgG molecule with heavy chain, two copies of the heavy chain and bluish colors and light chains and greenish colors. In the middle you have the variable region zoomed in, which has six loop structures here highlighted, which form the complementarity determining region loops, which as the name suggests determine the complementarity with the antigen, so they determine what protein an antibody binds to. And that's where most of the sequence variability is as well in these antibodies. On the right side you have such a V region of an antibody bound to an antigen, with on the left side in red highlighted the peritope, which is essentially the union of all of these complementarity determining regions. And it is essentially those parts of those CDR loops that interact with the antigen. And on the antigen side you have the epitope, which is the residues on the antigen that interact with the antibody. Next slide. Now we have a huge number of potential antibodies in our body. A body can make a huge variety of antibody sequences that are potentially complementary to a huge variety of antigens. Now part of this diversity comes from the way these antibody sequences are created when the B cell matures. The genome of a B cell contains many copies of different segments of genes here, V gene segments, D gene segments, J gene segments, which are somewhat redundant and essentially to create a single B cell genome, the cell picks and chooses from these segments to form a close to unique final sequence of an antibody essentially, which is then transcribed and translated into protein and ends up as a B cell surface receptor, IgM antibody on the B cell surface. When the B cell then class switches to produce IgGs later on, it will further change the sequence via hypermutation of the variable region, which is focused on the CDR loops and create even more different potential sequences and therefore potentially closer binding to the antigen of interest that's invading the body and much more specific binding as well. Next slide please. So moving on to the importance of 3D structure. So here's a first example of a drug discovery project from the past, which is kind of a classical way of doing things. Initially there was no involvement of 3D structure in the creation of this antibody. Instead animals were immunized with an antigen and then B cells were isolated from the animals and they were screened for antibodies that bound to the antigen of interest and through a large screening campaign an antibody was identified that had desirable properties bound to the antigen strongly enough and then a crystal structure was obtained of this antibody bound to the antigen, which you can see on the right here, it's been published. The antibody is called Olocuzumab, binds to IL-6 and as the crystal structure showed, it actually binds roughly in the same place where IL-6 interacts with its co-receptor GP-130 and therefore binding of the antibody displaces this receptor and that's how the antibody works. So the crystal structure here wasn't used to create the antibody but it was used to rationalize how it worked, how it performed its function and this was vital to obtain a patent and essentially apply for approval of the antibody and move it towards the clinic. Next slide please. A different approach here which is also becoming fairly common now is to start from an existing antibody which may have been identified using the screening approach I just talked about or phage display or something else and then trying to engineer it in silico, so on the computer to improve its binding affinity to an antigen. At UCB we have a tool which produces pictures such as the one on the right where given a crystal structure of the antibody bound to the antigen we can essentially scan the surface of the antigen and generate these contact preference clouds which tell you which kinds of atoms should be in the vicinity of each atom on the antigen surface in order to essentially satisfy that antigen surface. So we can then look at the overlap between the antibody sidechain structures and these contact preference clouds and see where we can improve this overlap. Essentially for example here at the bottom mutating this 3-anine to glutamate would move the sidechain into this contact preference cloud where in this case it would satisfy this and essentially generate a favorable score. And of course once you've created a design in silico you then need to go to the lab and test usually not just one but a series of designs and hopefully get some that improve affinity. In this case we had an antibody which bound to one particular isotype of a protein and we wanted to make it bind to a second isotype of the same protein and through engineering in this fashion we improved the affinity to that second isotype by 200-fold and ended up with a biospecific antibody which using the same paratope so the same antibody CDR loops bound to two different versions of the same protein. Next slide please. Hold on, go on back. There we go, de novo design. So the third example is de novo design so here we start from nothing essentially we do not start from an antibody. We have only the antigen of interest and we look for natural binding partners of this antigen that we have the structure of. So here we had a structure of the protein called KEEP1 binding to a peptide fragment of another protein called NRF2 and by taking a few key hotspot residues on the surface of NRF2 that interacted with KEEP1 we then transplanted these residues onto an antibody framework and further affinity matured the antibody sequence to further complementarity between the antibody and the antigen here and ended up with an antibody that bound an animal or affinities to KEEP1. This also has been published as you can see at the bottom here. Of course again usually it's not a one-shot success you have to test various designs and you might even do things like phage display to further explore various options and variations of your designs in order to finally identify your final binder. Okay next slide. So I've talked about use of structures and now I'm going to talk about modeling. So of course if you can obtain a crystal structure of your antibody that's great that is probably what you want if you can get it within the time frame that you need it at. However it's not always possible to do this quickly and it's not always possible to do this on a large scale which is when you need to do modeling. So our collaborators in Professor Charles Dean's group have made a tool called a bodybuilder which Richard will mention again later on and the great advantage of that tool is that it's well at least as accurate as everything else but also it is extremely fast. You can make a model in 30 seconds and it will tell you when the model is likely to be reliable and when parts of the model are not so reliable and that is essential because if we're relying on these models then essentially if we're having to do a lot of wet lab work to test the assumptions based on this model then that's a big investment and if we already know the model isn't reliable then we can just skip that part and not bother with that model. Now we're more and more moving towards large scale approaches where alongside for example an antibody immunization campaign we create these large sequence data sets. We do high throughput sequencing of millions of antibodies alongside the immunization campaign and ideally what we want is to use that information to improve the way that we discover antibodies. Now so far this field is still somewhat in its infancy and so far people have tried a lot of things that are mainly sequence based to identify interesting antibodies however what we really like is to superimpose on top of this structural information and with a body builder this should be possible in theory given that it's so fast and we can tell which models should be accurate. Next slide please. So the kinds of things we can do here that maybe we couldn't do without structural information is exemplified in the picture here where we have two pairs of sequences of antibody CDRH3 loops. In one case the sequences are extremely similar in the bottom case 88% identity however if you look on the picture B here their shapes are actually quite different exemplified by the RMSD of 2.4 here and the other case which is A you have two sequences which are quite different however the loops have the same shape and this is in the PDB you can look at this yourself. So if we can model the structures reliably then we can start to pick these cases apart where before we couldn't even tell that these existed so we would have completely made opposite decisions of what I've just talked about. So essentially we could start to identify convergence on common shapes rather than just looking at sequence. We could avoid these shapes changing mutations that I've just mentioned and on top of this we could start trying to predict additional properties of antibodies on top of this structural information for example biophysical or developability characteristics as well as actual structural complementarity with the epitope of interest on an antigen. So there's a lot of potential in this but the main question now that I'm trying to answer is well if we can only model 10% of our sequences then maybe this kind of approach isn't that useful yet but if we can model a large percentage of sequences then we can actually start to work with this so how big is this proportion of modulable sequences versus the unmodulable sequences? Next slide please. Okay so millions of antibody sequences we have about 3,000 known antibody structures. Next slide. How big is or how does this huge gap affect our ability to model antibody structure? Next slide please. So this is the work with Conrad Krajcik. It is currently under review. We started from a dataset at UCB which contained 5 million heavy chains and 8 million light chains. These were IgM antibodies so we like to call this the naive human dataset from about 500 human donors and alongside this the Oxford guys also repeated this analysis on various immunized datasets which contain about 36 million sequences in total and we looked at our ability to model all these antibody sequences either the whole sequence or just the framework region or each individual CDR by itself and on the right I've mentioned some of the tools used to do these. The main ones to notice are the structural antibody database and the loop modeling tool called FREED. Next slide please. So the first question to answer is how well can we model antibody frameworks? Intuitively we should be quite good at this but how good are we? So we take every, so first we look at the protein data bank. We look at every antibody, non-redundant antibody structure in the protein data bank and look at every possible pair within that dataset and we record sequence identity between any two antibodies as well as the structural difference between these two antibodies over the framework region and we do this independently for a heavy chain and light chain and then we draw a graph. So here we have these two antibodies which looks a bit like this. So here the blue lines show you this interdependency between sequence identity and structural difference. Sequence identity on the x-axis, structural difference on the y-axis here on the left. So you can see at 80% sequence identity somewhere between 0.8 and 0.9 angstroms, RMSD over the whole framework region which is pretty good. And then we also look at our NGS sequence dataset and for each sequence in that dataset we search the PDB for the closest template antibody structure. So the most sequence identical antibody in the protein data bank and we add those up and draw this histogram in pink. So therefore if you look at this line at 80% identity everything to the right of this line makes up about 99% of the sequences of antibody frameworks in our NGS dataset which means if you go to the next slide that for 99% of the framework sequences we can create a model that's better than one angstrom RMSD. And the same graph that's for the heavy chains on the right side is for the light chains on the left side it's roughly the same trend. So we are very good at modeling frameworks and most frameworks, 99% frameworks we can model. Now we're going to look at the CDR region. So again we go through our NGS dataset for each sequence in the dataset we consider each CDR sequence individually. And we look at, we try to find loops within the protein data bank that we have the structure for that have either identical sequence or that have a close enough sequence so that we can model it using our freed knowledge-based loop prediction algorithm. And next slide please. These are the numbers that we come up with. So in summary essentially most of the CDRs are very modelable. For all of the CDRs except for CDRH3 we can model at least 98% of the cases. For CDRH3 which as we know is the hardest to model we can model about 2 thirds of the naive sequences and nearly half of the immunized sequences in our various datasets. I should say we're probably overestimating our ability to model mainly because none of these analyses consider the effects of heavy light chains pairing because NGS datasets are typically unpaired and are certainly are. So you have the heavy chain by itself and the light chain by itself. And we're also not looking at interactions between CDR loops. But with these caveats these are the numbers. So next slide please. In summary we can model about half of our antibody FE sequences completely which is mainly driven by our inability to model all CDRH3s. So if you're interested in one particular antibody towards a certain antigen and you're trying to model this one antibody then well you have a roughly 50% chance of falling into the set that we can model completely and 50% that we can't model it. In reality that's not quite how it works but face value you have a half half chance. And so there's a large gap in our knowledge that we need to close. Next slide please. So in summary going back to our applications the more structural knowledge we have the better we can model and therefore the higher the impact will be of structural knowledge on our therapeutic development and the better chances that we make good therapeutics essentially. Even in the first case in the first example that I mentioned where no structural knowledge was used to discover the antibody itself today if I had to redo this project I probably would use structural knowledge. I would use modeling and perhaps I would have discovered the antibody much more quickly or perhaps I would have discovered a different one. Simply because of the advent of next generation sequencing and large-scale modeling. Same goes for essentially any knowledge-based approach whether it's the modeling itself or this identification of contact preference clouds or a score to assess a certain interaction between antibodies and antigens all of these are generally knowledge-based and will benefit from having more structural information in the protein data bank. Next slide please. And yeah this is the point where I hand over to Richard. So go ahead Richard. Okay thanks Seb. Great introduction. Great amount of information there in terms of antibodies and how they're used in drug discovery and also of course around our current lack of understanding or knowledge around large parts of antibody structural space. So before I talk about the advanced project the Pisto Alliance Advanced Project I need to say a little bit about the Pisto Alliance itself. So on this first slide basically the Pisto Alliance is a global not-for-profit alliance of life science companies, vendors, publishers and academic groups which operates in the pre-competitive space and counts many of the top global pharma companies as members. The aim of the Pisto Alliance is to lower the barriers to innovation in life sciences R&D by facilitating what is now a pretty well established decentralized innovation model. And this is distinct from the more centralized innovation model which has probably been in operation and still is perhaps in some areas but certainly has been in operation in past years where big organizations were conducting every aspect of their innovation internally. So obviously the decentralized model is much more based on collaboration, open sharing of ideas and information and lends itself to sort of greater overall success for everyone involved. Now on this slide I've also shown some examples of how business value is brought about in the pre-competitive space and this sort of includes things like building new standards and tools and sharing or defining best practices. So next slide please. So the Pisto Alliance value proposition is focused on three key areas represented in grey value is achieved by retaining existing members and attracting new members. So the way the Pisto Alliance is set up is to serve its member interests and no idea or activity is progressed without member buy-in and support. Represented in blue is a value which is achieved by running innovation challenges and competitions, things like startup challenges and hackathons. And then finally represented in red is the value achieved by turning member ideas into projects which deliver added value to those members or the community as a whole. And in the next slide I will go into that in a little bit more detail. So next slide, thank you. So all ideas submitted by members through the Pisto Alliance tool so we do have a tool for this go through a formal process which involves not only openly sharing these ideas but then sort of follow-up obtaining of buy-in, funding and creating a business case. And if this is then approved projects are normally officially launched and then to obviously deliver the value proposition which is set out in that business case. The Pisto Alliance has an active portfolio of projects and I'm just going to pick a couple of these as examples to highlight the sort of thing that we're doing. So for example the chemical safety library project has put together a database which is dedicated to sharing previously inaccessible hazardous reaction information in the interest of increased lab and personal safety across chemical industries. Now previously this kind of information would have only been available within the companies where particular incidents had happened or occurred. The second example is the Helm project. Now Helm stands for Hierarchical Editing Language for Macromolecules and the project has delivered a tool which enables the representation of a wide range of biomolecules so proteins, nucleotides antibody drug conjugates etc. whose size and complexity renders existing small molecule and sequence based informatic methodologies impractical or unusable. So you can sort of think of this in a way as a smile string for all biomolecules. Now as well as the active portfolio the Pisto Alliance also has an interesting developing portfolio and I'm just going to say a little bit more about the community of interest around AI and machine learning since this is a pretty hot topic at the moment. So a recent survey which was carried out by the Pisto Alliance found that 72% of life science professionals believe that their industry is behind other industries in terms of development of AI. To address this Pisto Alliance has established an AI center of excellence which holds regular webinars and discussions around the theme and in addition to this there's also two work streams ongoing, one which is looking at establishing best practices in AI and a second one which is specifically developing use cases in areas of common interest where there are large amounts of data available and if you go to the website and the link to the website was on one of my previous slides you can find out more about these projects and initiatives that are currently ongoing. Okay next slide please so moving on to the Pisto Alliance Advanced Project we are a team of professionals from pharma, biotech and academia and consist of structural biologists bioinformaticians, modelers all supported by the Pisto Alliance SEB is a part of that team and also Conrad and Professor Charlotte Dean as well is part of this initiative the project is funded by pharma, by GSK Rosh and Lily and it's supported by the PDB so next slide please okay so I mentioned ideas and moving those ideas through the Pisto Alliance innovation process so the idea actually came about for this project some years ago at an EMBL EBI workshop to meet the challenge basically described by SEB earlier how can we plug the gaps in our knowledge of antibody structural space beyond just waiting for organic growth of the PDB for example and the thinking was that a quick win could be to release existing proprietary structures owned by pharma companies that had no associated IP with them into the PDB okay now these structures may not provide the sort of biggest bang for buck in terms of a novelty perspective but information on any structure especially those derived in the context of drug discovery efforts will probably give some added value that is one approach what would add the most value and this is sort of the second approach is if we could pick the right structures i.e. which structures if solved will deliver the most knowledge or most new knowledge in the least populated areas of antibody structural space so as SEB has highlighted there are the biggest challenge arguably is predicting the structure of CDRH3 so we decided to focus on CDRH3 and we said about triaging the NGS data set which SEB has described using selected criteria down to a number which was manageable in terms of sequences which could then put into crystallization experiments and this is what is represented on the panel so on the y-axis you've got the number of sequences and on the x-axis you've got the CDRH3 loop length okay so we went down from approximately 5 million heavy chain sequences to about 50,000 non-redundant CDRH3 sequences and there is still a set of criteria that we could apply to pick to get this number down to sort of in the hundreds to then sort of put into crystallization experiments okay in addition to this we also did deliver to some extent on our quick win approach and we released 8 new proprietary structures into the PDB last year and if you know how sort of pharma operates this is no sort of you know this is a challenge and 8 doesn't sound like a big number but to get pharma companies to release some of the proprietary information can be quite challenging as I said so you know a small victory in itself and certainly a step in the right direction now having highlighted the importance of obtaining new templates we understand that adding novel structure information into the public domain into the PDB is probably only half the picture and you know granted if we want to achieve the goals stated on this slide it may be sufficient to provide more templates however if we do ultimately want to have the greatest impact on drug discovery we also need to tackle the computational aspect i.e. understand as a starting point what the current state of modeling software is and where the limitations lie so next slide please okay so basically there's you know 2 to 3 depending on how you look at it approaches to modeling antibody structures these are knowledge based approaches ab initio approaches or a combination of the two the so called hybrid approach knowledge based antibody modeling is the most common approach and it follows a fairly well established process which is based on availability of the appropriate template and this is what's shown in the panel on the right now the main issues with obtaining good models are the correct orientation of the heavy variable and light variable domains and also as mentioned previously by Seb and myself you know the modeling of the CDRH3 loop accurately now knowledge based approaches as Seb said are fast accurate if templates are available whereas ab initio approaches are computationally expensive and accuracy drops with loop length however ab initio methods can do a pretty good job if no template is available the problem is all the challenges should we say picking the good models from the bad ones from the output that's obtained so hybrid approaches which combine both methodologies have also been developed and if you're basically looking to do loop prediction ab initio and hybrid approaches are what you would go for and probably the recommendation would be to use a hybrid approach software like Sphinx which is mentioned here so as to not to ignore the available structure information to do your loop prediction next slide please so just to reinforce some of what Seb just said so the top three panels on the slide here show selected results from the antibody modeling assessment two competition which was published in 2014 in which blinded structures were modeled using a number of different software packages and in each panel what the rows show is the results for the 11 structures which were used in the competition so these are the models and the columns show the corresponding result for the various structural elements so these are the whole RMSD region, the framework alone and then the various CDR loops so the results are color coded in terms of backbone RMSD between the solved crystal structure and the model produced for the competition and there's probably no surprise in that the competition highlights that the existing challenge is with modeling CDRH3 in particular so if you look at the column on the far right in each of those panels you can see a lot of red and orange which is obviously not good and less so of the green whereas if you look at the second column from the left which is the framework region as Seb highlighted it's mostly green now if we look at the bottom the bottom right panel so these are results produced by a body builder which is mentioned by Seb in 2016 and what this suggests is that software capabilities have moved in the right direction over the few years from when the AMA two competition was held since in theory the same database was used to provide templates for both sets of analyses however it's always difficult to establish whether you're doing like for like comparison when you're doing this sort of retrospective assessment however as Seb mentioned the biggest advantages of using something like a body builder is not only the speed and reliability and by reliability we mean representation of knowing in this case which row you are in in terms of this panel because most software won't actually tell you that so as an example is if each of these rows was an output from your modelling exercise you wouldn't actually know with other software which row you were in i.e. which model to pick and to take forward whereas a body builder gives you an indication of that there's two important points that I want to emphasize in terms of modelling software so all these methods first is that all these methods are based on modelling and measuring accuracy of the C alpha RMSD so side chains are to some extent an afterthought they're not included in the computation there is software available that takes care of side chains but I'm not going to go into those specifically and then the second point is that all these methods are focused on modelling uncomplexed antibodies and when you're thinking of drug discovery arguably the most value is in understanding the structure of your antibody antigen complex as Seb has highlighted already next slide please so as part of the advance project at the end of last year we established additional collaborations with BioExcel with MedImmune and the Spanish National Research Council and we put together a funding proposal to the European Union under the Horizon 2020 program with a view to take advance project work forward now sad to say that despite receiving a very competitive score of 79% we were informed last month that we were not awarded the funding this time around however we do believe I certainly believe that we have a very innovative and competitive proposal and as such we are likely to take it forward again this year so what are we actually looking to do next slide please so our integrated modelling of antibodies proposal or IMAPS for short includes the generation of novel structure equity and also an approach to improve our computational capabilities for modelling antibody antigen complexes based on integrating Gromax and Haddock now as I said before understanding and improving antibody antigen interactions is in many ways at the heart of every successful antibody drug discovery project and this can't be done with information on the antibody alone although you could argue that knowledge of the structure whether it's a crystal structure or a model together with experimental data would certainly help existing software has been used to date in a piecemeal fashion to provide complex structure information but we believe that having a single process which produces a refined complex model based on structure and experimental information from real drug discovery projects could add distinct value so to summarize next slide please so Seb and I have basically told you that having available an accurate structure information throughout the life cycle of a drug discovery process or project is important to increase the confidence in making decisions we currently have tools which can predict most region of the antibody most regions of the antibody with a pretty good degree of accuracy pretty good at the framework and you know this gives you obviously high confidence in the results and we can do this fairly quickly and we can apply this to millions of sequences at a time however there are important regions for which we still have considerable gaps in our capabilities and our understanding namely around CDRH3 and the Pistoy Alliance Advanced Project aims to address these challenges by taking both a structure and computational approach and through the IMAPS proposal which I've just described have a direct impact on real drug reuse cases by focusing on structures of complexes so next slide please so if you found what you've heard today interesting or if you have any comments or ideas please get in touch with me there are opportunities for those who can to get involved and participate in the project primarily around contributing structures and or resources and or also to join this year's funding proposal IMAPS funding proposal that is so at this stage I thank you for your attention and I'm going to hand back over to Ian Hi thank you very much from Seb and Richard for such an interesting pair of talks so I'd like to encourage everybody still with us to ask any questions so far I haven't seen any appear in the panel yet if you could enter your questions that would be good I think Rosson may have a question and he is able to talk to us Rosson yes thanks Richard and Sebastian it was really interesting to hear about Antibolis I was interested the software that is currently used for modeling a bodybuilder and the other tools how efficient is it how productive can you be with it on this HPC resources that are currently available is it pretty streamlined the whole procedure what's the throughput if you can get all the opportunities for improvement there Seb I think Seb is closest to a builder are you able to talk to us Seb Seb was muted hello yes so a bodybuilder so it is very quick as I mentioned about 30 seconds on average per model it is written in Python and it's mainly written in a way where to parallelize it you would need to run multiple copies of the tool one copy per core or so so it's basically it has been run in Oxford and I think on Amazon AWS in large scale I think mainly via some shell scripts hundreds of CPUs at a time at least and I don't know exactly the numbers in terms of how well it scales but it should scale fairly well the main bottleneck is possibly the IO essentially so the reading from the disk quickly since at least the loop modeling part will perform a database search and you can also parallelize a single run if you want to mainly by modeling I think searching for multiple CDR templates at the time but I think mainly it's meant to be run multiple copies of the bodybuilder process on multiple cores I don't know if that answers your question exactly but essentially I'm sure you could perform additional integration with queuing systems etc to streamline the use on HPC clusters great thanks I've got questions coming in now three questions have appeared in the question panel in the interest of speed I'll identify the asker and read out the questions and proceed from there so this is from Morikio Menegati-Rigio apologies for the pronunciation the question is how do you compute how does compute charges in silico to evaluate protein antibody interactions if we think in terms of electrostatic potential this is very important to assess the interaction so are you able to compute charges I think is basically after that question yes so I think this alludes a bit to this approach I mentioned with the contact clouds so that particular approach doesn't do any physics based calculations it is a purely knowledge driven approach so it's basically completely based on statistics so if you see certain charged amino acids atoms of amino acids interact with certain other charged atoms in another amino acid then that would be represented in those contact clouds that you see you can use a completely different approach of course you can use various energy functions that will try to compute charges in a more physics driven manner and use that to score and certainly people do that we do it as well mainly in the context of molecular dynamics simulations etc and free energy calculations but that is a completely separate method from what I've talked about okay thanks another question from Sid's Shrid Haran how did you select the 50k sequences for modelling what is the sequence identity of these to known structures hi Sid yes so that was the numbers that Richard showed reducing the 5 million sequences down to 50,000 that included quite a number of filters the actual sequence redundancy I think is what you're asking so between these sequences in that 50,000 set you would have at least 5 mutations between any two sequences so if you reduce that number to say 3 you would have a much larger sequence data set so this is a very crude way of reducing numbers we also did additional filtering including this whole modelability idea that I mentioned as well and a few other things including developability problems that are easy to predict etc but you would probably want to prioritise this data set even more and you could do it certainly with a view towards essentially looking at numbers of sequences that are similar to this particular H3 sequence in the known antibody sequence space which we could talk about more but we're running out of time unfortunately we're running out of time anyway thank you for those questions that we are indeed running out of time and I'll have just one final slide which is about the next webinar that is entitled biosim space filling the gaps between molecular simulation codes and this will be given by Christopher Woods of University of Bristol at the end of this month on the 27th so everybody who has called in today would be welcome to attend our next webinar thank you very much everybody for attending today thank you especially to our presenters bye bye thanks bye bye