 Okay, hello everyone, thank you for joining this session. So, before we start the session itself, maybe some word about housekeeping. So, you know that you can ask questions through the Q&A menu which can be found at the bottom of the Zoom window. You can also use this option to upload questions from others. And when you submit a question, please mention the name of the speaker to whom the question is addressed which will be very helpful for us. So, you will see that we have the chance to have six speakers which means that we have little time between the talks. But we will try our best to be on time and maybe we are going to limit the question to the speaker to one during this talk session and we will move the other questions to the mid-speaking session. So, before we start the talk, I would like to present myself. So, I'm Vincent Zouet. I am one of the two group leaders of the molecular modeling group of the Swiss Institute of Bioinformatics and I am also an assistant professor at the University of Lausanne. I don't know if you would like to present yourself. Yeah, just a brief introduce myself. I'm Chen Zou, a postdoc researcher from EPFL, Mandodoprera's group, laboratory for molecular, biomolecular modeling. I'm working our biological nanopore for molecular sensing and I'm so delighted today to be co-chair and host this actual with the way since together and looking forward for your talks. Thank you. So, let's start with the first talk. So, we have the pleasure to start with Francois Bonardin from the group of Frédéric Lissacheck of Geneva, who is going to talk about prediction of carbohydrate binding proteins in microbial proteasomes and about you need sorry, the database attached to this topic. Thank you. Francois, the floor is yours. Okay, so I will start with sharing the presentation and then we move on to the presentation. Okay, so does everyone have the presentation? It looks good. So, hello everyone. I'm very happy to be here for a presentation for the seed days. So, I'm a PhD student working with both the team of Frédéric Lissacheck in Geneva, which is the peak team, and also with Anniberty team in Grenoble at the CERMAV. So, my work focus mainly in developing unilectin.eu web portal, which is dedicated to the classification of the lectin and the prediction of these lectins in all available genomes. And for the following talk, I will focus mainly on my results of prediction in the microbial proteomes bacteria. And I would like to highlight also that it's financed by the University of Geneva and Glicoalps in Grenoble. So, first I will redefine what are lectins because they are only one part of carbohydrate binding protein. And the distinction is that lectins have at least one non catalytic domains that binds to a glycan or different types of mono and oligosaccharides. It is not an antibody and it does not have an enzymatic function. There is a distinction between the lectins and the CBM. So, the lectins will be alone on a protein or separated of other domains compared to the CBM which will be always next to an enzymatic domain or sometimes even superimposed with it. The lectins can be found in the extracellular matrix in the vacuole and on the cell surface and is very important for the interaction between host and pathogen. So, if we look at UniEctin web portal which I developed during the last two years, I have the already presented UniEctin 3D module which contains the curated structure of the lectin and provides different types of information. It is also published in NAR. There is also the Proplec domain, Proplec module of the website which focus on the propeller lectins which is a type of fold. So, I have been predicted all this type of propeller lectin and we have had very good luck with the prediction and we were able to publish it in structure with the identification of new types of propeller in nature. And finally, the part I will focus on is the prediction of all the lectins. So, if we take a step back at the classification of the lectins in UniEctin 3D, there is a lot of types of folds. For example, beta barrel or you can have some beta prism and some alpha beta barrel. There is all types of folds that can be used to recognize the glycans and the folds are now in the new classification that we provide on UniEctin 3D. It is now the first level for the classification. As the fold for the proteins is more concerned compared to the sequence. So, the classification of the lectins works in four levels with the fold, followed by the class which distinct so lectins of a same fold sharing at least 20% of sequence similarity. This criteria is based on the one used by CATHDB. So, they advise to use 20% of sequence similarity at least for a same fold. And then we have the family level at 70% of similarity and finally the PDB structure in each family. For the prediction of the lectins, I use my class that share at least 20% of second similarity to define proper lectin domains. In total, I have now 108 distant lectin classes based only on the 3D structure on lectins. I will explain how I generate my concern motif of lectin using the proplek part. So, if we take a look at the propeller, it can have from 5 to 7 repeated domain and each repeat is a lectin domain able to bind to a glycan. So, by using the repeats, by cutting every lectin domain and aligning together, and also I use all the available structure for sharing 20% of similarity for a same fold. And I align them together and I am able to get a proper conservative motif, which is used after this for the lectin de novo prediction. And the problem is, if I focus on all the prediction, I have too much prediction, I have more than 500,000 predictions of lectins, it's very too much. And if I select only the prediction in the bacteria genomes, after filtering based on the similarity score, I have 42,000 predicted lectins. So, it's quite a lot. And so, inside of all this lectin, what is interesting to do is to compare the full distribution of the 3D structure we know with the full distribution of all the predicted bacteria and lectins. And, as we can see, for the free distribution, we have mostly information for the beta sandwich pilates in fold, the OB-TF fold, the beta-tree fold and the beta sandwich to calcium lectin. And so, this distribution is completely mixed after the prediction, where I have most of my prediction for the lysam domain and for the trefold, beta-barrel and beta-elix domain. So, what does it mean? It means that we have quite a lot of predicted lectins in the bacterial kingdom that might be interesting to crystallize to have their free distribution and to have more information inside how lectin works and how they interact with glycans. So, the problem with the bacterial is that we have too much prediction, so we didn't know on which food start in all of these. And thanks to a contact and a collaboration with the Imperial College in London, we were able to focus on the vaginal microbiome. And to check the interesting parts for the health and disease. So, what they have at the Imperial College in London is information for the vaginal microbiosis and inflammation, and they know which species and strains will help to have better pregnancy outcomes compared to other species and strains that will be linked with issues. And this is with inflammation and this biosis. And thanks to this, we have a list of lactobacillus bacteria, which are really interesting as they are able to protect during the pregnancy compared to Garnerella, Prevotella, Streptomyces, Streptococcus, and in particular, lactobacillus inus, which will be linked with problems. And so, the question we asked ourselves was, is there a link between these species that might have differences and the lektome, because the lectin has importance between host and pathogen, it might be very interesting to see if there is a correlation between the two of them. So, by using the lectin class defined previously and the concept motif, I predicted the lectins in these species in as many strains as possible by only selecting the strains known in the vaginal microbiome. And thanks to this, we were able to have a type of barcode of lectins for each of these species. So, here, the identification of lectin is restricted to the list of species I just talked about, but we also try to allege a little bit to other species to see if it has the same kind of repartition. So, what is very interesting to see is that for the probiotic lactobacillus in green, we can see that they have a very low number of lectin class that are present. And in fact, only the lisem lectin, which is present in most of them, and is in fact a household lectin, which is not the most important for host-pathogen interaction. And if we take a look in orange and red to the more pathogen species, we can see that there are much more lectin types of these multiple classes. And for example, if we take a look at the serine-rich repeat lectin class, it is present in many bacterial pathogens and also in lactobacillus iners. It's really nice to see the difference between the lactobacillus in general and the lactobacillus iners, which has different types of lectin classes. So, for now, it's only at the correlation level to have the proof that lectins have an effect on the pathogenic level of these species. The next step would be to have a binding array and to see if lectins have a role in the pathogenicity of these other species. But it's really interesting as a first step for the prediction of lectins. So, to conclude and for the prospects, I have my unilectin portal with unilectin 3D part that provides now curated information for more than 2,000 structures of lectins with 108 lectin class. So, for now, the classification is not yet published, but it will be when we have the complete paper for the microbial prediction part. This classification allowed us to first predict for beta-propeller lectins and have obtained a nice new propeller structure. And then the prediction of all lectins which open the doors to the exploration of lectins in fungal bacteria and viruses and many more other possibilities of exploration. In fact, we are always open for collaboration with other teams that might be interested in the lectin profile of their specific organisms. I would like to thank everyone and to highlight that after my PhD, a new position will be available between Geneva and Grenoble for the further continuation of this project. And you can check unilectin.eu website to access this offer. Okay, I will now take a look at the question. Sorry, yeah. Thank you very much. So I have a question. Do you have any plan already to share the data that you are collecting or predicting to more integrated databases like Uniprot or PDBKB? So for now, in Uniprot, we have already put all the free destructions of lectins that are classified. But as a next step, it will be to put all the information of the predicted lectins in Uniprot as curated annotation. But we still have the use of defining a proper score threshold to say, okay, this one is a lectin, but this other one might be a lectin, but we are not sure about it. But it will be normally the next step after the end of my PhD to have all this information available in Uniprot. Okay, thank you very much. So before we go for the next talk, I would like to remind all the attendees that you can ask questions through the Q&A box that you will find at the bottom of the Zoom window. So thank you very much, François. So the next question will be taken during the speaker session. So now we will leave the floor to Luciano Abbiata from Matteo Dalperra group at the RPFL in Lausanne. So Luciano is going to talk about the state of VR web services for modeling the structures of proteins that lack clear templates in the PDB. So thank you, Luciano, for starting. Here comes, thank you for the invitation. Okay, I'm sharing. Should be sharing now. Yes, I have another computer to see everything is coming. It's coming good. So thank you. I will try to be brief and go to the things that should be of most interest to you. I hope you will find it interesting and useful. I summarize here my title. What I'm going to talk about is modeling the structures of proteins that have no clear templates in the PDB. So let me activate the LASER because I use it a lot and you will not see it. LASER pointer. Okay, yes, here. So this presentation is in my website. If you want to go and follow there, you can see everything I will be saying. You will have all the links. So for sure, you already know that if you have a new sequence whose structure you want to know, you go to the PDB. Maybe you are lucky you'll find the structure. If not, if not, then you know you can do a modeling if you find the protein of similar sequence. Then you can do threading and alignment and threading, all that you know already. Now the problem comes when you want to do a prediction, then you have to use some kind of some different method. Which in principle, are much worse than homology based modeling, but as I will show you here, in recent years, they have been very impressive advancements in our capacity to do ab initio prediction. So without templates, this is coming from our experience, Matteo, Dalperado, my PI and me, in the GASP. For those of you who don't know what GASP is, this is a kind of contest on a structure prediction. So essentially what GASP does, so it has been going around for more than a quarter of a century already. And there's a group of organizers who secure new structures that haven't been released by the PDB. Then they take the sequences of these new structures, they send the sequences to the predictors, predictors build models, then they send the models back to the GASP organizers and the GASP organizers contact independent assessors who compare the models to the targets, trying to come up with an evaluation, who is doing best, what parts of the protein can be correctly modeled, what parts are still very challenging, and all this kind of assessment. So now, right now, GASP 14 started, it's taking place every two years. In GASP 12 and 13, Matteo and me, we were assessors in this track of GASP that has to do with modeling tertiary structures of proteins when they are very difficult to model through homology. So you can go and read in detail these papers in protein, so this journal has a specific issue dedicated to GASP. If you go here, you will find all the details about evaluations and progress and rankings in GASP 12 and 13. Very briefly, what we introduced in GASP 12 is this kind of web app that I can show you running live. So it's a kind, it's a web app implemented in a web server that takes for a given target, whose structure predictors model, it implements lots of scoring metrics that help us navigate through the models. And then all everything that you see here is a model. And then we can compare the target structure, which we know because we are participating as assessors, and we can compare each structure with each model that was submitted. And then this is coupling in 3D, so we can do all kind of evaluations on the different models and then come up with our conclusions. So going back to the presentation, of course, one very important thing in GASP, especially in this track about very hard modeling, very hard targets, one very important thing is try to track if there is any improvement over time. So as I told you, GASP started like 1994 with GASP for one. What you see in this plot is each GASP edition, very annually, and then the Y axis is a kind of score that measures the quality of the best models that we had for all the targets in each GASP. So this score is such that if you are above 70, 80, then essentially your model is perfect. It's within one angstrom of backbone RMSE. Now, if you are under 25, 30, then the model is not better than any random spaghetti model that you could build. Now, starting at 30, you kind of define the overall shape of the protein quite well. You capture it. As you see, this is a median plus minus standard deviation of the mean. So there are some points somewhere here and here. But overall what you see is that in the first GASP until like GASP 11, 12, then models were barely capturing the overall topology. Now in GASP 12, when we started being evaluators, we were lucky to see that people were starting to model proteins much better. And then actually in GASP 13, the jump was even stronger. This, of course, suffice to say that target difficulty was more or less similar in the last three GASP and that the assessment is always based on the same metric. So it's not a matter of the assessment changing. This is reflecting that actually methods for predicting structures without templates are working better. And why are they working better? Well, already by the time of GASP 10 or 11, people started to solve this problem that was around for some time that allows them to predict contacts in a sequence. So how does this work very briefly? Because I'm very likely you are already familiar with this. So suppose this is a sequence of the protein that you want to model, then what you do is you build an alignment and then you look for pairs of amino acids that co-evolve. So here you see these arginins of positive amino acid here that co-evolves with this negative amino acid here. That's because it's forming an important contact. Maybe they will mutate, but when one mutates, the other one has to mutate as well. For example, here in this protein, you have two hydrophobic residues that will still pack and make a contact. So the problem that got solved by the time of GASP 10 or 11 was to try to infer these contacts based on these correlations that you have in alignments. And by getting these contacts, then people could fold proteins really by trying to satisfy the predicted contacts and that's how we got this increase in model quality by GASP 12. Then there was a further increase that we saw in GASP 13, which came about by the introduction of machine learning methods for molecular modeling. Going very quickly there because there's a lot of bibliography, probably I missed some important paper here. But how does this work? Well, this machine learning methods, what they do is essentially they take a sequence, then from the sequence, they build an alignment. From the alignment, you can compute residue co-evolution just like I showed you before. But you also get these kind of features which are linear features or also from alignments, which measure, say, a probability to be a solvent exposed or buried, secondary structure propensity, different kind of things. And then the key thing that came up in GASP 13 is that the best predictors, what they were doing was to actually learn contacts, distances and even orientations between different residues from PDV structures. So the best predictors in GASP 13, they were integrating all this information to predict contacts, distances and orientations that they could then use for folding. Okay, directly for folding or you can also use the patterns of contacts to go and search the PDV, but not at the sequence level, which maybe will not give you anything because that's why, by definition, these are very hard targets. But maybe you will find a pattern of contacts that is already there. Okay, and then you can select that you do an alignment based on the patterns that you found, and then you can do some kind of threading. Actually, most of the most successful servers and human predictors in GASP 13, they were using a combination of folding and threading, mostly folding using the predictions. So this looks very nice here. I'm showing you very quickly just some, a couple of examples. If you go to our GASP assessment paper in GASP 13, got published last year, you can see all this in detail and then the supporting information is full of explanations. You can see here, for example, here you have one target, say this one 140 amino acids got modeled super very well over the full sequence over 2.3 angstrom. So the prediction quality that you can achieve is really remarkable. Of course, all the assessment that we are doing here is at the backbone level. Sidechains typically were not evaluated in this track of GASP, because as you saw before, the model is tended to be not so good, but I think in the future GASP will move into looking at the sidechains as well. So how can you benefit from all this, which is what I promised you from the first slide. I will go again very quickly because there's not much time, but if you want details, see these two papers, especially the second one that got accepted now on Friday. And it's a kind of review, but also doing analysis where we describe what we think are the best servers and some data sets as I will show you briefly now. So if you go to that paper that I think will be out in a couple of weeks, you can go to table one and you will see our assessment combined with the GASP assessment of what we think are the best servers that today implement all these technologies. We discussed several aspects about them, but then another thing I wanted to show you is these four data sets, these are in table two of our article, which this is a very interesting thing. So what the people from this other service here, what they did was they realized, okay, if these methods are working so well, we could go to PFAM, we take all the PFAM families for which there's no structure in the PDB and we try to model them. Okay, so they did that, and based on quality estimations, they could provide together these four data sets, at least one model for over 2,000 structurally uncharacterized PFAM families. And in principle, they are quite confident according to their evaluation and that's not little, this is like 12% of PFAM. If you check the PDB, you can already today just find structures for 55 of PFAM, so these models give you a further 12%, that means for roughly two-thirds of PFAM, you have some structural insights. Now, what's happening? Okay, so one thing that we looked at in the paper is that since publication of these data sets, for example one that I showed you there from the David Baker group was from 2017. So since publication, there were a couple of structures that came out and having these structures allowed us to do an evaluation independent of the CASP evaluation. So we did that. I'm showing you here a few examples and in the paper there will be more and you can see, for example, just to show you one here, this is brought in 257 residues and this model is getting like 90% of the residues within 2.7 angstrom accuracy at the backbone level. So I think that's also remarkable and where we could compare these data sets, we found that roughly when the models cover more of the domain, they tend to be better. Okay, this is our assessment based on the structure that came out later. There are more details also in the paper. But I wanted to show you one last tool. So this is what we call model search. Imagine that you have a new, a new sequence that you want to model, you know that it's not there in the PDB, but you wonder maybe it's in one of these data sets or somewhere else that somebody already took time to model and maybe we can use this. So what you can do is you go to our model search link, which is described in this paper here. You take a sequence and this will kind of do a kind of blast on all these data sets. It will also blast in the PDB dev for integrating models in the database of SACS based models, in all the CASP models, some CASP targets were modeled, but then they were never released, maybe at 1%. So it may be interesting to find what people modeled and in some other resources. And then this will give you back models that maybe you can use now for a model modeling based not on an experimental structure, but on a on one of these models that you find in the database. So I went very quick because I wanted to show you many things details we can talk later and also in the in the paper that especially the paper that will come out soon. So I wanted to to to thank here material that my PI with whom we did all these evaluations and all these studies. Also Georgia, who was very important in the CASP for CASP 12 to help implement lots of analysis and he was also doing evaluations himself. Then CASP being an assessing cast was a wonderful experience got to know lots of people learned a lot about structured by informatics and also experimental structural biology. It's very important to acknowledge the structure contributors because without them there would be no CASP, the predictors and the previous assessors as I show you in that very quickly in that slide in the website. There are many metrics that we use that they developed. And then we implemented them in the website. I would like to help the SIB, especially in Francisco Solange and Rob and Shannon Benza one song who put this up, I think, so far is going very, very good. I'm happy to be presenting here. And as the last thing, if you're involved in some collaboration with experimentalist or you run an experimental lab yourselves, consider donating your structure to CASP. They will remain confidential. They are not released only them and the assessor see them. One thing I didn't say here is that there are increasingly more cases where these models are getting so good that you can use them for molecular replacement and then to face data that you couldn't face in any other way unless you collected some anomalous dispersion or something like that. Then these models are also being used a lot to be put into mid resolution cryo EM maps, which still dominate the cryo EM database. And of course, there are lots of reports we summary some of them in our paper about practical use that these models have in driving forward biology, maybe at the low resolution but enough to move the science forward. That's it. Thank you. And I will take questions if we have time. Thank you very much for channel. So indeed, we have a couple of questions. So I will just pick the first one from Anna Claudia theme. So what role did domain expertise play in designing the system that one cat 13 from deep mind. What are the implications for future relations and participants of the challenge. Can you sorry repeat the first part of the question. So what role did domain expertise play in designing the system that one cat 13 from deep mind. And what are the implications for future additions and participants of the challenge. Okay, I'm not sure I understand the question baby we can talk more later but yeah the winner was the winner let's say in case they don't like to say winner. The group that performed best was alpha fold from Google deep mind. They were using this technology I showed you before. They didn't invent it but they push it to the to the to the limit, I'd say, and they do some very, very important improvements and extensions. And the problem with them is that what they provide is not a server. So there was a lot of, in principle, you could guess there was a lot of human intervention or something like that, maybe selecting the best models is kind of things. Probably one should read in detail their papers to be sure about that. But what sure is it's not a server that you can use. That's why it didn't appear in the tables. The technology they use is the same I show you and for the future what's missing well most of these very nice models I show you they are very nice, very good at the domain level. But then when you start having different domains that come together and this until cast 12 I say there was very little chance to get something good. In cast 13 some groups, including them, they started to put together the domains in into something that kind of makes sense, but there's still a lot of progress needed there. Then the other thing that needs to be improved is, well, the side change actually they are so far not even evaluated. But I think they will be evaluating the future given that the models are getting so good. And the other very important thing is developing on quality estimates. So you want to know as a user you get a model you want to have a number that tells you how likely it is that it is good globally and also at the residue level. And some of the groups are starting to work on that. Okay. We have some examples in our gas 13 paper. Okay. Thank you channel. Thank you. And then we are going to move to our next speaker. Olivier Binucolo from the group of Stefan Kellan Berger at the University of Lausanne. So Olivier is going to talk about responses of ion channels to pH fluctuations investigated through molecular dynamic simulations. So Olivier, if you'd like to start. Thank you for the nice introduction. I start sharing my screen. So, in this presentation, the purpose is to show you some illustration of how the use of restraints of advanced or classical molecular dynamic simulations can help us getting a better understanding of the relationships between the structure and the function of proteins. Example will be taken from a family of channels that I will just introduce afterwards. But the strategies are the tools that will be shown. They can be applicable to any protein that you might be interested in. So in the group where I am working, we are interested in Azix. These are the acid sensing ion channels. You see on the left trace also if you perform a voltage clamp experiment, you will have some current because these are neuronal proton activated sodium channels. They are activated by a drop of the pH as you can see on the pH shown above. And after having opened, they will just spontaneously close again in a so-called desensitized state in which we are unable to open again. Now there will be the only experimental here that we have to understand to understand the following. So if we draw the current normalized to the maximal current as a function of the pH, as you see here in this order, we can see that at a given pH we have 50% of the current. That is what we call the pH 50 activation. So in terms of structure, we do have structures of all the states. So here the open on the closed state are shown. If we now would perform a superposition of them, we see some conformational changes, some helix orientation, some things are moving from one side to the next. But these structures, they do not give us any information about the two questions that I will present now. So who are the, where are the pH sensors, these residues that might take a proton during the acidification. And because of this change of their protonation state will affect their surrounding that will ultimately lead to the opening of the channel. And in addition, these structures, they do not inform us at all about the transition or the different pathways that are possible between the closed and the open state. And now we'll show two methods that can be used to investigate these things. So if you have any proteins that you are interested in on this protein might be sensitive to pH, these are some tools that you could use. The first one is a very old one, a classical one, but still very useful. This is a PK calculations based on the structure. The only thing that we have to know is that the PKA that you learn in the textbook for amino acid can be very strongly shifted if this residues sitting within a protein. So we are using tools, some equations that we solve. And in this case I have investigated the PKA of all the residues in the channel in the closed and the open structures. And now you need some criteria to identify a pH sensor. These are at least two. The PKA in the first state, so the closed state, must be more or less close to the pH 50 activation. And the second criteria, you should observe a strong shift between the two states or the PKA of this residue should change. Then you can have the idea that you have identified a pH sensor. So I did that for the structure. Here I show only a stretch of it. And if we look carefully, we see this lysine 211 who has a PKA close to 8 in the closed state and just above 6 in the open state. And the gray zone represent more or less the pH 50 of activation. So it's a good candidate. Do we have experimental support? Of course, yes, we have. Group performed recently a deletion of this residue. And what we could see is we remember the pH 50 of activation. Here we will have the y-type. And upon deletion of this residue, the pH 50 of activation was the activation curve was a right shift, which means because this residue is lacking, this channel requires more than 100 times more protons to open. So this confirms that this residue is a pH sensor. A second technique that we can use, which has been developed recently, is the so-called constant pH molecular dynamic simulation. You might all remember that molecular dynamics deal with classical physics, so you will never build a covalent bond. Yes, some tools have been developed and I am using some of them to run molecular dynamic simulation at constant pH. This means you constantly assess the protonation state of all the residue in your channel. Here I have done some replication at low pH to mimic the acidification at high pH like physiological to mimic the physiological pH. And in this case, I will just show an example of building the wreckage of salt bridge because it's just the easiest to show. The acid channel that I am studying forms trimmers and here we have two residues that are building some salt bridge between two subunits, so glutamate and arginine. You just easily count the number of occurrence in which you have these glutamate protonated. And as expected, at the acidic pH you have a lot of occurrence in which this glutamate has been protonated, almost never at the high pH. If you look at the time series of the distance between the side chain of the acidic and alkaline residue, you see that at physiological pH, these two residues might remain at a given level. They remain at a given short distance to each other and at pH 5 they seem to move away, which is logic if one is protonated. But do we have experimental support that the molecular dynamic simulation has indeed reproduced something that we can be confident in? Yes, if we compare with the crystal structure we see that at pH 7.4, so the physiological one. At the same distance as the one measured in the simulations, at the acidic pH or in the open structure, it's 11 while we find 9.5. This means the simulation was on the way to more or less reproduce the x-ray structures. We have additional experimental support, so a given group has mutated these two residues into systems and investigated using oxidative or reductive conditions how this might affect the channel. We see that upon oxidation, so when the two residues are closed or maintained together, they cannot move apart until the channel again has problems to open. So this shows us that also these techniques is now mature to reproduce experimental settings, experimental observation. But of course, in addition, you will have all the intermediate steps, you can study how the channel changes conformation and then you can really understand how these protein functions with this. I would just like to thank all of you for your attention on the Nora Ruff from Chicago for supporting the constant pH molecular dynamic simulation. And the group where I'm working, Stéphane Céline-Bergé, and the group of the elevator which hosts me as my own family. And now I am ready for your questions. So thank you very much, Olivier. So indeed we have questions, but unfortunately we are running late in the meeting. So we will keep the questions for the mid-speaker session a bit later and so thank you again, Olivier, and right now I'm going to leave the Zoom to the channel. Okay, so let's welcome our next speaker, Professor Andrew Cavalli. He received his PhD in Pharmaceutical Science from the University of Bologna and Deep Postdoc and S.I. S.I. S.S.A. and ETH. Now he's a professor in Medicinal Chemistry at the University of Bologna and a research director at the Italian Institute of Technology where he's also deputy director for the research domain computation and data science. So today he's going to share with us about his work of machine learning for predicting the toxicity of immune globally light chance. So please. Okay, let me start by sharing my screen and maybe just a short note. The Andrea Cavalli you presented is anonymous. It's not me. So I'm not working in Bologna, but I'm working at the IRB in Bellinzona, which is an institute in the southern part of Switzerland. So today I will spoke about light chains and so light chain amylozotis essentially light chain amylozotis is a disease related with the misfolding of light chains of immuno globally light chain which detaches from from the heavy chain and form toxic aggregates or amyloid fibers that accumulate in heart and kidney mostly causing fatal order dysfunction and eventually death. Now each patient carries a unique sequence which raises the question which are the molecular determinants of light chain toxicity. Now the light chain, the sequence of a light chain is a result of the, the, the maturation of of B cells, which starts in the bone marrow with an hemapoietic stem cell and terminates either with an antibody secreting plasma cells or with a memory B cell. During this maturation light chains goes through essentially two differentiation states. The first one is the BDJ recombination, which generates a naïve B cell by recombining the B and J genes. A second phase comes then in which somatic hyper mutation are added to the sequence to increase the affinity of the, of the antibody towards the antigen. Now, from the structural point of view, light chain homodimers are similar to antibody fabs. However, light chain homodimer do not have any biological function. On the contrary, fabs are responsible for binding the antigen. So our hypothesis is that because this lack of structural and biological checkpoint mutation, which are added to fabs to increase the affinity of the antibody might be in some cases detrimental for the stability of light chain homodimer leading to the generation of toxic species. Now, to further investigate the role of somatic mutation of determinant of light chain toxicity, we collected a large database of sequences comprising 600 roughly 600 non toxic sequences and 400 toxic one. Then all sequences were aligned to the reconstructed germline sequence in order to identify somatic mutation, which were then used to train a machine machine learning based classifiers. Now these slides show the result of our best predictor, which we call Lictor, which means light chain toxicity predictor. As you can see how it's based on a random forest algorithm. And as you can see, the hour of the predictor is roughly 0.9 and the sensitivity and the specificity of the predictor is around 0.8, which means that Lictor is able to correctly classify toxic and non toxic sequences in 80% of the cases. Now, to validate our approach, we obtained six sequences not present in our training database, three of which were toxic and three of which were from patient with multiple myeloma, which is another plasma disease, but does not generate toxic light chain. And in all six cases, the prediction, the Lictor was able to predict the right phenotype, assessing the cardiotoxic as toxic and the multiple myeloma as non toxic. Next, we wanted to see if our approach could also identify which mutation exactly is responsible for the toxicity among the five, six, seven that typically are present on a light chain. So what we did was to revert one after the other, all mutation to their germline value and see which were, in which cases we could change the phenotype of the light chain. So by starting from a toxic one, we changed, introduced our first, we reverted our first mutation and had a non toxic light chain, and then we added a second Y in one to even, to decrease the toxicity even further. So we had three sequences, the wild type and Q2 mutants, which were predicted one toxic and two non toxic. Now we wanted to assess if this correlates with a phenotype that could be assessed. And for this, we expressed this light chain recombinantly and tested them on two validated models. One is an in vivo model based on C elegans, and the other one is a test with the viability of human cardiac fibroblasts. As you can see from this image, the red one is the wild type is the only light chain that has a toxic phenotype. The single mutant and even more the double mutant does not have any, an effect which is statistically different from a vehicle. In the same way, the reason we have a concentration dependent toxicity, so decrease of viability in cell, if we add toxic light chain to cardio fibroblasts. It should be noted that the concentration, the highest concentration here, 800 nano molar is roughly half of the concentration that patient have in their blood. So, concluding somatic hyper mutation are key determinants of light chain toxicity in light chain amyloidosis. And later can exploit this somatic mutation to classify light chain as toxic or non toxic with an accuracy of roughly 80% making it a very valuable tool for early IEL diagnosis in patients. Litter is available as a web server at the IRB and last I would like to acknowledge some person that helps me in this project and SNF for funding and you for your attention. Thank you a lot Andrew. I'm really sorry for my wrong information about you. And since we are a little bit out of time. So I think we can leave the question for the next section. And we will welcome our next speaker, Dr. Martha. And Martha is a computational by chemist. She has been developing and applying data analytical theoretical method and mathematical model and the computer simulation to address critical biological problems. She got her PhD in chemistry and University of Portugal and did the postdoc and EPFL and UNIN. Now she's working on cancer related research and the development and the maintenance of the Swiss drug disease 1, but web tools with Wessent and Olive. Today she's going to show us how she address cancer in new syrup by the structure by informatics approaches. The stage is yours. So please, Martha. Thank you. Thank you for the introduction. I'm going to start by sharing my screen. So hello everybody, it's a pleasure to give a short talk for the cb days 2020. I'd like to thank you all for coming and listening to my virtual presentation. The title of my presentation is addressing challenges in cancer immunotherapy with structural bioinformatics approaches, and I will give you an overview of some research. So it's lab on HLA one peptides. So, HLA one stands for human leucosite antigen class one, and recent advances in cancer immunotherapy have lead to renew its interest on these peptides, due to their potential use as peptide vaccines. So, looking to the figure on the left. There is a simple illustration of HLA one peptide presentation, basically proteins inside the cell are going to be fragmented into smaller proteins by the cell machinery, namely by the proteasome, and then is HLA that is responsible to present these peptides on healthy cells, the peptides that are displayed don't bind to the T cell receptors and the cell is not killed. However, the sick cells display peptides from aberrant proteins associated with this with the disease that are going to be recognized by the T cell. And the cell is going to be killed. So databases of HLA one peptides all therefore information on therapeutic targets essential to understand immunity. And in this work we use extensive and decorate HLA one peptidomic data sets and dream up the three dimensional structure of HLA one binding peptides into the source protein for analyzing their properties. We also specialize on the increasing number of structurally determined proteins, namely on protein data bank, and we search for potential differences between the properties in HLA one peptides and in peptides from human proteome that are not HLA one binders, allowing us to understand if there is a bias between HLA one peptide and the human proteome. So starting from the first point, we use a huge database of HLA one peptides with more than 150,000 peptides and determined by mass petrometry in Bassini's group, and then for each one of the peptides we map the peptide on the three dimensional structures. For example, for this peptide, we have nine proteins that nine PDB codes that display this peptide and the source protein is NFA is promoting complex. And here are the three dimensional representations of the peptide in purple and green. Purple shows the percentage of the peptide in elix and green shows the percentage of the peptide in the coil. And in the end, we average over all the matches for the peptide in order to determine the amount of the peptide in elix and the amount of in coil and we got 76.5% for elix and 23.5 for coil. So we average over all the matches to avoid under representation or of certain peptides with respect to the others are some peptides have representation in more or less proteins, depending on the peptide in question. So, we analyze secondary structure for PDB itself so the first set here in the, in the graph, and we analyze also the secondary structure for HLA one peptide database and subsets of HLA one peptide database but divided per length. The frequency of residues in elix is in red, and we can see that HLA one peptides are enriched in illegal content when compared to the protein database structures, and this enrichment is more pronounced for HLA peptides with smaller length, namely with eight or nine. So going to our second point when we compare PDB with HLA one peptides we are not comparing the same set size and the same amino acid composition. So to circumvent this problem we search for bias between HLA one peptides and HLA one multi flag peptides. And HLA one peptide data sets can be divided per allele and present different amino acid composition, as we can see here for three alleles on the left of the slide. So for each allele understood we made use of an in-house developed search that searches across all the human proteome, a set of peptides that match exactly the same number of elements and the same amino acid composition of the reference set, allowing us for an accurate comparison. And if we do so we have here an example for five alleles and we converge to multi flag peptides that have similar logos to the original peptides, and once again we see that in HLA one peptides we have an enrichment in the illegal residues. So to conclude HLA one exhibit localization bias to illegal fragments in the source proteins, this knowledge refines the understanding of the rules governing antigen presentation. So we saw we saw in the beginning that proteasome has a paper in the cell cleaving the peptides and the proteasome cleaves preferentially in coil. So we assume that proteasome leaves more illegal residues to be displayed by the HLA. And this knowledge could be also added to the parameters of the current peptide MHC class one binding predictors to increase their antigen predictive ability. So I'd like to thank the molecular modeling groups, our collaborators and Vonson for all the sport. And thank you all for your attention. Thanks a lot, Marty. So next we have the appalling a slush any from University of Basel. Hello. So, thank you very much for the kind introduction. I welcome everyone to this short talk on the Swiss hold repository and up to date interactive protein structure database. If one compares the number of entries in Uniprot and in curated Swiss product database to the number of experimentally resolved proteins in the PDP. One can see a discrepancy, which is known as the structural gap filling this gap with predictions of protein structures is a main goal in the computational structural biology field. The overlooked implication of this is that there are potentially hundreds of millions of protein structures, be it experimentally resolved, or generated by modeling. How do we deal with this amount of structural data, or to be more precise, how do we make this data accessible and useful for life scientists in all biological fields. Answering this question is the main motivation for the Swiss model repository. The repository. Currently, there are 150,000 experiment structures from the PV and 1.6 million homology models which were mapped to the Uniprot KB. Every week together with the PV release the entries and the repository get updated. The repository sets a focus on 13 model organisms which range from human to we call it. This is the residue coverage plot, which indicates the type of structure covering a specific residue on the 13 mobile protons. This is the fraction which is covered by experimental structures from the PDP. This is the fraction which is extended by homology modeling with various degrees of quality as indicated by this color. This is the rest of the gap which still needs to be filled by structure prediction. Finally, the repository is cross-referencing seven different databases which range from protein annotations, protein interaction databases, and organism-specific databases. But this is not the only thing the repository has to offer. There are also functionalities for further analysis and visualization of proteins. Annotations from Uniprot KB and Interpro can be mapped to every structure in the repository. The annotations range from functional sites like active sites or various binding sites, structural annotations like transmembrane regions, and domain annotations. Here we see metal binding annotation mapped to a protein structure, and we can see very well how they surround the two zinc ions as you can see here in this structure. Furthermore, it is possible to analyze and visualize the interactions between ligands and proteins. By using the protein-linked interaction profiler or short clip, one can obtain a list of residues which interact with any ligand in your structure. And then you can also see what kind, what type of interactions these are. In this example here we see four hydrogen bonds which interact between the RNA polymerase of SARS-CoV-2 with a potential antiviral candidates drug, Remdesivir. The homology models in the repository come together with a stringent analysis of the global and local quality. There are four global quality measures, which generally look at the physiochemical plausibility of the homology model. But we also have a look at the local quality of your homology model. As we can see here in this example of the hetromere of the homology model between the prucreate and exorbitant in the players in the non-structured protein, where blue indicates a high quality. But there are specific regions which lower quality, like this loop here, which can be identified and visualized in this way. This is actually it. I thank you all for listening. If you want to know more, interact with me later. The poster session in this afternoon, I will be there presenting poster 256. Or I will await your questions in the meet the speaker session. If you want to have a look at this small repository itself, you can see here below the link to it and the QR code. That's it for my session. Thank you very much. Thank you, Ablin. And let's thanks again for all speakers for this section. Now we can move on with the next section meet the speakers. So you could check, you could go back to the program page and click the video symbol on the right, or the plus symbol on your left. And you will see one sentence card in the end card, use the link. So both way should be work. See you there. Thank you, everyone. See you in the meet the speakers room in a few seconds.