 Good morning everyone. My name is Bhuvan Mulpada and I'm from Indian Institute of Technology. I'm from Indian Institute of Technology. It's in Delhi. My course there is Biochemical Engineering and Biotechnology. Right here I'm working in the Mitchell Lab and they work mostly in bioinformatics here. So the topic of my presentation today is the effect of RMSD values of structural relatives of on-protein flexibility protection. So I'll go through everything one by one. So first of all, I'd like to talk about the picture as in where flexibility is involved and where does it come in and why do we need it. So most of us know about molecular developing experiments. These experiments are done to see how a particular ligand would interact with the protein in silicone. And this is very important in the areas of medicine as well as architecture where we can look at drug protein interactions and individual studies. So the very first step of any molecular development experiment would be to classify the protein as in would be to analyze the protein and what behavior it has. So the first thing that we need to analyze is the protein flexibility and it's a different confirmation analysis. So protein flexibility here, as we know initially a ligand and a protein interaction was modeled as a lock-in key model where a solid ligand would go and interact with a solid protein. But then it was changed to an induced fit where a particular ligand interacts with a particular protein. There is some time for confirmation change such that the protein best fits the drug. So induced fit is already there so protein flexibility was required to bring in the induced fit. Now the problem with including protein flexibility is that any protein is a very large compound and it has many degrees of freedom. So allowing it to move freely in space requires a lot of combination power. So I go on to the different protein flexibility methods that have been developed till now. So initially this started off with soft talking which this method is very easy and efficient way of including flexibility where it allows a certain overlap between the ligand and protein. After that people started to include side-chain flexibility where the side-chains were allowed to move somewhat. Now both of these methods are easy and the computations are efficient and implementation is easy. But the disadvantage both of them, the main disadvantage is that they can account only for small conformational changes and backbone flexibility isn't there. So people came up with another method called molecular relaxation where a ligand and a protein both are considered rigid first and they are allowed to dog. And after that the binding pocket it is allowed to move a bit so that there is some kind of relaxation. So it has backbone flexibility included in it and the problem here is that again this demands a lot of computation power. After this people started using a lot of multiple protein structures. Here what they do is they use a bunch of protein structures so that they can include somehow many different conformations of the protein. So definitely backbone flexibility is included in this and calculations are easier and faster. So my work here what I had to do was we have a script that predicts protein flexibility. So that requires different models of a particular protein. So those models are created using structural relatives. I had to see what RMSD range of those structural relatives would best predict flexibility. So RMSD is root mean squared ligation and it tells us how much one structure would deviate from another one. So I was given a list of 28 proteins per which I had to see which range of RMSD would best predict flexibility. So these are 21 rigid and 8 remote flexible. So the method that I use is first I search for proteins having similar structure then I select proteins in a particular RMSD range. Then I use a program called Modeler to make around 20 models. After that these models are used to find out the flexible regions and then this is compared against observed flexibility and then scored. So searching for some of the proteins this is a snapshot of an online available software called Dali. What Dali does is it has a multiple structural alignment of a particular protein whichever one you want you can input that. You can input the PDD ID and it gives out the list of many different proteins with different information such as the RMSD range right there. Then the alignment length then the number of residues and the percentage identity. Now this list is not complete there may be around 800-2000 proteins in the list. So once I select certain proteins in a particular RMSD range Dali allows for multiple structural alignment. So I can align all those structures together and get this kind of an alignment. Now once I have this alignment in my hand I download all the PDD structures of all those proteins which I have chosen and I feed the alignment as well as the PDD structures in the program called Modeler. Now what Modeler does is it does knowledge modeling that is it would take a chain it would use it as a template and it would kind of thread the protein that we want to model over that chain and predict a structure. So this modeling is done by satisfaction of different spatial restraints such as bond lengths, bond angles, diagonal angles and non-bond length at a modern time. So I get around many models. So at first these look almost the same but if you see closely there are certain changes in the loops right here. So these can be viewed as flexibility in this particular protein. So this would be more clear if I play all of these frame by frame. So on the left on the protein that's playing on the left you can clearly see that on the left it's less flexible and on the right it's more flexible. So we get this by using different RMSD ranges. So it's clear if we use a smaller RMSD range we get less flexibility but if we use a larger RMSD range we get more flexibility in the same protein. So after that we take out the flexible regions. So this is and once we have the flexible regions from the models we compare it with regions found by using unbound and bound proteins. So every protein that we have in our list has a bound structure as well as an unbound structure. So this can be the bound structure and this bound is a ligand and this is the unbound structure. So you can see the green part would be considered as flexibility. So on the basis and then this is say our structure and the green parts are the flexible. So we compare these green parts with the green parts in the bound and unbound. And on the basis of how good the predictions are we would set the RMSD range for selecting the protein. So the comparisons are done somewhat like this where say this is the observed flexibility and this is the predicted one. So any overlap between the flexible areas would be awarded and any missing residues is this part. This part is missed right here and any extra residues that have been predicted would get a negative score. And the penalty curves are somewhat like this. The penalty increases as the number of residues which are predicted extra or are missed in these. And there is a higher penalty for predicting an area with no flexible region. This is because if a particular area is missed this is because we want to predict almost all of it and we would like to rather predict more than lose out all the flexible parts. So once all of this is done this is a normal plot where I plug in the RMSD range right here and this is just a number and first, second, third here is the first best prediction, the second best prediction and the third best prediction. So at first sight we can see for rigid proteins 0 to 1 range is kind of the best and 0 to 2 is the second best where these have predicted the best structures, best flexibility. For flexible proteins it's a bit more loose because the RMSD range can go up to 0 to 4 and that is kind of intuitive. If we increase the RMSD range the protein would be more flexible. So what we did was we normalize the scores and I profit all of the average score a particular RMSD value would give against the RMSD range. So the scores here are relative to each other so the absolute values make a little sense. So we can see that a very high RMSD range such as 5 to 6 would predict that the whole protein is flexible and very loose. So if we take all the proteins in total we see a lower RMSD range may say 0 to 1 to 0 to 3 would be the best way to go about in predicted flexibility. And if we segregate rigid and flexible out again we have a similar kind of result where 0 to 1 and 0 to 2 is best for rigid and till about 0 to 3 or 2, 0 to 4 flexible proteins can be predicted. So first of all I'd like to tell you a point that when we are predicting flexibility we do not know beforehand whether particular protein is flexible or is rigid. So we wanted some kind of a range which can work for both. So ultimately as we can see a lower RMSD range kind of predicts better for both of them. So I would say that a lower RMSD range would be the best to predict flexibility. Apart from that observed flexibility is dependent on the bound and unbound structures. So the observed flexibility that we use to compare results with directly depends on the structures, the experimental structures that we had. So if there is a transitionary flexibility in the protein which has not been captured in the structures we would miss out on that. That's why again we would like to predict extra flexibility rather than miss out on something. And while doing these experiments there are a number of variable factors that are involved such as the number of PVVs used as an input to model up, the sequence alignment and the sequence identity. So it would be interesting to see how the results would change if probably smaller ranges are used the next time and if other factors are kept constant on a particular range and then change the RMSD value. So finally I would like to acknowledge Professor Julie Mitchell who thank you for ghosting me in her lab. Dr. Gary Rosenberg who actually helped me out with all of my script and everything. Amanda Boyard who's my guide and all my fellow lab mates who are just bringing me out. The Korana program and the funding department of biology India University of Wisconsin and IUUS this year. Do you have any questions? I hate to be the one asking questions. Sorry, it's really exciting these days to use unstructured regions and start predicting function for that. Some people have an un-appreciative one where they argue of unstructured best. Now it's very apparent that this is the sort of thing that people are trying to try with drugs and things like that. For example, they buy partners, choose between partners, etc. Can you begin to predict something like that? This is a region that is flexible. Not only does it allow for students but might also be scaffold for these types of attractions or these types of key regions of metals perhaps something like that. Definitely. This can be used in that area as well. Because when we talk about chelation and any other interaction, ultimately we are talking about two things interacting with each other. And this flexibility that I am working on, this would ultimately be used on molecular docking experiments where we have two molecules which are interacting with each other. Do you need prior knowledge or could you now go back and use things like folded, folded and sort of people have been using quite effectively to predict certain destruction parts that they can't quite predict and go in and... Yeah, that's the whole point. That's why we predict things because if we do not have prior knowledge, this would be the best way to go about it. If we know something has a particular probability of predicting, say it has a probability of 0.8 of predicting a good structure, then why not use it?