 Dear students, before we get into the details of homology modeling, let us familiarize ourselves with the basic few terms and definitions. So in this module, we're going to talk about homology, biology and orthology. So to begin with the background to this topic that we have the proteins and their primary sequences. So the primary sequences or the amino acid sequences fold to create secondary and then the secondary structures come together to create the tertiary structures. So essentially the tertiary structures are a product of the primary structures. So given that you have a amino acid sequence for proteins, the idea is to be able to predict their structure, their three dimensional structure by looking at other proteins whose sequence and structure are known. So by comparing such sequences, you can talk about their structure as well. So let's start with homology, paralogy and orthology. So as you can see here that if a specific gene during the process of evolution gets duplicated and diversifies into different proteins over time, such as the example here alpha chain gene and beta chain gene, which are essentially coming from the early globin gene. So early globin gene is the ancestor for both of these diversified genes or proteins. So this beta chain gene can be found in frog, in chick and in mouse. Similarly the alpha chain gene can be found in mouse, chick and frog as well. Now if you look carefully the same alpha and beta gene has now been found in mouse. So the same species that is the mouse has both alpha and beta gene. So when both of these genes will be translated into proteins, so two proteins will be there which are essentially coming from the same ancestral gene. So this is called two paralogs, so two genes which are present in the same species but are different from the same ancestor. Now if you look on the other side, so you would have three different species that is mouse, chick and frog all containing beta chain genes. So during the duplication event the gene has now been adopted by three different species. Same gene present in three different species. So this is called orthologs. So once again paralogs are those genes that are coding different proteins but are present in the same species. Orthologs are those genes which are present in different species and are exactly the same. So in conclusion two different genes in the same species are paralogs and two similar genes in two different species are orthologs. So homology is essentially a combination of orthologous genes and paralogous genes or orthologous proteins and paralogous proteins. So when we are going to study homology modeling, we are going to try and find proteins in other species as well as proteins within the same species that are having the same or similar sequence towards predicting the structure of the unknown protein. Next how much of homology is required or is better? So here two terms, one is sequence identity and the other is alignment length. So alignment length of 100 will mean that the two amino acid sequences are 100% aligned and the score will obviously be very high and 100% identity will mean that both the amino acid sequences have the exact same composition in terms of amino acids. So as you can see here, this curve is going to define which homology strategy is going to be useful. So if there is a high alignment length as well as a high sequence identity, which is the case when two proteins are very similar, so you will be in this region. But if the alignment length is small and the sequence identity is also small, you may end up in this region. So this region here is called the twilight zone. So if your alignment length and your sequence identity are somewhere here, then you are not going to be able to use homology modeling to predict the structure of the proteins. So essentially before you proceed with homology modeling, you look at the alignment length and the sequence identity and if they are high enough, then you go for homology modeling. But if they are low, then we will have to employ some of the strategy such as ab initio modeling or full recognition, which is also called threading. So in conclusion, before we proceed with homology modeling, we need to have good alignment length and good sequence identity. So if on both of these fronts, if we have a high score, then we can proceed with homology modeling. Otherwise, there are some other techniques that we need to employ towards predicting the protein structure. Next, what will be the flow for homology modeling given that we have a high identity and alignment? This we will consider in the next module.