 everyone also if you're watching this on YouTube later or if you're on Twitch here welcome welcome thanks for viewing phylogenic trees is going to be the next topic so phylogenic tree of life we all know that life on earth started from a single origin and then it kind of diverged into different clades which we now call bacteria, archaea and eukaryota so multi-cellular organisms the bacteria and the archaea which are the extremophiles so the animals living near like fountains or these these warm water sources on the bottom of the oceans so this is based on ribonosomal RNA data and when you make a tree like this phylogenetic tree based on our RNA then it emphasizes the separation of bacteria, archaea and eukaryota because ribosomal RNA is very very stable so it doesn't mutate very much and it stays very similar right because ribosomes are very essential parts so you can look back a long long way back in history if you would do the same thing for proteins you're not going to look back that much so if you if you look at other proteins which have recently evolved then you don't see this big split in bacteria or eukaryota so the idea of a phylogenetic tree if you look at a phylogenetic tree made of proteins then the branching diagram shows the inferred evolutionary relationship or the inferred evolutionary distance between or amongst different proteins of course a phylogenetic tree is based upon similarity in differences in sequence or in structure so you can make a because the amino acid sequence of a protein can change but the structure of this of the of the protein can remain the same right because you can change a long nonpolar fatty acid side chain by another long nonpolar fatty acid side chain and the protein doesn't really change so the sequence has changed but the structure did not change and when you look at a tree and so if you see tuxa which are joined together like here right so you see that Argea are joined at this point with eukaryota that means that at some point there was a common ancestor for both in the group right so here we can also see that for example if we look at animals and that there is a common ancestor between animals and plants and fungi but for example there's also a common ancestor between metanosargina and hollow files and which is more or less located here right so that's the idea of a phylogenetic tree so the idea behind the phylogenetic tree is that if you have two tuxa coming together so two of these three branches coming together then it implies that there is a common ancestor to these species alright so when we talk about these trees we have to talk about orthologs and homologies so the homologous sequence so a sequence which is deemed homologous like when you do homologous recognition is so an homologous sequence is called orthogolus if they are inferred to be descended from the same ancestral sequence separated by a speciation event right so if we have a duplication event within a single species then these two sequences are not orthologous but they are homologous so then there is a homology because they are they have the same kind of sequence but an orthologous sequence or orthology is strictly defined in terms of ancestry right and orthologs often but not always have the same function right so we have for example mice and human and both of these carry myostatin as a protein so the myostatin from mouse is an orthologous protein of a myostatin in human while within human you have myostatin and or not myostatin if you look at myoglobin and some other forms of myoglobin because hem myoglobin is the stuff that binds oxygen and blood but you have different proteins which also do like oxygen binding within a cell these are homologous proteins but they are not orthogolous because they occur in in a single species right so remember that when we talk about orthology we talk about speciation events so when when a single species splits into two species then the same protein which used to be in one species is now in two species so now these two proteins are called orthologous to make it even more complex we can say that two species so homologous sequences right so sequences which are highly homologous so which are very similar to each other they are called paralogue if they are created by a duplication event within the genome and then we define two types of paralogs even we define in paralogue so those are paralogs which arose after a speciation event and then we have out paralogs which are paralogue pairs that arose before a speciation event so and it's just a little bit of terminology because I when you build these trees then often people just assume that this tree is the relationship that it used to be but it or that that it really represents the relationship between species but it doesn't have to but by defining something as an in paralogue or an out paralogue hey you kind of annotate these trees with saying well no here we have the speciation event and a certain gene occurred or arose before or after it and this has to do so have paralogue a sequence our sequences that are created by a gene duplication event so the part of the DNA is just duplicated and then have we say that well when one did when this happened before then we are talking about out paralogs and when it happened after a speciation event occurred then we call it in paralogue so just as a picture to show you guys right so for example here we have the early global gene right and the early global gene at a certain point got duplicated into the genome so we have the early global gene which got duplicated and one of them became the alpha chain and the other one became the beta chain so now when we talk all of these sequence are homologs of each other because the alpha chain is very similar to the beta chain in globens it just has some minor amino acid changes but have when we talk about the alpha chain then of course the alpha chain is found in frogs and chickens and our frogs and chickens and mouse right and and the alpha chain existed because of the gene duplication event being here so an alpha chain an alpha gene an alpha chain gene is a homolog of a beta chain gene but the frog alpha chain is an orthologous gene to the chicken alpha and to the mouse alpha the same thing holds for the mouse beta the chicken beta and the frog beta but the mouse alpha and the mouse beta are called paralogs right because they are the same and so a paralogue is it's they are created by a duplication event right but because the duplication event was here before the speciation event we call these paralogs and since this duplication event was before then we actually call this an out paralogue so here we have in a single picture we have the difference between what a homolog is an ortholog and a paralogue right so a paralogue is the same homologous sequence in mouse an ortholog is the same homologous sequence in in two different species and to have the in and the out paralogs and the orthologs a little bit more clear let's say that we have an ancestral gene a right then the gene got duplicated so we now have a a bar and we have a bar bar right and now after this we have a lineage divergence right so the gene duplication event was here and then we have the divergent between different species for example species X I a one gene species why a a one gene then these two are called orthologous and these two are also called orthologous an in paralogue is now when we compare why a bar to why a bar bar right because this now the duplication event is before the lineage divergence and we call it an in paralogue when we look we call it an out paralogue when we look now I'm messing it up again so so tricky I hate that people call it like this and make like terminology so difficult alright so we start off an ancestral gene a gene a is duplicated into a bar and a bar bar a bar is then split by two different species right so if a speciation event after the gene duplication event so now we call a bar within the same species orthologous we when we compare a bar with a bar bar within the same species it's an in paralogue but when we compare it with with a different species then we call it an out paralogue I hope that it's clear I don't really like many of these definitions in a way but they are related to how we in biology or people in biology talk about proteins and homologous sequence and in paralogs being orthologous to each other so I hope it's a little bit clear I have two pictures so just look at the pictures and see and otherwise just ask a question about it next time then I will read up on it again and it's just confusing to make it even more confusing we also have something called a xenologue and xenologs are really interesting right because homologs resulting from horizontal gene transfer between two organisms are called xenologs and this occurs a lot in bacteria right so imagine that we have bacteria a for example E. coli right which has a certain antibiotic resistance gene right so what this E. coli can actually do is it can actually transfer it's or give its gene to another bacterial species right so here we have the the same gene now occurring in both species but E. coli gave it to another one either using horizontal gene transfer or our conjugation or some other method right generally like bacteria talk to each other through these little tubes and they exchange DNA with each other and so when this occurs we call this xenologue so we now see the exact same gene in two species so from a phylogenetic tree standpoint this would make the two species related to each other but they are not related to each other at all it's just that E. coli gave one of its genes to Sleptococcus right so this this in a phylogenetic tree had we would now see that oh okay so there has to be a common ancestor between Sleptococcus and E. coli but there actually isn't it is just because of a xenologue so a transfer of DNA from one species to the other and this happens more often than you think this also happens to more complex organisms not just bacteria DNA is actually something which is exchanged quite frequently by bacteria amongst each other but also by multi-cellular species so also multi-cellular species can exchange DNA making them look more similar to each other right so then you infer that oh there might be a common ancestor like one million years ago while actually that is not the case because the real common ancestor is like hundreds of millions of years ago and this is because of of xenologs so when we talk about horizontal gene transfer right from a bacteria to another bacteria then this can happen in different ways so the most common way is transformation this means that there is a bacteria which dies and part of the DNA from the bacteria like a bacterial plasmid like a little circle of DNA remains after the bacteria exploded right so we can have an E. coli bacteria it has a certain antibiotic resistance and at a certain point the bacteria kind of explodes or dies and it releases all of its DNA into the surrounding and then just some some streptococcus happens to be in the neighborhood and it just says oh nice little piece of DNA it serbs it in and just starts transcribing the DNA and now has the antibiotic resistance gene just because of the fact that it happened to encounter it in the environment another way which is more common is bacterial conjugation and this is when two bacteria make a little protein tube and start exchanging DNA with each other which is like strange because you would think that two bacteria of different species would not talk to each other or would not kind of be friends with each other but this is something that happens a lot right so conjugation is when a little protein tube is made and bacteria just start exchanging plasmids so these little circles of DNA which encode certain genes with each other so happens a lot furthermore we have transduction and transduction is something which happens by bacterial phages so a bacterial phage will be born in a bacteria right and this bacterial phage will then transfer genetic material from one bacteria to another bacteria so using the intermediate then it's called transduction of course there's another very common horizontal gene transfer method and that is just of course genetic engineering right in the lab we often give bacteria new properties like antibiotic resistance or other properties and we of course do this by just using plasmids and then using electroporesis to put the plasmids into the bacteria or we use like these little lipofactamine bubbles so we put the gene into a lipofactamine bubble and then merge this bubble with the bacterial cell but there are three natural ways of horizontal gene transfer transformation bacterial conjugation and transduction as well as genetic engineering which is a non-natural way of giving bacteria new properties and all of these methods will make two species look as if there is a common ancestor more early in the tree than that there actually is right because the DNA now is more similar because both of them have the exact same piece of DNA but this is not due to having a common ancestor but this is due to one of these horizontal gene transfer methods so all of these methods cause xenology right so a homologue from a horizontal gene transfer between two organisms all right so if we want to look at some of these so if we want to look at proteins and analyzing proteins and classifying them into families and predicting domains then one of the things that you can use is from the european bioinformatics institute interpro and i didn't really prepare an example but let's just look at the interpro website so you guys know that it exists and what you can actually do with it so let me show you firefox get rid of all of the weird pop-ups so here we have the interpro database right so if we go to search we can just search by sequence by text or by domain architecture we can also just browse right so we can say well just browse a certain protein which protein did i want to do let's do an example all right um we already had the insulin receptor right so let's just reuse the insulin receptor and then throw it into the interpro database right so we just give the insulin receptor and then we just say search and this should be relatively quick let me also see if the cctop prediction no cctop prediction did not finish yet so take some time bioinformatics is a like field where you just have to wait for things to do because you're always like in this case you're relying on someone else's computer doing the analysis and and the computer might be busy um so how are you might be in a queue just waiting until you can get serviced i should have actually probably just browse for human hemoglobin as well as we oh insulin receptor that is not going very well either like never do a live demo right that's that's how the saying goes that if you do a live demo then all of a sudden everything starts not not working at all why is this not doing anything yeah i agree with this stupid cookie always when ah nice at least this one finished right so we can look at the results right so here we see that we what is mismatch version i didn't choose any version anyway so here we can see that the insulin receptor from humans the one that we just uploaded is a protein right so and this protein has 1382 amino acids right and it is classified in a certain family because it is called tir kinase insulin like receptor right it is also part of the family insulin receptors right so um if we look at amino acid 5 to 138 and then we see that this whole thing is part of a family which is logical because insulin receptors are very common right but then if we look at the protein itself then we see that the protein consists more or less of three different parts so three different domains right so one of the domains here is a receptor l domain which is then split or found multiple times so you can see that here on the domain from amino acid 52 to 470 there is a receptor l domain if you want to learn more about it you can just click on it see the description and stuff here we see that there's an fn3 domain and here we see that there's a protein kinase domain right so protein kinases are there to cut proteins so we can we can see that this one is more or less split into three parts we have also a chemokin receptor and so we can learn more or less how this protein is structured and organized and we can also see that there are several repeats right like a furion repeat into the thing we can see that there's a conserved site here and which is a very so a conserved site is something which is which is the same in many many different species and this is again part of the therazine kinase 2 receptor we see that the active site has been determined to be here so this is where the insulin binds to the receptor that happens somewhere between 1155 and amino acid 1176 we see that there's a binding site so oh no so this is the active site so this is the site that actually transmits the signal after binding and here we see the binding site so this is where the insulin binds the receptor then we have here an integrated part so these are all predictions and here we see that there's more predictions for different signal peptides and transmembrane parts right so here there's a transmembrane area where so part of the insulin receptor is outside of of the cell and part of it is inside in the cytosol and here we have a transmembrane region so this is the region of the protein which goes through furthermore it gives us which biological process which molecular function and which cellular component it it has been classified into and of course we can we can click on all of these things and learn more about them let me actually yeah so we can have we can zoom into the region of the protein and then we can continue with searching with Interpro or with the HMMMR to do a prediction of the secondary structure of the protein um I actually wanted you guys to see is when we look at for example the binding site I had that's not what I wanted I wanted to just get the overview of the oh no that doesn't have that because I just searched with the new protein that is annoying normally when you take a protein which is known and you just go to the database then it will also show you which mutations are known within this protein but it's an it's an interesting website to see okay so I had this is a known domain in my protein and based on the protein kinase domain this protein has a certain function based on the domain here it also we can assume that this function is shared with other proteins as well as the furin like cis rich domain and of course we can click on it and go through if we want to learn more about the domains we can actually click here right so this is a receptor l domain the structure of the first three domains blah blah blah and head so it gives you overview references where you want to look further anyway interpro it's a good website if you want to learn more about protein domains or if you want to learn more about protein structure so did it actually finish the rows probably not no it's still running in the background all right so if you want to learn more about proteins so if you just want to read something then go to this big picture learning it has a special issue on proteins and it's a relatively good link to find more about proteins it's it's quite good what I like more is the second link which are the paper models so the paper models come from pdb the database which is there for proteins so that's the old protein database which was actually founded in the 1970s and if you and because 3d structures everything one of the nice things is is that for example the trna paper model so you can just download the pdf you print it and then you can you can fold it yourself so you can see which parts of the protein fold back on themselves and they actually have a nice youtube video on how to do this and you it's based on this pdb structure so pdb they have this really nice funny thing where you can fold your own proteins so they have a whole bunch of them so you can fold an antibody or you can do some dna or you can for example fold the human papillomavirus or the g protein coupled receptor so really interesting website pdb itself is actually a very very good website for so the pdb website the main website not the learning website actually it allows you to see protein structures so where are the alpha helices so again you can do your own prediction or you can just say well i want to know more about insulin receptor right so if you look at the insulin receptor then they have here different different crystal structures which are made of the insulin receptor and then here you can see indeed that this is crystallography of an active loop mutant of the insulin receptor and so you can click on it and then again it gives you an overview of the so it gives you the the structure which you can actually view in 3d as well and then it gives you literature not just that but if you go down you can see for each of the parts of the protein if there are no mutations the hydropathy so high if the part of the protein is actually liking water or not liking water so the more at the more high the more hydrophilic it is the more likely it is to be on the outside the higher the hydropathy the the more likely it is to be on the inside of the molecule and they have the pay from domain right so this is the the pay from domain which is again related to the interpro domain so and then here you can see downstairs downstairs you can see all the way down and that there are for example experimental data where you can see how well the structure was determined so it's a really interesting website to kind of look through and hey you can also see that in this case there was a magnesium 2 plus iron in there so it's just if you want to know more about proteins in general then had pdb is the place to go and had just searched for your protein and there's all kinds of links to all kinds of different websites if you want to learn more but it's a really interesting website and of course they have the proteins that you can fold yourself right and that's the nice thing because then you can just do it at home and for example i would definitely fold the tRNA so that's one that i did myself and that's actually really interesting because if you look at the pdf and then here you have the the template if you want to do it in black and white if you want to do it in color and the only thing that you do is you just have to cut these out and then you glue them together on the right position and then in the end you have a really nice tRNA with the acceptor loop and the anti-codon loop and here you have the thing done all right so i think that's it for today let me switch back to the powerpoint so yeah that's it for today so i told you a little bit about the history of proteins not so much about the history of proteins but about the history of like methods that we use for protein determination i talked to you about protein structure so what is the primary structure have primary structure is done by atomic bonding what is the secondary structure secondary structure we take into account hydrogen bonds the tertiary structure we take into account ionic bonding and we take into account hydrophobicity or the hydrophilicity of the protein so have which parts like to clump together because they are all hydrophobic and which parts are clumping or are on the outside because they like being in water i told you about purification and identification so head that there are four different protein purification techniques and that you can identify proteins using mass spec and nuclear nmr and other methodologies i told you about functional prediction so have i told you about that proteins are consisting of protein domains and that based on the protein domains you can actually figure out what a protein is doing and i told you about that there are 60 000 different protein families so proteins that are very similar are part of the same family and again that helps you understand if you are looking into a protein what your protein might be doing i told you a little bit about phylogenetic trees phylogenetic trees will come back also how to make them yourself for example using r we did one in the beginning using haklis but there are many many different methods to make phylogenetic trees i told you a little bit about homology and all of the different and confusing terms that are there like an ortholog a powder lock an exanolock so for the exam just a little tip know that i'm not going to ask in detail about in paralogs and out paralogs and stuff i just want you to know what that an ortholog is different from a powder lock why it is different and what why a xenolock is different from a powder lock and of course remember that xenologs are important because they make it seem that the shared ancestor is actually closer than that it actually is and this happens a lot in bacteria where bacteria exchange DNA with one another all right so that's it for today are there any questions remarks suggestions and other things if not then here's the beautiful guinea pig i drew today are there already dates for the exam yes and no yes i submitted dates to the exam no i'm still waiting for an answer for the prüfunksbüro so um yes i submitted dates but no they are not confirmed yet so once they're confirmed i will let you guys of course know when the example be and there's still a little bit of struggle with the prüfunksbüro because they want me to do the exam in person and um verbal so oral exams um but i just want you guys to do a written exam and not that i don't want to see you guys that's not the issue it's just that i don't think that uh it makes much sense to do an oral exam for something like bioinformatics but as soon as there are then i will send around an email um and then you will know exactly when the exam is good then at least i will stop the recording for um youtube yeah no problem genie so people on youtube see you on the next lecture which is going to be lecture six lecture six is going to be about let me see mathematics so lecture six is going to be about um either metabolomics and pathways or about programming in r um that depends a little bit on you guys and also on you guys on on youtube so um probably we will have a little vote um so i will probably do a vote on moodle for you guys to determine which one of the two lectures you like to see more but the next logical lecture because we did dna we did RNA we did proteins um then the next logical lecture would be to do metabolites and pathways and information about those good so people on youtube um see you on the flip side