 So hello, welcome back, everyone on Moodle. Unfortunately, if you're watching this on Moodle, you're missing the apparently best part of the stream. That's the animated gif. So let's continue. We still have like 20 slides to do, and I think there's still a little demo that's there. So yeah, I just started recording again. So we're recording. All right, so when you want to detect proteins, right, because after you've separated proteins, you only know what their isoelectric point is if you do it by 2D gel, and you know the size of them. But then you have to still figure out which protein it is. So detecting proteins in a cell or a tissue can be done by exploiting things like antibodies nowadays. So you can have like a purified protein, and then inject this into rabbits, and then the rabbits will produce antibodies for it. And then after you have your antibodies, you can do two things. And that is that you can make a peroxidase towards your antibodies. So that means that the antibody itself is connected to a peroxidase, and then this produces a color reaction. And then you can use this either in a slice. So this is, for example, a slice of tissue. And then here you see the little black dots. And these black dots are made by an antibody and a peroxidase. Nowadays, what you normally do is you make your antibody, and then you add a fluorophore to it, for example, a red or a green or a yellow one. And then you can do immunofluorescence. And then instead of having only a single color, like with a peroxidase, you can have like three or four different colors. So you can also see if proteins are combined together at a certain point in the cell. And you can do this just using light microscopy, or you can just use better microscope to do it inside of a cell even. All right, so another way of knowing what a protein is, is of course x-ray crystallography. And this is a technique for determining the atomic and molecular structure of a crystal. So to do this, you have your crystal. Then you mount your crystal on a gionometer. You illuminate the crystal by x-rays. So you just shoot x-rays through the crystal, and then you capture the diffraction pattern. And then this gionometer changes one degree. And then the process repeats. And then it changes one degree again, and then it repeats. So you take your crystal, and you slowly rotate the crystal around. And at every point, you do an x-ray beam through your crystal. And then you get a head disk produces then a Fourier-transformed electron density map. Because electrons are the things that make x-rays bend. So you get a picture. And this picture, you can then transform back into how your protein looks like. So where are the electrons? And if you know where are the electrons, then you can then take the protein chain and then fit that through. Which is, of course, all done more or less in bioinformatics. Of course, the first part not. So crystallizing your protein is the hard part. Getting your protein to crystallize, especially when you're dealing with proteins which are in the cell wall, is very hard because they are very hydrophobic. But if you have proteins which are inside of a cell, then you can crystallize them. And there's a whole protocol that you can follow. And then this part is just done, well, that's kind of standard. You put the crystal on there. One of the issues, of course, here is that since x-rays have a relatively high amount of energy, you often melt your protein in the process. So you have to not make one crystal, but you have to make many different crystals. And then every time you have to mount them. And then Fourier-transformed, which is a mathematical way of going from Fourier space to normal space, which I don't really want to explain right now. But then you can use Fourier transformation. And then you get an electron density map. And since you did that for every position, because you rotated the crystal around, you get a 3D map. So you get a map from the front, from the sides, and have from every angle. And then you can fill in where the protein is and where the different side chains are. Another way to figure out what kind of a protein you're dealing with is to use nuclear magnetic resonance. So this is also something that I don't want to explain in detail, but it's the physical phenomena in which the nuclei in a magnetic field absorbs and re-emits an electromagnetic radiation. So what you do is you just put your protein into a solution. And then the solution can then be put into a very strong magnetic field. And then, hey, you hit the solution with radio waves. And then, hey, you look to see when the radio waves get back. And then based on doing that a lot of times, you can kind of figure out what kind of a protein you have. And of course, NMR is called MRI when you do it in the hospital. But when you do it for proteins, you need machines which look more or less like this compared to a normal MRI machine where you can go in because you need a very, very strong magnetic field. So the magnetic field in an NMR machine is generally 800 to 1,000 times stronger than the standard MRI machine in the hospital. Another way, which is really good at identifying how your protein looks like or what your protein is composed of is using mass spectrometry. And next week, we will talk a lot about mass spectrometry because next week, we're talking about metabolites. So that's the thing that proteins work on. So we go DNA, RNA, protein, and then metabolites. And I will explain in great detail what a mass spectrometry is. And I will also introduce you to databases which allow you to identify your mass spectrometry spectra that you get. But just in a few words, it's a technique that ionizes chemical species or chemical, well, little molecules more or less and then sorts the ion based on their mass over charge ratio. So what you have is you have something called your ion source where you make your ions. This is generally using electrospera ionization. And then this is pooled. So you have your ions, which is a little droplet that is pooled towards a big magnet where you then have a time of flight machine which then using the magnet, you bend the waves. So this is not a time of flight machine, but this is a machine which, and then these wave bend. So if the mass over charge ratio is very high, right, that means that you have more mass per positive or negative energy, then the turn that it makes is less gradual because of course, heavier objects take more power from the magnet to get bent while smaller objects don't. And then you have something called Faraday collectors which then collect the ions, right? So if an ion hits the plate, it will give you a little signal and then in the end you will get a nice spectrogram. And then this spectrogram you can then use to figure out which molecules are making up your protein. All right, so very quick overview. I told you about four of these different identification techniques so you can use immunohistochemically which is done a lot nowadays because hem making antibodies is something that we're relatively good at. And you can use x-ray crystallography which is really good when you're dealing with proteins which you can easily make crystals off which is very hard for things that are in the cell wall but relatively easy for things which are in the cytosol. That's also one of the reasons why you're looking, when you're looking to protein databases, protein databases generally have more structures for things which are not in the cell wall. So proteins which are globular, like one of the first ones getting solved was insulin. Insulin is not in the cell wall so that's a relatively easy thing to crystallize. The same goes for hemoglobin but things like the insulin receptor had these are very difficult to crystallize so that's why the structure of insulin receptor took way longer than for insulin itself. You can use NMR. NMR has the advantage that it's more as a dynamic method so you can see proteins move in a way so you can track like how the protein moves and you can use mass spectrometry. Mass spectrometry is a really good way to figure out what are the individual constituents of your protein but it's really bad at puzzling the protein together, right? But it's just different from crystallography which gives you a 3D map. Mass spectrometry just gives you a 2D overview of what is in your protein but these four techniques are used a lot when you do protein analysis. All right, so then when we talk about proteins then proteins are always classified by their function. So these are the official definitions for proteins so proteins fall into one of these seven categories. So a protein can be a structural protein for example collagen or keratin and at the example or the function of these proteins is to strengthen tendons to make skin hair and nails. Then there's something there's a class of protein which is the enzyme proteins. These are things which catalyze a certain chemical reaction. An example here is DNA polymerase. This catalyzes the reaction of DNA to DNA. So it's the duplication of DNA but of course there are many, many different enzymes. Hey, you also have an enzyme which breaks down insulin. You have an enzyme which produces ATP so enzyme is a very broad category most of the proteins have some kind of enzymatic activity. We have transport proteins. Transport proteins just bind something, bring it somewhere else and then release it like hemoglobin which transports oxygen to the cell. We have contractile proteins which are mostly found in muscles. For example, octene and myocene and these cause contraction of the muscles. So these are generally like very small molecular motors which cause things to be able to move. We have protective proteins. Antibodies fall under protective proteins but not just the antibodies. Also the complement complex falls under protective proteins. So the complement things are things of the innate immune system while antibodies are based on the adaptive immune system. Then we have things like hormones. So hormones are things like insulin or leptin or these kinds of protein. And their function is to regulate metabolism not in a short term manner but more in a long term manner. So we're talking about a half an hour to an hour. And then this for a cell is a long term process and these things are regulated by hormones. And we also have something which is called toxins and toxins are more or less the opposite of the protective proteins and these are things like snake venom. So these are toxic to humans. Toxins are of course a little bit of a difficult category because toxins might not be toxic which sometimes happens. Because it's just, yeah, it's what's good for someone is bad for someone else. So something which is an antibody. You can see an antibody as a toxin for bacteria. So there's a little bit of an overlap between those two. All right, so the function of the protein of course comes from the different domains of the protein. So a protein domain is defined as a conserved part of a given protein sequence and it's their cherry structure that can evolve and function and exist independently of the rest of the protein chains. So if you take like a hundred proteins and you would do multiple sequence alignment on them then you find that if these hundred proteins have a part which is similar then this gets annotated as being a domain of that protein. So domains are things which do one thing and do it very well, right? Like DNA binding. A protein can have a DNA binding domain and then have a signal domain and then it binds DNA and then it signals another protein. But it could also have a DNA binding domain and then a replication domain to for example copy the DNA. But of course it needs this binding domain and a lot of proteins have a DNA binding domain and these DNA binding domains are very similar. They're generally always alpha helices which fit into the major groove. Then you have a complementary protein so an alpha and a beta part of the protein one which binds the major groove one which binds the minor groove. So these things are called protein domains and knowing which domains are in a protein can help us understand the function of a protein. If you for example figure out that the protein you are studying has a heme group then you know that this protein is somehow involved in oxygen, right? Because heme binds oxygen because of the iron molecule and the structure. So if you know that your protein has a heme domain then this will almost always lead to you hypothesizing that your protein has something to do with the oxygen metabolism in the body. All right so besides protein domains we also define something which are called protein families and a protein family is a group of evolutionary related proteins and is often nearly synonymous with a gene family because proteins are coded by genes then if you find that something is a gene family then the genes of this gene family code for proteins of the same protein family which does not always have to be true because protein families can split as well. So things can still be related on DNA level while on protein level they have very separate function. But as long as things are evolutionary related then these proteins are more or less part of the same protein family. So in total at the moment there are like 60,000 protein families which have been defined and you can think about things like myostatin and you have like so far like the heme carrying proteins. There are a whole bunch of them some they function in muscles, some function in blood, some others function when you are very young or in the embryonic stage. But there's different proteins that bind oxygen which are more or less active at different parts of your life. So how do these new protein families come about? Well that can become that comes through either speciation. So when a new or when a species exists and it splits into two different species or it can arise due to gene duplication. If a gene in a genome gets duplicated and then you now all of a sudden have two genes which code for the same protein. But of course due to mutation and recombination this new duplicated gene in the course of evolution will get a new function. And so these proteins are still evolutionary related but they are having different functions. And often knowing which protein family your protein belongs to is very important to know which kind of major part your protein will play in. So things of the same family have more or less the same function. And so if you're in a family of DNA binding proteins then of course all of these proteins in this family will bind DNA. All right, so there's a lot of tools to look at proteins. So to look at protein families and also to look at protein domains. So there's something called PFUM which is the protein family database and it has a whole bunch of alignments and hidden Markov models to kind of predict if an unknown protein belongs to one of the known families. You have ProSight which is the main database of protein domains, families and functional sites. You have PRFS which is the super family classification system because families are not the end all to everything. And because having 60,000 families means that all of these families also have relationships amongst them. So that's why you have things called super families. And then you have the PAS2 which is an algorithm to do protein alignment of structural super families. And then you have super family which is a library of hidden Markov model to kind of determine if a protein or if a protein that you're looking at belongs to a certain family or to a certain super family. And then there's also the different classification algorithms. So all of these you can look up. I just want you guys to be aware that PFUM exists. So PFUM is the database for protein families and ProSight is the database of protein domains. Of course they also have the families but their main focus is the different domains. So if I have a protein, what does amino acid 12 to 50 code for and what does amino acid 70 to 80 code for? While PFUM just classifies your protein into a family of protein. So it says, well, it's a global or it's a contractile protein, right? So those are more or less families. All right, so when we talk about proteins we also have to talk a little bit of phylogenetic trees. So this is a really nice picture of Mr. Mrs. Garrison explaining level three phylogenetic network. Of course this is not really how the tree of life looks like. The tree of life looks more or less like this. So this is a phylogenetic tree based on RNA data with the emphasis of the separation of bacteria, archaea and eukaryotes. And this was proposed in the early 1990s. So before that the tree of life actually consisted only of two groups of eukaryota and of bacteria. But nowadays we also have a group called archaea. So the archaea are more or less the extremophiles. So these are the little creatures that are not bacteria. They are not multicellular organisms really but they kind of, they have like things like metanococcus. So this is kind of, it looks like a bacteria but it has some features from the eukaryota. And hey, if you then look at the tree of life then every life started at some point. We don't know exactly what the common ancestor is. But at a certain point this split into bacteria and into archaea and then the archaea branch more or less split again into the eukaryota. So these bacteria or archaea they, well they're not really bacteria, they are between bacteria and eukaryotes. And so one of the nice thing is that you can, that these animals, because they're all anomalia, then this part of the animal tree, they have properties which occur in bacteria but some of them also have properties which occur only in eukaryota. So it's very difficult to decide where exactly in the tree you fall but this is based on our RNA data. So as we learned from the RNA lecture, this is based on ribosomal RNA because of course ribosomal RNA changes very, very slowly because the ribosomes are under immense selective pressure. Hey, you can't just have my mutations in your ribosomes because if you change a single amino acid, the entire ribosome might not work anymore. So using ribosomal RNA you can look back more or less until the beginning of life as we know it, like bacteria and eukaryotes. If you would do these trees based on different or on other types of RNA or different types of DNA, then of course this tree would look a little bit different but based on our RNA you can look back billions of years almost in time. Hey, if you would look at mitochondrial DNA, then that will allow you to look back in the time span of like 100,000 to 200,000 years and if you would look at individual proteins or other types of DNA or RNA, then that generally gives you an overview in the range of like 10,000 years. But are RNA because ribosomes are so important for duplication of cells and making proteins, they change very, very slowly in the course of evolution. So you can see here that plants are all the way here and then you have animals that come here. So animal plants are split and then you have fungi which splits between animal and plants but there's like a whole bunch of animals that came before there, like the flagellatus or the enthromeuba and that's still like a billion years away from when the bacteria and the archaea split from each other. So phylogenetic trees are very important to understand how life is related to each other and it's a really good way of visualizing how things are related to each other on large time scales or on small time scales. All right, so how to read a phylogenetic tree? Well, it's a branching diagram which shows the inferred evolutionary relationships among proteins or among RNA or among DNA but you always have to remember that this is inferred. So this is based on, for example, sequence distances but there can be, so it's based on similarities and differences in sequence and structure. So tuxas are these branches, right? So this is, for example, a tuxa, the tuxa of animal plants in Fugota. What if all humans are like dole? They think they have freedom and choice but in reality we're programmed clowns. We are programmed to do what we are doing. Well, in a way we are. It's a very, very philosophical way of looking at it. I would say that that's a difficult one. Do I believe that free choice exists? I think that that's way, way too far when we're talking about just proteins and biology but in a way like as a bioinformatician you do view life as being a machine. So hey, there's inputs, there's signal transduction and there's output. So it's a very, very interesting way of looking at it but in that sense I believe more in holism, right? Because you have as a cell you are just input, signal transduction and output but if you put several cells together then things become more interesting and you can have like behaviors which are not obvious based on the DNA or based on the structure of how things are built. So collections of things can be much more interesting than the individual parts but as a bioinformatician you generally look at how like little parts of this big machine work and of course like has something like awareness is a product of all of these underlying mechanisms building up to something. But it's very philosophical. So and then I'm not a philosopher, I'm a bioinformatician so I just study biology, chemistry and use computers, informatics to kind of tie these things together. But thank you for your remark, Puyah M9. I haven't seen you before in the stream so welcome. So Taksas joined together. We imply that there's a common ancestor, right? So here we see that this tree like animals, fungi and plant all have like a single common origin. So we assume that at a certain point in time there was some animal living which then split into three different kind of branches of life. And of course we have no idea what this branch was but the closer you get here the more we know how things are related to each other. But the key word here is that it is inferred and this inference is based on similarities and differences. And that is tricky because sequences or similarities and differences can occur quite rapidly. And I want to talk about that a little bit. So the first term that I want to introduce is the term ortholog. And now things start becoming complex and I don't really agree with how this is defined but the thing is is that I just have to learn you how it is defined. So an ortholog is defined as a homologous sequence. So homologous sequence means that there's a sequence which is having similarity to another sequence. If they are inferred to be descended from the same ancestral sequence separated by a speciation event, right? So if you look at, for example, hemoglobin in humans and hemoglobin in mice, both have hemoglobin and these two hemoglobins are orthologous to each other, right? Because both humans and mice are joined in this tuxetry by a common ancestor. So then we call hemoglobin in mice orthologous to hemoglobin in humans. So orthology is strictly defined in terms of ancestry. And orthologs often but not always have the same function. Like hemoglobin in mice does the same thing as hemoglobin in human, but it doesn't have to. So that's one of these things that kind of is difficult. When we talk about a paralog then this is a homologous sequence which is created by a duplication event in the genome. So we know that in the genome we have things like jumping genes and so we have genes which can roll out of the DNA and then copy themselves and jump into another part of the gene. But also sometimes when a DNA strand is replicated the machinery fails and it copies the same part twice. So instead of having one part of the DNA copied it actually makes two copies in a row. So this is a duplication event. So we take a sequence of DNA and then the sequence of DNA all of a sudden is in the genome twice. So then we define two different types of paralogs. One of them is in paralogs which are paralog pairs that arose after a speciation event. So we have a speciation event. So for example, the branching of human and mice and then in mouse there is a duplication event. Then gene A and gene B are in paralogs of each other in mice. We also have out paralogs. So we can have the common ancestor of mice and humans in which a gene got duplicated and then mice and humans split, right? So then it means that mice have two copies of the gene and humans also have two copies of the gene. But then we call it an out paralog and that is just because we want to kind of have we want to have an idea when the duplication event occurred. So an in paralog is a duplication event which is relatively recent, right? Because it only occurred in human but not yet in mice. But an out paralog is a duplication event which is much older because it already occurred before humans and mice split from each other. So defining these is very important when you're building these phylogenetic trees. All right, so in a picture and I actually have two pictures because I always find it confusing myself. So all of these things we call homologs, right? So for example, here we have an early globin gene and the early globin gene got a gene duplication event making the alpha chain and the beta chain, right? So we used to have one protein chain has a one gene coding for one protein chain then there was a duplication event and afterwards mutations made the second chain different from the first one. So we call the first one the alpha chain and then the second one we call the beta chain. So now we start having, and so all of these things are homologs. So the alpha chain is a homolog to the beta chain. And now we have things which are more or less, and so now speciation events occur. So now there's a speciation event, so frogs get their own alpha chain, chickens get their own alpha chain and mice get their own alpha chain, right? So there's, this is in the common ancestor and then due to evolution now new species occur. So all of these are called orthologs, right? So the alpha chain is an ortholog, the alpha chain in mouse is an ortholog of the alpha chain in chickens is an ortholog of the alpha chain in frogs. The same thing holds for the beta, right? Because had this speciation event which was observed in the alpha chain was also observed in the beta chain. So have we called mouse beta, chicken beta and frog beta, we call all of that orthologs from each other. And now we call the mouse alpha a paralog of mouse beta. The same thing holds for the chick alpha, so chick alpha is a paralog of chick beta by definition. All right? So now the question is, is this to you guys, is this an in paralog or is this an out paralog? Is mouse alpha to mouse beta? Is this an in paralog or an out paralog event? Is the, are these in paralogs or out paralogs? Wait a moment, there's a little bit of delay in the stream and you need some thinking as well. All right, Curita says out paralog. So I want to see some more answers. Just, well, normally I would say raise your hand if you agree, but we can't do that here. So you really have to like get on the keyboard and type. All right, so isilf is also out paralog. How come you have 19 in front of your name? That's interesting. How do you get a common, I thought that there were only like people with swords and diamonds and without things. Out paralog, huh, interesting. I'm learning new Twitch things every time that I stream. Like, could you explain to me why you have the 19 in front of your name? Is that something that I did? Or like, Jan has a nice diamond because I know that I, oh, Twitch gone 2019. Oh, interesting, super interesting. All right, it's going for Twitch gone 2019, perfect. Twitch is, it has so many like strange things. All right, so we solved that question and yes, it is an out paralog. You are all right, it is an out paralog because you can hover over the badges but I can't do that in OBS actually because I'm, oh, I actually can. Twitch gone NA 2019, interesting. Super VIP, huh. Okay, but yeah, you are right. This is an out paralog. Why is this an out paralog? Because it arose so that the paralog so the duplication event was before the speciation event. All right, very good. So here we have again the same and now here I'm, this picture shows the out paralogs and the in paralogs because there's no in paralogs in this picture. So here we have ancestral gene A which then got duplicated in A accent and A accent accent. So A hoovery thingy and then A hoover hoover. And then there was a lineage divergent. So here it diverged, right? So the A one then diverged into species one and species two. And then you have the other one. It's just, it always boggles my mind but at least you got it right. So they're bound to be at least one question on the exam. I find it a little bit arbitrary. I know why people want to define stuff as in paralogs and out paralogs because there's kind of a time difference between them but I find it a little bit arbitrary, right? Because we will get to that in the next slides. And like there's things which kind of break this whole nice tree structure, this whole nice in and out paralog thing. So that's what I want to talk about. So, and that's the Xenolox. So another log and Xenolox are homology between two genes resulting from either horizontal gene transfer between two organisms are called Xenolox. So this is a very common mechanism in antibiotic resistance in bacteria. So you can imagine that for example, we have for example E. coli which on the DNA has an antibiotic resistant gene and then the bacteria at a certain point thinks like, oh, there's streptococcus pneumonia living next to me. So let's make a little tube and give them this antibiotic resistance gene. And that just happens. Like genes from one bacteria, if you co-culture them, they just all of a sudden appear in the other bacteria. And there's a couple of reasons how that happens. So there's actually four. So how can this happen? How can this gene go from being in one bacterial species to being in a completely different bacterial species? Well, one of them is transformation. Transformation is just a process in which a single black bacteria dies. So imagine that the E. coli has the antibiotic resistance, the E. coli bacteria dies and all of the stuff within the E. coli bacteria just dumps into the environment. And then the streptococcus pneumoniae comes by and it just eats up this little piece of DNA and thinks, ooh, nice, little piece of DNA. Let that integrate that into my genome. And it does. And of course now it gained the ability to make this antibiotic resistance. So one bacteria dying leaves the little piece of DNA in the environment. So this is of course more or less a random process, right? Bacteria die, other bacteria comes around, just ooh, nice DNA, slurps the DNA in. And actually every bacteria has a whole system to slurp up DNA from the environment. Then there's even inside of the bacteria, there's kind of a complex which then tests if this DNA is producing a protein which might be or might not be useful. But if it codes for a protein, then it actually gets priority of being integrated into the genome. So bacteria are pretty smart because they know that just slurping up random DNA from the environment is an evolutionary advantage. One of the other things is that it can become, it can come, it can jump from one to the other by conjugation. And conjugation is something which is not well understood why it happens, but it's a very good mechanism. But all of us, and so this E. coli bacteria, it has some nice antibiotic resistance gene on a plasmid and some streptococcus comes by and they kind of like each other. So they form this little protein tube. So they conjugate and they start exchanging genetic material. Why you would do that? Why you would think that that would be a good idea to just like exchange DNA with some random bacteria that you just met. But it happens. Bacteria are in that sense very liberal. Like they conjugate with other bacteria if they feel like it and they just transfer some of the genetic material. Like I have a couple of these plasmids and you can have like one or two and I want one or two back. So there's conjugation and they just move genetic material from the one to the other. And of course then there's transduction. So transduction happens when a bacterial phage introduces a new gene into the bacteria, right? So a bacterial phage is more or less like a virus which is specifically aimed at bacteria. So if you have a bacterial phage, it can inject its DNA into the bacteria. And sometimes the bacteria that it infects is actually not the bacteria where it is because normally when it injects its DNA, the bacteria starts producing new phages and then the bacteria bursts open and spreads the phages all around. But that doesn't always happen. Sometimes the phage genome does not get, is not able to kind of replicate within site the bacteria or not form new phages. So that means that the bacteria just gets the DNA from the bacterial phage, integrates it into its genome and then says, well, that's nice. More, more properties for me, right? Because that's the thing that it does. So hey, it just stores the DNA and things like, well, I might need like this capsule molecule later. And that's called transduction. So a bacterial phage produced in the one will kind of give bacterial DNA from one bacteria to the other because it comes with, and this doesn't really kill the bacteria because this phage is for different species. So transduction occurs also, well, not a lot, but it occurs and this is usually phage driven. So, and of course, one of the things that we do all the day in the lab is horizontal gene transfer using genetic engineering. We nowadays can make competent bacterial cells and put any part of the DNA in there, right? We can just take the green fluorescent protein, which comes from jellyfish and then we just clone it or clone it. We just put it in bacteria or we put it in human cells or in mice cells or other cells that we have around the lab. And of course, this is also a horizontal gene transfer mechanism, but it's not a natural mechanism. It's an artificial. And of course, this makes this tree very complex, especially when we're looking at the bacterial branch, right? Because in the bacterial branch, it looks like everything is really nice, but of course, like part of bacteria can just jump from, so a gene can just jump from one part of the tree to the other part of the tree. Not only that, but the same thing happens in some eukaryota. Of course, the more complex of a eukaryota you are, the more difficult it is to kind of integrate new genome or a new DNA into your genome, but it happens. There are instances of bacterial DNA being integrated into the human genome, which then would, if you would look at this specific piece of DNA, you would say, well, but humans are very close to speedersetters because we might have part of a gene which is coming from speedersetters integrated into the human genome, which is then not found in mice or in implants even. And so this tree looks very nice, but this only looks very nice because it's based on this ribosomal RNA. And of course, ribosomes are very well maintained because without ribosomes or with non-functional ribosomes, you can kind of reduce from the selection pool directly. But if you look at other genes, then sometimes you see that the tree that is being built based on homology and similarity or based on the idea of in paralogs and out paralogs, it just doesn't function. So you get like weird relationships which show that humans and dolphins are actually really closely related. And that is because of these horizontal gene transfer because of this xenologue mechanisms. All right, so I wanted to show you one of the bioinformatic resources we've been talking about Interpro for the functional analysis of proteins by classifying them into families and predicting domains. And this is one of the most important sites when it comes to proteins. So if you have your own protein of interest, so we can do that. So I'm going to switch to Firefox again and I think I already opened up Interpro. Yes, I did and I already did a run with the insulin receptor for humans. But if you have your favorite protein, just tell me now. And we're not going to do the tau one again because the tau one was just bad. Well, it wasn't bad, but it's just like it doesn't really have a structure and we already know what kind of domain it has because it has this tau domain. So in this case, it would be nice to have like a relatively big protein. And I did the human insulin receptor just so that you guys that we don't have to wait if it takes a long time. But just let's open up a little window and favorite protein anyone. First come, first serve. And otherwise I'm just going to take one of my own. Then we're going to just look at insulin, which is, I did the human lactase, but it doesn't work. You did the human lactase in Interpro or in the prediction or in the, what's it called? Let me see, open up the preview. Well, octene. Oh, the prediction of the structure. Yeah, but lactase is, again, it doesn't, the prediction worked really well if it has a couple of nice alpha sheets and beta sheets. Diction, yeah. So I'm thinking we can do things like, well, not the tau protein, but if we think about proteins, right then, one of the things that I always am interested in is the current pandemic going on. So we can use like the spike protein and of course the spike protein of well, no, let's do some other one. Let's do, because viruses are generally not having like really nice structure, but ribosomes take too long to predict hemoglobin. Yeah, we can do hemoglobin. So let's do hemoglobin. So we can go to, we can go to here. Now we just search for hemoglobin and of course hemoglobin occurs in, all right. So again, pseudo-terranova desifians, does anyone know? Like do we wanna know which kind of an animal this is? I'm always like guessing animals from their Latin name is really, really difficult. But yeah, just like hemoglobin from this pseudo-terranova desifians, I could actually Google it in another window and look very smart, but I'm not gonna do that. I'm just gonna Google it for you guys here. So is fish health? Is there not a nice Wikipedia or a picture? No, don't go to images. This is a parasitic nematode. It's a fish parasite, of course, Jan. Yeah, yeah, yeah. The fish guys here have a massive advantage for the weird names. Causes unease, suck it. What the hell, like, oh my God, like you actually know, like, wow. I think it's a mouse. No, it's a mouse. A mouse is a mouse. Mous musculus or mus musculus domesticus or so. Interesting, interesting. All right, so we're not interested in that, but so we're just gonna take the sequence here. So again, we're just gonna click on FASTA and then we're gonna take this hemoglobin from this fish nematode. All right, just take the, yeah, no, I don't want to tell you about my visit. I've already logged in, you know who I am. Like, you know everything of me. All right, so we can just enter the FASTA sequence in this format, right? And then we can just search. Of course, this should be relatively quick because it's a known sequence. We studied that last semester. Well, see that this course is really, really useful. Did you know that this thing actually had a shame on me? It's a really, really fun that that's just the first thing that comes up when you search for hemoglobin. I think hemoglobin is just, it just has a globular structure. So we have to wait a little bit, right? So we can look at the results from this one. All right, so here you then see the, so this is the one from insulin. So here you see that the family is that it's, this is the insulin receptor that I did. So it's the insulin-like receptor and here it's the insulin receptor. And you see that this is part of, so it consists of one big domain here, then it has a secondary domain and then it has a third domain here. So it does, this is probably the domain which goes into the membrane. This is the domain which recognizes insulin on either the outside or does the signal transduction on the inside. And then here you have another domain which does the signal transduction or the recognition, and then here you see the super families and then it shows you which sites are conserved. So this is a Tierkinase receptor and there's an integrated stuff and there's predictions. But all of these things you can click on and you can learn more. But the thing that you'd learn directly from the insulin receptor is that it's composed of three very distinct parts. And these parts of course are evolutionarily fixed in a way. So there's many proteins which have this receptor L domain. And so if you would click on this receptor L domain, like don't move around, can I not go there? Yeah, so it will highlight the receptor L domain and then it should be giving me more information about that as well. Why don't we go back? All right, so I have to Google that? That's so horrible. Why don't it, it used to allow me to click on it. Like the website looks a lot fancier than it did like two years ago and it was still like looking a little bit like 1990s. But actually here it would then, like I don't know why it's not allowing me to click on the, or should I click on these? No. All right, but I can just look for the PFUM thing. So PF013, right? And it says the L domain and actually goes to its own website. So the L domain from these receptor make up a bi-low ball ligand binding site. Each L domain consists of a single stranded right beta-handed helix. The pay from entry is missing the first 50 amino acid residues of the domain for some reason. But and so you can see that there's three different structures and this structure here, if we would search for it, the IPR, because it's the insulin receptor. And this would probably be a domain which is allowing it to enter into, or go into cell wall, but this is actually a dimeric glycopic composed of disulfide, molecular weight, it's a fallen cell dehesion, cell morphology, thrombosis, cell migration. So it's something that is in the cell wall and probably a couple of them, they make a pore so that insulin can be going out or inside of the cell. But let's stop that. This one should be done. Okay, so this is the hemoglobins from the Pseudo-Teranova desifians, which actually everyone knows what it is, but I didn't. So you see that this one is composed of one big domain. So it's a globular, right? So it's a globular protein, meaning that it's a blob. And then there's two parts of it. So there's a globin and then there's another globin. And then this globin has another globin. So this is actually a very uninteresting protein because the only thing that people really know about it is that it's shaped like a ball. And it's not shaped like a single ball, but it's probably shaped like two little balls with a little chain towards it. Of course, you can then use the same amino acid sequence, which I probably still copied. And then we can go to the other side for the prediction of the structure. So we can go here. So, and then we just fill in the sequence and then we just press submit and then we have to wait a little bit. Yeah, but probably it will show that the sequence is two globular structures or two little kind of balls with a site in the middle. I'm actually a little bit, I'm actually a little bit surprised that they kind of changed the website. They made it look much, much more fancy, but the information density and the ability to click on it and investigate it more is actually, it actually is gone entirely, which is really, really a shame because it used to also show you linkouts to which other species have this thing and how which other proteins have a similar one. So, all right, let's use the reload button here. All right, so here it predicts it and then in the 3D structure, it doesn't predict it. So you're not really coming up with really, really good proteins which have really nice structure, fortunately. That's a shame. That's really a shame, but at least the Interpro site, it used to have like a whole bunch of information which you could like click on and then investigate further, see all the other ones which have them. These are then all of the families that you have. They do still have the taxonomy browser, so there should still be a way to get there. Anyway, it is up to you during the assignments to kind of figure out how we can get back to, because the idea is that you can learn a lot about these proteins by looking at the, by looking at the Interpro website and then just drilling down, right? And seeing, oh, this protein occurs in mice, in humans, and in rats, but it doesn't occur in fish. So we can kind of pinpoint the evolutionary origin to this many thousand years ago, because we know kind of when several species split from each other. Anyway, it's, I think the Interpro is one of the things that you guys have to click around on, and I just wanted to show you that it's there. And if you have a more complex protein, a more longer protein, hey, if you would fill in the ribosome, it would take like a couple of 15 to 20 minutes to come up with a result, but then it would show you the different parts of the ribosome. So say, well, this is probably a P side, this is probably the E side, and this is probably the A side. Is there an alternative to Interpro? Yes, there are literally hundreds of alternatives because you have the protein debae. So the RSCP, so let me show you Firefox again. But this actually is, so this one, the problem here is that it doesn't allow you to do the predictions. You can only look up proteins which are known. So this is the spotted hyena, this is the pseudoterans. But if we are interested in, for example, hemoglobin, hemoprotein, so hemoglobin, so ahead and then we can go to hemoglobin, and then head, this just has different crystal structures. If you click on the crystal structures, it will give you an overview of the structure and it should also show you the different domains, which in this case is just the hemoglobin domain. So there's two, the bees change and the de-change. This is from the rossophila, but it doesn't have this, the Interpro used to have this very nice overview of the domains and where in the domains the differences were. Let me look at the assignments actually, because I think during the assignments, there's another database which I wanted you guys to look at. So it's Interpro. No, no, I didn't have this. Yeah, so there's the RSCP. Yeah, so this is the website that I want you guys and get the protein sequence from the Ensemble database. What is the 119th, so then we can go to here. Then we would look for this protein. So that's an oxyoreductazoprotein. So we just can go here and then, so here we see the, hey, here you have the protein sequence and here you have the unmodeled, but they used to have a really nice overview, oh, a few more in depth experimental data. Ah, that's so bad that they changed that. These statistics, map genome position, visualize search, basic search, advanced search. Yeah, now I know what I'm searching for. Why did they not, they used to have this really, really nice where you could just click on a position and then it would get a domain, Uniprot. Yeah, that's probably a better one. Yeah. So Uniprot is really nice as well and I want to look for this one. So just take the human one. So this is what it catalyzes and here, yeah. So here, this is the thing that I actually wanted that this used to be an interpro as well where you could just look at the overview. So at position six at 77, hey, there's a substrate binding site and there's more binding site for substrates. There's a NADP binding site for the energy head. Then you have the critical for catalysis site and this is actually like one of the assignments is, one of the assignments is, what happens when we change the amino acid at position 109? And so at 109, there's a binding site for the substrate and it used to be that here, they also had that. But here, like if I go to 109, 100, that's interesting that they, some atom are not reported. Partially modulated residue. I can't even click here on the PDB, right? When I go here and I say, oh, I want to click on the PDB link, then it doesn't even allow me to do that. I hate that they change databases every couple of years. Yeah, they might just have outsourced it. That's true. That's true that they just, because they probably focus more on the 3D structures because I use them a lot to get structures of proteins. And then here, because here you have like, because this is what you want, right? Top right. Here, top right. Do a classification. Download files, display files. Structure, global symmetry, find some of these, right, right, right. No down. All right, go down, down, down, down, down. 3D ligands. Anyway, I would just search it in the Uniprod database because here, this is kind of what I wanted. So here, what you see is you see the feature key, right? So what are the different, more or less, binding sites and other sites that are known? And then here, you see the position in the protein, right? So now, if you would start modifying, of course, the amino acid at this position, that this would have a massive effect on substrate binding. And you can then look in the publication which actually which thing is binding here. But structure validation, deposit data, up, up, up, right side. Why don't you have the mutations, literature? I don't see, oh, is this one? No, that's the link to this page. 3D view, annotations, binding. Do we have experiments, sequence? Ah, this is what I was looking for. So they do have it, but it's now under the tab. It used to be just in the main overview. But if you search for things in the PDB, which is like, this is the oldest database, right? This is the one from the 1960s that I told you guys about. So when you know what kind of a protein it is, and here, you can actually see that position 109, right? You see that here, there's the sheet, and then there's a binding site at position 109. So during the assignments, the idea was that if you look at this IDH1 gene, have what would happen if you would start changing the amino acid? If you would have a mutation which changed it, have what would happen? So in this case, the binding site probably would be lost. And then here, had the same thing would happen. So you have different binding sites for different proteins. And it's NADP, which is bound here. And then here, we have binding of NADP as well, another substrate. So this is the substrate. So the substrate you can get from the summary structure, no, not from the summary structure, but here. So this one here has the substrate, right? Because it takes this day trioizotrat, takes NADP plus, and then it transforms this to oxoglutarat. So you can see then that had the binding site for the NADP is, I'm losing here. So if we go then to sequence, then you see here that the binding sites for the different substrates are there. So this is the substrate binding site. And you see that the substrate binding site is in there multiple times. And that is because it's a 3D structure. So actually, this amino acid position here for the substrate is very close physically to this position. And this position, again, is very close to this position. So three dimensionally, position 177, position 109, and position 212 are physically kind of forming a little pocket which hold the trioizotrat. And then they have, there are four positions in this protein which then hold the NADP plus. So that's position 82, combined with position that's substrate as well, substrate, NADP. So here and NADP. So there's three positions which hold the NADP. And then there's four positions which hold the substrate which are physically close. And so what it does, it's actually, it's catalyzing this reaction. So it kind of physically holds this molecule, then takes this molecule and then transforms it into this molecule by using ATP. I think there's also an ATP binding site somewhere. Anyway, you can see that there's a lot of information in these databases. And the idea was for you to kind of experiment through the databases yourself. I'm not really a protein scientist. I'm more like a geneticist. But the idea is that there's so much protein information available. But in this case, I like the Uniprot much more because they just have this nice little table where you can just look at it, which is the thing that I actually was looking for. Because this one used to have a table as well without having this really fancy coloring and stuff which flashes if you hover over it. All right, I think that that was more or less it. There was one thing that I wanted you guys to be aware of. There is a special issue on proteins in the big learning picture, so I made a link to that. And this is the thing that I like the most. And that is the paper models of different proteins. So let's click on it. Go to Firefox. And so you can make your own proteins. So I would like everyone at home to print the tRNA. So you have the assembly PDF right here. So you have the description here. And then here you have to print out in color if you want. And then you can fold it. So you just connect everything together. So you're just building your own amino acid chain. And then you have to fold it to make a real tRNA. Great Christmas present. Yeah, but they have a lot. So in this one, the tRNA is really nice, right? Because you see here the phi. So this is the modified uracil, right? And then here you see the binding site. And then here you see the amino acid. And then you really get the idea of how these things in a 3D structure work. So this is something that I just wanted to show you guys. And do the tRNA. The tRNA is one of the better ones. I did the DNA one as well. So there's the DNA. The antibody is fun, but it's a little bit difficult because it's very stringy and stuff. So I would definitely do the tRNA. That's not too big. And then the H, no, the dinghy I think was also really fun. So it's the dinghy icosahedron, which is just very easy to do because it's just printing it out and clipping it together. Like it's not doing the whole chain. But the one that I really like was the tRNA. So I would advise everyone to print out and make a tRNA and put it somewhere or give it as a Christmas gift or hang it in the Christmas tree. I think that was it for today. So I talked to you guys about the history of proteins. I talked about structure, so primary, secondary, third chariot and quaternary structure. So primary structure, physical atom bindings, secondary structure, physical atom bindings, plus hydrogen bridging, third structure. You include the other forces as well. Purification, the identification. We talked about that, or I talked about it a little bit. The purification part won't come back. The identification part will come back next week. Very extensively about how mass spec works and all of these things. The function prediction, like what are protein domains, what are protein families, phylogenetic trees, homology, and all of these terms, like orthologue, paralog, xenologue, imperilog, outparalog, and these things. So that's it for today. I will actually stop the recording because I'm