 everyone. Also if you're watching it on YouTube to the next part of the lecture and probably think it will be the last part. I still have a couple of slides left, like 30. So a little bit too many, but we'll see if we can get through it. But it takes a while to do the prediction, so we'll just have to wait and at the end of the lecture we'll just look if the prediction is done and otherwise you just have to click the link in like a day when the server is caught up. All right, so there's two other methods to do protein prediction. One of them is threading slash fold recognition, and this can be used when there is no homologous protein sequence available. It's kind of a template-based method and it's similar to doing local alignment to known proteins, but it is one of these things that you can use in like the if there's no real homologous protein there. So if you really want to know how your protein looks like, then what it does is again it kind of chops it up into small pieces and for each of these small pieces it tries to find the best matching protein that is in the database and then use that structure to then build up a larger structure of the individual parts. Homology modeling is very similar, but this is based on homologous protein sequences being available, and so for example imagine that I have myostatin, have we know how myostatin looks in humans because we have 3D x-ray data and these kinds of things, but now for example you're working on some lizard which occurs in the desert and hey you now have the primary sequence of the amino acids because you just sequenced the animal and then did a DNA to protein translation and then of course it can use the human myoglobin gene to do a prediction of your lizard myoglobin gene. So I have very similar threat folder recognition is actually based on local alignments while homology modeling is just taking the whole protein looking at which one is the closest and then just modifying the structure based on the amino acid changes towards the protein that you have. The latest and greatest actually in protein folding is something made by Google and it's called alpha fold and it's based on this machine learning kind of technique that they use. So it's a two-step process, it was only published like a year, year and a half ago, so what it does, it first does a neural network to, I don't know, it first does a neural network for alpha helix and beta sheet prediction and then afterwards it does a gradient descent so and that is another, well it's not a machine learning technique, it's a local optimization technique to try and fold the protein better and so far as I know alpha fold is actually the best up-initio protein prediction which is currently available. It's still not perfect but it's a lot better than the other tools that we discussed. So it's something made by Google and it's a very interesting project because they use two different techniques, right? One on the one part you have the neural network for the alpha helix and beta, so the secondary structure prediction and then they do gradient descent which is a method to kind of optimize when you have a structure then it simulates all of the forces and then tries to find a minimal energy state which is kind of what the phone app also tries to do but the phone app uses of course human brain power for it while here they are using a mathematical optimization technique. So very interested, read the article if you're interested in exactly how it works and how you can use it and you can actually get access to their predictions and just see. All right so production of proteins happens in the ribosome, right? We know that so proteins are assembled from amino acid using information encoded in genes. The genetic code is a set of three nucleotides called codons so three DNA letters together form the codon for an amino acid and so each three nucleotide combination designs an amino acid or designates an amino acid for example when you have an AU energy in your messenger RNA so adenine, uracil and guianine then the code is coding for a methionine amino acid. So since DNA has four nucleotides the question to you guys and you can think a little bit about it because I put a slide in between so that I don't have to wait. What is the total number of possible codons that you can make to encode amino acids? Right so you have three positions and at each position you can have four base four nucleotides so the question to you guys in chat is what is the total number of possible codons? So how many different amino acids could you in theory encode using three base pairs when you have four bases available at each of the positions? Oh crap damn I gave it away so it's 64. I thought I put in a slide I thought I put in one slide so that you guys could think about it and I could talk about the wobble base but never mind so hey it's it's just basically four to the power of three so it's four times four times four so there are 64 possible codons but there are only around 20 amino acids in humans so there is a certain amount of redundancy in the genetic code and this redundancy in the genetic code comes in at the third position of the codon. So the third position of the codon is called the wobble base. So how does translation work? Well genes are encoded in DNA they are transcribed in pre-messenger RNA also called HN RNA the pre-mRNA is then modified into mature mRNA for example introns are spliced out and then the ribosome synthesizes proteins using the mature mRNA. So in this whole process is called translation so the last step so that is synthesizing a protein from messenger RNAs and like we write it down proteins are always synthesized from the n-terminus to the c-terminus so the first amino acid that we write down when we write down the primary structure is also the first amino acid which is made into the ribosome and then the second one is the second one that is included by the ribosome. So we already saw this ribosome consists of an A site a P site and an exit site and so there's always two of these tRNAs encoding amino acids so the newly born protein has the n-terminus here and this is the currently free c-terminus to which the next amino acid will be bound. So see I thought I so it's four to the power of three so 64 possible codons so there are 60 cents codons for amino acids there are three terminator codons instructing the ribosome to stop producing and there is one codon which is the start codon which is the ribosome the message to the ribosome saying that you should start making a protein now. Some of the codons are redundant or degenerate or two or more have a different codon for the same amino acid and this is due to the wobble base and tRNAs have anticodons that match the mRNA codon so there should be 60 different tRNAs with different anticodons but the total number of tRNAs found in any species is less than 60 and in most cases it's like 22 or 31 right so like you're in your genome you only encode 22 or 31 tRNAs maximum so wobble base pairing so the fact that the last base pair in the codon is is not as relevant as the first two was proposed actually by Francis Crick the guy who invented well not invented DNA but the guy who invented the fact that it's in a double helix so the anticodon five prime end base in tRNA has the ability to pair with more than one base found at the third position at the three prime end of the codon of the mRNA so had the orientation of codons and anticodons is anti-parallel the third base in the codon base pair is code is is pairing with the first base of the anticodon so how does this look so here for example we have messenger RNA right which is coding for cuc here we have an anticodon which which codes for lucine right which is going five prime to three prime right so you can see here that that the anticodon so the first base pair in the anticodon is actually binding the third base pair in the codon and the fact is is that the g normally couples to a c which is normal pairing but we can have two identical tRNAs or we can have two tRNAs right which are identical coding both for lucine but since this one has a u so here we have a u which pairs to the g which is very uncommon right because normally a g only pairs with a c but the way that you have to look at it is that the tRNA is not coming in straight but only the first two base pairs are really tightly bound and the third one so the third base pair of the codon the first one of the of the tRNA is actually kind of wobbly it doesn't match or doesn't have to match exactly um so it just it just has some freedom um so this is called the wobble position so remember the wobble position is the third base pair in the messenger RNA of the codon but it is the first one of the tRNA codon and why for some reason they actually flip this one around because this one is is mirror imaged like it's it's wrong right the this light scene should actually be here at this one so it's it's not a good representation all right so what can we now do because we have all of these possibilities we can use or we can make something which is the amino acid codon wheel so the amino acid codon wheel allows you to take any RNA code and figure out which um which amino acid it codes for and here you can actually see the wobble base in action so you always read this from the inside out so you read from five prime to three prime right so that's why five prime is in the middle and three prime is at the end um so for example if we look at a serine right so the amino acid serine is coded by a u then a c and then it doesn't matter what the third base pair is all of them code for serine so even if so if you have u c u or if you have u cc they all code for serine um sometimes you see that that's not exactly the case right if you look at here at the cysteine and tryptophan and so you see that u g g codes for tryptophan while u g u or u g c code for cysteine um and if you have u g a then that actually codes for a stock codon right so it's not always that the third base pair is degenerate but in many many cases it is and if it is degenerate or not is something that is really hard to figure out but like since the genetic code is more or less universal um we can use these amino acid wheels to when we have a dna sequence to kind of predict what kind of a protein will come out by just doing the translation on paper um so this wheel um is kind of universal it's not entirely universal um but you can use it to translate genetic code into amino acids and of course this is based um or the the notation here right so uh leucine being coded by an l serine being coded by an s um this is based on the international union of pure and applied chemistry and this is called eupak so the eupak notation is the the way that you go from the name of the amino acid they define what the three letter code is and they also define the one letter code for amino acids but they define much more so they also define um for example um a w right so in dna you can have a c t and g but you can also have a w and the w stands for weak so that means it's an a or a t and you also have an s which is strong and that stands for a g or a c so they have their own system in how to write down dna how to write down r na and how to write down amino acid so if you're really interested in how to exactly write down like uncertainties in a dna sequence um then you can catch uncertainties in a dna sequence by using the eupak um notation so the universal genetic code i told you guys that it was that it is universal that everyone uses the xx same t r na's but actually they discovered in 1981 that that's not entirely true um because for example the mitochondria use a slightly different universal code so your your your uh autosomal dna so and also the dna for the sex chromosome so your your chromosomal dna uses a slightly different codon table than your um than your mitochondria because the mitochondria is originally from bacterial origin they they retain their own t r na's and these t r na's have for four codons have a different amino acid coupled to these codons um then the standard dna in your cell nucleus uses so you have to be careful right so you can use this thing for m r na in multicellular organisms that's not a really big issue generally they follow the exact same structure but if you are translating for example mitochondrial proteins or if you're working on a very exotic prokaryote or eukaryote um for example extremophiles um then you have to be very careful because then you might need to use an adapted codon table so the codon usage is species specific in a way and also how to favoritly code codons is also dependent on if you're a human or if you're a mouse or if you're a rat um but in for all for mice rats and humans um the codon is the same so a mouse also uses u c u for serine um just like a human also uses u c u for serine but which codon you have on a third place so which t r na you have most of that is something that is species specific so you can see for example if you have serine being encoded by u c u most of the time and then you it might be a human while a mouse might always or might preferentially use u c g so based on on the codon structure that is used you can see if a sequence is more or less optimized for a bacteria or for a human or for a mouse because although you have all of the different t r nas not all t r nas are used equally by the by different species so so be very careful not only always check the codon table if it's available for your species but if you're working on a new species don't directly assume that the genetic code is universal although everyone always talks about universal genetic code it is not as universal as you think it is all right so now some more about protein identification and purification um so i just wanted to run through a couple of purification techniques like ultracentrifugation precipitation electrophoresis and chromatography and then i also want to tell you something about identification techniques so how do i how do i how do we identify proteins so we can identify proteins using immunohistochemically we can use x-ray crystallography we can use nmr and we can also use mass spectrometry nowadays to identify which protein you are looking at and often these things of course are combined because generally when you do mass spectrometry you combine it with one of the purification techniques um so but how do they work so ultracentrification is actually kind of simple um and so differential centrifugation is used to separate certain organelles from a whole cell to for further analysis by specific parts of the cell so it's it's just based on the size of the proteins right and the stronger you rotate something uh the the smaller the parts that actually end up at the tip of your eppendorf cube and so imagine that you have an eppendorf cube with a protein mixture in there um you first centrifuge it at very low speeds right then the green big molecules or the green big proteins start ending up clumping at the bottom of your tube while the other ones remain in suspension if you then remove the green parts by piping them out um then of course you can centrifuge again with higher speeds and then of course the smaller parts start settling on the bottom of your tube you can then pipe them out and repeat the process until you've separated in this case the whole mixture into four different protein fractions so hey it's just repeated centrifugation where the quicker the rotor turns the smaller the uh the proteins that come out of solution and start clumping at the bottom precipitation is uh slightly different um it is the creation of a solid from a solution um so the solid formed is called the precipitate and then we have the um the the the so we have a protein in solution then we add a molecule which binds to the protein and makes them kind of uh either hydrophobic so that they go out of suspension and then they start um kind of clumping together on the bottom so this is often a very slow process right because you add for example something to your mixture and then very slowly things start coming out of solution and forming a precipitate at the bottom um but to kind of speed this up you generally combine precipitation with centrifugation where you say well we're not going to wait until all of these little things have kind of dwindle to the bottom um but we're just going to put it in a centrifuge and just have this have the precipitate um kind of forced out of solution so and this of course you can separate a single protein out um and this you can for example do with antibodies so you can have antibodies um or magnetic beads um so you you have magnetic beads which then um or for example antibodies which couple to your protein uh that makes that they cannot dissolve in water anymore so they start more or less coming out of solution and then you centrifuge and then hey you are left with something which is called the supernate or the supernatant and then the precipitate is the proteins which have come out of solutions are now at the bottom electrophoresis is one of the most used techniques um it was developed in 1807 by Ferdinand Friedrich Ruiz um and um there it comes in two forms so you have cataphoresis when you use a positive electrophil or you have anaphoresis when you have a negative electric field it is also used a lot in DNA and RNA if you've ever done um a a gel right so if you ever put DNA in a little well and then used electricity to pull the DNA through the gel um then you have done electrophoresis so electrophoresis is the motion of dispersed particles relative to a fluid under the influence of a spatially uniform electric field so how does it work well you have an electric field which pulls charged molecules towards it and then you have a kind of matrix often an agarose gel which provides a kind of um um how do you call it something that that pushes against this this molecule right so the the smaller the molecule the quicker it travels through the agarose towards the positive field or towards the negative field um and the bigger the object the slower it travels so that's the thing that electrophoresis is um based on all right so short into mezzo because like i told you guys that proteins have a charge themselves because proteins have these side chains and the side chains determine if a protein has a slightly positive charge or a slightly negative charge so we call this charge for often of a protein so the intrinsic charge that it has we call this the electrolyte isoelectric point so because a charge on a molecule actually corresponds to a ph in water right because positively charged things have take away hydrogen atoms from the water so make the water more basic while negative charge actually or while well let me see that's probably the wrong way around yeah but if you have a molecule then it can release a hydrogen into the water making the water more acidic or you have a molecule which can take up a hydrogen which makes the water more basic so below ph of seven no above because basic is above but there is a ph so for every protein out there there is a ph at which the molecule has no electric charge right so if I would have a gel and this gel would have a gradient from a ph of one all the way to a ph of like 14 then if I would put a protein on then this protein would start moving towards the point where it has no charge because it will be kind of if if you put a protein which has a p which has an isoelectric point of seven when you put it at three on the gel then it will automatically start moving towards a ph of seven because then it has no charge so it's not it's not pushed and pulled from the side so when you dissolve a protein in water it has an intrinsic charge this is based on the side change and different ph means that there is a different net charge and proteins like to be at the point where they have no charge because they are in more or less in homeostasis with their environment and this is called the isoelectric point and this is unique to a protein so a certain protein has a certain isoelectric point and for each protein this can be different but proteins which are similar have similar isoelectric points so in 2d gel electrophoresis we use this so we use this 2d gel electrophoresis to separate protein mixtures so here you see the result of one of these separations right so the first step here is if you have a protein and you denaturated with sds then you can just do size separation using electric charge so this is just basic electrophoresis right so you pull proteins from the top to the bottom um and this separates proteins based on their size so the protein here is small while the protein here is big right so the the y-axis in a 2d gel is the size of the protein furthermore on the gel you also have a ph gradient and this separates so this is very very acidic this is very basic and the protein travels from top to bottom based on their based on the fact that it's being pulled by the by the electric field um but it's also traveling from left to right or from right to left based on the ph gradient which is there to to separate the protein by the isoelectric point so this is a protein which is feeling very comfortable in a neg or in a in a acidic environment well this is a protein which has a net charge when the environment is very basic is that clear that you use two different different techniques or in one go so on one side you are just using electrophoresis and the other axis on the gel on the 2d gel is using the ph to separate by the isoelectric point i hope so good then there's a third way to separate and this is uh using chromatography chromatography chromatography so it's a collective term for a laboratory technique for the separation of mixtures so the mixture that you have is dissolved in a fluid which is called the mobile phase and then it flows through another material which is to call the stationary phase and this the stationary phase is often just a piece of paper so i think a lot of people would have done this in like elementary school um where you get a little piece of paper right on the bottom of the paper you put a dot with a with a felted pen or with a colored like pencil right so you put a little dot there or you use like one of these board markers right so like a board marker you put a little dot on the bottom of the paper then you put the paper so a long strip of paper you put it into a glass of water the water travels through the paper because it's being sucked up by the cellulose in the paper and while the water travels it takes the constituents of the board marker with it right and have because the board marker is not just one substance but it's a mixture of many different things it will separate out into many different phases and this is based on the um the speed at which it travels um due to the fact that if it dissolves very well in water it will be very easy for it to travel upwards and if it doesn't dissolve very well then it will more or less stay more or less where it is and that's the way that the separation works so it's just like i think people did this in elementary school or i hope you did this in elementary school and if not just get a board marker get a piece of paper put a dot there put it in a glass of water and then you see the water just slowly going into the paper and then well everyone who's done a corona test knows what chromatography is because there also it's the same thing it's just that the the stuff which is in the um little cup that you get it's pulled through the little test strip and then once it reaches the test strip there's two parts of the test strip where there's like an antibody um which reacts with certain proteins um but then this is called chromatography and of course the the mixture that you use to dissolve is the mobile phase and had the the paper that you use to pull things through is called the stationary phase all right so when we want to purify proteins we have four different techniques um so we can use centrifugation we can use precipitation by so centrifugation is just floating it around and using gravity we have precipitation which means adding a substance so that it dwindles out of the mixture and we have electrophoresis which is pulling a protein through an electric field electrophoresis can be combined with the p i to make a 2d gel so then we don't only separate by the size of the protein but we also separate on the intrinsic charge that the protein has and we have chromatography which is just putting a little dot on a piece of paper and then putting it in water which then separates out the different constituents all right so what if we want to detect or detect proteins right so what if we want to say this protein is here or this is this protein um then generally what we do is immunohistochemically so what we do is we detect proteins in a cell or in a tissue by exploiting the principle that antibodies are very specifically binding to antigens in a biological tissue right so what we do is we make an antibody against the protein that we are interested in um and then head this antibody this can be done in two ways so we can use an antibody with a peroxidase and then head so the peroxidase these things look like this so the peroxidase is a color producing reaction so then we call this immunoperoxidase so that's what you generally use when you have tissues so you just have a substance which you throw on there the substance has an antibody and a peroxidase and the peroxidase reacts with the um with the um with the tissue to create a black color um and if the if the protein is not there then the black color is of course not produced we can also do it using a fluorophore and then we're talking about immunofluorescence and so in this case we have a a molecule attached which based on infrared light or uh like visible light uh amidst light based on immunofluorescence so head then the the antibody is coupled to a fluorophore and then when you shoot a laser at it it makes a it makes it gives you a color um here we see a peroxidase right so when the protein is there it reacts turns the tissue black um if the protein is not there then there's no reaction so the tissue just stays nice and white we can also use x-ray crystallography if we want to know which protein we're looking at so this is a technique used for determining the atomic and molecular structure of a crystal and so it it it means that you first crystallize your protein which is really difficult because you have to test multiple um different conditions so hey and you have to find the exactly right conditions for the protein to be to crystallize but once you've crystallized your protein then you mount it on something which is called a goniometer um then you illuminate your crystal so you just shoot an x-ray at it um and then you capture the diffraction pattern um using a um a photosensitive blade then the goniometer rotates the crystal like one degree and then you repeat the whole process so in the end you get 360 pictures and the pattern in these pictures allows you to get um to make a 3d electron density map and that is done using Fourier transformation that's just a mathematical technique to go from something in 2d space back to something in 3d space um which is very interesting but i don't want to go too much about it because the idea is is very simple you crystallize your protein which is really hard to do you mount it you shoot it with an x-ray then you turn it like one degree you shoot it again in the end you get 360 pictures and based on these pictures you can then build up a 3d model of your protein um and this is then it's not really a model of your protein because you can see where the electrons are and this is because x-rays interact with uh electrons um and an electron kind of bends an x-ray in a certain way we can also use nuclear magnetic resonance um has a nuclear magnetic resonance is what is called uh MRI in the hospital so have what you do it's a it's the physical phenomena in which a nuclei uh within a magnetic field absorbs and re re-emits um electromagnetic radiation so what you do is you put your protein of interest in a very strong magnetic field and then you you bombard this thing with um electromagnetic radiation so just radio waves and then based on the structure of the protein radio waves will be absorbed or not and then once they are absorbed then it takes a certain amount of time before they are then released to the detector and the difference between the absorption and the release will tell you something about the structure of your protein and of course every amino acid has a different time of absorption and a different wavelength at which it absorbs um and these these machines are quite big um so they are because the magnetic field that you use in hospitals relatively weak compared to the ones that you use in protein science one of the most used techniques to identify different proteins is mass spectrometry um and this is a technique that ionizes um chemical species and sorts the ions based over their mass to charge ratio yes so how does it work well um so here we have an ion source so we put the protein in right so the protein is first separated generally by size so head the the smallest protein comes out first and what happens is that you have uh an ionizing filament i think i have a slide for that no that's bad so the protein comes out um so the protein comes out then it is quickly accelerated by using a little metal plate with a hole in there and then the protein is is by so what happens is that when it when it flies towards the plate which happens into a vacuum the protein splits into all kinds of little parts and then each of these parts gets a an electron or two electrons or three electrons added to it when it flies through the metal plate then it it there's a beam focuser but the beam is then shot there's a big magnet which kind of makes the one makes everything go in the magnet attracts the charged um the charged parts of the protein and then what happens is that if you have a very small fragment and the small fragment is bent more by the magnet than a very big fragment so in the end you have a whole bunch of collectors and those give you then a pattern which shows you which fragments were detected um and that's kind of how it works i think there is a i think we get back to mass mass spectrometry in another lecture i i do have in my mind that there are like 15 different slides that i made about mass spectrometry in the end if if we're actually at lecture number 10 and mass spectrometry hasn't come back and you want to know more about it then just send me an email right because we still have two open lectures um where you can decide the um the the topic of the lecture so um if you want to know more about mass spectrometry um the the big trick about mass spectrometry is just having charged particles flying through a vacuum tube being attracted by a magnet if you are very very big then you are not that much affected by the magnet if you're very small then you're more affected by the magnet so the the loop that you make right so the the turn that is introduced you can then use that to figure out what the charge and what the mass of the original fragment was all right so i very quickly told you about different identification techniques for protein so you can do it immunohistochemically which means that you use an antibody coupled to a fluorophore or coupled to a peroxidase um you can use x-ray crystallography which makes a crystal so you have to make a crystal of your protein and you then shoot it with x-rays and then based on the diffraction pattern you can figure out what protein you were looking at you can use nuclear magnetic resonance where you put your protein into a massive electric field and then hit it with radio waves um so just basic electromagnetic waves and then based on the absorption you can the absorption and then the the re-emission of these radio waves you can figure out which protein it is and you have mass spectrometry which just chops up your protein in all kinds of little pieces charges these pieces and then um makes them fly through a vacuum tube and a magnet and then the smaller the fragment the more the magnet kind of bends it around the corner and then you can detect that um and figure out how big the fragments were and which molecules made up the fragment that you're looking at all right so almost the last section so proteins if we talk about whole proteins right then proteins are generally classified by their function so there are seven different groups of proteins which we define in nature um one of them is called structural proteins and those are proteins which are like collagen or keratin and these are proteins which are they are to strengthen tendons but also things which make up your skin your hair and your nails so they are proteins which give structure to cells and all the things surrounding the cells we have things proteins which are classified as enzymes um for example DNA polymerase um but an enzyme is something which catalyzes a chemical reaction um has so a chemical reaction can be the replication of DNA or the repair of DNA but also of course other chemical reactions are for example catalyzing the breakdown of alcohol right alcohol the the heterogenase um is one of these proteins which allows your body to break down alcohol into its constiguous parts and an enzyme the definition of an enzyme is something that participates in a chemical reaction but is not used up so it's not an it it's it only catalyzes or hey it only makes the reaction run easier or run faster or run the other way but it's not being used up so a single enzyme can be used again and again and again um we have transport proteins so those are proteins like hemoglobin um which are there to transport stuff from A to B we have contractile proteins like octin and myosin so these are proteins which based on calcium or other molecules actually contract and expand so they become bigger and smaller um based on um um so in muscles you have um myosin which is kind of sliding on top of each other so but the contractile proteins had their their function is to contract and to make muscles work we have protective proteins um antibodies those are also proteins we have hormones so hormones are a very interesting class of proteins um but hormones in generally are considered very slow acting messenger molecules and so something like insulin is a hormone um or generally classified as a hormone um and that is because it based on you eat something your blood glucose level goes up then your then you produce insulin so your pancreas starts producing insulin and then based on the insulin your body knows oh this is the signal for my cells to start taking up glucose from the blood um to bring the glucose level in the blood down again and of course since we have protective proteins like antibodies we also have a group of proteins which are called toxins and of course the the best thing is for example snake venom so the function in this case is to incapacitate prey um but seven different groups to classify proteins in and you you can or normally proteins are classified based on the function that they have all right so how does a protein get a certain function right if you want to predict for an unknown protein what the function of this protein is is then what we do is we look at the different domains that there are in the protein so a protein domain is a conserved part of a given protein sequence and it's tertiary structure that can evolve um but it exists independently of the rest of the protein chain so it is a it is a section of the protein chain which does a certain thing right so it can for example be that part of your protein is DNA binding it can be that part of your protein is um other protein binding or the protein is there to bind ATP and so and you can determine these protein domains by looking at hundreds of proteins and aligning them and to see which region of the protein is conserved and so knowing which domains are in a protein can help understand the function of a protein and so we can kind of classify a protein based on the different domains right if you have a protein and it has a globin domain right then it's generally a protein which binds something else and then transports it right because glo means like hemoglobin and myoglobin are there for the transport of oxygen but there are many different glo globins that that transport all kinds of molecules but all of them have this very similar structure where there's kind of this hand with a charged molecule in the middle which then attaches to something and then the globin is there to move it to another position so we group families or we group proteins which share protein domains into protein families so protein families is a group of evolutionary related proteins that is often nearly synonymous with gene families right so we have around 60 thousand different proteins family and the reason why we use a protein family is to say well this protein is involved in this process and all of these other ones are as well and so as an example of a protein family you can think about hox genes so hox genes and their associated hox proteins they are there to shut down part of the DNA so once you start once you once you have a growing organism at a certain point this organism needs to be certain cells need to develop into an arm and this is this is done by hox genes so in a single cell a certain hox gene will turn on it will turn off parts of the DNA and from now on this whole cell and all of the cells that will be born from this cell are kind of dedicated to go and create an arm while another hox gene is there to create a wing for example or another hox gene is there to create a part of your intestine right so these are these are massive regulated but they're all very similar right because every one of these hox genes has a domain which bind DNA then they have another domain which binds these histones which which which are recruited to kind of wind up the DNA to make it inaccessible right so all of these proteins look very similar so that's why we talk about a family and of course like they're all different because some of them bind specific sequences which shut down parts of the DNA so that you can produce an arm other ones do do a heart but these new families they arise due to speciation events and to duplication so hey a part of the genome can just be duplicated and from now on you have twice the same gene these genes are similar right so they start off being exactly identical so part of the same family but then during the course of time mutations will occur in one of them or in the other one and they will kind of drift apart but since they started off from being more or less the same they still have more or less the same function but one instead of binding glucose it binds some other substance very similar to glucose and the more time passes the more different they become but had duplication events are very common events that started off protein families so there's many many different tools bioinformatics tools that are out there to predict protein families right so you have payfarm which is the the database of protein families and it contains alignments and hidden Markov models to see if your protein belongs to a certain family right you have pro side which is the database of protein domains families and functional side you have the PRSF which is the super family classification system because we have 60 000 of these um protein families right we also group them together in super families saying well this is a globin right and then underneath the globin you have hemoglobins myoglobins and other globins um and so we have past two which is the protein alignment the structural super family so you have a lot of these different tools out there to learn about which domains are in my protein to which other proteins is my protein related and then based on this you can get a very good idea of what your protein that you're interested in is actually doing and how it is doing this because hey if you know the domain then the domain is generally doing RNA binding or it's doing like a hole in the cell to make a a pore or something like this so just a list of different tools that are out there we won't go through all of them but of course hey I just want you guys to know that all right in the future I might have to study some protein in more detail which tools can I use here's the slide these tools you can visit you can fill in generally your fusta sequence of your protein and they will tell you oh there's a domain here um so very useful if you if you're working on a new species um which has no annotation or less annotation than a human all right then we're going to take a quick break um and then we will have to do another 15 20 minutes to finish off the lecture so um we will be talking about phylogenetic trees this is mr. garrison explaining phylogenetic trees of how a monkey and a frog fish and a retarded fish are related I always love this slide um so we will have a short break for the people on youtube I will stop the recording so see you in part four