 Again, so welcome back everyone that's watching on Moodle or later on Moodle, since I'm not allowed to stream onto platforms at the same time because of Amazon rules. But let's just continue with the lecture. And all right, so talk about a little bit of more of bioinformatics, right? So if we are dealing in proteins in bioinformatics, then of course we are interested in the protein structure, because the structure of the protein gives it its function. So there are a lot of computational tools available. I think many of us did it. You all reported the guy. I think that one report or two reports is enough. Like Twitch is running a very aggressive campaign against bots watching, and not only that, but also selling followers or buying followers. That's just like app. You just get your followers by being funny and giving lectures, right? That's how I do it. Or at least I try to be funny. You don't have to think I'm funny. Anyway, so structure prediction tools, there are a lot of different structure prediction tools, but they fall into these kind of categories. So the Appinico prediction is kind of the unsolved question in bioinformatics. So how do you take the sequence of amino acids and then predict the tertiary or quaternary structure from it? Because that's the thing that we want to know. Primary and secondary structure are not too hard for computers. And so the secondary structure prediction is more or less solved. There's a lot of tools out there online. We will actually do a secondary structure prediction, or I will do one live for you guys. And I think in the assignments there's also one. And then transmembrane helix prediction is also kind of solved. It works relatively well. So if you have an alpha helix and this alpha helix has hydrophobic side chains, then this will preferentially be in the cell membrane. So what you want to know about a protein is, is this protein in the cytosol of a cell? Or is it in the cell membrane? Or is it, for example, in the membrane of the nucleus or in the endoplasmatic reticulum? And these things, so the structures making these things able to be in the cell wall, they are pretty well known. Because the cell wall has a certain thickness. So if an alpha helix has a certain amount of helical revelations, or how would you call it? Helical turns. So if there's a certain amount of helical turns, then you can kind of combine that with the knowledge of how thick the cell membrane is. And then you can predict if this protein will insert into the cell membrane and if it will function, for example, as a, as a, as a transporter or as a receptor. Then there's something called threading or fold recognition. This is based partially on homology. But threading and fold recognition is its own prediction method. And then there's homology modeling. And homology modeling is the way that you use known structures from other proteins to predict structures in new proteins. So if you know that if I have a sequence of amino acids, which is 50 amino acids, and I know that these 50 amino acids have a certain sequence in protein A, which has been solved by x-ray crystallography, then you can then use this known structure to say, well, OK, so in this protein, I find 44 out of 50 amino acids. So it will be very similar to the one that we already know. All right, so this is, as so, the Appiniccio prediction is an algorithmic process by which protein tertiary structure is predicted from its primary sequence. And this is still a remaining unsolved question. And it is one of these top 125 outstanding issues in modern science. So at a certain point, I think in 2010, they made a list of which things are solved in science and which things are outstanding, so which things are not solved yet. And the idea is that if you would solve this question, so if you would come up with a program and you would feed this program the amino acid sequence and it then predicts the tertiary structure, then this would be like a massive advancement in science. And it actually would be, because we would be able to much better be able to deal with what is the function of this protein and how, for example, can we target drugs or other interventions to kind of block this protein or to make this protein work better. So there are some projects out there which try to do this. And one of the most interesting projects that I find is Folding at Home. So Folding at Home is based on the SETI at Home program, so the search for extraterrestrial life. So it uses your home computer. Or you can have this program run on your home computer. It communicates with a server. The server is located at Berkeley. And it gets little data packages from Berkeley. And then it does the analysis. So it does local, it tries to locally fold the protein as best as possible and then send this back. So it's kind of a crowdsourced project. One of the other projects which is very interesting is the Folded Project. So the Folded Project is actually something that you can play on your smartphone or on your tablet. And it's a game. So in the game, you are the one folding protein chains. And if things are hitting each other, you score negative points. And if things are more or less free from each other, then you get a positive point. So you get a score for how well you fold the protein. And data from this project has actually led to some very novel algorithms. Because apparently gamers are much better at folding proteins than scientists are. Because scientists have all these preconceived notions on how a protein should look like. But gamers don't really have that. So using this game and using strategies that gamers came up with, they developed novel scientific algorithms. And there's a couple of nice publications on the Folded Program. So you can just Google it and you could download it and help. Because hey, it's just using human intelligence to kind of brute force this project. And then there's the Human Proteome Folding Project. And the Human Proteome Folding Project is more or less a scientific project run by different universities to kind of solve this open issue in modern science. All right. So when we talk about secondary structure prediction, this is very well done. Cheers. Yes, have a nice holiday. So the secondary structure prediction is kind of solved. So we can solve secondary structures with around 80% accuracy, which is more than good enough. So hey, there's programs out there like RaptorX, SS8, or SimPredict, or YASP. And these are programs which you feed the primary amino acid sequence. And it will then predict the local secondary structure. And it does this relatively well. So hey, it will detect if there's an alpha helix or a beta sheet and some other known protein structures, which are secondary structures. These are structures based on the atomic bindings and based on the hydrogen binding, so hydrogen bridging. All right. So this is then used to predict transmembrane helixes. And these programs fall into more or less four different categories. So there are programs like HMM Top, which use hidden Markov models. You have MMSOT, which uses neural networks, so kind of machine learning. And then you have Phobius, which uses homology prediction. So hey, it uses the homology of known proteins to predict novel proteins. And then you have CC Top, which is using constrained consensus topology. So this is kind of a mixture between hidden Markov and homology prediction. So it uses homology from head. So that's the constrained consensus. So the constrained consensus comes from the known proteins. And then the topology part is using hidden Markov models to kind of fold the parts that which have no homology to known proteins. So I wanted to show you an example. So I just put some links here. So I'm going to switch you guys over to my Firefox window. And the first thing that I want from you guys is a protein. So what is your favorite gene or your favorite protein in human or in mice or in plants? And then we can just search for that. So let me go actually to my window. So of course, we are going to search in the protein database. So if you guys have a favorite protein, let me make this window a little bit bigger actually so that we have the search button here as well. It will squeeze a little bit so I can make it a little bit bigger like this. All right, so otherwise we will just do the insulin receptor, which has some nice structure, right? It's a receptor tau protein. That's probably one of these that is like these unstructured proteins in Alzheimer, if I remember correctly, right? But we can do the tau protein. So we just search for tau protein. So let's see how CCTOP will go there. So this is the partial. This is a fragment. So this is also partial, partial, partial. Let's sort by not the default order, but just by, can I not search by the length of this? There should be an, well, let's not make it too big, right? Because if we make it too big, then the prediction will take like a couple of hours, and we can't see it. So let's just take one of the first ones. So this is 124. I have fleeing, don't know what a krkota, krkota is. I always like to figure out which animal we are dealing with then. So Spotted Hyena. So who would have guessed that a Spotted Hyena has such a nice scientific name? Krokuta, krokuta. That's better than a homo sapien. Like, we really got shafted on the scientific name, in a way. But so let's take the tau protein, partial sequence, 860 amino acids, which is quite big, so it might take a while to do the prediction, from the Spotted Hyena. So it's loading the sequence, so it gives us all kinds of information, right? And it tells us what the CDS is. But here, this is the thing that we want. This is the protein sequence. So we can't use it directly like this. We have to export the sequence. And of course, we are going to do this in FASTA format. At least we don't have the name Huso Huso, like the Meluga Sturgeon. I don't know. I like the double name, like Pissang Pissang. It's nicer. But all right, so we just take the FASTA, right? So some of them are actually really, really interesting. I actually have a list of my favorite scientific namings of animals. Anyway, so when you have the FASTA sequence, we can just copy-paste it. So I'm just going to select it, press Control C on my keyboard. And then I'm going to go to CC Top, which I already opened. So CC Top, and like you can see, the program is very easy. You just give it the primary sequence. So we just copy-paste this in. You can fill in your email if you want to not wait for it. But we're just going to submit, and we're just going to wait here a little bit, so we can just talk about it. And I'm already going to click here. So we have to wait a little bit. And in the meantime, we can actually open up the results that I did before. So I can just copy this link or not. Just click it. Yeah, so I can just click the link. So this is the insulin receptor in humans that I did before. So this actually shows, of course, the insulin receptor. So here you see that it actually predicted that part of this alpha helix will be embedded in the cell wall, because this is kind of its way of showing that this might go through the cell wall. And then you see that it has an unordered amino acid or an unordered part of the chain. Then at the end here, there's another little alpha helix. So this is kind of what it predicted. And here in the legend, it says that blue is outside, is the exocytosolic, the yellow part is in the membrane, and the red part is inside of the cell. So this part is inside of the cell. This part is outside of the cell. And then we have the re-entrant loop, which is not there. And we don't have any green, so we don't have any interfacial helices. Of course, we can also look at the summary. So this was the insulin receptor in humans. It actually figured out that this thing just has known structures. Then we can go to the 1D structure. So that's just a standard structure. And now it colored it to say which parts are inside, which parts are in the membrane. And then we have here this little part, which is here in the 3D structure. So it's, I think, this part here. No, this part here. So it's part of the helix, which it doesn't really know if it's inside or outside of the membrane. And then here you have the inside of the membrane again. So then it will predict the 2D structure as well. So here it shows you the 2D structure prediction. And it will run different prediction algorithms. It will run the standard HMM top. But it will also look at other. And then it will combine everything. And then I'm zooming out, which I don't want. I would just want to scroll down. And so here it will have all of these different signals. Oh, I don't want to zoom. I want to scroll down. So here it says that there are all kinds of different signals. And these things you can get more information on by looking at it. And it will do a prediction on these are generally known proteins in which it finds the same sequence. But the nicest thing is that you can just click on the 3D structure, and then it will tell you more or less how it looks. All right, so did the CC top already finish? So we just have to refresh it? No, it's still being processed. So we will come back to that later. Hey, you can see that it's just a very easy program. You just throw in the sequence, and then it will do a prediction. And it will show you if it's inside or outside. And hey, you get some more 2D information about the 2D structure. And unfortunately, you can't really click on it. But here you have the cross refs. So these are all other proteins which have homology with your sequence. And then it shows you which part of the sequence it has homology with. So some of them are really weird because there's only a very small homology. But some of them have like large homology. So this 2DTE-TGE has the same kind of membrane helix as that we see in the human insulin receptor. All right, so let's see. Let's refresh again. No, it's still running. So we could have taken a smaller one, but let's just continue with the presentation. And I will click refresh. And when it's finished, then we can see what it does with the tau protein. All right, so let's go back to the presentation. So I just put this in. And this is more or less the permanent link if I want to go to the results. Because it took some time. Like when I ran it this morning, it took around 15 to 20 minutes to go through all of the predictions and then combine them into one figure in the end. All right, so there's two other methods, which of course had a CC top uses several methods. So the ThreadFold recognition is you can use when there is no homologous protein sequence available. It's template-based. It's similar to doing local alignment to known protein. So what it does, it actually takes the whole protein sequence that you have, then chops it up into little pieces, and then looks if any of these little pieces is available in another protein. And this is, of course, very similar to homology modeling. But in homology modeling, it's more like doing global alignment. So if there is, it just looks at the whole protein sequence, looks if there is a protein sequence of similar length, which has a known structure. And then it just does the modeling based on that. And then had the changes of the novel protein towards the known protein, it will then try to do the prediction of that. So of course, homology modeling is much quicker. Because it takes a known structure and then just makes modifications to the known structure while in the ThreadFold recognition, it chops up the protein into little pieces. And then for each of these little pieces, it looks to see if there's a known structure available and then combines them together into one big novel structure. So if you can do a homology modeling because there's a homologous protein out there, then you can kind of do that. All right, so the latest and greatest. So this is the new slide that I added for you guys. You can still see that I didn't format it. But this is Google. So Google actually thought it would be a good idea to get involved as well, because they bought one of these. Well, it's not a supercomputer. They bought one of these quantum computers. So from the D-Wave. So they bought this 512 qubit D-Wave machine. They didn't really know what they wanted to do with it. But this machine is one of the first real quantum computers in the world, which you can just buy. So the D-Wave company is located in Canada. And Google bought one of these machines. And last time when I was at Oak Ridge National Laboratory in the US, I actually asked them, do you have a D-Wave? And they said, no, we don't have a D-Wave, but the guys in the building next to us, which is the kind of NSA building. Well, not NSA, but Department of Defense building. They might have one, but we can't go in there because you're not a US national. So you're not allowed to enter the building. So then I was really disappointed because I really wanted to see one of these new computers, but they didn't have one at that time. But what Google did is they combined two of these things. So they have a neural network, which does their secondary structure prediction. So it predicts all phalluses. Happy holidays for you. Yes, happy holidays for you as well. Yeah, on the? Yeah. So you have to stop with the wrong one. That's OK. So yeah, just wishing colleagues happy. Next week I will already be on holiday. I have to see if we actually have a lecture next week. If we have a lecture next week, then I'm probably legally obliged to show up and give you a lecture. But we can talk about that. But so what Google did is it took this neural network, which is used in a lot of different prediction algorithms for predicting alpha helices and beta sheets. And then the next step that they did is so head. This is the first step. So they take the protein sequence, and then they have these databases. These databases feed this big neural network that they build. The neural network then spits out a distance prediction. So this is a prediction of how far away individual amino acids are from each other. And then besides that, they have this angle prediction. So that's the angle of the side chain, because of course the side chain cannot kind of fold back 180 degrees. So this is kind of a constrained matrix, which they have. So if there are very abrupt changes in the side chain, then that is generally not how it works in the real world. In the real world, things gradually loop around. But what they then did is then they used this new big D-wave quantum computer of theirs. And then they did gradient descent. So gradient descent is an algorithm in which you start kind of wiggling stuff around. And at every position of the wiggle, so you move one amino acid a little bit, and then you calculate the overall scoring of the protein. So in this case, they're looking at free energy. So you calculate the free energy and see if the free energy goes down. So if the free energy goes down, you say, oh, this was a good wiggle. And if the free energy goes up, then this is a bad wiggle. Because in the end, everything in the universe wants to be at the lowest energy state. So this is how they wiggle their protein. And they published this in January 2020. And this actually predicts tertiary structures. And it does it pretty well. So on average, they have like a 75% accuracy, which is really good. So they are on track to kind of solving this open question in science. So very good. Like applause for Google. It's not just a search engine, but the company itself is like massive. And you can read more here on the DeepMind article. If you're not really into reading like these articles that are written by the authors themselves, you can actually go to the Nature publication that they have. I think it's still available for free. So you don't have to pay until like the end of the year. Because it's such an important article, or it's deemed such an important article on how they do it. So they kind of combine these two methods. So the one method is just using this neural network to do standard kind of homology prediction in a way. And then they follow this up by gradient descent in which they start wiggling the amino acids. And then every time calculate the free energy of the whole structure, which of course is a very computationally intensive methodology. So you would require like a massive computer farm for that. Of course, Google has a massive computer farm. But on the other hand, they also have this D-Wave machine. And this D-Wave machine is optimized. So it's a quantum computer which does gradient descent, which runs a gradient descent algorithm much, much faster than a standard computer can. Very interesting work. So just read the article and if you're interested. And it's going forward in science. It's always good to see that things are improving. And like 10 years ago, this was completely unthinkable. That you would be able to predict a protein tertiary structure with like this level of accuracy. All right, so production of proteins. We already talked about this a lot. So I actually cut down the number of slides talking about the ribosome again. But I do want to highlight some of the features of the ribosome. So like proteins are assembled from amino acids using information encoded in gene. So we already talked about that. And then the genetic code is a set of three nucleotides each, right? So three nucleotides on the DNA in the coding region. They code for a single amino acid. So this is kind of the description of that. We know that DNA contains four nucleotides. So now a question to you guys. If we know that there are four nucleotides and that there are three, a combination of three nucleotides code for a single amino acid, then how many possible codons are there? How many possible amino acids can you encode on DNA? 64, could you write that down in a more scientific way? Like how did you get to the answer? Right, we learned about how to write things down in R. So if you would write this down in R, how would you do this? Right, because it's something to the power of something. All right, 4 to the power of 3. Very good, very good. You guys are getting really good at this. Should come up with more difficult questions for during the lecture and making sure that you're still paying attention. But yeah, in theory, there's 64 possible codons. There's 20 amino acids. So why is that? So there is redundancy. There has to be redundancy, because if you have 64, because the DNA can code for 64 possible amino acids, and in humans, there are around 20 which are essential and there are 20 which are coded by tRNA. So there is some redundancy in the genetic code. So genes are encoded into DNA, transcribed into mRNA, pre-MRNA, modified into mature, and then the ribosome is making these. So HE is known as translation. Proteins are always biosynthesized from N-terminus to C-terminus. That's why we write them down in that way. So ribosome picture, we already saw this. HE, the A-side, P-side, and E-side. It's all in the RNA lecture. And here, like indeed, four to the power of three means that we have 64 possible codons. So there are 60 cents codons for amino acids because there are three terminator codons because you need to tell the ribosome to stop at a certain point. And there is one codon which is used as the initiator, because you have to tell the ribosome also to start making a certain thing. And so some of the codons are redundant or degenerate, as we call them. So two or more different codons are coding for the same amino acids. And so we know that tRNAs are the things that match the mRNA codon. So how does it end up that there are only, like there are actually 60, so your body actually makes 60 tRNAs, but some of these tRNAs, they come with different anti-code. So the total number of tRNAs found in the system is less than 61. And in most cases, it's only 22 to 31. So even if you look at the most complex plant, it only has 31 unique tRNAs. So in the 1960s, people were already aware of this. And the thing that FC Crick from Watson and Crick proposed is that the anti-code on 5 prime end in the tRNA has the ability to pair with more than one base found at the third base pair at the 3 prime end of the codon, so of the mRNA. Because the orientation of codons and anti-codons is anti-parallel, so they go the other way, the third base of the codon base pair with the first base pair of the anti-codon. So how does this look? So if we have Lloyd's scene, right? So this is twice a tRNA, which codes for Lloyd's scene. Here, we see that the codon, which is on this tRNA, is G-A-C. Well, actually, it's not. It's G-A-G. So we read it from 5 prime to 3 prime, right? And here, we see the mRNA, which is written from 5 prime to 3 prime. So it is C-U-C. So this is the perfect match. And then the tRNA has a perfect binding to the mRNA. However, there's also the other Lloyd's scene codon, which has the same sequence. But this is actually bindings to C-U-U. And how can it bind to C-U-U? Because normally, the G would not be able to bind to U. But if you look into the ribosome, then the wobble base pair position is actually not really read very accurately. Because the tRNA is not horizontal. And it's not at 180 degrees or at a 90 degree angle. It's actually slightly at an angle, which is slightly less than 90 degrees. So that means that the third base pair, like this G here, together with this U, they are not that close together. They are kind of in a wobble position. So there's a little bit of slippery there. So the third base pair on the mRNA for each of the codons does not really matter that much. And that is because the tRNA is at an angle, which is not exactly 90 degrees. But it is slightly at a less angle, which makes that the third kind of base pair in the codon does not really need to be correct. All right, so that was the only thing that I wanted to kind of add to the ribosomal stuff. I don't think that we mentioned the wobble bases before, but the wobble bases are really important because they can show you if something is species-specific. So it's just a nice analysis method, too. Because the third base pair is also not really under selection. Normally, all of the base pairs in the genome, which code for proteins, they are under selection. Because if you change the amino acid coding of the protein, then that has an influence on the biochemistry. But in this case, the third base pair is not really under selection. So you see more mutations at the third base pair location than you see at the other ones. All right, let me actually refresh the tau protein. So the tau protein worked really well because, let me show you guys, there are no 3D structures for this protein. So boohoo. Although we can still learn something right, because we can still see that there are some which are homologous. So you can actually see that there's a domain at the end of the tau protein. And this domain is PF00418. So we can just Google that. So PF00418. So PF00418. So that's a tubulin binding. So there is a tubulin binding thing. So it teaches us that at the end of the tau protein, this is the wrong one. So here, how we learned that at around amino acid 600, up until 720 of this tau protein, we see that there is a structure which is very similar to a tubulin binding thing. And if you want to learn more, then it is actually something that it has to do with microtubule assembly. So that's the microtubule within the cell. So that's interesting. So did it find anything else? What we can learn from? So there are prior tau proteins. And there is also something called PS00. And that's the same thing. So it actually doesn't find that much structure here. So apparently, people haven't really studied tau proteins, or the structure of tau proteins. One of the reasons for this could be that they are really hard to crystallize. If you can't crystallize them, then it's also really hard to get a structure for them. But that's interesting, that it doesn't find any 3D structure. So it's not the best example then. I'm actually glad that I ran the insulin thing before. Instead of trusting you guys to come up with one of the few proteins which actually have no 3D structure yet. But it does give you the 1D structure. It gives you the 2D structure. Sorry. No, don't be sorry. That's how these things work. Like, in the end, you want to learn things about proteins. So you're looking at the protein structure. And here it also gives you some ideas that there might be some other things, like these are the predictions that it does. But it doesn't really contain a membrane part. But there is a very, very well-known domain which runs from amino acid 600 to 720. Philip? Yep. Keys. There you go. Very good. All right. So, tau protein. So it's finished. So if anyone has its other or its own favorite protein and thinks, look, oh, that's something that we should just have a quick look at, then sure, throw it in the chat. And we can just restart the prediction and update the prediction counter with one. So that's OK. All right. So talking more about protein. So proteins are, I hope now that you guys get a little bit of an understanding that proteins are kind of complex, that they are difficult to predict in some ways. But if you're lucky, then someone made a 3D structure of your protein somewhere. So then that helps you to understand what your protein might be doing. So of course, when we talk about proteins, we have to talk about how you can identify proteins, right? Because we need to know which protein is which. And for that, we need to start first with the purification. Because if you have a cell and you break open a cell, then literally 200,000 different proteins come out. And then you need to find a way to kind of get your protein of interest. So there are different purification techniques like ultracentrification, precipitation, electrophoresis, and chromatography. Just say it of Dutch when it doesn't work in English. Chromatography, chromatography, chromatography. And there are different identification techniques like immunohistochemically using x-ray crystallography or NMR or mass spectrometry. And of course, we can use two-dimensional gel electrophoresis. But we will just quickly go through all of them because what is ultracentrification? So ultracentrification is a step where you do centrifugation, where you separate certain organelles from the cell for further analysis, right? So the first thing that you do is you take your tube, you put your cells in there, and you put some lysis in there so you break the membrane open. And then when you break the membrane open and everything starts floating around, like here you have in different colors and with different sizes, you have different proteins. So what you then start doing is you do repeated centrifugation. So you have a nice centrifuge, you start spinning stuff around, and then the bigger proteins start falling to the bottom, right, because they are heavier. So if you spin at a certain speed, then the heavier ones will more easily drop to the bottom than the lighter ones. And at a certain point, hey, you have a little pellet, and what you do is you take the supernatant and then you bring it into a new tube. This new tube is being spun again, some other proteins, hey, so you spin it at a higher revolution, others drop to the bottom, and so you continue until you have your protein of interest, right? Of course, at each of these steps, you can remove the supernatant and you could also study the green ones, the red ones, the blue ones, or the really, really small, tiny black ones, right? Because at each of these steps, you can kind of stop. And the nice thing is, is by spinning it at a certain amount of revolutions, you also have an idea of how big the protein is going to be, which is going to go out of solution into the little pellet on the bottom, right? So it gives you an idea of which, if you wanna study really big proteins, then of course you centrifuge at a relatively low speed, and centrifuging at a relatively low speed, you then remove everything on top, so you just pipe it that away, and then you just study the stuff that remains. So that is why in the old days, when we were talking about proteins, proteins would be mentioned by their Dalton number, because this is one of the first techniques. So they would identify proteins and they would say, oh, this protein is 100 Dalton, meaning that it's like 100 size units. And if it would be 40 Dalton, it would be like, well, four out of 10 from the bigger one, right? So all of the classification used to be based on sizes. So sizes or weights, more or less weight, which is coupled to size, of course, in a way. So one of the other things is precipitation. So when you do it with centrifuge, then it's called sedimentation, but what you do is you have a solution, then you add something to the solution, and then, for example, you add a salt to it or nitrogen, or nitrogen-nathrium or chlorine, and then you put that in, and then this binds to the protein of interest, and then that falls down, and then you have something called precipitate, and you have the supernatant, and then you can study the precipitate, or you can study the supernate. But hey, if you have something which binds, and not all things will bind to all proteins, then you can very precisely extract some of them. Electrophoresis is the nowadays most common technique. It's still very much used in the lab, and here in electrophoresis, you use the motion of dispersed particles relative to a fluence under the inference of a uniform field. So for DNA and RNA, you do electrophoresis in one dimension, but for proteins, you do it in two dimensions. It was developed in 1807 already, the electrophoresis, and so what you do is you make a gel. This gel has kind of holes in there, so things can travel to the holes, and because DNA and RNA and proteins are charged, like they have a slightly negative charge or a slightly positive charge, you can pull them using an electrical field, and then because you pull them through an electrical field, they get slowed down by things like friction, because they bump up to the little kind of matrix that is in the gel, and then they get slowed down. So the bigger a protein is, the more it gets slowed down, so if you pull proteins through a gel, then the big ones will remain on top, and the small ones will kind of run easier, so they will go down to the bottom much more. So this holds for DNA and RNA, but of course this also holds for proteins, but for proteins you generally want to separate them into two dimensions, so you want to separate them based on the isoelectric point, because I told you that proteins have amino acids which have positively or negatively charged side chains, and so the pH at which a particular molecule carries no net electrical charge is in the statistical mean, is the isoelectric point. So because every protein has a different number of charged side chains, or so positively or negatively charged, each protein, when you put it in water, will change the pH of water to be higher or lower than seven, or it will remain seven if the thing is uncharged. But what you do is if you put proteins in water, then based on the side change, you will have a different pH, so a different net charge, so what someone then figured out is that you can actually separate proteins in this way. So what you see here is a gel, so in this gel there is a pH gradient from being very acidic to being very basic, right? So and because of this gradient in the X direction, proteins, when you put them on the gel, will start moving from left to right and start finding the point at which they are not charged because that is the point where they want to be. So if I have a protein which normally is basic, it will start moving until it reaches, so if the protein in water has a pH of eight, then it will start moving and at a pH of six, then the whole thing will be neutral, so it will start moving and that's what you see in this direction. Besides that, you're just using the standard gel electrophoresis where you pull them down based on the size. So here on the bottom, we will have the small proteins, here on the top we will have the large proteins and then here they are separated by their isoelectric point, so that means that that proteins which normally have a positive charge will go to the left side, positives would have a negative charge will go to the other side and the more charged they have, the more they will move. So you see very clearly here that there are spots and you can see that from this gel, you can see many, many different spots and you could use like a pipette or a needle to extract proteins very carefully from this gel and every spot here is a single more or less protein, so it might still be a little bit of a mixture but hey, you can see that you can very, very clearly separate like up to like 200 to 500 proteins from each other which is much better than what we could do before using centrifugation. Using centrifugation, it gets really hard to separate 200 different proteins from each other, right? You always end up with a mixture because everything just drops down and you just take the whole thing. So this is a major advantage in protein study because now here we are able to pick individual protein and we can also quantify them because the size of the dot is the amount of protein that was there. So if you would do this on, for example, the extract of a normal cell versus the extract of a cancer cell, then you might be able to see that, well, in a normal cell, this is a very little dot but in a cancer cell, this is a huge dot, right? That means that the cancer cell is producing much more protein than the normal cell is producing and that might give you a clue of how you would be able to treat this. Then there's another nice way of separating proteins which is based on chromatography. So chromatography is something that you might have done as a child. So you take just a cup of water, then you take a little piece of paper and a pen like this one, which is red, but this red color is, of course, not purely red. There are many different mixtures, many different chemicals in this, so you put a little dot on the paper, you put it in a bucket of water, so something like this and you need a little piece of paper like this, which you then put into the water and then what happens is, is that because of suction tech or because of suction, right? Because paper likes to suck up water and then the water runs from top to bottom. So the mixture is dissolved in a fluid, so you would put this thing in the water, you would put the thing into the thing and then it flows through another material, which is the stationary phase. So the paper is the stationary phase and in this case, the mobile phase would be water. You could, of course, also use things like whiteboard cleaner or acetone or whatever, right? So, and then had the various constituents of the mixture travel at different speeds and this causes them to separate. So if you would take like a green marker, someone like this and you would do this using a little cup of water and a little piece of paper, and then you would see that there are different, different, that every constituent, so every element of the green marker travels at a different speed. So you would get a picture like this and of course this holds for proteins as well. You can just put your protein mixture, take a little piece of paper, put the little piece of paper in and then the water, together with the proteins will run up through the paper and then this will separate out the different proteins as well. So it's also a way that you could separate proteins from each other. All right, so that was a quick overview. If you wanna purify your proteins, you can use centrifugation, you can use precipitation using things like salts. You could use electrophoresis, which is the standard way nowadays of doing it, but you could also use chromatography. And of course, there's not a lot of bioinformatics involved in all of this, but that's because we need to be able to separate proteins before we can start using bioinformatics to analyze them. Of course, in 2D GL electrophoresis nowadays, there's a lot of bioinformatics involved because I have had projects where I got pictures like this, so a couple of five, 600 of them, and then the question was, can we use these pictures and have a little program that actually recognizes the spots, aligns the pictures, right? Because every gel will look a little bit different because of the pH being not exactly the same and the charge not being the same. And so this little program that you can write will pick a couple of control spots and then for each of the pictures, it will make them similar, and then it will do an analysis of how much of a certain protein is there. So 2D gel electrophoresis, there are tools available which allow you to load in like 100 of these gels, and then for each of the gel, it will give you a list of proteins, so it will use like the pH on the one axis, so it will tell you that not the name of the protein, but it will say protein with a PI of 8.7 and a size of 300 or a size of 100 is expressed one in the first gel, 1.5 in the second gel and 1.3 in the other gel. So there's a lot of recognition going on. So you have picture recognition, then making sure that all the pictures are lined up, and that works actually very well and it's actually very fun to do to see that. Our computer can actually help you analyze data coming from this. Same thing here for chromatography. If you have like a professional chromatographer in your lab, then also there, there will be a lot of bioinformatics involved because bioinformatics will do kind of the band detection, band picking, and also will kind of see if the sizes of the bands or not the sizes, but the content of the band is similar or different between different gas chromatography runs that you do or water chromatographies that you do. All right, so I've been talking again for 48 minutes. Let me see. I still have around 20 slides left. So that means they were really, really nice on time. So I will take a short break again. So you guys can enjoy the second break. I think this one is cats. Do we already have an animated gift series on cats or not? I think we already had some animated gifts on cats, right? I don't know, but at least I got it. Yay cats, yeah, always cats, like cats on the internet. Not enough cats on the internet. Now there's nowhere near enough cats on the internet. All right, so I will take a little toilet break, do a little bit of a cigarette break and drink some water again. And I will be back in 10 to 10 minutes, something like that. I might be a little bit earlier. And of course, since I'm showing cats, there will probably be these bots that get attracted to the cats and want me to buy more followers, which we're not going to do. We're just gonna ban the bots and make sure that they don't spam the chat. All right, there's actually someone for dogs. For next week's Curita, I will make a special dog break for you. So I will find the animated gifts of dogs wearing silly outfits and stuff and make one of them for you. All right, so I will, yeah, no problem, no problem. I will stop the recording.