 Also, if you're watching this on YouTube, welcome moderator for being here. So lecture number six already. That means that we're almost halfway through the course. I looked and we'll have 14 lectures, I think. I think there's one more because of the way that the semester started and something like that, which is OK. But if you have any suggestions for a topic that you want to have a lecture on, and I know this is repeating, let me know. Even if you're watching it on YouTube, just throw it down in the comments section. I read the comments. So if you have a very specific interest, like I want to know more about quantum algorithms and how it relates to bioinformatics, or I want to know more about DNA sequencing for your interesting lecture, oh, thank you, thank you. Well, we haven't started yet. So it might be very uninteresting today. But thank you for joining us. All right, so for today, metabolomics and pathways. So this is going to be a short lecture. I think I have 50 slides or something like that. But of course, we have to do the assignments first. So let me show you the gliderung today. So today, we will start with the answers to the previous assignments. We will be talking about metabolism and why metabolites. We will talk about the difference between primary metabolites, secondary metabolites. And then there will be a whole bunch of slides about identification of metabolites because I just like mass spectrometry. So I thought that since I'm interested in mass spectrometry, I will make you guys interested in mass spectrometry. And then of course, we will be talking about the real bioinformatics part, which means databases and different tools to visualize. So I haven't set everything up for that. I wanted to show you guys how you can use cytoscape to visualize metabolic networks. However, I didn't set it up. I was really, really busy this morning. We have the new 3G rule in the office. And I have been designated as the 3G police. So it might actually be that during the lecture, people come walking in to show their vaccination certificate and these kinds of things. But that's something that we'll just have to deal with. I also didn't set up the music. But we'll see how it goes. So I'm excited. So let's start with the answers to the previous assignments. So first get the protein lecture, right? So lecture five proteins. So the first thing was going to ensemble and downloading human hemoglobin alpha 1. So let me go to Firefox. And of course, I need to see Firefox myself. So we'll just go to ensemble. And we are interested in humans. So and we want to find hemoglobin alpha 1. So I'm just going to throw it into the search option. And it's going to take a little while. So let's just delete this part. So first just search for hemoglobin, which actually has only an E. And I think this is gamma A, stop unit gamma 1, hemoglobin alpha 1. And phenotypes. I'm not interested in human phenotypes or mouse phenotypes, actually. It's going on here. All right, hemoglobin. Why are we looking at phenotypes? I want to know genes. That's why I'm here. This is hemoglobin 1. Hemoglobin subunit gamma 1. That's not the one that I wanted. So we want to have alpha 1, which is 142 amino acids. And this one is actually 147. So it's a completely different gene. So hemoglobin and alpha 1. So let's go back to the search results and see what we get when we search for hemoglobin alpha. Make sure that we do genes and do only human genes. Hemoglobin alpha binding, that's not the one that I'm looking for. Pseudo gene also not. Hemoglobin alpha binding, binding, binding, binding. Where is the gene that I'm interested in? Is it this one? No, this is again the hemoglobin subunit 1. Let me look at this assignment a little bit more. First, you need to find the gene. That's always the issue, right? So hemoglobin alpha 1. Did that get a new name or something? Let me just search on Google. Hemoglobin alpha 1. Hemoglobin subunit alpha 1. Also known as Habea 1. Good. So that's going to be a lot easier to find. And there it is. No idea why the search is being so finicky about specifying the name, but this is it. All right, so this is the protein that we're looking at. So indeed, we want to find the one which is 142 amino acids long. So the first question is, download the protein sequence for human hemoglobin alpha 1. So we can just click here on the protein, right? And then it shows us an overview of the protein. But of course, we're interested in the sequence, right? So if we're interested in the sequence, we can just go and say here on the side, because we already selected the protein that we're interested in, we can just go to protein. And then here we have a nice download sequence button. And this will give us the sequence. So of course, we have to again specify that we want to have it in FASTA format, because FASTA format is the format that we use when we exchange DNA sequences. I just want to have the FASTA. I don't want to have the marked up version. So we just go and we say download. And then I get a popup which downloads it. So I'm just going to save it for you guys. I'm going to open it up in Notepad++. So that looks like this. So here we see the peptide, right? So it says that it's HBA1 201, which is the first transcript, because actually this gene, let me explain that a little bit better, perhaps. So when we go back here, right? And we go to the summary of the whole gene. Then the summary of the gene actually shows you where it is located, right? So it's located on chromosome 16 from this base pair to this base pair. We can see that the protein is encoded in three different axons, right? That means that it has two introns, so two regions of the genome which do not code for a protein and which are skipped. And when we look at this picture here, we see that it's located on the forward strand. So that means that it's a positive stranded gene. But we can also see that there are four different transcripts being made, right? So we see that the first transcript is 142 amino acids. The second one is 110 amino acids, right? And they are both protein coding, which means that a protein can be made from it. But you can also see based on the colors that this one has a yellow color and this one has a red color, right? So the yellow color means that there's more evidence. That means that you can see it here, for example, that it has more experimental information behind it. And this one only is found once, while this one has two different sequences in Uniprot, meaning that people have detected the protein and that the protein is also more or less having a crystal structure. If we want to know more about this transcript, we can click on the transcript, and then it zooms into the transcript, right? So we see now the statistics about the first one. We can also click on the second one, get the second one. So what I wanted to show you guys here is the flags, because the flags gives you the certainty that this is really true, right? Because a lot of things are just based on predictions, because annotation of genomes happens partly due to experimentation and partly due to predictions, because based on someone finding a protein in mouse and then the same kind of sequence occurs in humans, then you get a TSL level 2. So a TSL level 2 is a transcript where the best supporting mRNA is flagged as suspect or the support is from multiple ESTs, right? So the TSL level is the level of certainty that people have in that this protein that we observe is a real protein and the same thing holds for the transcripts like the messenger RNAs. And the nice thing is is that just clicking around on Ensemble, you find a lot of information available, and the nice thing is is that it also has a bunch of linkouts to other websites. So for 90% of bioinformatics work, Ensemble is more or less the initial entry point because we have a gene that we're interested in, we search for the gene on Ensemble. Ensemble gives us some information about the genes, about how many transcripts it can produce, how many possible proteins there are, but it also shows us how much confidence we have. So in this case, the gene can produce four different transcripts or four different messenger RNAs. Only two of these messenger RNAs are suspected to code for a protein, and only one of these proteins has actually been found. That's what the yellow color tells you. The red color tells you that yes, we have detected a messenger RNA which codes for this 110 amino acid protein, but we've never found the protein itself. So that's just a little bit of more background information on how you can see these things. All right, so we have the sequence, so that was question one. All right, so the next question was to use the Interpro tool to analyze the protein sequence, and then results might not be too familiar. So let me go to this one. Let me show you guys the Firefox. So this is Interpro, right? We looked at it last week. So again, we just take the protein sequence that we have, let me switch to the protein sequence, and we just throw it here into the sequence window. Of course, we could also just search by text, right? Because in this case, the protein is a known protein, so it doesn't really make sense to have Interpro search for this thing, but you can actually see that relatively quickly it finds out that it's this one. So it's not really searching, but it already knew which protein we're looking at. All right, so the question here was, let me go back. The results might not be too surprising, but have time, click through the links. So the question here is how many global domains does human hemoglobin alpha one have? So it's still searching. It's a little bit annoying. And then the next question is actually, okay. So we can just do the next question. So question number three is find the protein sequence for myostatin in mice. Again, we have to go to ensemble. It has two known variants. So let's go and go to ensemble. So we switch to, instead of going to human, we want to go to mouse. So myostatin is the gene which actually regulates muscle growth. So it's the break on muscle growth. So if it is broken, then your muscles grow uncontrollably. This gives you the phenotype, which is called double muscle, which is for example, one of the phenotypes in Tesla's sheep. So the sheep that come from a certain part of Holland, they are very muscular. And the same thing holds for cows and dogs. And it just means that you have a mutation in the protein. The protein cannot function anymore and then the break is gone. All right, so let's look at myostatin. Indeed, like the assignment said, there are two known variants download both of the protein sequences. How long are they? So we can actually see that the first protein is 376 amino acids and the second protein is 189 amino acids. Again, there's less evidence for the second one than there is for the first one. But let's just download them to make sure that we have them. So how do we go to the protein summary? They're clicking on it. Then we go back to the protein and then we get the sequence. So we can just say download the sequence and then hey, it will tell me that I want to have the amino acids. So just download the first one. So we save it and then when we open it up, it shows me that this is the first one. Let me show you guys, right? So here we see that myostatin. So it says MSTN 201, which is the first peptide, right? So we can check that we have the right one because here you also see that it's called 201. And then we go to 202. So we just click on it and then we say download sequence and again, download. We save it again and we open it up. And then we see now that when we look at the sequence that we downloaded, it is 202. So we have 201, we have 202. And you can see that these sequences are very similar, right? So you can see that they both start with MMQKL and you can just see that the second one is just a truncated version of the previous protein. So that means that probably one of the latter axons, right? So this gene is divided in multiple axons. The first protein includes all of the axons and the second protein that is being made only as part of all of the axons. So it cuts off here at the trip F and then here we probably also see the trip. No, so here we actually see that here we see TRI. We see, so we see that also there's a little bit of a difference. So there is probably some splicing going on where we jump from one axon to the other one in the full protein while in the truncated protein we actually don't jump but it just includes some of the base pairs at the end. All right, so we've downloaded. So that was question number three. And now we have to analyze both of them in Interpro. So I'm hoping that Interpro will be a little bit quicker. Yes, perfect. So at least the first one finished. So here we have the Interpro search results, right? For the first one that we searched for. Human hemoglobin alpha one sub unit. So we were interested in how many domains there were, how many global domains does human hemoglobin alpha one has. So we can look at the domains, right? And we see that there is this global domain and this global domain is across the whole protein, right? Because the protein is 142 long. And we see here that the global domain spans from the third amino acid all the way to amino acid 142. So the answer to this question is, how many global domains does it have? It has one global domain. If we are, for example, interested in how many oxygen atoms can be bound by human hemoglobin, then we have to look at the prediction of the family, right? So in the family prediction, we see that indeed it predicts that there's an alpha heme here. There's alpha heme there. And then we see five in total. So that means that alpha hemoglobin, one has five alpha heme domains. So these heme domains are domains where there's an iron molecule. And this iron molecule is able to bind oxygen. So from this, we would hypothesize that this protein is able to bind five oxygen molecules when transporting. Of course, this is just the prediction that we get. It might be that some of these predictions are not as clear. We see here that there's a pi heme domain as well. Pi heme domains, if we want to learn more about them, we can just click on it, right? So, and that's the nice thing. It provides a lot of information. So you can click on all of these and it will link out and give you a description of what a heme binding site is or what this prediction for this one is, right? So we can see here that alpha heme, it gives you some references and it gives you also the proteins at which they occur and some external links if we want to learn more about it. So, a lot of information at your fingertips where you can just easily browse to. All right, so let's do the question number four where we want to analyze both sequences of the myostatin. So let's just throw in both of them. So we go to the homepage of Interpro, right? Because we might be interested in what is actually the difference between the first myostatin and the second one, right? Is there a certain domain that is missing and based on the domain that is missing in the second protein, can we say actually what the function is? Why am I only allowed to do one? All right, we just do one. So let's search for the first one. And the nice thing is it remembers your search results. So you can come back at a later point in time and continue analyzing the protein that you want and it's searching. So we'll just wait for the first one to search. I will also start the second one just so that when we come back to the myostatin gene that we have both of them ready. And then in the meantime, we will just continue to X1. So again, I will just enter the second sequence and I will just say search. Good, so question number five. First we want to get the protein sequence from IDH1, transcript one from the Ensembl database. So let me go to Ensembl and search for IDH1. So of course we're interested in the gene. In this case, do we want to have the human gene? All right, probably we do because the first protein should be 414 amino acids and indeed the first protein is 414 amino acids. So we see here that there are three proteins which have been detected, three proteins which have been hypothesized and of course we see that they are different in length. If we scroll down a little bit to the gene, then we might see how many different axons this gene has and of course, depending on the number of axons and the different axons, it can kind of combine into multiple proteins. It is taking a really long time. It might be due to me streaming though that it's not having the bandwidth to easily download all of the things. Anyway, we want to look at this initial protein, right? Because, and the question was from this protein, what is the 109th amino acid? So here we see the protein, we see all of the sequences, the variants, but we want to just go to the protein sequence and we want to then see what the 109th protein sequence is. So I'm just gonna download it because I like downloading the sequences and just analyzing them in Notepad++. So if we open it up, I think it downloaded it, yes, good. Then we go to Notepad++ and now we want to know what the 109th is. So the nice thing about Notepad++, and I don't know if you guys can see that, so let me actually see here. So here you see this additional information on the bottom in Notepad++, right? So it says line one, column one, select it in zero slash zero, right? So if we wanna know what the 109th is, then what we could do is we could just say, well, select them all, right? And I can see that on a single line here at the bottom, it tells me that there are 60 amino acids on a single line, right? So that means that here I'm at 60, here I'm at 120, and we wanted to know what the 109th was, right? So what is 109? Well, it's 120, so this is the 120th. So then we have to go back. So this is 19, and then we have to do 10 more. And the nice thing is we can just see here at the bottom that we've selected two, so we should go back 12, so we can just say here. So this is it, number 100. And if we want to make sure, right, then we want to check it, right? Because the R here, let me actually highlight this one. I can just say give me your style token, right? So it highlights all the R's, and now I can just move away. So now I want to really make sure that the R that I selected here, right? So this one here is the 119th, so I just go all the way to the front, and then it says here at the bottom that it is 109 that I have selected. So notepad++ really is useful in when you want to analyze sequences because it will automatically kind of do these kinds of counting for you. So you don't have to use R, you could use R, read in the file, have make sure that you put it all on a single line and then just select the 109th, but selecting the 109th or figuring out which one it is is relatively easy when you use notepad++ because it gives you all of the statistics at the bottom. Good, so we know now that the amino acid is R, or that sounds a little bit weird, but it's the R amino acid. So if I want to know how the amino acid is called, right? Because I don't know all the amino acids by heart, I just go to the EOPAC website, right? Because EOPAC does the naming of the different amino acids. So go to the EOPAC website, and let's show you this. So this is the international chemistry one, and I want to know the amino acid coding. So let's just go back, AA coding. It's a little bit easier, and there's 16 more rows, so just let's say all of them. So we are interested in R, so that means that it is arginine. So the answer to the question is the 109th amino acid from IDH1 is arginine. Good. Look up IDH1 in the RCBS database. What could happen to the function of IDH1 if we change amino acid at 109? Okay, so we go to the RCBS database. Can I do it like this? No, I just have to go like this. I actually wanted to click, but the thing doesn't really allow me to do. All right, so we want to know about IDH1, right? So, and it actually has the structure and everything. So we can just click on the search result here. So these are the crystal structures. So we should say, is this really IDH1? Yes, iso-citrate dehydrogenase cytoplasmic. So we click on the protein structure, and then let me see. So here we see the structure of the protein, right? And when we look at, when we zoom in, can I zoom in a little bit? Yes, so we can zoom in a little bit. And then here that you see that there's a mismatch, but we want to know position 109. You used to be able to zoom in here. Position 102, position 109 is over here. So it is, and this is one of the parts that is hydrophilic, right? So it likes water because it has a positive hydropathy. So it's probably something on the outside. And if we change amino acid 109, is this really which database I wanted to look at? No, I wanted to go to, let me see, let me see where is, oh wait, it still zooms. So I'm interested in amino acid 109. So the arginine at 109. So here when I see that's 102, 116, that is weird here, 109. So this is the arginine. And now it doesn't overlap any domains. Let me see what I have in my answers just so that I can, because I thought that it actually is, okay, so let me show you guys my answers. Why the H1? The answer is nothing would change. In a previous version of the database, amino acid 109 was part of the binding site. All right, so that's a little bit of a silly question then. So it used to be that the binding site was actually annotated in this sequence. That is a little bit strange that it changed. I think we had the same issue last year where I also tried to show, okay, so now we see that here we have the annotation saying that this is the binding site. So when we change amino acid 109, probably the binding site won't be able to work anymore. But for some reason, the new version of the database doesn't put anything here. So the answer to this is it probably won't change anything if we would change the 109th position. Of course, just as an example, oh, what's happening here? As an example, we can also look where the binding site is currently located. I'm a little bit confused why these things keep scrolling. So these are partially modeled, modeled residue, partially modeled residue. That's not what we want. It used to say binding site in one of the models where so this is an engineered mutation. So this is a mutation that people have already introduced. So instead of the H amino acid that used to be there, people put an arginine there as well. But, huh, interesting. It doesn't show you the binding sites anymore. Anyway, it had like based on the different results, the same for Interpro. We could also use Interpro to figure out if at position 109, there is really something going on. But of course, like knowledge about proteins also progresses. So it might be that in the next version of the database, we have another annotation or a new annotation saying that 109 is very important for binding ATP. All right, so let's switch back to the previous ones. So when we looked at mouse myostatin, it says that it's growth differentiation factor eight. And then we wanted to look at the difference between 101 or 201 and 202. So let me put it in the right order. So now the first top is 201 and the second top is 202. So what we see is that the first one is 376 amino acids long. The second one is only 189. And like we already saw when we looked at it in Notepad++, it seems to be that the last part, so had a part of the protein from like amino acid 186 is missing in the second protein, right? Because this one is much longer, it has almost 150. So the question is, what is located in these last 150 base pairs, right? So when we look, we see here that there's a TGFB underscore C domain, which has been predicted to be at this position in the protein, right? So since this protein is a lot shorter, it only has the prediction for the TGFB protopeptide, the same as what is here, right? So they have the exact same sequence. When we look at amino acid 43 to 237, the same thing is here. Of course, it's a little bit shorter, 43 to 180. So part of the domain, the last part of the domain is not there in the protein. But the major difference between the first transcript and the second transcript is that it has this additional TGFB C domain. When we look at the long version of the protein, when we look at the short version of the protein, it only has this one. So of course, we now want to know, so what does this thing do, right? So we want to kind of use the clickouts to kind of figure out what this thing does. And so I can just go and look at this EPR. So this is a transforming growth vector is a multifunctional peptide that controls proliferation, differentiation, and other functions in many cell types. So this part is missing in the short protein, but it is available. So hey, it regulates cytokines, it exerts a tumor suppressor effect, but also modulates cell invasion and immune regulation. So the thing that we learn from it is that if we have the full myostatin protein, it has a domain, and this domain seems to be involved in signaling. So it seems to be a signaling part. So it seems to be that the full protein has an additional signaling part, while the short protein is just missing the signaling part. So probably it can do the regulation, but it can't signal further down the chain that it does this, so, and that's the thing that we learn. Of course, there's all kinds of other references if we want to look into it more, because, of course, it's science, so there's no real one answer. Well, there generally is like one answer, but the idea behind these is, of course, is that it only gives you a hint, and you have to follow up and see what is exactly missing. But it is some growth factor activity, so probably the long protein is able to block the proliferation of the muscle, while the shorter protein is not able to block proliferation of the muscle. So that means that having the second version of the protein is probably causing the myostatin to not limit your muscle growth, and in that case, it would be that the muscles just continue growing, although you have part of the myostatin. Good, so a little bit of working with proteins. I'm not too familiar with protein databases. Like my training is as a geneticist, so I'm much more familiar with looking at DNA sequences and annotating DNA sequences than I am looking at proteins and kind of figuring out what proteins do. But, of course, like proteins are the effector molecules of the cells, so they are the things which are really important for the effect that a certain gene has, right? A gene codes for messenger RNA, messenger RNA codes for proteins, and proteins are the workhorses, so they are the things that do things in the cell. Good, so I will put my answers, all of them online on Moodle and probably also on my website. So, again, when I do a file like this, I always answer it with having a little header, and then here we see the sequence that we had to download, right? And I also remember the link, so how I got there, and then it has one globin domain and five alpha heme domains. So, just the way that I answered them. And you see the myostatins, and then the difference is the lack of growth vector C, the terminal domain of the smaller protein, and then for the IDH1 question, it says, IDH1, nothing would change, but in a previous version, there would be a change, which is just the way that it is, like knowledge progresses. So, we used to think that the binding site for IDF1 was at position 109. Nowadays, we don't think that anymore, or new research was there, and that says that it's not there. Good, but for me, it's important that you guys know that if you want to know something about proteins, you can go to the protein database. Hey, you can go to the RCP database, which has the predictions, and there are several databases interested in proteins, but generally, because I'm a geneticist, I always come from the gene, which is what I did, right? Go to Ensemble, get the sequence, then go to one of these protein prediction databases, do a predictions, which domains are there, and then read the literature and see if I can kind of figure out based on the literature what is going on, and which domains are missing or which domains are important. Good, so let me go back to the PowerPoint. So overview for today, we already did this, but just as a reminder, metabolism, why metabolites? We want to look at the metabolome, primary and secondary metabolites. Identification using mass spectrometry, just to focus on mass spec. There are more ways to identify metabolites, but mass spectrometry is the most common one. And then, of course, several databases that, again, we need to identify the results of the mass spectrometer. So, and of course, some visualization tools, because when we talk about metabolism and metabolites, we also want to kind of visualize this in having a certain substance, a metabolite, being transformed by an enzyme into another metabolite. So we draw networks of how genes are regulating the expression of some metabolites. All right, so first some definition. So what is metabolism? So metabolism is the process through which living systems acquire and use free energy to carry out functions. So there's two very like distinct words in here, living systems, right? So a virus, by definition, does not have a metabolism because viruses are not considered being alive. A bacteria is considered being alive, so a bacteria can have a metabolism. The second word which I highlighted here is free energy, right? So free energy is energy not so much that is free and open source, but it is free energy is the energy that you get by transforming one state into another state, right? So chemical reactions run because of free energy because there's a substance which is in a high energy state and then this substance is not stable because it's in a high energy state. So it automatically or less converts itself into its subcomponents which are in a lower energy state. And of course if you wanna know the exact definition of free energy you can just look that up. So metabolism requires highly coordinated cellular activities and there are four main functions to the metabolism in any more or less situation or in any cell. And this is also what I want to stress is that when people think about cells, they think about cells as little globules of fat with water in there and proteins and other things, but they are not that, right? They are not just randomly put together. Cells are highly, highly organized machines. A cell you should compare to more or less a city with a nuclear reactor instead of thinking about a cell as something which is dumb and just like floating around and not really doing anything. Cells are highly, highly coordinated. And the metabolism is the thing that actually is because the cell needs to have energy for itself to work and reproduce itself. So that is why a cell is such a highly organized system. So there are four main functions. The first one is of course to obtain energy for the cell because every cell needs energy to survive. The other function of metabolism is to convert these nutrients that the cell takes up into bigger molecules like DNA and RNA and proteins. It needs to also have energy to assemble these bigger molecules into cellular structures, right? Because it costs energy to make something like a muscle. So if you think about a single muscle cell, then of course the cell needs energy first to make the different components that it needs to do its functions or to twitch. But also the assembly of these things into a highly complex system is something that needs to be controlled but also because of this control it also requires energy to do so. Furthermore, we need to degrade these macromolecules as well. Right, because proteins that are produced are not needed indefinitely. At a certain point a protein is not needed anymore and it needs to be broken down. And this breaking down of these big proteins or DNA or RNA when it is not needed also requires energy, right? So these are the four co-ordinations that these are the four more or less processes that a cell needs to coordinate. Obtaining energy, converting this energy into big molecules like proteins, RNA, DNA or lipids for that matter. We need to assemble these things into cellular structures. So we need to, for example, build a nucleus and a nucleus is combined of several hundreds of thousands of different macromolecules but also when we don't need some of these macromolecules anymore the cell needs to expend energy on breaking them down. So degradation. So there are two things that divide metabolism. So we have the catabolic pathways and we have the anabolic pathways. So catabolism is defined as the degradation pathway to salvage components and energy from biomolecules such as nucleotide proteins, lipids and polysaccharides. So catabolism is the thing that generates energy and we have the anabolism which is the thing that requires energy, right? So the biosynthesis of biomolecules. So when we talk about energy production within a cell we are talking about the catabolic processes when we talk about like the construction of like DNA and RNA and proteins then we are talking about anabolistic processes, right? This means that all living things require a source of energy, carbon, oxygen and nitrogen, right? Because carbon, oxygen and nitrogen are needed to build things like DNA and proteins. Of course, there's also a lot of trace elements that are required, has such as phosphates and other molecules. But in basic terms, if you are a living organism then you require energy, you require carbon, oxygen and nitrogen. And of course there's other things that you require but these are more or less the four main components that every cell needs to deal with and conquer with every day. So all right, so how does this more or less look? So this is one of these nice schemas. Oh, I actually forgot the citation. So the citation is here from Sanders College Publishing. I didn't color it white but of course like in science oh it's also says here Sanders College Publishing but I put the citations here just so that you guys know where I get my pictures from and to credit the people that made the picture. Right, so metabolism broken down in catabolism and in anabolism, it means that we have like energy yielding nutrients like carbohydrates, fats and proteins. These are broken down into energy, poor energy, energy, poor end products, right? So this is the kind of where the free energy comes from. You have complex molecules which are broken down into very basic molecules. And in the process, generally we consider ATP or NADPH to be the carriers of the chemical energy within the cell. Right, so we take these big molecules, these big molecules are broken down into little pieces and what a cell generally produces is ATP. And ATP is generally considered the kind of cellular money, the cellular currency. So if a cell wants to expend energy, it means that it uses an ATP molecule and this ATP molecule is used up in, for example, a chemical transformation of substance A to substance B. But the ATP is one of these like little tiny molecules which is part of the metabolism but it's created and it's expended but then it's also gained back. But if we take catabolic processes which produce ATP besides that we have the anabolistic processes where we take precursor molecules like amino acids, sugar and fatty acids and nitrogen bases, so bases for DNA, then ATP and NADPH is expended to build big molecules. Right, so catabolism is the breaking down of energy yielding nutrients into energy, poor end products while anabolism is going from precursor molecules like amino acids and sugars and building things like proteins, lipids and nucleic acids. So metabolism in a way divides the world because there are two types of metabolic organisms. There are organisms which we called autotrophs and that means that they are self-feeding. That means that these energies, that these organisms can produce all of their cellular components from very basic simple molecules. And so for example, we have chemolithotrophs which are organisms which are generally living near these hot springs in the ocean, right? So these animals, they never see sunlight but they live near one of these chemical vents and these chemical vents, they produce all of these like basic little components, right? So these vents produce things like very short-chain sugar, they produce a little amount of fatty acids and amino acids and by using the chemical oxidation of inorganic compounds, these animals, they extract energy from these vents but this energy is then expended to build all of the molecules that they use. The more commonly known type of autotroph is actually the photo autotrophs and those are plants, right? So those are plants, they take up nutrients from the environment but the energy that they use is their self-feeding. So they use the energy which is in the sunlight to build up bigger molecules and to kind of replicate their DNA. This is of course not entirely true because also plants can't live without any other plants or other things surrounding them but in theory we say that no plants because they do photosynthesis, they produce their own energy and they use this energy which they get from the sunlight to build up their own molecules so that is why they are called autotrophs. Of course, all of the other animals in the world are heterotrophs, so heterotrophs are animals which feed on others. So humans, we eat all kinds of other animals and other plants, so we are not autotrophs because we need our bigger molecules from our environment so we hunt other animals and we eat other animals or plants to get our energy and we don't do photosynthesis and we also don't use chemical process, well we use some chemical process but we don't have a chemical process which oxidates in organic compounds to produce energy for us. So metabolism is really a big divide. So based on what type of animal you are you're either an autothrofe like a plant or you're a heterotrofe like a human who eats the plants and gets the energy stored by these animals. So why do we want to look at metabolites? So heterotrophs need carbon-based foods, yes, yes. So they, well they need autotrophs to feed on, right because the energy needs to come from somewhere, right because the energy within a system if you have, for example, no, if you have a glass jar where you have your thing in, right so you have a closed ecosystem in a glass jar which is standing in the sunlight, right then the sunlight is adding energy to the system but this energy that is added to the system can only be used by autotrophs. So the autotrophs, they do photosynthesis and turn the energy inside of the sunlight they turn this into bigger chemical molecules. The heterotrophs are hunting the autothrophs, right so for example, bugs, they eat plants and they get their energy from the energy that the plants collect. So, yeah, so heterotrophs need a carbon-based food, yeah so not only carbon but it's like it's, is there a mixed type of living things? Yeah and no, like I already said, like a plant, although it is considered an autotrove it is not entirely, right because plants also extract some of the nutrients that they need from the soil they cannot produce all of the nutrients like they also need things like iron molecules and, but iron molecules are not created by, they are just there. But are there mixed types of living things? Probably because it is biology. So in biology the answer that you can give to any question is also probably because like we haven't been to different planets. Yeah, some plants eat meat, right, yeah so if you think about meat eating plants then yes, the plant itself is a photo autotrove, right because it uses sunlight to feed itself but yeah, that's a good example, right so eating meat as a plant means that yes you can sustain yourself by just being in the sun but you sometimes snack on a fly that comes by just because it's cheap energy, right you don't have to do anything for it. But yeah, the problem with biology is that it's not a natural science like physics, right you can't say with 100% certainty that gravity exists like in physics. In biology like it's only the things that we have seen so far, right we might figure out that if we go to a different planet that there's animals which have a completely different way of generating their energy. Who knows, there might be even animals which use nuclear fusion to generate energy. We have never seen those but we can't really exclude the possibility that they are there. But currently on planet Earth, chemolithotrophs and photo autotrophs are the main sources of energy. And so all of the other animals feed on those things like plankton, things like plants those are the animals that produce the energy. All right, so why are we studying metabolites? Well metabolites are very important because biological fluids reflect the health of an individual, right because metabolites we are humans we're generally only interested in humans human health, human longevity. So have we already know from ancient China that in ancient China they already figured out that if you use ants and you use a urine sample of a patient you can figure out if this patient has diabetes, right because people who have diabetes generally have a high amount of glucose that they excrete in their urine. So if the urine of a patient attracts ants then you can be sure or relatively sure that there is diabetic. So herning and all in 1971 created something which is called the metabolic profile, right. So they say that any disease or any change in the health of an individual is reflected into the metabolite. So in the composition of all of the metabolites that you have in your body. So the human metabolite project was started in 2001 I think the first draft of it was published six years later in 2007 and in total they started inventorizing all of the different metabolites that were found in humans. So in total they identified like 2,500 metabolites which are metabolites internally, so secondary metabolites. There were 1,200 different drugs that they cataloged which were found in humans and of course this ranges from aspirin to insulin to cocaine and there are 3,500 different food compounds that they identified. And based on all of these, right, the relationship like how many of these 2,500 metabolites do you have? How many of these 1,200 drug components can we find? And how many of these 3,500 food components can we find? You can get an idea of how healthy an individual is by looking at the ratio or looking at the profile that they make, right. So if you are high in cocaine and low in some other substance, then you're probably not that healthy because you're a major cocaine user. But of course, having a high amount of cocaine also influences all the 2,500 metabolites which you can find in a human blood sample. So a couple of key fields for metabolomics are things like toxicity assessment by metabolic profiling, right. So here we look at the toxicity of substances by looking at differences in the metabolic profile. So we have a metabolic profile of more or less what compounds do we find in a human, a standard human or more or less an average human. And then when we do toxicity assessment, what we look at is we see that, okay, so we now add a substance. How does the metabolic profile change and does it change in a way that we can say, well, this person is sick or is on the verge of getting sick. One of the other key fields is nutrigenomic where we look at food quality and food improvement. Of course, we want to have food which has very specific metabolites in there so that they become better at feeding humans, but we also have food quality assurance because we don't want certain metabolites to be present in food, right. We don't want toxins in our food. And so also their metabolomics and especially mass spectrometry is one of these tools that people use to do food quality assurance. So to make sure that the food is safe to eat and that there are no toxic chemicals in there. Also one of the fields of metabolism is functional genomics where we use it to predict the function of unknown genes. Right, so the function of unknown gene prediction is one of these fields which is a big field where also a lot of the knowledge from metabolism and metabolomics comes in. So when we talk about the metabolome or the metabolic profile, which it used to be called, nowadays we call it metabolome and like proteome and genome. So the metabolome refers to the complete set of small molecule chemicals found within a biological sample. So metabolites are considered intermediates and products of metabolism. The term metabolites is usually restricted to small molecules. So big molecules like a protein are generally not considered a metabolite, although they are the product of the metabolism. Right, so when we talk about metabolites we mean the small molecules which are more or less the building blocks to build up things like proteins or to build up things like lipids. But the end products themselves generally are not considered part of the metabolism. Although this is changing a little bit because when we start looking at cake you can see that the cake database has a big amount of bigger molecules which it actually classifies as metabolites. So when we talk about the metabolome we defy the metabolome into two different separate sections. One separate, one section is the endogenous metabolites which are naturally produced by an organism, right? So have based on the food that I eat or in my genetic constitution I can create a certain amount of metabolites, right? I can take for example a sugar and break down this sugar and make a certain type of metabolites or little sugar components from it. Exogenous metabolites or chemicals that are not naturally, exogenous metabolites are chemicals that are not naturally produced by an organism, right? So something like cocaine is not endogenous to humans, right? Because we, to have cocaine in your metabolic sample you need to take it from the outside. Your body doesn't produce large amounts of cocaine. However, if we talk about the metabolism of a coca plant then of course coca or the precursor for cocaine is of course an endogenous metabolite for the coca plant, right? So what is endogenous and what is exogenous is dependent on which animal you look at. So something which might be endogenous to us might be exogenous for something that each humans and the other way around something that we consider exogenous to us might be endogenous for another species. So that makes it a little bit difficult but just remember if an organism is able to produce a certain metabolite then we call it endogenous when you cannot produce it, for example, you are missing the genes or you're missing the proteins to produce a certain metabolite then we are calling it exogenous because you can only have it inside of you or in your metabolic profile when you take it from the outside because you lack the genes to produce it. All right, so there is some confusion and actually there's a lot of confusion between different terms because we have to term metabolomics, right? So the metabolomics is the field which focuses on metabolic profiling, right? So generating these metabolic profiles determining which metabolites are produced by a certain organism, right? So it looks at these things at a cellular or at an organism level or at an organ level. So not at a whole organism level have for example, you look at a liver and when you think about what does a liver produce and which metabolites can be fine in liver then you are doing metabolomics. Then there is a field which is called metabolomics. So metabolomics is metabolomics but now we also look at perturbations of the metabolism caused by environmental factors. So for example, diet effects or effects of several toxins or drugs or chemicals, right? So of course it is a subfield of metabolomics, right? Metabolomics is part of metabolomics but metabolomics is more or less the experimental field. Metabolomics is more of a descriptive field where you would look at an animal and don't do any experiments, right? You just take an animal from its natural environment and see which metabolites are there. Metabolomics would be when you take for example, a mouse, you shoot it up with methamphetamine and then you see what happens to the metabolic profile of this animal. Then you are not doing metabolomics but you're doing metabolomics because you have this experimental component in there, right? Because you're doing a perturbation so you're kicking the mouse or you're injecting it or you're feeding it something. Then we also have a field which is called exometabolomics and this is the extracellular metabolites and this is more or less the field of biofuel. So how do we get single-celled organisms to produce things like large carbohydrates which we can burn in a combustion engine? So again, it's the same as metabolomics because we're interested in stuff that is being produced by an animal but which is not inside of the animal but which is excreted into the environment. So then there are three different terms. All of them are related to the metabolites and the metabolomics is more or less an observing. So you look at an animal and you describe the metabolic profile so these metabolites are found, these metabolites are exogenous. When you start kicking the organism or perturbating it when you, for example, give it several toxins or when you change diets or force animals by injecting then this is called metabolomics and when we look at exometabolomics then we talk about things like biofuel generation. So there's a lot of confusion because people ask, well, what field are you in? And they say, well, I'm a metabolomics and then you think, oh, okay, so they're injecting mice or they're injecting other animals with things but then they're actually doing metabolomics so they're just looking at the animal as is. All right, so I've been talking for around an hour. I will stop the recording. So people on YouTube, see you in two days with the next part of the lecture.