 recording. So hello everyone, welcome to the lecture. Today we'll start talking about phenotypes. So phenotypes are the observable things, and so observable characteristics of animals and plants. Hey, Oleksandr, Oleksandr. Welcome, welcome. Happy New Year. Yeah, happy New Year Yeah, I just started recording. That's why I did the intro twice, or doing the intro twice. But so phenotypes are the observable part. So we started with DNA, then we went to RNA, and then we went to metabolites, and then the next level is of course the observable level of the phenotype, and phenotypes can be a lot of things. But besides phenotypes, I want to kind of couple this phenotypical layer, so these observable features of animals and plants back to the genome, and that is called QTL mapping. So QTL mapping is a method which has been developed more or less in the 1980s, and it's one of the main methods in quantitative genetics, and not only in genetics, to kind of assign function to parts of the DNA. So I want to explain you guys how that works. Not only that, but I did my PhD on QTL mapping, so I got my doctor degree for contributing to QTL mapping methodology. Besides that, I also developed a new method called CTL mapping. So QTL stands for Quantitative Trade Locust, and I called my new method that I developed during my PhD thesis, CTL mapping for correlated trade mapping, and I wanted to explain you guys that method as well. So I've been rummaging with the presentation. I had a lot of things to do this morning, but I've been rummaging through, and I might have moved some slides, so it might be that I'm expecting a slide to come up, but then a different slide comes up. But that's just because I was working on it, so you just have to forgive me for being surprised about my own slides. But I hope everything's in there, and I'm hoping that we will have a good stream and a good lecture. So as always, ask questions if you want to. This is one of the subjects where I'm good at. Much better than, for example, metabolites. That's something that in my normal work doesn't come up that much, but phenotypes, QTL mapping, CTL mapping, that's kind of my forte. So ask a lot of questions, and I can probably answer all of them. And if I can't, then I will write it down and you will get the answer next week. However, so the overview for today is we're going to talk a little bit about phenotypes again, just a kind of rehash of the phenotype lecture very shortly. We will be talking about QTL mapping, so I will be explaining to you guys different crosses, because you need a, you need a breeding plan before you can start doing QTL analysis, which is different than GWAS, which you do on an outbred population like humans or cows or something like that. But for QTL mapping, you specifically mate individuals together, and that gives you more statistical power. So I'll be talking about different crosses that are in use, and I will also be talking about effect sizes and likelihoods and how we kind of deal with that in QTL mapping. I also want to talk about genome-wide association. Genome-wide association is very closely related to QTL mapping. In my mind, there's actually no real difference, but there is a difference. So, but it's more or less the same thing. You want to find regions of the genome responsible for differences in the phenotypes that you observe. And then I want to talk a little bit about fine mapping of QTLs, and there's a couple of questions that I've woven in between, and then we will talk about CTL mapping. So CTL mapping is the method that I developed during my PhD, and I just, I don't talk about it enough. I should talk more about it. I think it's a really, really good method, and I haven't been using it enough, and I should use it much more. But that's for today. So of course, first, we will look at the previous assignments. So the assignments from before the holidays for lecture six for the metabolites. I still had a problem actually here, because I wanted to show you guys how to do the Metlin database, and I already send a couple of emails to the guys there, but for some reason, my account still is blocked. So I can't show you the Metlin database. So I hope that you guys did the assignment, so made your own account on Metlin, and then were able to follow through. So let me open up the Notepad plus plus window for you guys. So these are the answers that I came up with. And these might of course have changed a little bit, because I wasn't able to check the answers to the Metlin database questions. All right, so first off, I need to open up the assignments. So let me quickly go to Moodle and open up the assignment so I can read the assignments, and then talk you through it. All right, so use the Metlin database to identify the following fragments found by mass spec. So if you do mass spectrometer, you get like this peak spectrum right. And then each of these peaks you can identify as being a very specific compound. So the first compound, I had a mass overcharge, so an m slash z ratio of 565.0518. And here you can see the real power of mass spectrometry, because it's very, very accurate. You get numbers which are accurate, like six digits behind the comma. And this allows you to identify these individual peaks very, very accurately, which is nice, because that allows for identification of a lot of compounds. If it wouldn't be that accurate, then of course you would have more changes in there. And that's also the first two questions. So the first question is a compound measured with a mass overcharge ratio of 565, and it was measured with a 15 parts per million accuracy. So which compound was measured? So if you go to the Metlin database and you fill it in, then the only compound that comes up with this exact peak, and at this accuracy, is Utipae glucose. So it's just the glucose molecule which is there. And then in the next question, then you had to change the accuracy of the machine to 15 parts per million to 120 parts per million. And then you see that now there are two possible candidates, just because the accuracy is not as high as in the previous one. The accuracy is when you have an accuracy of 120. Now there's two possibilities. So it can be Utipae glucose, or it can be Isogynsketin. And so the accuracy of the machine determines how well you are able to characterize the individual peaks of your spectrum. And then again, the next question was a compound measured in negative mode, mass overcharge of 323.03, measured with 15 ppn, which is the most likely compound measured. So the most likely compound measured in this case is Euridine monophosphate. And these are just looking up the individual peaks from the spectrum. So I'm not going to ask you on an exam like which compound has a mass overcharge ratio of something. I just want you to know that if you are ever going to work with metabolite data in the future, that you can use the Matlin database to kind of go from a spectra to identified compounds. All right, so then we go to CAG. And I actually can go to CAG because you don't need an account for that or you didn't need an account when I last checked. So let's go to CAG. And I'm going to show you the Firefox window then. So this is how it looks. It's not the most beautiful website in the world, but it's a very, very useful database when you're dealing with metabolites. So CAG is more or less structured like books in a library. Every pathway has its own number. And then the question was, what is the CAG identifier for the photosynthesis pathway? So of course, you can just go to CAG Pathways. And then here you have all of the different pathways that are there. And here you have the whole list. So you just do Ctrl F, photosynthesis, and then photosynthesis. And if you click on it, then here you see the image that belongs to the photosynthesis pathway. So this is what happens in a plant cell. And photosynthesis is pathway number O. I just clicked on it. Where is it? Where is it? Photosynthesis, photosystem 1, photosynthesis 2. Doesn't that give you here? Oh, so it's called MAP00195. And if you look at photosynthesis here, then it's 1.2.00195. So that is the photosynthesis pathway. So this is an idealized pathway. So these MAP pathways are pathways that do not exist in real-world plans or other things. No, this is just a combination of all the different pathways together into one picture. So if you want to see it in, let me make it a little bit better. Yeah, now it fits on screen totally. So if you are interested in, for example, a single plant like Arabidopsis stalliana, then you can actually see that not every plant encodes the whole pathway. So if we switch to Arabidopsis stalliana, then we can go here to change pathway type. No, I want to have the organism. That's the user data. That's the module. That's that one. Change pathway type. So we can go to pathway type and we just say Arabidopsis stalliana. So now we can also see that here the pathway now changed to ATH for Arabidopsis stalliana 0015. And now highlighted in green are all the enzymes, so all the proteins that are available in Arabidopsis stalliana. And the question is, which proteins are missing in the photosynthesis pathway of Arabidopsis stalliana? Well, so you see that these two proteins here, so pet L and pet M, they are not available in Arabidopsis stalliana. So they are not necessary for the whole pathway to function. These are kind of modifiers that if you have them, you're probably able to produce a little bit more energy. But you see that there's other other enzymes, which are also missing. So Arabidopsis stalliana does not encode the full pathway. It only encodes part of the pathway, which is nice. And like I showed you in the last lecture, you can use this to reason about if a certain animal is able to make a certain metabolite. And of course the photosynthesis pathway has this really nice picture here, which not all pathways have, of course. All right, then the next question is go back to the reference pathway for photosynthesis. If we click on the cytochrome complex, we can see which reactions take place, which reaction this thing catalyzes. So let's go back. So this is the standard one. And we want to click on the cytochrome complex, which is pet B. Let me see, antenna proteins, carbon photosynthesis, where is it? Photosynthesis 2, cytochrome complex, so here. No, did they change the number? Because in the question, it was 1.10.9.1. And here it is actually, oh, that's the, why is this not, cytochrome complex, I think it changed the number. Let me see. And then we have the, yeah, that's really weird that, all right, go back to photosynthesis. And it's the related pathways. I don't want to look at that. Photosystem 2, photosystem 1, huh, interesting. If we click on the cytochrome complex, so the cytochrome complex is here. It's made out of citv6, so it's cytochrome. Okay, that's annoying that they changed the numbering here. They shouldn't change the numbering normally, but let me just click on this one. This is the cytochrome complex. Yeah, so the entry number has changed for this thing, and it now has its own pathway. So it's this thing highlighted in the pathway, then if we click on it, then we should get the information. But it still does the same thing. So this thing encodes for a plus the key null. And in theory, it should also give you the substrate and the product. So here we see what it does. So the substrate is the thing that it uses, and the product is what it makes. So you can see here that it takes plasto-queenol and makes plasto-queenol. And of course, it also has this plasto-cyan, which is oxidized and then reduced, and it produces, and it needs a hydrogen, so a charged hydrogen molecule. But that's what the pathway does. So this enzyme takes this plasto-queenol and makes the plasto-queenol. And here you have then a comment. So it contains two cytochromes and blah, blah, blah. The enzyme plays a key role, and so you get more information. And I think that if you click here on reaction, it should also give you the chemical reaction on what it does. So you see here the plasto-queenol to plasto-queenol. So this is plasto-queenol and this is plasto-queenol. So it just catholizes this one metabolite into another metabolite. And then it also has the other reaction here. So it catholizes two reactions. So click on the reaction itself, it will bring up the chemical formulas. All right, so that was the photosynthesis pathway, just a quick look, right? So if you ever have to deal with pathways or you have to ever have to deal with enzymes and chemical reactions, then use CAG to see what reactions there are. All right, so we then go back to not the pathway menu, we just go back to CAG. Why? Don't do that. So we go back to pathways and now we want to look at the tryptophan, so we're just going to do tryptophan metabolism. And now we want to go to tryptophan biosynthesis, which is also merged with another pathway. That's a little bit annoying. Okay, but it doesn't matter. It just, they merge three different pathways together, which is okay, right? The data changes as well. So here we wanted to have the question. In this whole big formula, there we have to kind of look where is the tryptophan made. And then we wanted to see if mice are able to make tryptophan or not. So we can actually just look for it. And so this is photosynthesis, glycolysis coming in. So this is quinates, schikomat, aspartate, tryptophan. So here we have L tryptophan, right? So this is the thing that we want to make. So L tryptophan can be made from indole via this enzyme. And it can be made from three indolin glycerol phosphate by this enzyme. And this makes L tryptophan. So now the question is, can mice actually make tryptophan? So then of course we have to change the pathway type. So in this case, we want to look at mouse, which is musculoskeletal. All right, that's over here. So we just go to the MMU pathway. And then when you look at the MMU pathway, then you see that of this whole pathway, right? Mice don't really have any of these enzymes. They have some of these enzymes in this part of the pathway. But mice do not have the enzyme to make L tryptophan at all. So the conclusion from this is mice probably are unable to synthesize tryptophan themselves. So they probably need to get it from their food if they need it. The next question was, well, how about soybeans? So how we can just say change pathway type, and then we just search again for soybean, which is called glycine mux, so GMX. If we look at soybeans, then we see that soybeans can indeed make tryptophan. They have both of the enzymes, so they are able to make L tryptophan. And then the question was, what about entropoccus fecalis p583, right? If I am a company and I want to be selling tryptophan at a high rate, then I might think about using a bioreactor with bacteria to make tryptophan in large amounts. Or I know now that I can just ask a farmer who makes soybeans and buy a whole bunch of soybeans and then extract the tryptophan from the soybeans because that would be possible. But if we change the pathway type to be entropoccus, let's just search for v583. So this is here, so this is called, hey, you can see that there's all kinds of diff, oh, you can't see that. Oh, that's because the pop-up window is not captured in OBS. But hey, if you click on the change pathway type, you get just a window with all the possible animals. So if we switch to entropoccus fecalis, they have around 50 different entropoccus in there. Then you can see that entropoccus fecalis, this very specific type of bacteria, sort of v583 type of the bacteria, cannot make tryptophan either. So if I want to start a company and I want to sell tryptophan, then a quick look in this, in CEC, will teach me that probably one of the best ways is to use plants, if I want to make tryptophan. So I probably can't use animals because mice don't produce it. So logically gorillas and humans probably also can't synthesize tryptophan. Soybeans can. The bacteria or at least this specific strain of bacteria cannot do that. All right, the next question was to navigate to the biosynthesis of secondary metabolites. So we go to pathways. Why am I getting to be here? I just want to go back to the main pathway. So I want to say plant secondary metabolites. So we're just going to search for that. Just search for plant secondary metabolites. And then we see the whole big pathway right, so we can just click on the map entry. And this is a huge, huge pathway. And this was more or less a kind of, for you guys to look at the whole pathway, because this is a very, very important pathway in plants. It makes these flavonoids, so have things like flavonoil, but also a lot of substances which are used in medicine. So if we look at the plants here, then the question was, from which substrate is flavonoil produced? So this is a little bit of a kind of looking through the entire pathway and finding the flavonoil, unless you know how it looks like, the chemical structure. But it should be in this part, I think here, flavonoil. So flavonoil is made from flavonoil, which makes sense, right? I thought that was funny, like flavonoil being produced from flavonoil. So you can use CAG to kind of reason about if certain plants can make this flavonoil, you can see how different substances, so chemical substances are transformed from one to the other. And that's what you can use CAG for. CAG is really good to allow you to reason about if a certain animal can produce it, but also if I have a certain bacteria which cannot produce tryptophan, which enzymes should I bring into the bacteria to allow it to produce tryptophan. So that's how you can use CAG. All right, and then the next question was, download and save the citrate cycle in humans and save it so that we can use it later in cytoscape, which let's go and oh, this is really small. So let's just go back. So we're just going to search citrate cycle, should be a normal keyword. Citrate cycle, no, that's not the one that I want. So let's just go to CAG itself, go through the pathways and then just search for citrate. Here we have the citrate cycle. So this is the pathway that I wanted. And then of course, we can change the pathway type again, and we can change the humans. So again, you can see that humans do not encode all of the different enzymes here, but of course, the citrate cycle is very essential for life. So a lot of these things are encoded. So let's just download it. So you just have to download KGML file. So you just click the file and then you just say save. And that's it. So that was the question. All right, so the next question was to do the reactome. All right, move to reactome. So reactome is very, very similar to CAG. It's just a little bit more fancy and it is focused on a slightly higher level than CAG. So CAG is really, we have a chemical substance, we have an enzyme, and that produces another chemical substance. Reactome is slightly higher. So a pathway definition for them could be something like DNA replication. DNA replication is a very broad process, but it's less well-defined than the standard CAG pathways. So get familiar with the structure, click on a couple of pathways. So we just go to pathway browser and that's just to click on a couple. So you can see things that like cell cycle. And of course, cell cycle, DNA repair, or metabolism, those are of course much, much higher level than CAG. So let's click on the cell cycle and it will very fancily zoom to where the cell cycle is. Then we can zoom in much more and then we click on cell cycle. And I think when we double click we can go in there. And then it has a very similar diagram to CAG. Although here it's still not that enzyme or a substrate enzyme product, but it's different. So it's a higher level. All right, so click on a couple that you can drill down by opening up different pathways, which are subnetworks in more detail. And so if we look at the cell cycle, then we see that we have different checkpoints. But if you have a cell, it goes from the M phase into the G1 phase, the S phase, and then the G2 phase. And then here you see that there's a G2 checkpoint and a G1 checkpoint at which there's checks. So the cell cannot continue until everything is done there. And you see here that there's chromatin maintenance being done. And you see here an overview of how myosis works, because that's intrinsic to two cell cycles. Could my moderator perhaps block and delete the message? That would be nice. Otherwise, I have to go to... All right, so let me see. Is my moderator there? Or are you already partying for your Christmas party? I have to do this myself then. So let's just delete the message. Oh, okay. The moderator did this as well. All right, so it's just a different level at which they look. So we can look at something else like DNA replication, right? So it then zooms back out. And then here we see how DNA replication works. And then we can even drill down, I think, on these diagrams. And then you can go deeper and deeper and deeper. And at a certain point, you hit a level which is very similar to CAG, where you can go from a more or less enzyme. And so, and then you can go further. One of the nice things is that just like CAG, they have references to papers where they got the data from. So if we go to ARC5, then you can see it has all kinds of external identifiers where you can learn more about it. And here you can see where it's located. And then you can actually look at different species as well, just like CAG by clicking the select species to go to button. All right, click the search icon. Here you can input your own gene of interest, search for HALA, and project it on different networks in which pathway search for HALA, project it on a different network. So we go to the search icon. I don't think that I'm seeing the whole page actually. That's a little bit scaling. So let's go back to Reactome, we can just search. So we will just search for HALA in humans. And we want to know, so the question is in which pathway is COX-15 active? So we just go to COX-15, which is a protein. So it tells here that it's in the mitochondrial inner membrane, and its location in the pathway browser here is in the metabolism, it's in the heme biosynthesis, and then COX-15 transforms heme O to heme A in Homo sapiens, and the location is in the inner membrane. And then I think we should be able to go back to the pathway itself, I think. Can we go here? Yeah. So if we then go to the pathway browser, then we can see where in the whole pathway it is, and then it will zoom in. So if we zoom out a little bit, we can see that it's actually in the pathway, which is called, no, that's not how to, so it's in the pathway called heme biosynthesis. So that's the answer to the question. All right. So again, there's just one question for Reactome, but be aware that if you are studying, for example, a biochemical process, which is higher level than the one that you can find in CAC, you can go to Reactome, and in Reactome, you should be able to find more or less the genes and proteins that are involved in your pathway. All right. So click and download Cytoscape. I have Cytoscape installed. I just didn't add it to OBS yet. I hope I can actually. So let me make the window a little bit smaller. It's this too. I want to capture a window. Yeah. And the window that I want to capture is Cytoscape. And I want to make it a little bit bigger for you guys. And I want to make it a little bit more like this, a little bit more like this. So when you open up Cytoscape, and it really depends which version you have, I think I have a relatively old version. I didn't download it today, a little bit smaller, so that it doesn't overlap with me. So this is how Cytoscape looks. And Cytoscape is one of these tools that is really nice for visualizing networks and building your own networks. So like there was a little example. So find the app that will allow you to load CAC files and install it, then load the Cytric asset cycle. So you can just go to plugins and you can say manage plugins. And then when you look at the plugins, I'm getting an error console, which is okay. And then I want to search for, why can't I install it? All right. So it doesn't allow me to install the pathway viewer because of some error of being out of date. Let me see 2.7.0. That's actually very old. So the current version of Cytoscape is 3.8.2. So I'm almost four versions behind, so that's a little bit bad. So I'm not going to reinstall the new one. The thing that I just wanted to show you in Cytoscape is that it's relatively easy to make pathways inside escape. So go away. So if we look at notepad plus plus, so let's disable these. Let me get the notepad plus plus window. So what Cytoscape does, you can do A to B, B from C, and C from A. And we can then do D from A as well. So we just define a three symbols or a triplet. And so you have node edge node. Then I'm going to save this on the D drive just as example dot text. And then I'm going to import this into Cytoscape. So let's open up Cytoscape again. And we are just going to say file import network from table. And you guys can see that. Can I capture this window as well? At window capture, yes. All right, yeah. So then when you do this, you get this little window which looks like this. So this is the window that pops up when you import the network. So I'm just going to select this little file that I just created, which I created on my D drive. And as example dot txt. So here you see that it detects that there are certain columns. So the only thing that we have to say is that we want to import all three columns. And column one is the source. The interaction is column two. And the target is column three. All right. And then we just say import. And then it will import it. We will close it. And then it looks like this, right? So it had our pathways here. And we can move around the different nodes and edges. So we can see that the network that we defined. One other thing, of course, when we imported this, when we go to the edge attribute browser, then you can see here that this is a from edge. And this is a from edge. But this is a two edge, right? Because we set A to B. And then we set C from B. So what we can then do is we can go to the vizmapper. And then we can say, well, this is the current style, right? We could also go and select some of the styles which are available. And then it looks a little bit more fancy. But let's just stay with the default style. And we want to, for example, based on if an edge is a from edge or an edge is a two edge, we want to give it a little arrow. So we can go here to the edge target arrow color, target arrow shape. So we can just double click to create. And then we select the value. So we want to do this based on the ID. And then the ID, what we want to do is we want to use a discrete mapper. No, not based on the ID, on the interaction, right? So in the file that we created, we have two types of interaction. We have a to interaction, and we have a from interaction. So here we're saying that the edge target arrow shape, so the shape of the arrow to the target. And so in this case, when we have a from interaction, we don't want to have an arrow. But when we have a to, we want to have an arrow. So you can see that there's one two, which now points from a to b. Then we can do the same thing. But now we can do from the edge source arrow shape. So we double click. Then we want to say this is based on the interaction. It is a discrete mapper. And in this case, we want to do it the other way around, right? If you have a from interaction, then actually it should point the other way. So if we have a from interaction, I want to have an arrow. And if I have a to interaction, I don't want to do that. I want to probably add like a circle or something like that, right? So you can see that we can have all kinds of different descriptors. And this is just looking at the edges. And we can do the background color, we can do the foreground color, but we can make our own networks. And we can style our own networks. We can actually add attributes to the nodes as well. So if we look at the node attribute browser, then we see that this icon here just has a single attribute called b. But we can actually create a new attribute, which is for example, the new attribute name is going to be called color, right? And now we can say that b should have a color which is red. And we can go to a and the color of a is blue. And now what we can do is we can now say, okay, so we are now again, doing a node visual mapping. And for example, we can think I have to reload the visual mapper. Yeah, because Oh, here, for example, we want to have the node color and the node color. So we create a node color mapping. And then what do we do? Well, based on the color, we are just going to pass that through. So b is red, a is blue, which I didn't save yet. And then it doesn't recognize these colors. So we're just going to use a discrete mapper. And we're going to add blue. So we're going to say blue is blue. And red is red. And it will color the edges as well. Right? So you can do a lot of this looks horrible for you guys. I don't know why it does this. Let me see if I move it around, it doesn't help. If I go back to network, no, it just keeps the weird pop up window open. All right, let's just delete it like that. But you can you can just add different, different things to your nodes, you can add different things to your edges. And then based on that, you can color them and make your network loop really, really pretty. And of course, you can do this in a more or less automated fashion. Because these visual mapping styles are disconnected from the type of network that you're looking at. So you can make your visual style. And then you can generate a network, for example, from keg or download a network from keg import it here, and then use your own visual style to do that. All right, so a very quick introduction in cytoscape. It takes you a while to really get to know all of the different features. But you have nodes, which are the origin. Then you have edges, which connect two nodes together. And then the nodes can have properties, the edge can have properties. And then there's also something which is a network attribute browser. So you can also add attributes to the network. And of course, there's a bunch of different visual styles, which you can choose from. And you can even download styles from other people. But this style, for example, looks a little bit different. So it puts the interaction on the node or on the edge here. Nodes are like this. And then you can use this to visualize your Facebook network. So all of your friends and friends are friends. But you can also use it for biological networks. And you can build your own really beautiful pictures using cytoscape. All right, so let me check the answers, see if I forgot one of these answers. All right, so cox 15 was in the heme biosynthesis, phenotypes correlation. Oh yeah, yeah. And then we wanted to do this with the correlation in R. So let me see where is my R window. So let's close cytoscape for now. So that's my R window. So the question was when we go here. All right, so this assignment will focus on generating a network in cytoscape using R. We will use the data file phenotypes.txt, which we already did. Make sure you set your working directory correctly. So let me see if the working directory is set correctly. Yeah, that thing exists. And is the phenotypes file there as well. It is very good. And usually the first thing that I do when I load data into R is just look at the first 10 rows, first five columns, right? So this is how our phenotype data looks. We have certain individuals and then we've measured different metabolites. So we have measured hydroxypropyl, hydroxybutyl, and butanil. So these are metabolites in different plans. All right, so the next question is you can use the correlation function to calculate the correlation. So let's do that. So let's calculate the correlation. And now if we look at correlation, then you can see that they are all missing. Oh, let me show you the R window actually. So when we calculate the correlation using core phenotypes, you see that they are all missing. And that is because when we look at the original data, right? So let's look at this first. And we see that, for example, for individual one, some of these are not measured. And since in our NAs are passed through, we have to tell the correlation function that we don't want to pass through NAs. So if you have an NA, just ignore this individual. So throw the individual out of the dataset. So we can use the command like this. So we can say calculate the correlation on the phenotypes uses pair. So this is then the pairway interaction. So if one of the individuals or both of the individuals are missing, kick them out. Then we have correlation calculated. And if we look at the first five elements of the correlation, then we see that everything is correlated to itself by one. And then we see that there's different correlations between the different metabolites that we see between the different individuals. All right. So then the next question, the question was, well, now go through all of the rows, right? And go through all of the columns of this correlation matrix. And then if the correlation is high, right? So if you go through the rows and the columns, if the correlation is high, write that in a little file. So the way that I did that is like this. So I first extract or I first make a variable that will hold all the high correlations, right? So the core, the connection here is just says from R to C. So I go through all the row names of the correlation matrix. I go through all the column names. So I take the first row name and then go through all the columns. And then if the correlation between the row and the column is higher than 0.7, then I just remember it. And I remember it in the way that we built this small network before, like this example network. So I'm just going to say name one, two, name two. And here I'm going to do so I'm just going to combine R, the word two, and then C. And then I'm just going to R bind this to high core. So high core initially is empty. After I find the high correlation, I make a vector with length of three. And then I'm going to row bind this vector of length of three to the high core. So let's run the code. And then I can show you what happened. So there's a typo in here. So I wrote it with a capital C. Let me go to R. So here I wrote it with a capital C, while here I wrote it with a normal C. So that doesn't work. So here you see that if we go through them, and then we can see that high core is for hydroxypropyl to hydroxypropyl, which is logical because everything is correlated by one to itself. So that's not too surprising. But we see that, for example, for hydroxybutanil, it is correlated to itself. But it's also highly correlated to benzonyl butyl and to benzonyl oxy pentyl. So there's multiple connections between this. All right. So now the only thing that we still have to do is write this out to a directory. So let's write it out to the D drive. So let's show you guys how to do that. So I'm just going to write this table that I just made. So this is a table which has three columns and a whole bunch of rows. So I'm just going to write this one to the cytoscape.sif. I'm going to use the separator tab. I don't want to have the row names, no column names, and don't quote anything. This is just so that the matrix looks a little bit better. If I would then open up the matrix, so I just did the command in R to write it out. And now I'm just going to open up that file to show you guys how it looks. So again, very basic, very simple file structure. So the file structure again is from node interaction other node. So now we have this list of all the things that are highly correlated. So let's go to cytoscape and load it in. So in cytoscape, I can go here and I can go to network and I can just right click and say, well, destroy the whole network and then it's gone. And then I can say file, import networks from table. And I have to show you, it doesn't really want to capture the popup window. Let me see if I can, that's that one, and then this one. Okay. And then show it. All right, so I get the same thing again. So I'm selecting the file that we just made. So the file that I just made is to cytosyf again, three column layout. So I'm going to select column one, column two as the interaction and column three as the target. And then I'm just going to say import, which is here. And now you see that the network looks like, hey, you see all of the different metabolites and the high correlations between them. And you see of course that everything is highly correlated to itself. So I'm going to go to the vizmapper and just going to do solid looks a little bit to universe. I'm just going to use the default style. And now one of the powers or one of the strengths of cytoscape is of course, it remembered the coloring that I just did, but I can go to layout. And I can say, well, make it for example, an organic layout. And now you can see that there are actually different groups in here. And so you can see that there's a group here of one, two, three, four metabolites, which together have correlations to each other, but not to the other groups. You see here that there's a big group as well. And you see that we have several kind of little subnetworks of like three pro art, three metabolites, which are more or less regulated together. And so just from basically looking at these things, we can now conclude that within our data, we have a relatively large network here of metabolites, which are probably co-regulated because they are correlated. Then we have another network here, which is also co-regulated, which is different regulated from the other one. And so from this really basic analysis, we can kind of get the idea that within our data set, there's probably like two or three regulators. So probably one regulator, which regulates this part of the metabolite production. There's another regulator or a couple of regulators, which make this regulation. So visualizing these results. And of course, this is much better than in R when we go to R. And when we look in R and we say, just show me the whole, show me the whole high core thing, had just the first five, or show me the whole thing. Of course, this is not as insightful as making a little network out of it. So every time that you have something interacting with something else, and here we just have the correlations, but we could not just look at the correlations. We could have, we could say that this is this thing core to here. And then we could also do something like covariance or other analysis. And we could add this all into a single network picture. And then we go to cytoscape. And then we use cytoscape to based on if an edge is produced by correlation, or if it's produced by something else to have it give a different color. So from the phenotype data that we have, we load it into R, we do a very basic analysis just saying everything which is highly correlated is interested because it might be regulated together. Hey, you get a big list of things. But from this list, it's not directly clear what's happening. But then if you go to cytoscape, then you can see that this is much clearer. Here we can see others and one group of metabolites which seem to belong together, another group of metabolites which seem to belong together, and then some smaller groups which are independent from the other ones. I hope that's clear. I hope that this is something that you can use in the future and that you can deal with when you have to make your own kind of network pictures of how things are interacting in your study. Okay, so that was everything for the analysis so far. So I think that R, that's the solutions to the previous assignments. All right, so I'm just going to talk for like four or five more minutes and then just show you guys the first break. So I wanted to talk about phenotypes, right? And phenotypes today are things that we can observe usually with the naked eye. So things like plants which are growing, they of course have very obvious phenotypes like the amount of substance that they produce, the amount of green that they produce, the amount of leaves that they have. So those are all phenotypes from the same plant. Phenotypes can also be, for example, is a plant affected by something like mildew, but phenotypes can also be qualitative in if something tastes good or if it doesn't taste good, right? So we have quantitative phenotypes like how much weed do I get off a certain plant. We have quantitative phenotypes like how resistant is a certain plant to mildew infection. And we have qualitative phenotypes how good does something taste, which is of course personal and is not really measurable. So like we told before, the realm of phenotypes is that there are quantitative traits and qualitative traits. So qualitative traits are more or less everything that you can see, you can describe using words like it's good, it's bad, it's beautiful, it's ugly, but if we are dealing with quantitative traits, then we are dealing with traits that we can measure. How do we measure them? Well, like in the first or the second lecture, how it will be discussed, that we measure things by using the international system of units. So quantitative traits are measurable, they are without a doubt, like a plant is producing five kilograms and there's no doubt about it. But qualitative traits have this kind of subjective behavior to it. So you can say, well, five kilograms is not a lot or it is a lot. And that of course depends on the plant and depends on you. All right, then we talked about phenotypes being Mendelian. Mendelian means that there's one genetic locus involved and they can be complex. And that means that two or more genetic loci are involved in producing this phenotype or displaying this phenotype. Okay, and then I'm going to leave you guys here for a short break. So we will be talking a little bit about heritability, of course, because if we are wanting to talk about how to find genetic regions that are responsible for differences in phenotypes, then of course we have to, we can only do this for things which are heritable. If something is not heritable, then of course there's no gene in the genome which is controlling this phenotype. Some phenotypes have a kind of zero heritability and are not interesting to geneticists while they might be interesting in an economic perspective. But here we are only interested in phenotypes which are heritable. So heritability is a measurement that measures the fraction of phenotypic variation that can be attributed to genetic variation. So we are talking about the variance in a phenotype within a population and how much of this variance of this phenotype is explained by the environmental factors and how much of it is explained by genotypic environment. All right, then I will leave you guys with the first slideshow and I will be back in five to ten minutes. So I will stop.