 Alright, so cool-looking dendrograms, you can do a lot more, but I just wanted to show you that you have to write a function, and then this function is applied to every node, so every node of the dendrogram, and you can color and change every node if you want it to. You have a question, Shurik's Kurita, that's why I put in the queue, so that when people have a question, I see a hand raised. Alright, so dendrograms and phylogenetic trees are very related to each other, but there is this special package called APE, you have to remember to install it, and that can help you make really cool-looking, no, don't be sorry, that's okay, and that can help you make really cool-looking phylogenetic trees, and phylogenetic trees are ubiquitous in biology, so you can use it for more or less everything, to kind of show like inheritance of alleles, but you can also show, for example, the distance, you can use it to make a tree of life, and they are used a lot in biology, and they're kind of really understandable pictures. So you can load the library, you have to install it first, but after you've loaded the library, you can just use a haklist object or any other clustering method, of course haklist is not the only clustering method that there is, but you can use any clustered object, so a haklist object, and then you just say as phylo, and then it makes a phylogenetic tree out of it, and then you can plot it, and you can for example do things like magnification and label offset and all of these things, but you now also have the option to make things like cladograms, unrooted trees, and you can also make fan plots of your clustering. So how do these look? So this is more or less how the basic tree looks like, it's a standard kind of dendrogram, but then rotate it on its side, and the cladogram is the same thing, but then it uses the triangle format, you can make unrooted trees, which kind of look like this on the same data, and you can also make fans, which start in the middle and then fan out in a circle surrounding the plot, so it just gives you a little bit more overview of how they look. So just try it out on your own data, or try it out during the assignments, and make a couple of these things, and of course also here you can use the phylogenetic tree and also add colors to it using the same kind of system, where you define a function which then is applied to every node in the graph. So a little bit of a real-life example, I wanted to show you how I plot chromosomal data, so data which has for example a chromosome position, and for example some kind of statistics on each of the markers that you have. So imagine that you did a snip chip and you genotype like a hundred individuals, and now you want to show some statistical data on each of these hundred positions or a thousand positions in the genome that you measured. So there's two things that we have to do, we have to plot the chromosomes first, and then we want to add some information to the chromosomes. So the data that we have is something called markers, and markers are single nucleotide polymorphisms, or they're like duplications in the genome, or anything that we can use to kind of distinguish one animal from another animal, or one human from another human. So a marker has a location, the marker is located on chromosome one at 10 megabases, and of course these markers have some kind of a statistics. I could have calculated for each marker the amount of animals in one group versus the amount of animals in the other group, and then I can do like a statistical test to see if there's an over-representation of a certain amount of animals in the one group versus the other one. We might have things like genotypes at these markers, so for these hundred individuals some individuals might have an A there, and other individuals might have a G there in the genome. And of course chromosomes, they have a length, and I need to know the length of each chromosome to be able to plot them. So the first step that I'm doing is I'm using a mouse example, so I'm saying make our chromosomes, so I do as character 1 to 19, because a mouse has 19 autosomes, it has an X chromosome, a Y chromosome, and it has a mitochondrial genome, which is called MT. I read in the data, so the lengths of the chromosome are mentioned here, so this is just a very basic matrix, which has two columns. The first column is the name of the chromosome, which is a character, and the second is a numerical value with just the length of the chromosome. So it looks like this, chromosome 1 has a length of 195 million base pairs, chromosome 2 is 182 million base pairs. And I have some kind of statistic, right? So the statistics that I have is called ratios, so these are just some random ratios that I came up with, and here what we see is we have a gene, right? So if for example a certain gene, which is located on chromosome 1 at this position, and the ratio at this gene was 0.78, and of course we have like perhaps 100 or 1000 of these measurements across the whole genome. So the first thing that I'm going to do is I'm going to create a plot, and I always start with an empty plot, right? So what I'm going to do is say, find the maximum length, so the longest chromosome, and put that in a variable called M length, so max length or the maximum length of the chromosome. So then I'm going to do my plot, so the Y axis is going to range from 0 to the maximum length, the X axis is going to range from 1 to the number of rows in the chromosome info, right? So the number of chromosomes that I have, type is none, because I don't want to have 0, 1 and max length number of rows plotted, because otherwise it would put two circles here and a circle here. I'm going to say that don't plot a Y axis, don't put anything on the X label, don't put any Y label here, and don't put any X axis there as well, because I am going to do the axis. So the first thing that I'm going to do is just say I want to have some lines in the background, so that I kind of can see where like 10 megabases is, right? So I'm saying here, use a sequence from 0 to the maximum length of the chromosome, and step per 10 million every time. And then the color of these is light gray and they are dotted, so they look a little bit more fancy. So I'm then going to add the chromosome, so what I'm going to do is I'm going to use the lines function, and I'm just going to draw lines from bottom to top, so how this works is I'm going through the chromosome info, and I'm going to call this N, so I'm just going to go through every row, and then I'm going to plot a line where the X position is NN, so the start will be N, the end will be at NX, right? Because the start here is at X1, and the end is also at X1. So I'm just going to say X equals CNN. Then I'm going to take the Y position, and the Y position of course every chromosome starts at zero base pairs, and I'm just going to take the length of the current chromosome and just plot it, and so the lines Y position is here at zero, and here at like 190 million base pairs. Type is line, color is black, line type is one, line Y is two to make them a little bit more bold, so to speak. And then I'm going to add the ratio data to the plot, so I'm going to use the apply function, so I have this old data right, which contains all of my genes, and the ratios and the positions of these genes. So I'm going to go through the rows of all data, and then I have a function which gives, which every time gets one row of the matrix, and I'm going to call the row of the matrix X in my case, and then I'm going to say, well I'm going to match the chromosome name to the chromosomes, and this is of course because three of the chromosomes, like X, Y, and MT, are not numerical values. So I have to match them to get the correct position, because of course chromosome X will be at position 20, Y will be at 21, and mitochondria will be at 22. The Y lock is the start position, so that's the position of the gene where it starts, and this is the Y position, and then I'm just going to define my color right, I'm going to just say, well if the ratio is above one, then I want to have it colored red, and if the ratio is below one, I want to have it colored black, so I'm just going to ask, is the ratio above one, so this will be true or false, so true is one, false is zero, and then I have to add plus one, right, because otherwise zero is not a color in R, so R colors like black is one, red is two, and I think green is three, but so every color has a number as well, and then I'm just going to use the points function, and I'm going to say at the position X, right, so at the start position, or at the at the chromosomal position, which is the X location, the Y location is the start position of the gene, I'm going to use the PCH of just a straight line, so this is just the minus symbol, which I want to use, and then I'm going to say the color is the color that I just computed, and I'm going to make them a little bit bigger, and then my plot looks like this, which is a kind of representation, so now you can see that we have a marker here of which the ratio was below one, and we have a marker here on chromosome 3 for which the ratio was above one. Of course I want to add the axis and a legend, so I'm just going to say put axis on one, right, so at each position, and so take the names of chromosome info, which is 1, 2, 3, 4, but also X, Y, and Z, and put these at 1 to the number, so at 1 to 21 in our case, LAS is 1, C axis 1.5 so that they show up a little bit bigger, and the LAS is so that they get rotated in the right position, and then on the second axis, I'm going to add the position 0 to the maximum length, I'm going to step by a million divided by a million, so I'm just going to put every million base pairs, I'm going to put a number, but I don't want the number to be like with 6 or 7 zeros behind it, I'm just going to divide out this million, and then at these positions, I'm going to write the length instead of writing down one million, I'm just going to write one, and then of course I want to have a legend on the top right of the plot, and it would say up and down because the ratio is either above or below, and then I'm going to say use the fill function 1 and 2 and make the CX 1.2, so then in the end it will look like this, so I have my legend here, I now have my chromosomes here, and here you see the mega base pair positions of each of the chromosomes. Of course, is it really finished? No, this is not finished by far, there's still a lot of things that we need to do in R because we want to have a good plot, so we need to add a title, we need to add a description to the Y-axis, we need to add a description to the X-axis, and here you see that the blue dotted lines go behind the legend, so we want to give the legend a background color to be on top of this, so it takes a lot of time to create a good plot in R. My procedure to plotting, plots are often a multi-step procedure, I usually draw them very quickly by hand, very coarsely, and then I start thinking of how to make the plot that I want to make with the data that I have, and the way that I do it is I first do the static kind of things, so the layout like chromosomes or grid lines, and then afterwards I take my data and I plot my data kind of row by row on to the plot that I have, so that's kind of my standard procedure for plotting. So remember it takes time to make things beautiful. Also make sure that your plotting code that you are writing is flexible, right, so you can see that in the chromosome plot I take a lot of effort to not specify 23 or 22 or 21, I never write down the number of chromosomes that I have explicitly, I always use the input file, right, so that in case I want to now not plot for mice but want to plot for cows, cows have a completely different number of chromosomes, but the plot function will still work because I can still, I can add more chromosomes to the chromosome info thing, and then that will automatically get picked up by my script, and it will automatically draw 31 chromosomes instead of 19 or 20, so that is what I mean by use flexible plotting code. So code which doesn't have hard coded numbers in there, but uses numbers from a file, so I can just update the file or use a different file for a different species and I don't have to change the code. What I also always do is that I make two versions of each plot. I make one version which is suitable for presentations and I make a version which is in PDF for an eventual publication or a paper that we're going to write, and so there's different requirements for plots that go into a presentation compared to plots which go into a paper, right, a presentation you generally want to use like bigger dots and you want to have like good colors while sometimes papers or journals, they want to have black and white plots, so instead of having like very colorful plots, the journal requires you that the plot is black and white and uses a certain font and that of course you can put in, so I always make two versions of each plot, one for PowerPoint and one for PDFs. Okay, some requirements for a good plot, it has to have a clear axis, a clear label, it has to have a good representation of quantities and I always add the units to the axis, right? I want to know if it's wind in miles per hour or in centimeters or in meters, so you have to have units in your plots. A plot always needs to have a legend and a legend always needs to mention everything in the plot, so if there's a line in the plot, the line needs to be in the legend as well. And of course if it is available, add things like error bars and standard deviations, because that's the way that if you work in science, you have to show that there was some variation in your measurements. So when you have a box plot, or not a box plot, but if you have like a bar plot, right, then the bar plot should have an error bar, which is the standard deviation that you saw in your data, so because you're not having a single measurement, you're having a group of measurements. Alright, and then I'm through, so I took a little bit more time than I have expected, so if there's any questions, then ask them now, I will mute my desktop audio and then I will call Misha to see if that works. Alright, so I get actually bothered by like a loud sound. Alright, so there we have Misha. Hey, let me put the Skype thing open, that should be like this, and then see if that works. Properties, and I am talking to Misha, and then okay, and that works. Alright, so there you are Misha. Can you say something? Yeah, I can. Alright, very good. No glitches. No glitches. Can people hear Misha? That's the first question. I'm going to