 The joys of phylogeny, how to? In the last video I showed how the use of inherited DNA markers are the basis of paternity tests. The same process of testing for inherited genetic markers is used by scientists to group organisms into phylogenetic trees, based on the similarities of sequence. I showed a single gene product, catalase, and how the homology of markers correlated well with the taxonomy of shared characteristics. Warning! What follows is actual science. Real biologists use these tools. There are no funny animations or lolcats beyond this point. It's just me and some cutting-edge bioinformatics tools up late at night. In this video I'd like to show you how to construct a phylogenetic tree on your own. Maybe I'm a bio nerd, but I find the process enjoyable, and the results are intriguing. It's also great to do with a friend or family member who is skeptical of common descent. If you have the responsibility of teaching science, this might be an excellent exercise for students for lots of ages. All you need is a computer and a very basic understanding of proteins and taxonomy. There are many ways to do this, but I'm just going to show you the one that I use. Step one, open up a text file. We're using Notepad today. The format of this data is going to be called Fast A or FASTA. The way it starts is with a greater than symbol, followed by the name of the organism, with no spaces inside. The first space will break it up, and this information will not appear on the phylogran. Here's the protein sequence. We're going to grab this out of a database. But I wanted to show you the format and the basic formatting, so it's about as simple as it gets. I'm going to go ahead and clear this out. And we're going to go to the National Center for Biotechnology Information. Here's the link. It's publicly available. This is a database of all sorts of wonderful things. I swear I could spend weeks here just browsing through the information. For example, if you want to look at all the H1N1 influenza sequences that have been isolated. Lots of databases can be searched here. Today, I'm going to focus on homology, which is one of my favorite tools. What it's done is it's already gone through and found homologous genes for a variety of different organisms. The gene we're going to look at now is GAPDH. This is a really common gene to design assays for in a research setting as a housekeeping gene, because its expression is fairly constant. It's also highly conserved. So let's hit go and we'll run homology and search for GAPDH. Now the first hit we get here is GAPDH. So that's great. This is homology and record 107053. You see here a long list of conserved GAPDH genes. We've got everything from Homo sapiens, that's us. Pantroglodides, that's a chimp. Canis lupus familiarius, that's a dog. Boss is a cow. Mus is a mouse. Ratus, the rat. Gallus is a chicken. Danio is a zebrafish. Drosophila is a fruit fly. Anopheles is a mosquito. C. elegans is a round worm. Schizosaccharomyces is a yeast. Saccharomyces is a yeast. Cluveromyces is a fungus. This one is, I believe, a cotton fungus. This one is a fungus of rice. It's rice blast. This is a slime mold. This is a common research plant. Common name is Cress. This is orizocetava, which is rice. And this is plasmodium falciparum, which is the parasite that causes malaria. Pretty sure. So this is conserved in eukaryota. That means it's crossing across family lines. There's lots of ways we can look at this data. For example, I'd like to show you the alignment score information. So here we are versus Homo sapien. Here's the GapDH homology, or rather identity, for the protein sequence and for the DNA sequence. So you can see chimp DNA is very slightly different from the human DNA, but the proteins are identical. That means that the mutation that occurred here didn't change the protein here. When we look at dog and cow and mouse and rat and chicken and zebrafish, you can see as we get further and further away taxonomically from Homo sapien, we get increasingly large differences in the protein, but much more rapidly increasing changes in the DNA. And in fact, these substitution rates are a quantitative estimation of those changes in amino acids that cause the proteins to change that are caused by the DNA. So as we go down, we notice that even rice has a 71.8% similarity to the human sequence. But you'll also notice that none of these numbers are essentially the same, except in a few cases. But what we're looking at here is a great deal of diversity in the sequences that are being created. So let's go to the FASTA format, that's the one we're going to use for our gene. And here's what we have all the way from here to here. I'm sure there's better ways to do this, but I'm going to control copy and control paste. Now that's a lot of information here. And unless you want your filogram to have the GI number, the best thing you can do is highlight these and change them to something a little bit simpler, like dog. Hantrogloditis, we can change to chimp. I've already done this for this gene. I've created a file called GAPDH gone through and changed it to human, chimp, dog. Now some of these you'll see are Cal1 and Cal2. What's happened here is the gene is duplicated and we'll be able to see how recently it's duplicated or how much it's diverged between the two, what are called isoforms. So here's our GAPDH data in FASTA format. Going to control A, I'm going to copy this information. Now we're going to go to a new website. And this one is at the EMBL, which I forget what that stands for, but these folks have a cluster, which is the name of the program. Clusterl, I've never actually pronounced it. I'll add here, you can paste the sequence in if you like it this way. Or I've used it, I can browse directly to my GAPDH file, do that again, and then run. And I don't change any of the settings here. And it will take it about 30 seconds to finish the processing of the job. I'm going to speed time up like they do on the cooking shows and cut to an already finished product. So I've already run this job. What it's giving me here is a very plain vanilla alignment of all these sequences. Let's put the names next to them so it's a little bit more interesting to look at. It has allowed for gaps, so when you see a dash that's where it inserted space to allow for the alignment of the sequence. We'll see this in a much more colorful fashion. What I'm interested in seeing here is called the guide tree. So I'm going to click on that, it's going to take a second. And here is how the software sees the genetic sequences. We're looking at the number of changes that have occurred or percent changes. But it's going to take that and turn it into this graph. Now, this is a cladogram or cladogram. The resistance of cladogram is they all terminate at the right side of the graph. The distances here are less important. There's no supposition about the genetic distance. If we right click, we're going to have a little bit more editing option. I'm going to change the format from cladogram to filogram. And now the distances here reflect the genetic similarities or differences. So here's human and ship. I mean, they're right on top of each other because, of course, we saw the sequences were the same. Here's cow one and cow two, the duplicated genes. Those are obviously very recently duplicated genes. And they're related to dog. Down here, we had four of the isoforms from the rat. And they're all very closely related and they're very closely related to the mouse. Then as we go along, all the chordates are close together, chicken to zebrafish, to human to chimp. The invertebrates are, well, these insects, fruit flies, and mosquito are close together. The plants are close together, but there was one unusual fit and that was the slime mold here. I would have expected that to go a little bit differently. And then we have the two fungi here and then the three genes from the yeast. But it's interesting how far back these diverge. Yeast, pombe, and yeast cerevisiae are very distantly related, which still just amazes me. And then, of course, malaria is sort of our out group here because it's obviously not related very closely to any of these. That's it, you now know enough to make your own filogram. If you do choose to run out and make one, do a screen capture and post it as a reply to this video. I hope you are inspired to try a little phylogenetics on your own. Thanks for watching.