 Welcome to this short introduction to the core concept of phylogeny. In this presentation I'll cover how we classify organisms and how molecular sequence data has revolutionized the way we infer evolutionary histories of both species and individual genes. I'll start by introducing the basic terminology and then go on to talk about how trees are made and how comparison of trees allows us to subdivide homologs and so-called orthologs and parallogs. Starting with the terminology. Before phylogeny we had taxonomy, which is the field of classifying organisms. These were classified into groups called taxa at many different levels starting from the top level domains such as eukaryotes going through kingdoms, phylums, classes, orders, families and genera to finally arrive at individual species. In phylogeny we aim to do much the same by capturing the evolutionary history in the form of a phylogenetic tree, a species tree which looks like this having individual species as leaf nodes and a tree representing their evolutionary relationships. The tree has many different branches and each branch corresponds to a monophyletic group that is a group of organisms that have common ancestry. These groups are also commonly referred to as clades and clades are closely related to taxa in taxonomy since in a well-defined taxonomy the different taxa will correspond to monophyletic groups of organisms. Gene trees and species trees are really the core of phylogeny. If you have a set of homologous genes from multiple different species which share common ancestry you can construct a multiple sequence alignment of their sequences. Having an alignment like that you can infer a gene tree in a couple of different ways. One is maximum parsimony which aims to minimize the number of mutations needed to reproduce the gene sequences. Nowadays most people instead use so-called maximum likelihood methods which use an explicit evolutionary model and tries to find a gene tree that gives the highest likelihood of the observed sequences. If you want to make a species tree you need to find so-called marker genes which are genes that are ideally universally present in single copy. That means every genome has this gene and it has it in only one copy. With multiple marker genes you can construct either a concatenated alignment and infer a species tree directly from that or you can make trees from each individual marker gene and then build a consensus tree to get your species tree. This finally gets us to the topic of orthologs and parallogs. If you have a gene tree and a species tree you can compare the two and trace gene evolution through species. In this slightly complicated figure you see six different genes from three organisms the organisms being A, B and C and the genes being A1, B1, B2, C1, C2 and C3. There have been several events happening in evolution here in this example, two speciation events and two gene duplication events. If you look at a pair of homologous genes you can now divide them into two types namely orthologs and parallogs. These are defined by evolution in the sense of how they are separated. This can either be at a speciation event or a gene duplication event. Let's look at the figure again. If you look at A1 and pair it with any of the other five genes you will see that the two genes trace back to speciation event number one. This means that A1 is an ortholog of all the other genes. If you look at B1 and C1 and trace them back you will see that they meet at speciation event number two. This means that B1 and C1 are orthologs as well. But if you look at B1 and C2 and trace them back they meet at gene duplication event number one and thus parallogs. And similarly C2 and C3 meet at gene duplication event number two and are also parallogs. So why should you care about all this? Well first of all phylogeny allows us to answer very fundamental questions about life and how it came about. But it also has very direct applications. This includes in particular in function prediction where the ortholog conjecture states that it's better to infer function from orthologs than from parallogs. And pathology is also used a lot for doing interaction transfer to transfer protein interactions from proteins in one organism to another. If you want to learn more about how the latter type of networks are constructed I suggest you go watch this presentation next. Thanks for your attention.