 In this part of the OMA tutorial series, we will discuss hierarchical orthologous groups. In the last tutorial, we talked about the OMA algorithm and the different types of homologs inferred by it. The final part of the OMA algorithm infers hierarchical orthologous groups, or hogs for short. Hogs are defined as groups of genes that have descended from a common ancestral gene in a given clade of species. Let's start by looking at the insulin gene in human and rodents. In human there is one copy, but in the two rodents there are two copies. What's going on here? To answer this question, we need to look at the history of these five genes, which is depicted in this phylogenetic tree. Here, the node with a star indicates a duplication event, and the nodes labeled with S indicate a speciation event. S1 is the mammalian speciation, while S2 is the rodent speciation. There was only one copy of the insulin gene in the ancestor of all mammals, so all insulin genes and mammals are derived from it, and should be in one hog. This includes the human insulin gene. In terms of orthology and parology relationships, a hog contains orthologs and imparologs. Orthologs, as you may remember, are genes related by speciation. This could be the basal speciation, the one that's used to define the hog, S1 in this example, or this could be a subsequent speciation, S2. As for the imparologs, these are genes related by duplication, but importantly, these duplications must have happened within the clade in question. For instance, insulin 1 in mouse and insulin 2 in rat are imparologs relative to all mammals, and are therefore in the same hog at this level. Because of the duplication, mice and rats have two insulin genes, suggesting that their common ancestor already had these two copies. So each insulin gene in present-day mouse can be traced back to one or the other copy. This defines two hogs. You can see that it's really important to define the clade, that is taxonomic level, for which the hogs are defined. Thus, by contrast, at the rodent's taxonomic level, insulin 1 in mouse and insulin 2 in rat are out-parologs. This is because they started diverging at a duplication that happened before the rodent speciation. Therefore, they are in different hogs relative to this level. Now, if we compare hogs defined at different levels, we see that the more basal hogs encompass multiple smaller hogs. This is where the hierarchical part of the name comes from. Although these formal definitions are a bit complicated, hogs correspond to the intuitive framework used by most biologists to study gene families across different levels of resolution. When we casually say the insulin gene in mammals, we refer to the collective members of the one and only insulin hog to find at the level of all mammals. In particular, this includes two rodent copies, but there is no attempt to differentiate between them. At the level of mammals, it's all just lumped into one concept. By contrast, when we refer to the two rodent copies, we mean that we should consider two types of genes, which might have differentiated in subtle ways. We distinguish insulin 1 from insulin 2. It's therefore quite natural that there should be two hogs at that level. Now that we know what hogs are, in the next part of this series, we will discuss how to explore hogs with the OMA browser.