 Once we have those scores, for any pair of sequences, I can calculate how similar they are. We'll come back to that a little bit, but that's the topic for an entire bioinformatics class. And if I now have a number of genes, or maybe different copies of viruses, maybe an A, B, C, D, E, and let's call them F. In principle, all these are dots, and between all dots here, I can calculate how similar they are, which if things are very similar, that corresponds to having a short distance between them. So it's called metric. And what we then try to do is find an algorithm that, let's assume that in this case, A and F were the ones that were closest. Maybe I can design them that I say, you know, A and F were the closest. I'll join those, and then I'll create a new A, F, dot there. Then I repeat this, and I say, aha, after this, B and that dot were the closest ones. And then I merge those. Then C and D might have been closest, and then those were closest to E, and eventually everything merged. So this creates a way for me to pretty much backtrack evolution with the idea is that things that are very far apart from each other, they must have separated a long time ago, while things that are very close to each other, they should have separated more recently. It is very difficult to assign a time scale to this, but we can do that partly using those silent mutations and calculating what is the expected average mutation frequency in these genomes. And if we do that, we can do some really amazing things with sequences, including things that are important today. So this concept is called phylogenetic trees, phylogenetic, and it literally traces out evolution. Kenneth Kidd at Yale University published extensively on this about human evolution. So if we, using phylogenetic trees, we can trace human evolution back, in particular to Africa. So 150 to 100,000 years before present, we had by far the largest evolutionary diversity in Africa, and we know that because we can still see that evolutionary diversity in Africa. What then happened some maybe 100,000 years ago is that we can see that the genes that are most common, most close to those in Africa, are the genes in the Middle East. So somewhere around 100,000 years or so before now, we had an exodus from Africa, where just a small part of the population, you see here that the blue and yellow genes, they stem from the northeastern part of Africa. This gene population then spread some 40, 50,000 years ago, gradually to the entire rest of the world. And that's where all of us originate from. Although, well, some of us originate from here, but we can literally use phylogenetic trees and sequences of humans to see how evolution has happened and where we moved. And this is pretty much anthropology, right, rather than just biochemistry. There are some more current examples of this. This is COVID samples. COVID is a much easier genome to sample, to sequence than ours. Our genome is roughly 3 billion base pairs. The COVID genome is just 30,000 RNA bases, so it's not even pairs. For fun, you can calculate the cost of sequencing a COVID genome. The raw cost of this sequencing is going to be close to nothing. This is not just a plot, but a movie. The different colors here and the different markings corresponds to different strains that have been identified. Because remember, we're sequencing a lot of these in every country in the world now. And based on those sequences, they're collected in a large database called GISS8. The GISS8 database, they then daily update this together. I think this is not GISS8, but next genes. We daily update this and trace, recreate phylogenetic trees. There's so much information here that the part out here is not going to change so much. But in theory, as the new sequences are coming in, the latest part, say, they could change a bit. Because again, it's not an exact science. We're talking about probabilistic data here. And you might in particular see a few recent colors here that have appeared the last few months. I'll start this movie so you can see what happens. So this goes from about roughly a year ago. And initially, there were only a few strains. And then gradually, as things spread all over the world, we started to see more and more copies of the virus in total. And as there were more copies, there were more subjects, humans, where there was a chance for the virus to mutate. And this will, of course, mean that the mutation rate appears to be speeding up, but it doesn't really. It's just that there's more, there are more places where it can happen. Things were still fairly boring in the first part of 2020. And then around the middle of 2020, a little while from here, we started to see some new strains showing up. It was second half, I think. So the top red one here, I think it is, is the V1 strain. That is the so-called UK version of COVID. V2 is the South African down here. And V3, which is the most recent here, that's the Brazilian strain. There is a lot of media coverage and about all these now. And I'm not going to say that they're necessarily dangerous strains. They might very well be slightly more contagious or possibly very much more contagious, but we simply don't know yet. This is how science develops, and we don't have certain data. The point, though, is that with modern sequencing and everything, we are literally tracking things in real time pretty much as mutations happen. Within days of a mutant, a new mutant being detected in Sweden, it's going to show up in this database.