 Okay, thank you, Marisa, for this nice introduction. I'd like to thank all the organizers for inviting me, for giving me the opportunity of giving this talk to this amazing group of scientists. I'm a PhD student from the University of Campinas, and I'd like also to thank Marco Zaglia, who's my PhD advisor, and all the friends and colleagues and all the collaborators of Marco's lab. So I will talk about the Dehida Higgs model with a neutral evolutionary process, which is a simpatic speciation. So let's talk a little bit about speciation. And specious formation process are in the center of biology, and they are important not only to explain the patterns of diversity that you can observe now in nature, but also it helps us to understand the origins of biodiversity, and maybe it's possible futures that we can forecast under different scenarios, under different conditions. And so it's useful to classify speciation according to its geographical mode of speciation and on one side of the zero, there are the allopathic speciation, which is the process of the emergence of species with the interruption of the gene flow due to geographical barriers, such as mountains, rivers, or a piece of land separating two lakes or whatever. And on the other side of the zero, there are the simpatic speciation process, which are those that occur without the complete interruption of the gene flow. A good definition of simpatic speciation is due to gabrylets. It's the emergence of new species from a population where mating is random with respect to the place of birth of the mating partners. It was proposed by Darwin as an important process of diversification. And although it's very contagious process, there are some examples in literature claiming that some species have embers due to a simpatic speciation process. And how can we model this kind of process in mathematical ways? So here I will talk about the Derrida Higgs model, which is a neutral model. So there's no selection mechanisms forcing the diversification. And it shows simpatic speciation. And it was based on the spin glasses theory and was proposed in 1991 by these two guys. And how does it work? Well, we start with a population of n individuals and identical individuals. And these individuals are defined by binary chain of plus one and minus one of size B, which is the genome of an individual. Actually, the entries of your binary chain can be any number that you want. So you can choose plus one or zero or the pair of numbers that you like the most. And for every part of individuals, alpha and beta, we can measure how close they are with this quantity that's called the similarity between the pair alpha and beta, which simply counts the number of alleles that they have in common, despite of a normalization here. And it's defined in such a way that the similarity among between two identical individuals is one. And how the evolution occurs, it's an IBM process. And we have your population of n individuals. So you choose a pair of individuals. So you have a pair of individual alpha and an individual beta. And each one has its own genome sequence, this binary chain. And then they are combined to generate an offspring. So can reproduce, generate an offspring. And this offspring will have its genome given by the combination of the genome of its parents. And so each allele can come from parent alpha or parent beta with the same probability. And then we can build this new genome. And additionally, there is a mutation rate. So every allele can flip to the different value with a probability related to the mutation rate. And we repeat this process of choosing a pair n times. And then from one population, we can build an entire new population. So this process have non-overlaping generations. So generation T generates the new generation T plus one, which will be a slightly different population from the previous one. So what do we analyze in this model? So the thing that we analyze here is the evolution of the similarity distribution. And here we start with a identical population. So there's a peak at one. And then it starts to evolve moving towards smaller values. And at some time we can see that it finds an equilibrium. This distribution finds an equilibrium. And how can I explain that? Well, for the infinite genome case, we can prove these equations. So the similarity between two individuals of the next generation is given in terms of the similarity of their parents. And from this equation, we can define an algorithm and then we can run this algorithm and generate all the evolution of the similarity metrics among the population. And here is something that you can find. And we can see that after some time, there's an equilibrium of this distribution in the system. And we can analytically calculate this equilibrium. It's simply take, we simply take the ensemble average of this equation. We can find a recurrence equation for the average value of the similarity, which has an equilibrium solution. And this is the equilibrium that you can find here. However, the purely random mating is non-realistic as also this is not interesting because nothing happens. And so now what we're going to do is to include a minimal similarity. Now there's this threshold. And so if a given pair has its similarity above the threshold, they can reproduce, otherwise they can't. So what we have here now is the similarity starts to evolve. It starts to move towards the equilibrium. However, at some time, it crosses the threshold. And at this point, it's an important point because a large number of pairs cannot reproduce anymore. And so something weird happens and we can find from the simulations that this is what occurs. There's the appearance of this peaks. And this peak we can relate to a species. We can see this peak as a species A, this other peak species B. And these will have the inter-specific similarity between both species. But why can we do that? And let me try to explain that. And the best way to explain this is through network theory. We can think about the individuals as nodes of a network and the nodes are connected if the individuals can reproduce. So if their similarity is above the threshold and what the Derrida Higgs dynamics is doing is simply raising some connections of this network. And at some point, there's the formation of components in this network. And then I'm sorry, I'm gonna finish this and then there is a question. Okay, okay. So we can make a correspondence in between the network that you have with the histogram that you had before that I showed you. And so the peaks above the threshold are the connections that exists in this network that is still exists in the network. And the peak below the threshold are the connections that no longer exists in the network. So what's the question? I cannot see nothing here. It's okay, Armon, go ahead. I have two questions if you want, maybe you can answer it after. But one is, I guess I missed what determines the relative fitness between the genome. Is it a projection or is it? No, no, there is no fitness. There's no fitness. It's a neutral process. Oh, right, right. Okay, my bad. So the evolution is due to only mutations and mating. There's a random rate. So then the time scale of mutations, is that, I mean, relative to the... Yes, it determines the velocity. It determines when this deformation of the peaks will happen, but it's not something that we is studying right now. Okay, because... At this moment, this is not important. That's what I want to say. Okay, okay, because the only thing I'm alluding to is because, I mean, in adaptive dynamics, I guess there's this controversy with sympatric speciation. Yes, yeah. Yeah, I just wanted to see if you could comment on that later. Maybe we can have a discussion later. There was a question about what these models have to say about speciation in asexual populations, but at the moment you are having mating, so it does not apply. But I think we can discuss later, if you'd like to answer that later after we see more about them. Okay, later we can come back to these questions. And thank you. So moving on. And here, from this point of the evolution, we can simply recognize each component as a new species. And why can we do that? Because there's a kind of reproductive isolation between these components. So no individual from this component can reproduce with an individual from this component. However, there's a new feature to define a species here, and that's the gene flow, because this individual cannot reproduce with this individual, they are not connected, but they are within the same species, within the same component, they have an indirect gene flow here. Well, so species appear. And because we're dealing with infinite genome, this transition is really fast, and the distribution is narrow, and the transition is fast, so there's only one condition to define speciation in this process. So when you have infinite genome size, the condition for a species formation is simply that your threshold needs to be greater than the equilibrium value, which defined only terms of the population at the location rate. So this is quite simple result. However, if you want to find infinite genomes in nature, I think you cannot do that, but you think that maybe you can try to find large enough genomes. However, when you try to investigate what happens, when the genome is finite, strange things start to appear. So in 2016, Mark Uzagiar published at this work where he investigated this problem of finite genome, and he discovered that there is a minimum genome size in order to have speciation. So what's different from what we had with infinite genomes? And he also discovered that this size of genome is large. So no small values of genome, but this need to be large. And when your genome is not large enough, so your system finds a new equilibrium. So here we can on this GIF, we can see that for this set of parameters, and the equilibrium should have been one over six. So it's really small. However, the distribution finds a new equilibrium around the threshold. And then you need to increase the genome size in order to have a specious formation in this model. Here we can see the formation of non-trivial structures in the distribution, which are a characteristic of the components formation and then in the network. So it's characteristic of the speciation. And there remains the question, what's the minimum size in terms of the parameters of your problem? Or in other words, how large is large? What's the analytical solution of this question? And unfortunately, this is still an open problem, and this is the focus of my PhD research. Well, and why is this important? Because this is a so simple model and why should we care about it? And that's because the DHX model is a tri-model that can build a lot of different models. So we can study parapatric speciation with the geographical mode of speciation that lies in the middle of that arrow. We can also study phylogenetic patterns because we can save all the evolutionary history in this process. So we can build phylogenetic trees and then compare with real trees, real phylogenetic trees. We can also study convolutionary dynamics, for example, in this really wonderful work of Deborah and Marcos, they studied the co-evolution of mitochondrial DNA and nuclear DNA and its role and the barricode hypothesis of specious identification. And we can also study migration dynamics. We can simply run this model into different islands, for example, so the population will evolve under different, according to the DHX dynamics, but there is a migration rate also between them. So this is a nice model to build a long list of different things. What are we doing now? What do we have to understand this process of finite genome size? We can have a mean field approach for this problem. So we can prove that this product, the product of the same level from different individuals is given by a Bernoulli random variable and dependent on this delta, which is a non-trivial function of the previous population, of the previous similarity matrix. And here you can recognize the term that appeared when you have infinite genome size. And this is a non-trivial function. And remember that the similarity between these individuals, the individual often better, is given by a sum of these random variables, which are independent and they're identical distributed. So this is a sum of IID random variables. And therefore the probability distribution of the similarity is given by a binomial. So we have the binomial, we have its average, we have its mean, its variance. And this equation also defines an algorithm that you can run. This defines what I call by a mean field algorithm. And here is a result that you can have from this algorithm. From here we see that we start with an identical population. So the average similarity among the population is equal to one. And then it starts to evolve, the dynamic is turned on. And it starts to evolve and for small values of the genome size, it finds an equilibrium. And when we increase the genome size, it's there's this jump, there's this transition with something really cool. If you connected this to the real dynamics, because when we have small values of the genome, there's an equilibrium. And if you increase the genome, there's a transition. There's the transition of species formation. And how can we study this analytically? This is algorithm, we take the several average of this term, it's quite hard to do, but it's not impossible. So, and then we find this new recurrence equation for the average similarity of the next generation in terms of the similarity from the previous generation. And however, it's also a function of this value p. And p is the amount of the similarity distribution that remains above the threshold. So it's the probability of a given pair in your population to be mating compatible that they can mate. And because of this structure that appear in this recurrence equation, there are two different possible equilibrium of this process. And to explain what's going on, what it means, let me try to do that here. So you have your similarity distribution. It starts to run. And then when it finds the first equilibrium, you need to compute the amount of the distribution that remains above the threshold. If this amount is greater than a critical value, then this is an equilibrium of the system and the evolution stops here. However, if this value is below that critical value, so this equilibrium is no longer an equilibrium of the system and the distribution starts to look for the second equilibrium. And that's the transition that we saw in that picture. And because the variance here in this mean field approach is proportional to the one over the size of the genome, the greater the genome size than average, the distribution, then it's easier for this jump to occur because your distribution simply needs to cross the threshold in order to P become really small and then P will be smaller than the threshold and then there will be this transition. And to see that these things work, and here in the absence of the threshold, we can analyze the model. And here we can see that recover the results that we have before for infinite genome size. And here we can see that the mean field approach describes really well what's going on on the day-to-day dynamics. And even for this variance, it's a good description. And when you include the threshold, there is a phase transition. We can build this phase diagram. I am not sure if I have time enough to explain that. So I will skip it, but I can come back later. So, but there's this transition. We can compute this transition. However, this is not the transition that's described by the day-to-day model. And here, the genome size is really small for the same set of parameters when you do this, the dynamics with the day-to-day model. To show what's going on here, we can see that while the day-to-day dynamics finds its equilibrium, the mean field keeps evolving. So the mean field will have found, if this was the case, it will have found its equilibrium really far from the day-to-day equilibrium. We can see that the same things happen for the variance. While the variance stops to evolve, it finds an equilibrium value, the day-to-day dynamics keeps evolving. So this does not, this mean field approach does not explain the exact day-to-day dynamics. And to only to finish my presentation and to summarize the things that I told you about this model is that the day-to-day is a neutral evolutionary model. So there is no fitness functions, there are no selection mechanisms. And this model presents Sympathetic Speciation for large genome sizes. You can see that we're not talking about places or something like that, or geographic of these things. However, there's no gene flow interruption when the evolution is occurring. While the species is appearing in your process, there is no gene flow interruption. So this is a Sympathetic Mode of Speciation. And however, although we know that for large genome size, there is speciation, how large is large is not known yet. There's a mean field approach, which I can build, which I told you. And however, this mean field approach does not describe the day-to-day transition, which makes this approach, it deserves the name of mean field because it's really good far from the transition but not on the transition. And from the mean field approach, we learn that the amount of distribution that remains above the threshold is really important, which is something different from what we have for the infinite genome size case, because the only thing that we need to know is the average of the distribution. But now, the mean field teaches us that the variance is really important also. And we're starting now to understand why when we take average calculations, everything goes wrong. And this average calculations don't work here. And we're searching for ways to escape this kind of calculation. And although this is not an easy problem, we are very optimistic that we can solve it. And thank you one more time for your attention. And if you have any questions, comments, or suggestions, here I am.