 So, welcome everyone to this parallel session, to this Evolution Philogenian Population Genetic Session. So, my name is Gwennel Mondono. I'm a postdoctoral fellow in Romance Aguayo team in the Department of Ecology and Evolution and I'm working on the birth of olfactory receptors in several drosophila species. Oh, hello. I didn't realize that Gwennel was waiting for me. Sorry about this. My name is Timothy Vaughan. I'm also co-chairing the session and I'm a postdoc as well. I'm with Professor Tonya Shadler at the ETH Zurich in her computational evolution group. We develop methods and apply methods for follow dynamic inference from genetic sequences, wherever they come from. Okay, so before starting just a few reminders about how this is going to work. So, I will introduce speakers and they will make their talks and you will be able to ask them several questions. You have, if you see in your Zoom window, you have a Q&A icon so you can ask your questions there. Please remind the name of the speakers you want to ask your question to. It's for us to keep track of the different questions. We will probably be short in time to ask questions to all the speakers, but you can vote to your favorite questions and we will try to find time to ask these questions. And we will keep all the questions then for the speakers part. So, let's start this session with Natalia Zajak from the group of Christophe Decimals, who is going to explain us how gene duplication and gain in atrial phallophoresis, winter, winter, borne, and over-measure positive trimester contributes to adaptation. Thank you for this introduction. I'm going to just share my screen. Can you see everything? Is it fine? So welcome everybody. Thank you for this opportunity to be here. I want to talk to you about gene duplication and gain in atrial phallophoresis, winter, borne, and other major parasitic trimester and how it contributes to adaptation to parasitism. I'm Natalia and a PhD student at ETH Zurich with Yuka Yokola, but this work was done in collaboration and under leadership of Natasha Glover from University of Lausanne in Christophe Decimals group. So our focal organism is atrial phallophores, winter, borne. It's a tremotode worm that is native to the South Island of New Zealand. It alternates between two hosts in its life cycle, an aquatic snail, Potomapurgis antipoderum, and so after sequencing the reference genome, we carried out a comparative genomic analysis of our focal tremotode to other tremotodes using OMA standalone. OMA is a software for inference of hierarchical orthologous groups of genes shared between species and establishing phylogenetic relationship between species. And that's what we did for the tremotodes. So maybe first an introduction to what hierarchical orthologous groups are. These are sets of genes that have descended from a common ancestral gene in a taxonomic range of interest. So to illustrate it on an insulin gene, if you consider the insulin genes among rodents, you have two hogs because there occurred a duplication before the speciation of the rat and the mouse. But if you consider the insulin gene among all mammals, including humans and rodents, you can see that these genes that we see today have descended from a common single ancestral gene. And that's why they're encompassed in one hog. So that is what we did for our species. We chose in order to accurately reconstruct the hogs, we chose three outgroups, two free-living nematodes, and four species of parasitic platyhelmets that are most closely related to tremotodes from the groups of monogenea and cystota. And we've chosen 14 species of tremotodes for analysis from the suborders of Diplostomida and Plagiocida. And these species have been chosen because their genomes are well studied. They are tremotodes that infect humans and cause human diseases. So schistosoma cause schistosomiasis, for example, and Plagiocida are liver flukes that infect humans. And among those is our atrophila forest winter borny. So after reconstruction of the hierarchical orthologous groups, we used piham for reconstruction of ancestral genomes at each stage of the phylogeny. And so we compared each ancestral and exogenous, the exogenous genomes to reconstruct the ancestral genomes and infer the duplicated, the gained and the retained genes along each branch of the phylogeny. So the retained genes are one to one orthologous genes. And then the duplicated gained genes are those that have increased the numbers or have appeared in different stages of evolution. So here on this tree, you can see the gained genes only indicated for the parts of the phylogeny that we were most interested in due to the fact that that these stages lead to the exogenous genome of atrophila forest we see today. And our analysis mostly focused on three ancestral genomes, the ancestral termitode, ancestral Plagiocida genome, ancestral Xephidiata opistocata genome, and the exogenous genome of atrophila forest. We observed the changes in gained and in gained and duplicated genes mostly because these have been indicated in literature to be most important in adaptation to parasitism for other parasites for other termitodes. Because unlike you may associate gene loss with parasites, however, for complex parasites such as termitodes that have seven stages in its life cycle and multiple hosts, novelty is the key resource in adaptation. So we characterize these genes that retain the duplicated and gained and we perform the enrichment analysis using Goa tools to function and characterize these genes and understand what changes occurred along the phylogeny to to understand what are the most recent changes that created that made atrophila forest different from the other termitodes. I don't have time to talk about all of these genes today, but I want to focus on on the genes that arose in atrophila forest winter borne through duplication actually through most recent duplication from the Xephidiata opistocata ancestor. 24% of the genome has originated through the most recent duplication events and among those genes there are 13 hogs with more than 10 duplications. Two families, two gene families or two hogs have attracted our attention most because they had more than 30 genes from atrophila forest winter borne and these included glutamide synthase, hog and metal hydrolase glycoprotease family. We used code amount to detect signatures of positive selection along the branches of the gene trees of these gene families and we detected that 13 genes of atrophila forest winter borne is under positive selection in the first family and all of the genes in the second hog are under are evolving under positive selection. We focused our analysis on the second family because of that result and we reconstructed so we did structural analysis of the protein that the gene family creates and we colored the sites of the protein by the probability of being under positive selection. And we found two, four sites with a probability of greater than 95% of being under positive selection and two of these are DNA binding sites, but you might still be wondering what do these, how do these families might be contributing to adaptation to parasitism. Is there any, any underlying literature background that shows that these, these genes might be involved in any functions. Yes, there is the first family of glutamide synthase. So glutamide synthase is an enzyme, which is involved in proline production. And it's a production of proline that is based on our gene derived from the host issues, and has been already researched and it's also implicated in modifying host antioxidant defenses, and it is a marker of glial cells, which are immune cells of the nervous system. And these have been found to be enriched or duplicated in parasites that affect the host behavior, such as microfellows papillobustus or tricobalcarthia. The second family was a little bit more enigmatic, there is not so much known about this family and not much can be inferred about its function without experimental evidence. But we know that the enzyme is implicated in DNA repair and metallopeptidase is a metallohydrolase is is a metal peptidase and has been implicated in other parasites such as the nematode strongloid as papillosis in host tissue digestion. So, I just want to thank you for listening thank the organizers for this great session and I want to thank Natasha Glover, especially for for this work that I was able to do with her. Christoph Decimos and David Moore as well from University of Lausanne and Hannah Hartikainen, Stefan Soler and Yucca Yocula from ETH Zurich. And thank you for your attention. Thank you very much Natalia for a lovely talk. There's, there's one. Oh, we have two questions from the audience. Maybe we'll be up to address at least one of them. So, firstly, 31 out of 31 genes under positive selection in a HOG seems like a lot. Which model did you use in Pamel, and which percent of the other HOGs had positive selection with the same model. So, it is what we did was we reconstructed first the gene tree for this family with IQ tree maximum likelihood tree, and we then use the branch site model in Codemail to detect branches under positive selection. So, we know we have detected positive selection on all the branches that we've tested. But the, the mostly we focused on the branches that led to the expansion of, of the, of the genes in at your fellow for us. Thanks. I think we might have time for one more question from Julian. Immune related genes and peptide peptidases are usual hits and positive selection screens. Do you think that something special happens in your system, more than in other sister groups. You mean for the metal peptidase is that the is the question about its peptidase in general so I guess it's referring to to that. So, I feel that that it is a little bit different because in this gene family there were about one to five genes from other termitodes and our termitode had a massive expansion of these gene family with 35 or 31 genes in it. So, I guess it is a little bit different. But nothing can be said without experimental evidence and further research into that. Thank you. So, we are going to move to Alejandro Vecano from the group of Joshua Payne, who is going to talk about modeling the effects of mutation bias on adaptive evolution. Yep. Hello everyone. Yeah, thanks for the introduction and thanks for the, for the opportunity. Yeah, I would like to start by posting a question that has been recently regaining some attention. That is, what is the role of mutation bias in adaptation. The answer is what depends mainly on first on one hand, population dynamics conditions, more specifically on the mutation supply. In the low mutation supply scenario mutations are so rare that the beneficial mutations that are more likely to occur due to a mutation bias are the ones more likely to go to fixation. In contrast, in the high mutations supply scenario mutations arise so frequently that it's more likely for selection to choose the beneficial mutations with the highest selection coefficients, instead of the ones favored by mutation bias. On the other hand, the shape of the distribution of fitness effects can also limit the role of mutation bias, for instance, in cases in which the number of beneficial mutations are not given environment, it's very low. In addition to the effect of mutation bias, the main idea is that we need two ingredients. First, the mutation spectrum of a given species as an output of some study like a mutation accumulation for experiment, for instance. And second, a very large set of bona fide adaptive point mutations of the same species, but that comes from a totally different experiment. The way that we can count the number of times a point mutation from each codon to a given amino acid happens. And then the idea is to use a generalized linear model to predict the spectrum number of events of the adaptive codon to amino acid exchanges, while being completely agnostic about the selection coefficients of such mutations. That means that our model only depends on the number of times the codon appears in the genome and its probability to mutate to a given amino acid. Then we can use regression to determine the value of beta that will give us information about the contribution of this mutation term to the expected number of adaptive events we see in the data set. Okay, so we did this for three different species, and then you can also see the number of adaptive mutations or adaptive events that we use for the regressions. And this is what we got. So surprisingly, the mutation term parameters seem to be relevant and significant for all species. However, when we quantify the correlation between the observed adaptive events and the ones predicted by the model, we see quite a difference across species. So in order to make more sense about these results, we decided to generate synthetic data using SLIM that is a very efficient evolutionary simulation framework. That I do not have the time to go into the details about all the simulations, but basically we are able to construct synthetic data sets under different population genetic conditions. So here are the results of the regressions for these synthetic data sets for different population sizes. So since the mutation rate is fixed, every time we increase the population size, we're increasing the number of mutants in the population. So we are increasing the mutation supply basically. You can see here how the influence of the mutation term in the regression decreases as the mutation supply gets higher. And similarly, the correlation between the observed events and predicted adaptive events vanishes as mutation supply increases. So we can conclude that low mutation supply favors the fixation of the mutations that are more likely to occur, showing like a first come first serve dynamics. And when this is the case, we can make better prediction of the adapted mutations, knowing the mutation spectrum of the species. So basically the take home messages quickly summarize are that first we can quantify the influence of mutation bias and adaptive evolution using generalized near models. And in addition, we see that mutation bias influences the genetic changes that write adaptation on the range of population genetic conditions, particularly when mutations are rare. So in this low mutation supply regime. In the future, we hope to get to get a better understanding of the role of the distribution of fitness effects in this modeling. In this modern framework. And with that, I would like to thank you again for the opportunity and for your attention. Thank you very much. And we are now, we are now going to move and to hear Julia Pereshka from the group of Maria. I'm going to talk about the impact of MDRT a TV strain background and transmission fitness less. Hello everyone I hope you can hear me okay. And let me start my presentation. All right, so I'm going to talk about the work I was doing for my PhD actually in the group of Tanya Sadler. And just to demystify the title a little bit MDR TV stands for multi drug resistant tuberculosis. And first things first just to tell you why we want to walk to work on tuberculosis in the first place. I think that people tell me a lot is isn't to be eradicated. Well unfortunately not it is one of the top three infectious disease that that causes worldwide. I guess unfortunately this may change in the well in the near future, because of COVID but so far this is the case. It is very slow evolving bacterium which actually has certain implications which I will talk about in a second. And it also has tree like evolution, which is very helpful for any kind of follow genetic and follow dynamic methods that we may want to apply. So what would we like to do is you would like to estimate drug resistant transmission fitness so how much fitter or how much less fit our drug resistance strains in comparison to more drug sensitive ones. But if we think about the types of drug resistance that are present, one of them could be transmitted drug resistance where a single individual or single patient acquires drug resistance after treating after treatment, and then goes on to spread that drug resistance to other patients. Another option would be to have acquired drug resistance where different patients independently acquired drug resistance on the run. However, what we actually mostly get isn't this full beautiful picture with histories of how drug resistance appears. And actually we think about it what we do in fact get is just the dots not even the trees themselves and this is also kind of the traditional epidemiological approach where you look at your samples at a different points in time so you look at the prevalence for example so you know how many drug resistant how many drug sensitive strains there are, but you don't know the evolutionary relationships between. So this is what we're trying to address. So what we would like to do is we would like to take drug resistant and drug sensitive strains of tuberculosis take their genetic data, and their drug resistant status is together with sample dates, and then you can use yellow dynamics to estimate the relative transmission fitness so we want to focus on relative transmission fitness because that would allow us to compare different strains between locations. So if we know that a certain drug resistance gets us a fitness costing comparison to a more drug sensitive strain, we will be able to look at the same drug resistances in different in different locations for example. However, the one big problem that comes from the fact that to be is a very slow balding terium is that there is actually basically very little collection of drug sensitive strains for the longest time we actually thought that it's mainly clonal so there isn't much variation in there. So we actually don't even have the samples that we need for analysis. So instead what we opted for doing is we opted for estimating. There's many many many samples for multi drug resistant strains strains that are resistant to these two of the first line drugs. So we opted for estimate the transmission fitness of an additional drug resistance on top of that and that particular drug is called pyrozinamide or PCA for short. So when we're looking at the situation at the data set that we had from Georgia, when we look at the sequence data and analyze it to get our transmission fitness costs we actually see that there is about a 35% fitness loss for the drug resistance so the drug resistance strain more drug resistance less. However, we decided to also see what happens if we remove the genetic data from the picture. This would be kind of an approximation of more traditional epidemiological approaches. So what happens is we actually greatly overestimate the transmission fitness costs so we see more than 50% reduction in spread. This basically tells us that it is very dangerous to not look at the genetic data and we basically face the danger of overestimating the fitness cost and underestimating the danger of drug resistance strains of tuberculosis. Moreover, we actually got, well, we had other data sets available. So then we're looking, this is the data set that we had available first from Kachasa, lineage four, one of the main lineages of tuberculosis. And we kind of have, we got a result that's very compliant with the general expectation in the field that pyrozinamide resistance knocks out a chain so it causes a fitness loss. However, when you look at a different data set from Georgia, which is a different lineage of course, we already see that we actually don't see that much of a fitness loss and this could be discounted on the fact that it's a different lineage. However, when we look at the same lineage from, as from Kachasa, the lineage four in Georgia, not only we do not really see a fitness cost, well, we cannot really rule out a fitness advantage either. So these are the main conclusions from that. And actually, in my future work, what I would like to address most is the data collection. We did communicate with some of the people who were collecting the data. There's still not enough data on the sensitive strains that we would need to perform the analysis that we want. So I would like to really foster better communication between different fields and different scientists to make sure that the data that is collected is what is most useful. However, the other thing that has always been my little bit of a pet peeve is that most of the time when we do tree inference on certain data it's done on a multiple sequence alignment which has been inferred based on some sort of a guide tree. So this is actually what I will be working now or have been for the past three months in Maria Nisimo's group, which is simultaneous multiple sequence alignment and tree inference under an indel model. So this is my future work and thank you very much for listening. And yeah, thank you. Thank you very much. Yeah, so we're running a little bit out of time. So now we are going to listen Victor Rossier from the group of Christoph Decimus. We is going to explain us why phylogeny driven and alignment free pudding family assignment is an accurate and scalable alternative to methods relaying merely on closest sequence. So hi everyone. Okay, thanks for the introduction. Let's start them. Okay, so this is a gene tree that depicts the evolutionary story of a gene. And together all these genes forms a gene family. So you might have noticed here that some of the genes belong to the same species. And this is because unlike species genes do undergo duplication events. And therefore this leads to multiple copy of the same genes per species. So to have a final grain classification of these genes, you can define gene sub families. And the genes of families do carry important informations, such as their function. So here myoglobin is really specialized for the storage of oxygen in the muscles where hemoglobin is there to transport oxygen in the blood. So you might understand that if you're interested in functional annotation, for example, you must consider gene sub families. So this brings us to the challenge that we are trying to tackle in that study, which is to assign some unknown protein sequences to the hierarchy of gene family and sub families. So someone could ask basically, can't we simply use the same sub family as its closest sequence with tools such as blast for example. So in that example, the striped line depict the true location of the query was where the star is the closest sequence location. And that example, it looks fine, because both belong to this hemoglobin family sub family. And therefore it would be fine to use the closest sequence here. However, it is not always the case. So in that second example, the query has diverged before the duplication, and therefore it's closest sequence is located in a different sub family. So then we have quantified scenario where the closest sequence would mislead the query in a member specific sub family. We have in the y axis, the fraction of queries that belongs either to the same or a different sub family than its closest sequence. And the key information here is that up to two thirds of closest sequence belongs to a different sub family than their queries. So to address that problem, we have developed a new method which is OMA mirror, and it uses some phylogenetic information and also an alignment free measure and tries to make more precise assignments. So here I will show you the key algorithmic ID behind this method. So the goal really is given a query is to place is to find out where in a reference hierarchy of family and the family, it has diverged from so whether it has diverged from the green, orange or the purple branches here. So for that, we extract the subsequence of that query and compare it against the reference sequences. So for that first example, so this subsequence or camera is actually informative to the orange sub family and this is simply because it is shared with one of the reference sequence. But the trick comes when subsequent is actually shared between more than one sub family. OMA mirror will consider these two subsequence to be homologous. And therefore, because they already this subsequence originated before the duplication, it is informative to the green family. So you can imagine that by combining multiple of these gamers or subsequences, you can make precise assignments and this is what we have shown with a benchmark against a closest sequence approach. And so this is a precision recall curve. And the first thing to observe is that OMA makes very precise assignments. And this is because when you vary the score threshold in OMA mirror, you actually make assignments to more or less specific families. And by contrast, the closest sequence is always bound to the same sub family. So then we compared also the accuracy. So the trader between precision and recall. And this is usually equivalent between both approaches. And finally, the real added value of that method is that it's very fast because it's an alignment free algorithm. And also it's very scalable. And we have compared OMA mirror against a very fast alternative to blast, which is diamond, and we have shown that we can classify up to above 200 sequences per second on this regardless of the number of different genomes. And this is because when you add a new genome, you don't necessarily add new sub families or families, but you still have to make many more comparison with diamond. So thank you for your attention. Thank you very much, a very mature Victor. And I don't need to please all keep your questions for the meat speaker for a little session. So now we are going to move to Matilde from the group of Luciano, and she's going to talk about ancestry and interactivity in a little bit in the age tree visual user. Hello. So it's okay. Yeah. So I'm Matilde, I'm bioinformatician that IRB in Ticino. And today I will present you ancestry. It's a software I developed to display immunoglobulin lineage tree. So first of all, just a brief introduction about antibodies. So basically, when you have to be safe before secreting the antibody, it will display the antibody on the surface. And then antibody is composed of two, any chain and two light chain. And you will have the antigens that will bind to the antibody on the top of it. So this region is called the variable region. And the antigens will fix more specifically into the CDR regions that are composed of loops. So if we just have a closer look on this variable region, so you have the eddy chain on the top and the light chain on the bottom and the CDR regions that are along the sequences. So in the general center, in order to increase the affinity of your antibody with the antigen, you will have a set of mutations that will occur professionally into the CDR region. And that will allow us antibody to increase affinity with antigen. So this is called affinity maturation and this mutation are called somatic experimentation. So this is why we develop ancestry in it was to have a closer look of this somatic experimentation events in a criminal family of immunoglobulin of interest. So ancestry is developed in Java. It can work in with two kinds of inputs. So the first one is when you have a criminal family of interest against an antigen. So all the sequences are related. So basically you just infuse a common ancestor using some bioinformatics tools specifically for this. And then after sequence alignment, you use the DNA ML and it will infer a phylogenic tree and the DNA ML output text file can be used as input into ancestry. So the second way to use ancestry is through the implantation workflow. So this is when you've done some repertoire analysis we call so this is IS report sequencing of antibody sequences. And so there's very nice software that is called Changer that will do the clonal clustering of your criminal family. And he will also infer the common ancestor of each criminal family. And they also develop this algorithm that is called IG5 ML that is an algorithm specific for immunoglobulin to infer a phylogenic tree. So at the end you have a set of output files that can be used as input into ancestry. So in both ways ancestry will just display a graphic to the interface, which is mainly so the phylogenic tree of your family of interest, which is the tree's interactive. And also you have some specific features such as sequence alignment, which in this case you have the common ancestor on the top of the alignment. And ancestry will also create XML file that can also be directly uploaded to ancestry later on. And so now let's take a steady case. So we took this antibody sequences from the paper. And so it's six antibody sequences that are against the fusion protein of the RSV virus. And as you can see there is one antibody that has a lower affinity compared to the other one. So we made the chronal family and we inferred the tree of this chronal family. So this is a graphic interface of ancestry. And as you can see straight away, the guy I was talking about with the lower affinity is already a part in the tree. So you have the possibility to interact with tree, like if you click on each node you have access to the sequences, also some features specific to antibodies. And as you can see also between each node we write the number. So the first number is number of nucleotide mutation. And the second one is the number of amino acid mutation. So here you can see that all this antibody sequences here share a common mutation here. So if you click on it, you see what is the nucleotide mutation and the amino acid mutation that you see this one occurs into the CDR1 region. And here you can extrapolate that this mutation is probably responsible for better affinity with a fusion protein of RSV. And another way also to see this share mutation is through the sequence alignment, which is very handy. You can see straight away the share mutation. So please go to my GitHub webpage where you can download the program. It's a jar file. It's very easy to install and there is also documentation. I thank you all for your attention. Thank you very much, Matilde. So now we are going to move to Diana Ivette Cruz-Davaros from the group of MSFO Malaspinas. And she's going to talk about unseen population genomics of Brazilian Vodokudo groups. Thank you. So today I'm presenting some results of my PhD project and the question I'm introducing is, who are the Vodokudos? So by the beginning of my PhD, I learned that Vodokudo was a label given by European colonizers to some groups in Brazil who shared some characteristics. Some of these characteristics are that they had a lifestyle as hunter-gatherers and they were wearing these wooden discs in ears and lips. Also, some anthropologists have called our attention by mentioning that these individuals have a special type of chronometry that's common among ancient people in the Americas. And it's called polyamerican chronometry. An example of a population having this type of chronometry is the Lagoa Santa people, who I will refer to as ancient Brazilians in this talk. Their remains have been found around Brazil, about 10,000 years old. And if you're more interested, I recommend you to look for the Lucia woman. So due to this characteristic in these groups, it's been, the anthropologists have wondered whether these Vodokudo people have a special link to the Lagoa Santa population. Besides this point, there is little that we know about the Vodokudo groups. Most of our knowledge comes from the contact with the European people, and actually, due to several conflicts with the colonizers, they became, they were exterminated by the beginning of the 19th century. In our project, we established a collaboration with the National Museum in Rio de Janeiro. They have 35 schools that were labeled as Vodokudo. And we are analyzing 22 individuals from this collection. They also provided samples for two other individuals that don't have this label of Vodokudo. And we started by radiocarbon dating these samples. And it turns out that the individuals of unknown affiliations are a bit older. So they probably died before a contact between Brazil and Europe that is set at 1500, whereas the other individuals of the Vodokudo collection died probably in more percent times all of them after a contact with Europe in 1500. So the next thing we did was to sequence the DNA of the 24 individuals mapped to a reference genome and compute the depth of coverage. And here I'm showing you the depth of coverage, each of these samples, so that you know the type of data we are working with. Most of them are at 1x or below. But there is one sample here that is at 24x, which is quite good for an ancient DNA project. And before I jump into the results of comparing the DNA of these individuals to other groups, I want to let you know that in 2014, my supervisor already analyzed two individuals of the collection that I'm not including here. And also we expected them to be Native Americans. What she and her colleagues found is that these individuals that are here were actually Polynesians. So what I did first was to screen this dataset for outliers, such as Polynesians. And remember we have 24 individuals, this is the coverage. And I'm comparing here two groups from other places in the world, Africa, Europe, etc. What's important is that we have Polynesian groups and we have Native American groups here in purple. So the Polynesians are grouping here and in black I have these dots that are the individuals from 2014. So I'm recovering the signal that's good. And in black I have my samples. So let me tell you what happens. When I compare them to this dataset, what I observe is that all the individuals are grouping with Native Americans. So there is not a clear outlier in this dataset. And I still recovered the Polynesians from 2014. Our individual seems to seem to be Native Americans. But now we wonder whether they could still be at mixed with Polynesians or other people that became in contact with the Americans with the Native Americans in recent times. And the answer is that by doing a mixture analysis, I don't find any signal of a mixture with Polynesian or other people out of the Americas. So once we knew that our new data was from Native Americans, we decided to model the ancestry of these individuals. But now in the people in the Americas trying to compare them to what we know. So we took whole genomes from modern and ancient populations. Some of them are quite old, 24,000 years old and some of them are a few hundred years old here in red. So here I'm showing the relationship between these genomes as far as we know. And now here this dot is to show you the ancestry that is shared by Native Americans. So I will just try to complete this graph and showing you there are some splits going on here, here, here, then some here. So this is the ancient Brazilians I was mentioning the Lagoa Santa. Now I have a Mexican genome here, the Mije. Also I refer to this genome as Mesoamerican genome. And we need another branch coming somewhere here because there is not a good genome that is explaining this stream of ancestry in this population. It comes from something that we call a ghost population. Now also something that we know to explain the ancestry of the Lagoa Santa is that they received some input from a population related to Australasians here. So something that so far has been found only in this population and a few other modern populations around Brazil, still puzzling and we don't have a really good explanation of how this could happen. So now the question I have is where can I add our population, the Potocudos in this graph. So I think we run some models and compare them. And I'm explaining here three equally likely models are best models to explain the ancestry of this group. So in all the three models I need an input of Mesoamerican ancestry. So something that's related to this Mexican genome that I have here. And this stream of ancestry branches out somewhere here. So this is common to the three models. Now I also need in the first and second model I have listed here. We need an input of an ancient Brazilian population. They are so in this regard these two models are similar. But there is a slight difference. So I will put the first model blue here. And the second model. So I hope you see that the difference between these two models is when the branches diverging industry in this topology. So in one case the branch comes before this logo Santa population receives some inputs from the Australasian group here. And in the second case it comes after the this branch has received the Australasian component. So that's the main difference between these two models. And in the third model to explain the ancestry of the protocols we need something similar like what happened with the Mexican genome. An input of ancestry from a ghost population for which we don't have a good gene to explain this ancestry. So these are three good models and they are non mutually exclusive. So I want to close the stock just to remind you we have new data set of samples they are native Americans. They seem not to be descendants of the ancient populations in the area the logo Santa because in addition to this component we need some Miss American ancestry and probably a ghost population. So I want to just be further assessed and I want to think of all our collaborators and people in my lab and thanks for your attention. Thank you very much to you for an amazing talk and one of the first non questions that I got was just commendation on the innovative presentation so we have a question here from Julian, who asks, what about these two what about these two. individuals from the 2014 study. Why were they assigned as 100% Polynesian. Yes, so this wasn't a mixture analysis and they were compared to people from all over the world. So, this was a simplified view. They have East Asians, Iberians, Native Americans, and Polynesians. And you're just from this program, you compare to the genomes of these populations and you observe that the genome is simply similar to that of the Polynesians in this group. So, they are of Polynesian ancestry and not Native Americans. Thank you. Thank you. Thank you very much. And now we are going to move to the last talk of this session, which will be presented by Tristan Kumar from the Germ Gouda group. And he's going to tell us about the genomic tales and the story of European bound. So, hello everybody. Thank you for being here. So, as Gwen said, my name is Tristan Kumar, and I'm a postdoc in Jean Gouda's group in the University of Lausanne. And today I'm going to present you a part of a large project based on Barnall genomics. And I will focus on the history of this burn in Europe. So, what you may know, and what you can see, well, yeah, what you can see on this map is that nowadays Barnall is a cosmopolitan bird with a wide distribution all over Europe and around the Mediterranean Sea. And what you probably also know is that during the last glacial period, which occurred between 20 and 50,000 years ago, most of the European continent was under ice or the ground was frozen almost all the year, meaning that many species, most of the species had to refuge somewhere sooth, somewhere in the south. And it's illustrated on this map, where you can see that trees are around the Mediterranean Sea while the northern part of Europe is classified as tunra or ice. So then with the climatic warming, this population reconquered Europe via different routes, and these different communities may meet and annex. So, if we actually know the history of this organization for many species, the history is still not clear for the Barnall. So, to try to answer this question, we sequenced World Genome for about 100 individuals from 11 populations, and they represent the actual repartition of the bird in Europe. And the common kind of analysis that may be done with such kind of information is to study the genetic structure of population. So this may be done using many methods and the one I will present today is to use the genetic clustering method, which allows to visualize the genetic ancestry of individuals. To do so, genetic diversity, observing all the individuals is used to infer the genetic properties of gay ancestral population, and then individuals are assigned to these clusters. This means that they may also be direct dissonant from only one of these ancestral population, or they can also be the results of the admixture of these different clusters. So, here on this map, you have the results for K equal 3 in the European Barnall. So on this map, each pie is an individual and the different colors indicate the contribution from the different ancestral population. So what we can observe is that we have the three main clusters, the three main colors that are geographically distributed. We can see that the yellow cluster is spread from Iberric Peninsula to the northern part of Europe. You can see a green cluster, which is in the actual Italian Greece, and the purple cluster, which is the only one present in the Near East region. But what we can also observe is that from the northern population of Europe, and especially here in Serbia, we can see that we have contribution of both green and yellow cluster and these individuals. And one last point that we can observe in this map is that our Portuguese samples do have some contribution of this purple cluster. So, from these results, we can start to build a scenario for the Barnall in Europe. One hypothesis that we may have formulated that during the last glacial maximum, Barnall were isolated into three main places. The first one, let's say in the Iberric Peninsula or somewhere in this region, a second one between Greece and Italy, and a third one in the Near East. And our result suggests that Barnall mostly are conquests Europe via the yellow cluster. And because this is the main, this is a cluster must represent in all population above the earth. But we may also hypothesize that the Greek or Italian lineages follow the Danube to conquer the Angaea plains where the two lineages seems to meet and hybridize. And the presence of the purple cluster in Portuguese sample may reflect some gene flow from the north of Africa, maybe during the last glaciation, but we still need to make some analysis to confirm that. So, obviously, all these results are preliminary, and well, it's just one random results, but I hope you enjoy this short presentation. If you have any questions, feel free to ask me any question, and I would just take two seconds to thank everybody involved in this project, organizer of the session, and you for your attention. Thank you. Thank you very much, Tristan. So I want to thank all the speaker for this really great talks. And thank you for everything. And now we are going to move to the meet the speaker session, and we'll be able to ask all the questions. I'm sure you have.