 So today, Tanya will share with us some of the phylogenetic tools a group is developing in order to understand epidemiological processes, microevolution, and things like analysis. So Tanya, thank you again for accepting our invitation and the following news. Thank you very much for this introduction. Thanks all for coming. So I would like to, in the next 45 minutes or so, to show you the work of our group, how, well, the main focus is that we develop phylogenetic tools in order to understand biological processes at all different scales. And today I'll show you examples from Ebola over tuberculosis, stem cells, and penguins. And if there's anything unclear during the talk, please feel free to ask. I tried then to remember to repeat the question that our friends online can also follow. So in biology, one basis is that we have reproduction and genetic change. And this reproduction happens on very different scales. So we can have DNA replicating, virions, then prokaryotes, bacteria replicate from new bacteria, your carotid cells, say cells in your body, whole eukaryotes we reproduce, but then also whole species reproduce, meaning we have speciation events. So we have reproduction in macrovolution as well as, as well as infected hosts, where one host infects another host, which can be seen as a reproduction event. The pathogen travels from one host to the other host. And this one can illustrate, I guess, with those bubbles. So one bubble is one unit at any of those cares, say a eukaryote, this eukaryote produces, and then during reproduction, they may be slight changes in the genome, some mutations, so we may get slightly different offspring indicated here with slightly different colors. So now when we follow this reproduction process through time, we obtain a lineage tree, meaning we display this reproduction process by bifurcation. So in this tree, every bifurcation would be a reproduction, say a speciation event. And then the genetic information characterizing this unit of which undergoes reproduction may change along all the branches in this tree, which we call a phylogenetic tree. And now one of the aims of a key aim of our research is to use this genetic information, which we obtain from different organisms, essentially at the tips of this tree, and aim at reconstructing this phylogenetic tree. And with that, then understanding this reproduction process, be it macroevolution and species, in fact the host would mean understand epidemiology or any of those other scales. And now I want to walk you through the main applications we work on, so on different levels of evolution. So the first one where I started actually already doing my master's thesis back in New Zealand is on macroevolution, where a phylogenetic tree represents species relatedness, tips being extend species. Here you see the human being the most closely related to chimpanzees. And then based on a time scale, meaning the length of the branches in this tree, you can read off when the speciation event happened. So between human and chimpanzees, we can read off that roughly six months ago, there was the speciation event. And people, so now people reconstructed for decades those phylogenetic trees, but what we are really interested in is understanding the dynamics giving rise to those trees. So the speciation and extinction rates giving rise to those trees, and in particular are the differences in speciation extinction rates across different species, which may then become important say for conservation biologists who want to know, well, which species are most more prone to extinction than other species say in a changing environment. The second key application of phylogenetic is by now epidemiology. And here again, we have those trees, now though, each tip represents an infected host, and branching events are transmission events between infected hosts. And here I should point out that the genetic information, the data we use for reconstructing those trees is now the passenger data from each infected host. So not the host, genetic data, but the passenger. Meaning when we reconstruct such phylogenies, really we reconstruct the ancestry of the passenger, meaning the transmission history. And then similarly, as for macroevolution, we want to understand the dynamics giving rise to those trees, which allow us to understand how do pathogens spread, how quickly do they spread, what can we potentially do to control or even prevent further epidemics. And the third application we work on is on single cells. So even a unit kind of smaller. And here I show you an application of blood cells. And the phylogenetic trees now really represent single cells dividing. And the cool thing in this application is that you can watch those cells dividing. So compared to epidemiology and macroevolution, you don't have to infer the phylogenetic trees, but you can actually observe them. And then you can ask questions like how are gene regulation, how are genes regulated along those branches, or what I'll show you later. And towards the end of the talk, can we actually identify stem cells by looking at such single cell trees? So in summary, what we aim is, we want to understand the evolution. So how does the genetic information change? As well as the population dynamics, be it speciation, transmission, cell replication, by looking at those phylogenetic trees. And I want to illustrate the procedure and some of the mathematical background for those tools in the example of epidemiology. And then later in the talk, mention how we can use such tools also in macroevolution and single cell biology. So in epidemiology, the field of epidemiology is concerned with understanding how do pathogens spread in different host populations. In particular, one particular concern being spreading in humans. And all the examples I'll show you will be pathogens affecting us humans. And so, classically in epidemiology, people used incidence and prevalence data, meaning how many people were sick per week over the course of the epidemic, and then try to understand the dynamics, kind of how fast is or was the epidemic spreading. However, there's some limitations to that, which I'll point out in a minute. And phylogenetics can ask similar questions. So here now, we use genetic sequencing data from the pathogen of different hosts as an input in further phylogenetic tree by costuring together similar pathogen sequences and interpret this as the transmission chain. Having in mind if two people infected each other very recently, the pathogen sequence is most likely very similar because the newly infected individual started with a pathogen from the donor very recently. So there was not a lot of time for mutation. While if there was a transmission event months or years back, there was a lot of time for mutations. So the two pathogens would be very different and would be very far away in the phylogenetic tree. So now those phylogenetic trees, we take them as proxies for transmission chains. And the first time they become really important in the field of epidemiology, I guess was to assess the question of the emergence of HIV. And so HIV was identified in the early 1980s with people dying of AIDS. And then finally in 84, the virus was identified causing AIDS. We call it HIV. But the big question obviously was where did this virus come from and when did it start spreading in a human population. Now with this classic incidence and prevalence data, you can start looking at your epidemic once you know about it and then count cases through time. But you can't go back and retrospectively count. However with sequences, you can reconstruct your phylogenetic tree and actually observe transmission events way before you've sampled your sequences. And with HIV, for example, in the 90s, people sequenced human HIV as well as another virus called simian immunodeficiency virus, which is about a hundred different simian species, so different monkeys. And it was the aim of identifying from where did this HIV come from. Then those people built a phylogenetic tree and saw that the main HIV strains circulating in a human population all are nested within a chimpanzee clade of SIV strains, meaning using genetic data people could conclude that HIV was sonosis from chimpanzees actually happening several times. It happened in Africa, that's where the chimpanzees live and where the first cases of HIV are later confirmed. And with modern phylogenetic tools, one can even also date those sonosis and people date them now to the early 20th century. So here maybe I should add, I said HIV comes from chimpanzees, so there's also the second type of HIV, HIV2, which is not very prevalent in the Western countries, but it is prevalent in Sub-Saharan Africa and that comes actually from Suti-Manga bees, so it was a different sonosis which happened also a couple of times, so there were a couple of host jumps from those Suti-Manga bees into humans. So meaning for now years, phylogenetic tools were used for pathogen data in order to understand epidemics. And people reconstructed the trees and then read off from the trees say where, when and where did a sonosis happen. And more recently people then started to also ask about the dynamics of a pathogen spreading. So how fast does the pathogen spread? Can we make predictions in a week and two weeks how many people will be sick if we don't do interventions or if we do interventions? And those dynamics were for example being estimated in the swine flu epidemic in 2009 where people estimate the so-called basic reproductive number are not, and this will come up again in our talk, are not for epidemiologists is a key quantity and it tells you how many people on expectation one single infected individual makes sick. So say I have a new influenza strain and I now infect two of you, then this or not would be two, if two was also the expectation how many people have a different factor. And so here for swine flu it was estimated to be roughly 1.3, 1.4. And this approach though to quantify transmission rates or this are not using sequencing data has some limitations. So what you would probably think one should do is the field of epidemiology is existing since 100 years if not more and people have very sophisticated models in that field and one should use those models together with genetic sequencing data to quantify parameters of those models. What is done though mostly is that in followed genetics due to mathematical convenience people don't use those ethnological models instead though use the so-called coalescent or population genetic model and infer and this the coalescent assumes some population size through time and essentially people infer this population size through time and equal that to the number of infected through time. But there you make some approximations and assumptions that you can actually project your ethnological model onto the coalescent which in certain circumstances can be rather problematic. And so our real goal is to use those explicit ethnological models and quantify their parameters. And this is what I'm now going to show you. So the overall idea what we do or assume is we first assume a population model. So how does the epidemic spread? We have parameters for transmission and recovery. So in a simplest it will be say a constant rate versus death process. You have a constant rate of transmission, a constant rate of recovery. And with those parameters you summarized in ETA you can simulate a tree branching being birthed and termination being death. And you get a probability distribution for trees given those parameters. The second component then you need is how do your genetic sequences, the pathogen sequences change through time. So you assume some evolutionary model, some mutation model, be it a Schuchts-Kämter model or more sophisticated, more general models, specifying how do the nucleotides change through time for your genetic sequences. So then given you start with the pathogen of destroy sequence at the root, the sequence accumulates mutations and we end up with sequences towards at the tips of this tree and we get a distribution of sequences given our evolutionary parameters theta and the tree. Now obviously we don't know theta and we don't know ETA these parameters and we don't know the tree. What we know though is this alignment, the sequences, that's what we get as empirical data. So what we are now doing is we aim to infer the probability distribution of our parameters to ETA and theta together with the phylogenetic trees given the genetic sequences. And this is also called a posterior distribution. So we aim to infer our unknown quantities given the data and this posterior typically can't be explicitly calculated. So one rewrites this quantity using Bayes theorem which is kind of a basic probability theorem and anyways when you rewrite that you get the probability of sequences given the C-tender tree which is exactly what our evolutionary model describes. We get the probability of our trees, the transmission trees given our transmission rates and recovery rates which the first model describes and then we have priors and our parameters which is typical for Bayesian methods. So the user can assume potential certain knowledge and those parameters and we have a normalizing constant. And so once we then we assume some models, have the quantities, those probability distributions, we can implement all of that in an MCMC approach, a Markov-Chamon-Tecala approach, feed in then the sequences and get back to posterior distribution so estimates for our parameters and the phylogenetic trees. And I am mainly interested in proving the quantity, the probability of the tree given those transmission and recovery rates because there I had the feeling what was being used, those coalescent approximations can in different cases fail and lead to severe biases. Okay, so what kind of models do we assume for the epidemic strata? Typically people in epidemiology assume so-called compartmental models so the simplest one being you have a compartment of susceptibles, infected and recovered people. Now and the dynamics between them is the infected people have a rate of infecting the susceptibles which we call here lambda and a rate of recovering which we call here delta. And so far this is the most plain epidemiological model. Now we assume a probability p of sampling an individual meaning sequencing their pathogen and including them into the phylogenetic tree analysis. And such a model now gives rise to a whole transmission chain so branching events being transmission events, tips in a tree being those recovery events and tips with a cross being non-sampled events and tips with a orange ball being sampled pathogens. Now obviously we cannot know anything about our unsampled individuals so we proved away from this complete transmission chain the non-sampled individuals and we end up with a phylogenetic tree connecting our samples and this is precisely what we would aim at estimating for more genetic sequencing data. And now the question is what is the probability of such a reconstructed phylogeny under our epidemic model? If we know that we have the component or essentially we can determine those parameters best explaining the phylogenetic tree and then the overall thing will be done in the spatial framework where we info all the trees together with the parameters. And you know I mean I showed you the simplest model but now you can vary those parameters to time and can assume different rates for different types of individuals say if you live a different geographic location you might have a different transmission rate or a different recovery rate because you have different treatment accessible so you can now generalize this model in different ways. For the very simple case where actually we only have infected individuals and they infect so we have kind of a pure or a constant rate first test process without any saturation because there's limitation of susceptible individuals we luckily actually can write down the full probability distribution of trees analytically and so we just then use those equations put them into the MCMC framework and then let this MCMC infer the transmission and recovery rates. And so over time this is kind of the equation I showed is essentially a model where we can analyze epidemic outbreaks over time then we added the fact that we have a limited number of susceptible individuals so we shouldn't have uncontrolled growth or we shouldn't assume uncontrolled growth because this is unrealistic for an epidemic there's only a finite number of people who can get infected and epidemiological parameters may change to time because well there might be public health interventions we may change behavior etc and then as briefly mentioned the population may be structured so different people in different ports may actually undergo different dynamics and all this is implemented as an m patient tool in in beasts so this is a very popular follow follow genetic software tool not implemented by us but we essentially write the add-ons for different tools of different models so we write add-ons for assuming different or new epidemiological models but using their machinery on doing the MCMC on trees and searching for tree space and we also have some our implementations but once we know something works well we actually put it typically into beasts because it would be more widely available yeah and so far not that's actually something we're currently working on if you have articulation say if you look at bacteria and your plasma exchange or so but so far actually there are no real well there are no tools where you can really investigate the dynamical behavior of systems with reticulations there are some tools giving you the reticulation network but then not kind of the dynamics are well so I should repeat the question so it was all now about reticulation and the question being now since there are no tools for the acknowledge your models with reticulation if it's an advantage of the coalescent I'd say well yes and no there are some frameworks where you can allow for recombination of coalescent but they also well that means break down if you're if you're a data size is getting to sizes which we typically analyze already only a few hundred sequences which you need to make really really good ethnological estimates then you can have very many recognition events and the whole tools get very very slow and we are right now actually developing something as an analog on kind of your first death and recombination rates and trying to infer those rates but I would say all those tools so far including ours are very slow and not mixed in certain circumstances but then may break down for the data sets even good so now that I showed you all this kind of theory behind how we define our models what we do computationally I want to show you an example a data set or data analysis we did on Ebola in West Africa which was mainly spreading all of 2014 so it was mainly affecting as you probably all right Liberia Sierra Leone and Guinea Sierra Leone and Guinea have still occasional cases now they are for I think two or three weeks free of Ebola so people hope now that actually it's also over there but it's not totally confirmed yet and there are more than 28,000 suspected cases and more than 11,000 deaths so it was by far the biggest Ebola epidemic which was ever spreading and we were wondering now can we use so last well pretty much a year ago well end of August 2014 there was a paper published in science they analyzed genetic sequences from mainly Sierra Leone and they aimed to determine where did when did Ebola start spreading and was it one single sonosis meaning a chump of this virus from some animal into the human population and so we thought we want to use exactly the same data and quantify this basic reproductive number what I explained before the number which characterizes how many secondary infections yet we have so if it's two one person infects an average two other people and so we had 72 genomes from Sierra Leone all sampled in May and June of 2014 so when it was really taking off in Sierra Leone and what you see here is here in blue shaded our posterior distribution of this R naught for those 72 genomes in Sierra Leone and it peaks roughly around two but I show you here just because we discussed this also before the prior was in green for R naught so you see there is quite some signal on the data pushing the green prior assumption which we just said was a median around one it was pushed to towards something significantly higher namely this blue shaded curve if one would use a population genetic approximation so the coalescent one would get this orange curve which you can see I mean it's in the same ballpark but it is definitely a differently shaped curve and back then we tested do we already see signs of saturation meaning a slowdown of the epidemic and back then we didn't see anything and so we finished those analysis pretty much last October and from then on we're thinking well you know now we characterized the very early spread of Igola and Sierra Leone but what that did happen since the last samples were taken in June and the main problem was that no new sequencing data was becoming available the reason being that actually in all those countries they had major problems just taking care of the patients and the medical people and staff who actually was working in the health sector and treating patients a lot of them contracted Igola themselves and among those also people who collected sequences while they were treating and sending them out for sequencing and from this paper which was published end of August 2014 by the time it was published such as friend to peer review already five of the co-authors working in those countries have died of Igola and so you can imagine from then onwards there was kind of huge problem there was not much new data the epidemic was spreading a lot and but then over time luckily or finally the WHO declared it as a public health emergency and some countries helped to some extent and now also new sequencing data became available in particular we got new sequencing data well from two papers which were both published this year one paper had many Sierra Leone sequences in and another paper mainly sequences and so what we were doing then was so first I show you here now the Sierra Leone data set we were reconstructing the phylogenetic trees here I show you one tree from the posterior distribution and with this tree you will infer for sure well the start of the epidemic so the time it all took off because that would be one parameter also of your model and so we estimated that for March 2014 in Sierra Leone then we estimated the reproduction number through time if it's not at the beginning of the outbreak but through time people call it effective reproductive number rather than basic reproductive number and so here you see again for early on it's well can be anything in a range of almost up to 2.5 and then it's dropping and towards the end of 2014 it's significantly below one and one is kind of this crucial threshold so if the reproductive number is below one one person infects on average less than one other person during the infection so the epidemic will decline and eventually die out so this this look good the drop below one and then we also with this method because we have to remember we have this parameter P the sampling proportion we can actually estimate how what proportion of individuals do we have in our data set and at the start it was quite a lot of them one things roughly half of them but then obviously once in Sierra Leone this epidemic was spreading really really fast and overall we had here less than 300 sequences we only had a small amount of the total number of cases and we see here the sampling proportion actually to drop and so to compare this now to what is known about the epidemic well the first suspected that index case is thought of December 2013 but this is in Guinea the first confirmed case March 2014 also in Guinea so there's quite a lack time between where when we think there was Ebola because people now interviewing them they think it's people like Ebola to the confirmed case the first confirmed case in Sierra Leone was end of May so you can imagine quite some time before was the first case we here estimated to be March and then actually in August this epidemic was declared a public health emergency by the WHO and then we see it dropped but it's hard to say if it dropped after they declared it and did something or if it was already dropping before the WHO also when they declared that they estimated the basic reproductive number and interesting or reassuringly pretty much the same time we did the initial analysis they also got roughly a two and people now think also this is totally other analyses all or majorly based on incidence and prevalence data people think really the basic productive number was around two the second data set so this is then now Guinea where actually the whole epidemic started we did exactly the same analysis and here you see we estimate the origin of the whole epidemic to early December 2013 well the the maximum our peak and the posterior distribution which very nicely agrees with the first suspected index case and an important thing to see here it seems defective for productive number in Guinea was throughout the epidemic lower than in Sierra Leone and actually when you look at the incidence curve so this is WHO data where it just counted through time the number of reported cases here in blue you have Guinea it looks significantly lower than Sierra Leone in Liberia and from the phylogenic analysis with the same fires everything being the same you also get a lower reproductive number we then were wondering how well did we actually infer the sampling proportion and so remember I said we infer this piece through time how many cases did we sample which allows us to calculate back how many total cases there were through time and in purple you see our estimated sampling proportion and green is simply the number of samples divided by the number of confirmed cases which should be an overestimate of the two sampling proportion because probably there's some unconfirmed cases the two sampling proportion is samples divided by overall cases and we divide by confirmed cases and you see the green line green line to be at the upper end of this purple interval so this was really reassuring that hopefully for epidemics where we don't have as good count data we can estimate the sampling proportion well and to summarize here the origin we estimated in Guinea for December 2013 Sierra Leone for March 2014 which is in agreement or earlier than the first confirmed cases so it makes sense with this I'm going to leave Ebola and just give you now kind of one idea what we are right now working and what we want to work in upcoming future with respect to epidemiology so all those analysis we're right now assuming all individuals kind of behave dynamically in the same way there is no structure in the population but actually there may be a lot of structure so for one example in pathogens they may be drug resistant or drug sensitive and for that reason they may transmit more or less so one theory is that drug resistant strains they should have a disadvantage otherwise they would be the wild type and potentially they are less good as being transmitted from one host to the next and if we would know that certain drug resistances can never be transmitted from one host to the other it would of course always be bad for the person having a drug resistance because he or she can't be treated but we wouldn't have the danger of a secondary resistant epidemic if on the other hand we would know certain drug resistance are transmitted a lot we should really really avoid them with respect to the whole population because then we couldn't treat soon any people and this is actually a very important topic in particular these days that most there are resistant strains against most antibiotics and so we feel that with phylogenetics we can actually answer one in principle can answer such questions because compared to incidence and prevalence data you actually in phylogenetic trees see the past transmission structure and so for example if drug resistance has transmitted you would see full plates of being red meaning drug resistance and if drug resistance only evolves to noble you should only see or you should see all those drug resistances emerging in different parts of the tree and obviously in the tree you don't have those red branches but you can reconstruct everything up to the red branches because you would know if your sequence strains are drug resistant or drug sensitive and so there should be signal from how those tips are colored red or black how they mix among each other towards how much drug resistance is transmitted or evolving to noble and this is some one question we are right now asking so giving you some idea of what we are working now in particular for tuberculosis so bacteria where it actually has a lot of or a lot of different multi-drug resistance strains and there we get data from Georgia as well as looking into HIV in particular the Swiss HIV cohort and so with this little outlook in epidemiology I want to now just briefly show you what we also do in those two other areas in macroevolution and single cells and so in macroevolution as I said people typically infer species phylogenies by having some genetic sequencing data of extinct species reconstructing the phylogeny and trying to understand what our transmission dynamic ah sorry speciation dynamics extinction dynamics on the other hand there's the whole field of paleontology where people look at fossils and pretty much asks the same question how do species emerge when do they go extinct and often the conclusions being made by phylogenetic studies contradict the conclusions being made by paleontological studies and so the obvious thing would be can we actually look at the different data sources together and somehow we consile certain um contradictions because at the end both data sources the extent species genetic data and the fossil data is part of the same process of speciation and extinction and so what um we developed was um well what we want to do is to bridge from phylogenetics towards paleontology and we wanted to do that developing a new tool where we essentially model the extant species data together with the fossil data simultaneously and we do that um with a model which we call the fossilized burst death process and this is now very related to those burst death processes I showed you before in epidemiology so even though it's a very different application at the core of the methods it's actually um very very related what we now have at the core is instead of infected individuals we have species they speciate with some ray london go extinct with a red mu and now while those species live through time they might produce fossils which we then actually sample later with some ray psi and so when you would do a simulation under this model you would get those um complete trees where again the crosses are that parts of the tree so extinct species but now the stark red dots of fossil samples we obtained and then at present we have in the the light orange dots which are the extant species samples as well as the non-sampled um tips again we prune all the unseen um um lineages so we obtain our reconstructed phylogeny which we aim to infer using genetic sequencing data together with the fossils which are indicated and dark red and now in order to infer such trees and um estimate rates we have to do exactly the same as before we have to determine the probability of this full tree while typically in phylogenetics one would or in phylogenetics in species one would only look at a tree with extant species so we'd include the fossils and um ask what is the probability given the different um parameters and so um what we hope now is to get um better estimates of our phylogenies and particular with respect to dating speciation events because um in order to get an exact um dating of when the chimpanzees and the humans or birds and mammals etc diverge we always somehow need to um superimpose some fossils where we know um to which um clade and uh tree of life they belong to and so in in a case we implement all those things or this model now together with uh um well we had available the molecular evolution model and then for fossils we also use morphology and have morphological models this is all available beasts so they are the nice thing is we only have to do this kind of first fossilized first dust model at that and so then we went and um analyzed data and the first data said we analyzed was a bare phylogeny here the aim was to simply infer the phylogeny using the fossilized first dust process without you know making any prior assumptions on where we think certain um fossils fall or how that we should look like and encouragingly without us knowing anything about bears we got pretty much the three people think of um how the bare phylogeny should look like so this was pretty much demonstrating that we thought well without being expert or in the lack of actually expert data we could get uh reliable phylogenies so then for a next step we went forward and actually analyzed the dataset where there is controversy namely penguins and so you see on the right hand side here they extend species penguins and then um in the circles all those um fossils of penguins and the discussion is on in the literature and among penguin experts how old the radiation of modern penguins is so what was the most recent common ancestor of all those extent penguins and so people are a lot of um people suggested that the radiation is actually not too young so it might be a 20 25 million years old and when we now include fossils only fossils which are which we know descended from those from this most recent common ancestor of modern day penguins we actually get the same age but once you include older fossils which was actually um for some methodological reasons not possible with other methods all of a sudden your age gets younger and younger so there seems to be signal that you know those ancestors they were ancestors pushing this most recent common ancestor to some younger age so we estimate now roughly 12 million years when we include all the fossils but now we are talking more with some penguin experts on their different views on those estimates and here's here that pretty much the same results so we also let that vary so you can have it constant through time that's actually a good question I didn't mention that before in the bear data set we didn't have the implementation yet that we could vary those rates which is obviously um a model mis-specification but here we can vary the rates through time speciation extinction and fossilization or leave them constant what you observe with any of those methods if you're just interested in the phylogeny we never get if the data is good this prior on how the rates change doesn't influence too much the phylogenies but once obviously you want to ask something about your speciation or extinction rate and you don't don't allow it to vary then like anti-space conveys yeah yeah yeah yeah yeah so we did it pretty much both ways so one way like was say the Ebola where we had those different layers things could change the there it was the productive number here it was the different speciation extinction and fossilization rate or they were constant and the phylogeny as a whole and the most recent comments as the word is saying so now I want to end with just giving you a very brief idea of what and this is something we just recently started working on um in uh for single cells how we can use phylogenate tools for single cell analysis and there actually Tim Schroeder who is a professor in the department where I now work at he looks at blood cells and wonders how differentiation in blood cell works so in blood cells you have kind of stem cells which can which can would replicate forever and can diversify into any other of the blood cells and with their tools they can pretty much search their cells according to they can get sets of cells where they're sure there's no stem cell in the set or they can get a set where they have up to a 50% say accuracy so they know 50% roughly 50% of the cells should be stem cells but we only not the 50% we can't divide kind of the rest of the cells accordingly so meaning um and the non stem cells or the next stage would be multi potent progenitor cells so mpp cells so we know here the right hand side those are not stem cells um and what we display is the cell dividing another on a plate on a microscope and now we put here the division um pattern of the cells with crosses being death of the cell and branch lines being the time it takes until they divide and on the left hand side you see this mixed set where they can you know pull out most of the cells they're sure they're not stem cells but they are left with a 5050 mix of stem cells and mpp multi potent progenitor cells and their question was essentially can we determine in this mixed set which trees have a stem cell origin and what we then ensure what we suggest is yes you can to some extent what we're essentially doing was um you have here a set where you know it's not a stem cell and you can assume some model here now we wouldn't assume rates until birth or death but there's some kind of no it looks like there's some normal distribution might be a good model for the lifetime of those cells estimate those parameters for the mpp set and now for each tree in the mixed set we ask is there significant support to reject the parameters for the mpp set and for 18 percent of those trees in the mix set there's significant support to reject them being part of this mpp set so now we are proposing that those 18 percent actually might be stem cells the method is not super accurate yet because we missed 32 percent but now it's back to our collaborators and they now have to verify if actually those 18 percent are stem cells and that's actually a very annoying word because so those blood cells I mean the group is mainly interested in human blood differentiation but the experiment are a lot done on mice cell lines and so to now confirm if a cell is actually a stem cell you would need or you need to put it back into the mouse and see if it's replicating and not differentiating four months a month and then you're really sure this was a stem cell and can confirm or say we were totally wrong about this but that it's so tedious to actually say what is a stem cell and they need that for doing experiments it would be great to have a tool where you can actually characterize those things and so but there I should say also other people in the department work on things and you can also now add expression data and other markers to essentially improve on your characterization so with just a little stem cell expression I am at the end I hope I gave you kind of an overview of what we are doing if you have any questions comments I'm there now or another point in time and I was just I want to thank in particular my group and then all the different funding agencies supporting us and thank you for coming and I'm happy to take any questions