 We, sorry, I need to share the screen, wait a moment, sorry, why do I always make a mistake, okay, okay, this. So now do you see my screen? Hello? Yeah. Okay, fantastic. Fernando, can you stop me? Okay, stop. No, like, when I go long, give me, I don't know, tell me something in 15 minutes, otherwise I can talk forever. Okay, so you have 30 minutes, right? No, it's shorter because we put other talks in the middle. Actually, Jesse is not talking, we invited Esperance to talk but I'm not sure what she answered, so I probably want to leave space for her in the end of the day. So anyway, the first thing is what's the point of an introduction to something in the last day of a workshop? And yes, it made total sense. So basically I'm doing a talk which is masked from an introduction to genetic epidemiology, but basically I will end up presenting my work, but with a very big introduction. So epidemiology, do you see my screen? Yes, you do. So when you talk about epidemiology is kind of putting together the personal information of the patient, like age, sex, ethnicity, where they live, if they have any travel history, risk factors, whatever, with the medical information that you have from, this is medical history, not this medical history actually, what's happening then, and then also the medical history, but what's happening in that moment, so symptoms, where is the, in which hospital it is, in which ward, which they get prescribed and everything, and you try to put together this information to make more sense of what's happening to the patient of the disease. But when you go to infectious disease epidemiology, there is a next layer to this that is the pathogen. I mean, there is a next layer now in recent, in the last, I don't know, in recent years. And this is the pathogen sample. So you can learn from the patient, from the medical information, but also from the patient. And why is this so important? Does this thing that you have even the pathogen sample changed a lot? Well, this is an example from real life, recently realized, it was like October 26, 2021. So basically in the US they found four cases of main uidosis for patients, which, who they analyzed the bacterial sample, and they were infected by this barcaldiria pseudochemally, which is a bacteria that you won't expect in the US. And it's particularly endemic in sub-tropical regions, and this one was from southern Asia, but these four patients had no link whatsoever among them, in different states, they are different like social profile age, job, not job, retired, lots of different things. And none of them had recent history of trouble. So what? The only thing that they really had, it really looked like a CSI episode. And the only thing they really had in the end, after they compared everything from standard old school epidemiology, they had to resort to the pathogen sample, and then they sequenced it, and when they sequenced it, they compared the samples, they compared the genomes, and they saw that these four samples were related. So whatever was the link, there was a link between these four patients. It wasn't for random samples, random events that had no connection, there was a connection. And the only thing to tell them there was a connection was the genome of the bacteria. And then they had to dig into the life of these four people to find out what was that, which was the link, because if the bacteria is the same, if the bacteria doesn't lie, if the genome of the bacteria is the same, then there is a link, and you have to find the link. And finally, they found the link in a aromatherapy spray that the four patients bought, and it was produced in the places where barcaldiria is endemic. And so the sample in the aromatherapy spray was contaminated, and the four patients sold in Walmart, I think, and then the four patients bought it, and then they got infected, and I think one of them died even before that. But I mean, you do understand that if there wasn't the genome, they wouldn't have found the trace and the link between these people, because just when they saw the genome and found the genomes were similar, they sampled everything in the houses of these people until they found the code. And here, while I was looking to put this into the slides, I found this that probably is interesting, I mean, from Cornelius's point of view of things, because then when the culprit was found, probably being an aromatherapy spray, it could have been in lots of houses in the US, and in the CDC press release, there was this sentence that you see at the end, because of course they didn't want the sample to end up in the environment and contaminate the US and become endemic in the US. So I mean, there is a lot of things that can happen, but I mean, this, I think this example shows that shows exactly why genomic epidemiology is important and very useful. So why what happened? This is the development that made possible genomic epidemiology that we have now. I choose this image from the internet because it goes back to the microscope that Jamie mentioned on Wednesday, but the two things, do you see my pointer, my mouse? Yes, we see your pointer, don't worry. If anything goes wrong, I will tell you. And then the two things that made the genomic epidemiology that we know now possible was one the PCR, although it's not the one that we know now, I mean, like, I will explain myself in a couple of slides, so in the 80s, and the sequencing machines from the mid 2000s. So skip. So this is what I meant. Since the Sanger sequencing and PCR was started, things that are called multi-local sequence typing were developed. So basically, an array of small fragments, smallish fragments from a handful of genes. But if you patch them all together, and you, you make a tree with these pieces of the gene of the seven genes, you have a pretty reliable tree that mirrors the tree that you get when you use all genome sequencing. And, okay, I put reference and an image from a Claremont paper, because like I worked in the group of Rick Denamour, Olivier Tenayon, so I know this one, but there are different multi-local sequence MLSTs like schemes. And for the collider is the Pasteur one, the Warwick one, and there is another one. There for other bacteria, but still this made possible to establish a phylogenetic relationship between samples that you might have found in the hospital. So before, before the late 70s and 80s, you were just able to describe the species, so E. coli non E. coli or not much more than this. While with MLST, you have a very fine grained detail of how related the E. coli strains are. And even you have the detail of the strain so you can say, if your samples that you are collecting in the hospital are from the same strain or for different strains, and this is really helpful helpful to narrow down. And of course, okay, but of course the great story. The great revolution arrived with in the mid 2000s when sequencing became much cheaper and much more feasible, and then we could have whole genome sequences of the samples, which contain much more information, so why aren't most likely to work. Okay. So what does the genetic analysis add to the outbreak investigation as you said, as you saw in the previous example, a lot, but from. So we can have the zero one classification like the pathogen that is infecting this patient is the same or is different, which can mean it is an E. coli or non E. coli or E. coli of the same strain or not, or the same resistance or not, but. It can also add a much more nuanced information that is the if when you build a tree. And if the different patient the pathogens in the different patients are related and how much so, and if there is the possibility that two, two samples finding two different patients are part of a common infection chain. And now does this work. And this is a very brief and sketchy description of how phylogenomics work. So basically, no, basically you get sorry. So basically you have the from the patient you have the sample you sequence the samples and you have an alignment and you align them and we are not going into how you do this. But in the end you get an alignment and in the alignment you see mutations the mutation are these dots. So all the yellow parts in the different. Genomes are see are the same in all the samples you are seeing each samples is coming is samples coming from a different patient and all the point mutations all the color dot that you see here are point patients. So let's start from patient one in patient two, who basically show every share everything except these, I don't know orangish mutation. So when you you connect them and then you put the mutation in Q1. No, sorry. And then you cannot also patient three, who differs from P1 and P2, because they have the green mutation here but they don't have the powerful mutation here so the powerful mutation is in the common line between P1 and P2 while the green the green mutation is just on the line from P3. And then, oh, I'm going in there. And then you go on like for P4 and P5 the only difference is the purple one. And then here you go all the ones, all like P1, P2, P3, P4 and P5 share the orange one. So you have to put it in a branch that is leading to the five of them. While only P1, P2 and P3 share the red one, so you have to put in this branch but they all they're really you don't know which one. And then you can do the same for the other three and then you connect them and here now you have made a tree from the samples you brought from the hospital and so you have the relationship between all these samples. And then here there is a note to add that is that the mutation, there is an assumption that the mutations appear at an approximately constant rate in time. And so the length of the three branches are approximately proportional to the time elapses, elapses between the connection. Now, the approximately, this is from such to evolution as you can see the approximately is not so much approximately you it's quite a reliable thing you see the mutations appear. So, on the X axis that is there is time and on Y axis there is a number of mutations compared to the one sample and the colors denote the different strains variants it's called in SARS-CoV-2 anyway. The different variants that have been named, but as you can see all these the fit to all these numbers to all these points is quite a reliably straight line. So you can say that the mutations appear quite at a constant rate in time. So this is from viruses and not from bacteria but. And what does it change and this is from MRSA but it's to show you what does it mean that you can use the time. So, if you focus here on the tree. Each branch finishes in the moment in which it has been collected, so those ones were collected before the 2000s, while these ones were collected around 2010. And then using the molecular clock you can not only draw the tree but putting these ones, the sampling time in the tree inserting it, you can have a reliable, reliable enough, date for the most recent common ancestor, so there you see this node here that is basically the most recent common ancestor of all of these that was around the ancestor of an outbreak is in the mid early mid 90s. So you, you can, you can have a, you can have a time for the most recent common ancestor, and this is quite valuable in a sense to when you are dealing with outbreaks. But, apart from these details of phylogenomics, you can add on top of the mutation and of the tree, you can add things like when the samples were sampled, this is another way. And so, for example, these two P1 and P2 and P3 might seem related, but P1 and P2 are sampled much before and very close by, while P3 is sampled much later. So, is this part of the same outbreak? We don't know. On the other hand P6 was sampled at the same time of, I mean that we don't know is we need to understand if there is enough time for the mutation, for the palpable mutation, the green mutation to appear. It could be, we don't know. But, for example, P6 was sampled at the same time of P1 and P2 and clearly from the tree is not part of the same outbreak. And you can go on like this and put all together them, and then you see P7 and P8 are on the same time and on the same branch. So maybe also those two are part of a different outbreak, but similar together. And on top of this you can add any more, another information that is in this case the world, but the world in which they were sampled, but it can be a different nation, it can be whatever geography. And so you see that seven and eight were sampled at the same time from the same world, while P1 and P2 were sampled at the same time from different worlds. And so you need to understand if there is a common source that cause P1 and P2 to be in different worlds, or if there is transmission between worlds. So, there are a lot of different layers in genetic epidemiology that you can go, you can add to get a complete information about the outbreak you are investigating. And this is something much more amazing that has been done in viruses. Sorry, I'm taking viruses as examples. And they were really able to trace the outbreak going around in different nations. The thing is that these data sets found some more than 20 years while in an hospital outbreak. Rarely you have this kind of data, but not never because sometimes you do. How am I going up with time? You have spoken for 18 minutes and a half. Okay, there is this video that is amazing. I want to show you the video, because they did a video and you really see in real time the different, like you see the tree on the left that is starting now on the bottom with the years. And then you see the different alleles appearing on the bottom right. And then you see how they map into the geography. I mean, this work was amazing. And you see everything how it maps on the tree, and how it maps on the geography. And this is what we try to do when we have enough information. So yeah, I've shown you the video. And then I've talked too much. So what about the plasmids? Because what do you use in genetic epidemic? Like, what if it is a plasmid outbreak? And well, I wanted to show these very fast because this is my work and it's a plasmid outbreak. It was 55 samples from 48 patients in one year. And it helps explaining which are the problems in trying to trace this kind of outbreak. So first of all, we were lucky because oxa 48 is carried in a very conserved plasmid so we can run all these analysis using short read sequences which doesn't happen most of the time. And the first thing was that they were chosen by, they were characterized first in the lab and then sequence. So they were chosen for the resistance but in the end after we sequence we saw that there was only one allele that was oxa 48. So it was most probably a plasmid outbreak because it's carried in a very conserved plasmid and it's often carried in a plasmid. And when we built the three, the samples as you can see were really, really, really similar in the outbreak, but as you can see the color of the list is the species they were found into. So we had at least four species and in each of the species, each one of the color is a different strain except for the dark blue one which are only called ST399. So a lot of different, different bacterial holes. And when we met the mutations, those mutation in the upper line here, where the ones that characterized the reference plasmid that was sequenced in 2001 with respect to the rest of the plasmids in the outbreak. So these ones were 129 mutations. And while around will be better if you put your screen in a presentation mode. Yes, no. I don't want to go through all of them. Yeah. Yeah. Yes. These are like these red dots are the one that characterize the, the samples in the outbreak and they are like 25 mutations. Sometimes some of them were repeated. But the other ones are 129. Of course, the question was, is this a result of co-adaptation between Oxford 48 and ST399. So one question was, how many mutations did we expect? Now, of course, I don't fool much of the audience because when you see this distribution of mutation you think about recombination as I do. So of course, how many point mutation do you expect is a kind of nice question to ask in this context, but it helps us understand which are the problems or some of the problems of doing genetic epidemiology with plasmids. So, yeah, I put this slide to show you that it is a combination probably. So it's not simple because okay the length of the plasmid was defined, well defined, but the number of generation between the reference, so the number of mutations you expect is the mutation rate times the length of the plasmid times the number of generation that are collapsed between the reference plasmid and the outbreak. Okay, this is an easy formula, but the length of the plasmid, this is easy too because it was this length and there were no major structural variation. It was 61881 base pairs. And that when we go to the number of generation between the reference plasmid and the outbreak, well, do I take the number of generation that any call I would have or because every time there is a conjugation the plasmid is replicated and each replications bring with itself a probability of mutation. So what do we do? In the end I assumed that I discarded the duplications and I just use the number of generations I would have found in any call I so seven generations a day. And then the mutation rate because the mutation, the plasmid to mutate is using the machinery of the host it is into, but then this is in this outbreak that are for different bacterial hosts, so which one do we do we prefer? And in the end I use the any call I mutation rate because the main host is probably any call I anyway you get this number and you don't know really how to treat it, but the thing that you notice is that you expect 0.2 mutations and you find 129. And they are so there is probably co-adaptation or something or whatever it is this plasmid is a different variant of the plasmid and it's well suited for that background. Anyway, this is just a lower bound and as I explained you there are lots of assumptions hidden in this number. And then we try given this and giving the idea that giving that this number was supporting the idea that the epidemic that there was a kind of a co-evolution or an adaptation to S3 399. So we assumed that the outbreak dynamic was that there was a main outbreak sustained by the clonal expansion of E. coli S3 399 with the plasmid and that all the other bacterial hosts that we found were resulted from an intra host with plasmid conjugation. And then we proceeded to model this in order to be able to infer a conjugation rate whatever it means in this context. Now, I skip this because we are late basically the model is in the paper and there's what will not go into the sales of the model, but we were able to find out to determine effective conjugation rate, whatever it means. And it was per lineage per year and here we go back to what Fernando said and other speakers said that you don't measure a conjugation rate. So how do we compare this with what you see in the lab and this is the end and this is why I put all these lights because this is what probably Diana and Olivia will be telling us and I'm so happy to hear it. Now unfortunately this was just a bioinformatic analysis and modeling analysis but of course what I would have liked was to give the plasmid and the host to Tatiana or whoever else is doing experiments and find out or Jamie or there were lots of folks and find out whether there was really a portal. There was really an adaptation or a convolution or whatever and basically whether the plasmid this a little of the plasmid put in another bacterial background that was not as the 399 was fairing better or worse but of course we didn't have this kind of collaborators. And then of course the advantages of trying to do this kind of studies but this is trying to do this kind of studies is that we find more fine grading the information on the outbreak dynamics and greater understanding of the AMR spreading dynamics. But of course there are a lot of things that we need to fix like or the sample size sequencing and assembling as we said it's very difficult. And the most difficult point for me is that point mutations are not the dominant evidence mechanism of plasmids but phylogenetics theory relies on point mutations and not on gene loss and so using gene loss and acquisition you can build a tree but there is no clock. And so yes, this is my main problem and I've said it already a lot of time so you know. And the more general problems sometimes is that sampling is seldom agnostic so. Most of the time they the sampling relies on this this resistance in this species, while we know that plasmids can jump through very different species. And we don't have controls and this is to being raised a lot in this workshop because we have seldom plasmids with no MR genes or not. I mean sample for the sake of seeing the plasmid and one another thing that I really would like to have is non clinical sampling, although I know it's difficult and it's not something. To go around to people in the market and say, can I solve you, but anyway, it is a problem what what plasmids do we found, do we find in healthy individuals. And this is my opinion and I put all of this here just to discuss with you because I am really. I feel your opinion because of course, we are looking at this problem from very different point of views, and I can only learn from your opinions. And okay, there are a lot of talks. I mean, not despite these problems. There is a lot of great research we using a hospital based the data sets on plasmid genetic epidemiology two of these talks, we've already seen them three of them are coming up after this and many more talks through the workshop. And I think there are various points that I touched here. And okay, there are no regiments for this talk are john, who, okay, I put it double sorry, who gave me the data set and a little young, because without their help in simplifying the model in writing the model, nobody would have like enjoyed the paper. The paper is out on archive and it's soon to appear in microbogenomics, and that's it sorry for the long time, and I have to take any question, and thank you for listening. Thank you Alice. Very nice talk. I have one question if nobody else is rising hands. And is this question is, we all know that a SNPs I mean mutation point mutations are not the only thing, but a horizontal gene transfer is probably much more important in the short run. But what's the role of recombination. I mean, I don't know. Has, has been this been measured really. How does recombination affect the trees that are made in short term, you know in outbreak analysis. I don't know but those I mean when you saw the where the mutations where those things this outbreak were clearly recombination like a model whose recombination. And I didn't, because there wasn't nothing in the middle so I didn't know whether they were acquired the different steps or I have no idea. So, well, I don't know. Yeah, I mean, so easy, but it's very important, you know, because in, for instance, when you produce these nice trees of the sauce coronavirus. There, apparently there's no recombination so everything is accumulation of mutations. But in bacterial phylogenetic trees of any species, how much of it is recombination and how much is real phylogeny. You know, I don't know because I mean one of the problem is that there are so many different fields so maybe there is somebody who has measured it. And we don't know because it's, I mean, it's not involved with us. Does anybody in the audience know. Does anybody have any ideas or just any paper. Okay. Nobody. Why. Okay, thank you. We can go on with the session I think or is there a break. Yeah, I just wanted to ask a question so I didn't measure recombination. But what I would find interesting, whether you decided to focus on backbone genes of the plasma or of acquired genes, because I think this might be possibly quite interesting because I believe that the backbone genes do not evolve that fast as, for instance, resistant genes or transposons. But maybe there are other opinions about this. But that's why I typically decide or differentiate between acquired genes on the plasmid and the backbone. Yeah, makes sense. So we are running more incredibly late. Okay. Sorry, I completely I'm the worst chair ever. Yeah, I think we need to have alvaro starting his talk, also because I mean