 Cool. So hopefully you can see my slides now and if you can hear me. So, yeah, thanks for giving me the opportunity to speak here. I've really been enjoying these talks so far. There's been a lot to think about and to ponder. And I think really it's left me with more questions than answers and even maybe sort of reconsider my own talk and before I've even given it basically. But what I'm going to talk to you about today is my approach to typing and characterizing plasmids and how I kind of start with plasmid sequences and attempt to type plasmid lineages so for the past three years or so I've been working on the detective project which is a UK China collaboration involving universities of Birmingham and Cambridge here. And three institutes in China in three different cities and we're actually focusing on gram negative pathogens in ICUs in these hospitals. And basically each project has been a deep sampling project, sequencing lots of isolates from ICU patients, staff, hospital environments and also clinical samples and my role in the project primarily has been to look at plasmids and other mobile genetic elements that are contributing to cover pen and resistance, but also to antibiotic resistance more generally in these strains, particularly where we see the accumulation of drug resistance. So, what I've been really interested in is being able to trace individual plasma lineages and so the reasons I think it's all it's clear now from what we've been hearing the last couple of days, why we want to do this. The first one is by observing a discrete plasma lineage over its evolutionary time frames we can actually learn a lot about how plasmids evolve whether at, you know the SNP level or at these larger levels where we see deletion events insertion events recombination events. And what effects those things have on plasmids generally. But also I think the more we learn about how plasmids evolve and understand their evolutionary space we can start to think about how we can more accurately employed genomic surveillance to tracing plasmids as well and that's of course important in terms of antibiotic resistance. So, we've just been hearing about how we should be defining plasma lineages and we know that the SNP can happen but I think one really important way in which to define lineages is to trace them by the revolutionary events and so we've seen examples already. But like a very crude example here you can see for example, a bunch of plasmids coming from a defined known original backbone can be placed into lineages based on acquiring different insertions and they can go, you know, onward and accumulate many different changes. Each will be different to one another and we need to be able to determine how to tell them apart because that's important for transmission if we're talking about genomic surveillance but we also need to know whether, you know, their evolution is taking them on different trajectories or similar ones and if there's different ways for them to reach the same same place so again very very crude and we've heard examples already but if we're talking about a plasmid lineage and here it is the circle here it is the line the kinds of things I'm talking about in terms of insertion which can be quite simple, deletion events which can make insertions more complicated or more difficult to track, recombination which can change parts of the background or insertions and the development of complex resistance regions which is where I can get really tricky with annotating and determining how these things have happened. And finally plasma co integration which we've heard a little bit about already in previous talks in this series, but which is, I'm seeing more much more of it in recent times. So, basically, this is a slide I just added because we were seeing a couple of different things I think I think in in these sessions and we either have an approach to typing plasmids where we can start with all the sequence data that's available which I know for me is certainly starting to become overwhelming. Or we can start slowly with a single plasma in a single study and work our way up. And I think it's very difficult to get to from the top to the bottom or to get from the bottom to the top and hopefully with more discussion we can we can find a way to reconcile those points but what I'm going to be talking to is essentially the bottom up strategy so starting with high resolution annotations and trying to bring that to the larger data sets available. And so essentially, what a lot of my work comes down to is annotate compare repeat. So, to give you an idea of what I would do if I if I had a plasma that I was beginning with and I wanted to annotate it. So the first step is to actually work out what is in your plasma in terms of first of all backbone and second mobile genetic elements like is transpose on etc. So, really, there's no one answer unfortunately I get to see a single simple solution to this. I do a lot of comparisons to reference sequences so if I know generally the type of plasma that I'm working with, I can have a reference backbone often to a historic plasma that's been sequenced and well characterised especially experimentally. And have databases of mobile elements as well but of course no database is perfect. So, first of all, yeah, and it's hate your your backbone and it's hate your mobile elements, and then try and look at the sequences around mobile elements so I'm very specific when I'm typing rather than just looking for the gene of an insertion sequence or a transpose on that might be picked up by procker or something like that. Look for the actual sequence itself. And really try to define the context of that insertion so we know that insertions can generate targets like duplications. What I've marked on here is where targets like duplications are present. The converse to that is if you know a mobile element can generate a target site duplication but it's not flanked by one, the, you know, the result of that is probably that you've had a deletion. So, for example, here we might be expecting a deletion and you could compare to a backbone sequence to determine whether you're missing something that you think you should have. And so if you do that you can then work out what's in your backbone, whether certain parts of the backbone are interrupted and you might even be able to predict phenotypes from that in terms of conjugative transfer, etc. And so that's how we like this and also, of course, we just heard in Will's talk that essentially we can have really rapidly evolving elements or, you know, combinations of mobile elements in these plasmids as well and they're also important to track as well as just the backbone in these things. It sometimes feels like annotations like this never end, you might find weird sequences in your plasmid that don't seem to match a backbone, don't seem to match a mobile element and that's how you can of course discover new elements and there's always more to discover in what we're finding the more of a sequence. So, in the end you, yeah, you have to try to be as accurate as possible I try to be in my annotations and the reason for that is because that allows me to do what I'm going to mention next which is take these really precise sequences. At the junctions of elements and backbones or elements and elements if they insert into one another and use those to further characterize other plasmids and to then look in larger databases. So for example here we have an insertion sequence jumped into a plasma backbone this sequence here which transitions from blue to black can only have been generated by this precise insertion event. So you'll have sequences that have a black sequence and you'll have sequences that have a blue sequence but you will never have them together in that exact order in any configuration, unless this molecular event has occurred so this precise insertion has occurred. And conversely, you actually have the sequence, or you can generate the sequence of what that would look like without the insertion in so you can also search for the uninterrupted variant. So if to imagine, you know, lineage of plasmids with all these different events you could have signature sequences for every molecular event that has occurred in that lineage of a time. And you can actually take those very short sequences so I make them 100 base pairs so 50 base pairs of elements 50 base pairs of context, and query any set of sequences you like, so you can query complete genomes and plasmids you can query draft genomes. So if we even be able to query meta genomes I have limited experience in that space but I've seen a little bit that suggests that it's promising. So, the reason. So, we've also heard about the difficulties of short read sequence data. The reason I say, these can be useful is because in my experience of short read sequence data. I've looked at a fair bit. When you have context break, you do still get a little piece of the genetic element that causes the break or they repeat the causes of break. And that repeat length is usually 60 to 180 base pairs. So if you use 100 base pair signature sequence that is 50 base pairs of elements and 50 base pairs of context that sequence should be preserved in your short read assemblies which means that even if this is a pile of samples and you're wanting some short read genomes, you should be able to detect the molecular events that have occurred in your plasma lineage or at least the derivative sequences of those events. So, I'll really quickly try and go through a couple of examples, where we've we've used some of this one example starting from annotating from scratch and learning about a plasma lineage and one where we could apply that to a smaller data set and look for plasma in ICU. So the first of these are these F 233 plasmids which we've recently published so I'll be going through this relatively quickly, but you can get more detail in a manuscript itself so the reason we were interested in F 233 plasmids is because Zong, who's the head of infectious disease in his hospital in Chengdu. So a lot of club DL a outbreaks that we're carrying this type of plasma with a KPC to gene so highly clinically relevant resistance, appeared to be in a plasmid. They were seeing it coming back, you know, constantly so they were very interested in this particular type of plasma and so that prompted our investigation so we are and I actually collected F 233 plasmids from Genbank and from Zong study so they're in Sichuan so actually most of these plasmids here actually generated by Zong's lab. We detected F 233 plasmids using the rep A1 gene and we had a very strict identity threshold of identical so we we made sure we were getting exactly this plasmid lineage coming through from our for our collection here so 185 plasmids. We saw that they were from multiple provinces in China, and they were from multiple niches so we were seeing them from human clinical isolates agricultural isolates and various other non hospital environment isolates as well. And they covered quite a lot of China, but what we actually found is that they're mostly endemic to China so amongst 185 plasmids I think there are only five found outside of China at all. This appears to be an endemic plasma lineage in China that's really well disseminated but has not yet really taken the leap internationally, or we didn't think so from from this data set anyway. So without taking you through the detail. Basically we annotated all of these 185 plasmids. But we could start by annotating some in more detail than others, then determining junction sequences, searching our database determining what plasmids look like and and looking at examples where they weren't matching what we expected them to. And in the end, what we came up with is this plasma backbone so this is the F 233 plasma background as a black line here you can see various elements of it annotated. There were three insertions that were found in greater than 50% of the data set that we could use to characterize the evolution of this lineage. And when you look at it overall basically there are two massive insertions that account for all of the drug resistance genes that we see in these plasmids and there are a lot. But what was also surprising is that insertions in these regions also accounted for a massive number of additional plasmid replicons in these plasmids as well. So, first of all, I'll just mention this primary resistance region is actually derived from one called TN 2670, which is a very famous old transpose on that actually formed in the 1950s and is found in our one of plasma isolated in Japan. So this is actually acquired by the F 233 plasma lineage, not in an insertion event through homologous recombination of its backbone. So recombination event between blue plasma to black brought in a little bit of blue sequence and the resistance region that it included. And that has it made what we call sub lineage one of F 233 plasmids. And if you look at all of the sub lineage one plasmids you can see further evolution as well. So acquisition of plasmids and other elements into the primary resistance region and insertions outside of it into the backbone that can also be used to help things apart. So the second major step was acquisition of a group to intron, which generally pretty cryptic but it was very useful for us in terms of determining a difference in these plasmids and again if you look at isolates that contain just the PRR and the intron. You can see a various evolutionary events as well like the acquisition of more plasmids and deletion events in the backbone. And finally, the really big event which is what kind of brought this plasmid to our attention I suppose was the formation of a cointegrate with an F 233 and an our plasmid and the our plasmid is actually what introduced KPC to to this plasma lineage. And what was really interesting to us is that all but one example of these plasmids that had acquired this insertion. We're finding club deal and ammonia ST 11. And so we also were able to work out that because it had inserted in the trial hygiene which is the relax days of F type plasmids. They, this insertion actually rendered them non conjugative and who are tested this on a number of plasmids and we're able to find some examples in the literature as well of non conjugative co integrates. And so, this led us to believe that this plasma lineage is actually formed in ST 11 and essentially is is trapped there. And once again, if you look at all of the variants of those massive variation in terms of deletion so it seems after the co integration event has rendered the plasmids non conjugative they seem to discard their transfer region and you see big deletion events removing the now redundant transfer section of that plasmid. And we even see one weird example outside of club deal and ammonia where it's acquired up to deletion of most of the our plasmid acquired an end type plasmids and a small rolling circle plasmid which gives it a couple of extra degrees as well, and that was found in a Proteus Marvellous. So, hopefully, that sort of rushed tour of these, these plasmids here is has given you some impression of just how much a single plasma lineage can evolve over time if you trace it like this. But what I'll also say is that at the end of this we wanted to look for this plasma lineage in a much larger data set so we only looked at 185 plasmids that were complete. And this long sort of set me the challenge of saying well how can you prove that taking me short junction sequences and querying short read genomes and, you know, can it actually work and you can you prove to me that it can work and fortuitously. Grace Blackwell, who works with them at the time have developed a massive and really well curated database of bacterial genomes, and she queried them for us, using the sequences that we had decided were most useful for identifying plasma lineages, and it worked fantastically so we could see more even then in our original set that we could see plasmids from the F 233 lineage have been moving to countries outside of China, although they still are mostly all from China. We see that all of them have the primary resistance region that we expect, which seems to have, you know, catapulted these plasmids to success. And we could also see when we sub lineage type them as one two or three that as expected the sub lineage three plasmids are all in a really really tight, a bunch of clubs the only money consistent with them arising in ST 11. And while the rest of them which we know hop around a lot. So this tree is the based on host genome host chromosome phylogeny so I'm not sure how I'm going for time. Alice, do I have much time left. You're, you're basically at time but then the next thing is is a break so I think if you could stop in a couple of minutes that would be perfect. Okay, so I'll rush through super quickly. So one more example of a smaller data data set that we used was actually to look for cryptic transfer in an ICU population and if you've heard me speak before you may have heard me tell this story so I can go through it extremely quickly but just to demonstrate the principle again. In Hongzhou, we had a collection of a sneak of active our money I, which was all global clone to but an extremely diverse set of global clone to isolates and we could look at the plasmids in those. So we actually had to build a database called Pacey, which, if you bring it up in question time I can tell you how to find it but really useful for finding plasmids in a sneak of actor. So the distribution of plasma replicons they're actually quite blocky and consistent with the phylogeny except for AC six, which is quite scattered which to me gave me the sort of hope of horizontal transfer in our hospital. So we had a short set a small collection of long read data. So we could look at plasmid complete plasma sequences for AC six plasmids look at the variation that was popping up in our data set. And then a few different backgrounds and then amongst one black bone one in backbone the one in black various different insertions and a few snips as well and we could make signature sequences of those. Use those sequences to query all of our genomes without having to long read sequence them all actually work out what the plasmids were. So we split the messy looking set of AC six replicons into their definite plasma type so actual plasmids within those lineages and then still see that even when we had a more defined set we could see small plasmids in so we could see plasmids in multiple clusters. So back to our metadata and thankfully find some cases where we could actually pinpoint the room and time at which these plasmids transferred in our hospital from one cluster to another so this this kind of signature approach really helped us pin down some plasmids. Find them in a really complex data set and with metadata find some really interesting things going on in a hospital environment so yeah in conclusion. Maybe we can trace plasma lineages where we can begin to trace plasma lineages now, but I think what we really need is discussion to to reconcile our approaches and sort of come up with a means by which we can, you know, get all of this much much better, you know much more powerful than I suppose and so just to conclude, thanks to everyone involved with detective. And thanks to everyone here, particularly the organizers for giving me the chance to speak today and apologies for coming over time. That's it. Thanks for that was super interesting. I am. I'm going to. I don't think we have questions. Oh well okay let's give Olivia question up once. Robert does the sampling of your plasma genomes provide any evidence for the evolutionary timescale for the F 233 some images. Yeah so I guess that's the question we've had across today and yesterday in terms of timescale I think it's it's so difficult. We're thwarted in many ways by by not having the sampling depth that we need. I can tell when an event occurred we will essentially you can tell that an event occurred prior to when the first example of it was sequenced so the first F 233 plasma only was sampled actually came from a dog in 2008. So we know that the F 233 placements had acquired for example that primary resistance region in or prior to 2008. And we know that the progenitor of that resistance region and had emerged by the 1950s. So we can't even roughly tell you when even the first event in that plasma lineage came came about. So because we can't even put a start date on it for most of these events we can't put a, you know, any sort of timescale. If you were to really deep sample for example a hospital environment, you might be able to look at, you know, small events but I think it's just so so difficult. And some of these molecular events you know they can happen in an overnight culture in the lab so they can happen really at any time it's so hard to say it's not like we can't have a molecular clock like we can but snips unfortunately. I think you might be muted. I am muted sorry about that. I'm going to close to direct questions for the moment and start the discussion session.