 Okay, please confirm it that you are seeing the slides like, okay, and. Yes. Hi. Okay, great. Yeah, so. So everyone, the first of all, I'd like to thank the organizers for giving me the opportunity to present my work in the first contributed talk of the workshop. So, and this last month I've been working on my master thesis in the laboratory of my story. And in this talk, I want to, I wanted to present you the results of this project in which we saw how a be able beats which are a new group of primary independent polymer is from the family be associated with a plasmid another type of conductive elements. So, these polymerases were first discovered in 2017 by our group. And they have been found in diverse bacterial file and even mitochondria. And this polymerase is confirmed. And this polymerase is a group of Polvis together with the RNA prime Polvis and the protein prime Polvis. The biochemical characterization of this polymerase is so that people these can perform an efficient DNA replication over both and damage and damage templates. And interestingly in survival experiments in which we over expressed the people be an inactive variant, so that this polymerase is maybe involved in an increase in the bacterial DNA damage tolerance. Later, when we analyze the genetic context of these polymerases, we found that people these are mainly encoded in mobile genetic elements that are integrated into the chromosome together with integrases and other type of recombinase. However, a few of the people bees were found to be encoded in circular plasmids that lack of the integral steam and therefore probably don't have an interactive form. In any case, we decided to name it's type of elements pipelines after the people being coded in the in the sequence. So, the first comprehensive study of pipelines was conducted in equalize to just a go by our group. And in this time we found that people bees were encoded in, in a few genomes of equalize, but these genomes are necessary to be just to represent a wide diversity of sequence type and serotypes and so forth types. In order to characterize the pipelines that were in Harvard in these genomes, our group developed a bioinformatic pipeline called explore pipeline. This pipeline, basically, it takes up genomes input, and then first searches for people begin, then looks for a difficult the limiting direct repeats that a flag the people begin and then explore pipeline returns the user the annotated pipeline. This pipeline is able to reconstruct the pipelines in case that each of the features is found in different context, and also in case that explore pipeline doesn't find direct repeats, it returns the people be genetic context. So we test this bioinformatic tool with the genomes of equalize, and we found that pipelines in this specie are very flexible elements and in fact the only features set by all. All pipelines are just the pipeline, the integrates and the limiting direct repeats, and one of them always overlaps the RNA gene that acts as integration site, which is a user integration site of MDMG element of magnetic elements like prophesies. So then, our next goal in the lab was to perform pipeline study that included all the, that involved all the all bacterial genomes but before taking. Such a big step, we decided to perform a paper in a screen in filmicutes, which has been recently renamed to basilota. And there are several reasons behind this decision. The first one is that a filmicutes is a wider taxon than equalize and it's also a relatively far. So we expect to find more diversity of pipelines than in the previous study. Also, and it's also known that in this in this film we find many well known pathogenesis like set to focus new money or a lot of course or is where MD transfers play and plays an essential role in this pathogenicity. And the last reason is that before this study, we had only described a few bipolines in filmicutes and all of them were plasmids. The first one is the erythromythine and alpha-plasty resistant plasmid named PLME 300 from limosilactobacillus fermentum. We also knew of the of the plasmid ptnsh2 from estaphilococcus which included a copy of the genefab1 which confers resistance to triclosan. And besides these two, we also knew of a similar plasmid to the sptns like the pse122803 from estaphilococcus epidermides, which just varies in a few genes. So with this motivation, the main goal of our project was to characterize the to perform a deep characterization of pipolines in filmicutes with the objective of describing the pipeline diversity in this film and also discover if pipolines in filmicutes are plasmid that may be related to the transfer or the transference of MAR genes. So the first thing that we did was a massive screening of all the gene bank assemblies from filmicutes in the with explore pipeline, we run explore pipeline with each of the genomes and we detected pipolines in 225 genomes, which are really few percentage. However, explore pipeline detected 243 pipolines, which means that it is, it is possible that some of these demos could be hosting more than more than one element. So, interestingly, a half of these genomes were belong to the estaphilococcus genomes and half and another third to the limos lactobacillus genomes, but the rest of the general didn't show more than six occurrences. Then we checked where was the solutions of a source of these genomes and we found that a mainly estaphilococcal genomes come from medical and human sources but a genome from limos lactobacillus come from diverse animals and but we think that this differences are mainly due to the to the database bias in which we have an over representation of sort of sample from medical sources. Then, we performed an in silicone with the local sequence typing, using the typing scheme is in the pub and the database in order to characterize the diversity of the, of the, of the strings that we had, and we found that there is more or less what diversity of staphilococcal sequence types, but the two most abused that were the sequence type 167 and the sequence type to the first one is absent from literature at least we don't, we have not found any publication and using publication that mentioned, but the sequence to we saw was the known pathogenic mdr strain like the sd5 and sd8 are also here presenting staphilococcus aureus and staphilococcus epidermides. Next, since we knew that we had a multi drug resistance strains, we decided to predict the MA the major genes encoded in the genomes with the major finder plus two. And here we found that a 93% of the staphilococcal genomes were predicted to be resistance to three or more major classes, but the genomes from the numerous lactobacillus were very short resistance to three or more of these major classes. However, again, we think that these results are all related to the difference in the isolation source of feeds strain. We checked if any of these predicted genes were in was in our pipelines, but we only found a. We got genes in four of our battlings for the 225 which angle for a bombs that export quaternary ammonium. And in order to buy feelings we had a black jeans that encoders and that configure resistance to penicillin. So, these results indicates that apparently pipelines are not a main main vehicle for a major genes. These results are very similar to the results in equal I where the major genes were also really infrequent. However, the, the biological rules of the pipeline still remains to be to be discovered. So, after performing a basic characterization of the genomes, we went to study the papalines and we found that explore pipeline could detect the delimiting direct repeats in in only 17 of the 243. So, these results, which means that in the 90% remain explored by pipeline couldn't detect these flanking sequences furthermore when we analyze the context where a Bible be was found 75% of them was were sorted and 30 kilobespers and had sizes similar to the BLE 300 and BSE. So, this, this data is suggesting us that it's likely that our pipelines are in fact a plastic similar to the plastic that we already saw before. However, when we explore the analytics of this context, we found that only nine of them were annotated as plastic. So, we still don't know what is happening with the remaining 270 pipelines. So, in order to, to a certain if this pipeline without direct repeats could be plus meets, we decided to create a sequence similarity network that included our pipelines and the plasmid from the PLS DB database. So, in this network, we use, we use fast any to connect the sequences that a certain average nuclear identity certain any of at least 90% and the courage of at least 75% this way, we will find a. Connected only sequences that are considerable similarity. This is the fine. This is the, the full network that contains all the other plasmids and pipelines but we can, we can remove them. The plasmid that are not connected to any pipeline because they are not super interesting now. So, this is the resultant subgraph and we can see that only we have only five connected components that contain a plasmid, which is suggested that these could be plasmid or at least similar sequences to them. And then we have the rest of the plasmids, the rest of the pipeline story, which we make up around 50% that are not connected to any plasmid. So, focusing on the on these connected components. First, I wanted to highlight this big cluster in which we have 88 pipelines and six plasmids. And the interesting thing here, here is that when we put the, you know, the structure of these pipelines, we found that these pipelines, the sequences of these pipelines are different versions of the plasmid PtNSA2 that I mentioned before that was described by Furi and collaborators a few years ago. And with our study, we have found like a new version that for instance have, for instance, like this plasmid that has lost the insertion sequence or the same plasmid has incorporated a composite transpositor that has new genes like the quaternary bomb that we mentioned before. And so with this analysis, we are everything that it's our knowledge we are expanding like the the PtNSA2 like plasmid collections that we already have. So, then we have a few components that have pipeline similar to the plasmids PLME 300 and PSE. We have also a few plasmid that are just search sequences that have one or two genes that are served by Mason pipeline so we can we can just discard them. So concluding this section in with this sequence similarity network we have a we have a find out that many are basically almost all of these have local pipelines are plasmid very similar to the PtNSA2 and PSE. However, we are still don't know anything about the topology, or about the how is the elements in in like in limo select about the loose and the rest of genus. So, with this results we followed a second approach in order to to find more evidence is like more evidence of plasmids. And this consisted in finding the PtNSA2 as a replication proteins, relaxases and all this kind of proteins in a similar way to what a plasmid predictions some plasmid predictions should do. So, the first plan that we found when we follow this approach is that more than half of the predictive functions in the in the pipelines were not annotated this means that around 65% of the proteins were labeled as legal name hypothetical protein. So we decided to perform a functional characterization of the pipelines in order to improve the pipeline annotation and at the same time, find these plasmid marketings. So, to perform this characterization we created a protein sequence similarity network then we clustered it and predicted the function of the of the clusters. We obtained 470 clusters but we will only focus on the on the main on the most relevant ones. And just one cluster has was present in all the in all the pipeline, this is a person absence matrix with in which we represent the presence of absence of each cluster in its pipeline and just one cluster was present in all the pipeline which is the one that in that involves the pipeline that the pipeline begins, then the two most frequent the two next most frequent cluster was the were two clusters that involves reservoirs and relaxases. So for now this is that it's telling us that more than half of the pipeline start at least associated mechanistically to to plasmid. Then we have five more clusters that correspond to the rest of the functions encoded in the PSA and ptns plasmids. So we have a two clusters that involves integrases and that are frequently present in pipeline from limos lactoacillus and we think that this is really interesting because in this hit map is this is suggesting that pipeline is from limos lactoacillus frequently encode integrases but also it has it on a result basis, which means that it is possible that in this genus, we could have both types of the of the element, and there, and we have found cases even where we have even found the delimiting. They did rapids and which have both the interiors and relaxes so we think that these and pipelines could behave similar to to an interactive and constructive element. So this is the full a princess have such a map and we can see that with these analyzes again we can see that a pipeline from a from a lot of those are functionally very similar to the psc and ptns to but by putting from limos lactoacillus are much more so a really high diversity comparison to the to the other genomes. So the last thing we did was to. I mean, seeing that most of the bipolar is encoded functions related to the DNA mobilization like this integrases or two or more or relaxes or two or plasmids relate or proteins related to the protein maintenance. So we performed analysis to look for a horizontal gene transfer evidences of the people be so to this end, we inferred and species phylogeny based on the 16S RNA gene, in which we can see that the, the species are very well separated and we compare this tree with the people if I can see that there are clear incongruences between the species and even in the genus in the lactoacillus family that are suggesting that the clearly that clearly that there has been an event of horizontal gene transfer of the people be summarizing in with this study we have found a more than 200 pipelines in filmicutes and my mainly they have been found in the stuffy locacus limos lactoacillus genus later thanks to them to the protein clustering and the sequence similarity networks we have found that stuffy locacal pipelines are basically variations of the PSA and PTNS plasmids and then we looked then when we look at them pipelines from limos limos lactoacillus we found out that these pipelines are more diverse and gold functions that can be associated to both integrases and to both interactive elements and plasmids so they can be we could have both types of elements in this in this genus. Next we saw that we had a duty evidence of the people begin, but there are still some questions to be answered. The first one is, like, why is this is the first question that we want to answer is if this relationship between plasmids and the PIPOLB is specific to filmicutes so we plan to perform the same analysis but taking all the bacterial genomes genome assemblies in gene bank. And then we also have another topic that we are currently carrying out in the lab like the discovery in the biological role of the pipeline which still remains unknown. And also we want to characterize new biochemically new PIPOLB is from different types of like the PIPOLB is encoded in these in these plasmids or the PIPOLB is encoded in the material plasmids. So, this is everything. Thank you so much for your attention. So, I think that now we can, we can step to the questions and the discussion section. Okay. Okay. Hello. Thank you for your talk. It was really interesting. Thank you. And all the talks were really interesting this session. So, is there any question in the chat. I don't see any. So, I do have a question. Okay. I think for starting from Fernando but it's almost for everybody. So, Fernando, you said that PT uses some of them as determined or strange some of them as more wider or strange. So, you checked if there is any code and use and codon usage signature that characterize either one or the others. And on the other side, James, would you expect it to be any codon because James mentioned codon usage signature before. So James, would you expect to see any codon usage signature. So, my idea is that plasmids are kind of too short to find a significant codon usage signature but I might be wrong. It's just my idea. Yeah. No, we didn't check anything, you know, because we have to do a lot of work on many other things but you're very welcome. A list of all plasmids with their PT use and their host is everything in the in the published paper. We have one of the supplemental material is a list of all 10,000 plasmids with their PT use their size everything and also all connections and the host. Yes. You can. You know that that will be a very interesting exercise, but I have no time. You have to select a PT you which is broad host range like the PT you see for instance, start with that one. Yeah, because it's very large. And it's large like several hundred plasmids. And, and you can check if there is any. Yeah. Therefore, plasmids move very quickly. I remember when Eva top did this. I was trying to guess what was the origin of a plasmid by, by analyzing the codon usage, compared to different hosts, and she didn't find the that codon usage was very good, but I remember she used like short gamers, which are almost the same as well, almost the same. It's another signature. And, and she could guess, she had a program to guess the origin of a of a plasmid by the, you know, the proportion of different cameras or so. Yeah, yeah, it will be interesting to be done. I'm not going to do it. Yeah, I guess. Maybe we should ask Jamie if it's, if he thinks it's worthwhile if we if if we see if he thinks that we would find enough signal. I'm agreeing with with Fernando there that it might be very, very weak if at all that the major trend in synonymous codon usage within a genome is that the highly expressed genes tend to have a codon usage that corresponds to the T RNA plasmid. So those highly expressed genes are there for a long time they're very unlike a plasmid that they're for a long time, four and a half thousand million years probably you know, in the sense that that they're not moving around. When you look at the the recently acquired genes and so on their codon usage is just really not well adapted to the tRNA abundances, and therefore, I don't know that there'd be quite random I think really if you were to look at them they wouldn't they wouldn't be at nearly as focused as we say all the ribosomal proteins will tend to use the same kinds of codons. I am almost tempted if Jamie Hall is here if he has any idea to share on this. Not seeing him. No, but there is a question in the chat. Or no, that is a common question. Have you seen if the same set of gene this is for games. Have you seen if the same set of genes in a certain species of the as different types of interaction, or interacts with other genes depending on the ecological niche. That is exactly the kind of question we're trying to answer now with the random forest approach. So it's context is to try to put it into context and I had a slide where I said you know the presence or absence of gene X could be very context dependent. You know it's likely to be present if this gene and this gene are present, or it's likely to be present if you know this other gene is absent. That sort of thing. So that kind of context, and into your random forest of course you can introduce things about ecological niche or host or whatever you like really all you're really trying to do is to, is to divide up the data in insensible in different steps, but Jose that's a very very good question that is exactly the kind of reasoning behind why we're trying to get onto this with with random forest and we hope to produce some software, and so the people can try it with their own data sets as well so we're not very far away but it's just not ready yet. More question in the chat. No, nothing. Then I had another question for Fernando, which was like, have you looked because you said your program shows also resistances. So, have you looked at which resistances are carried in the more promiscuous pt use that stay in the classes in different classes and filer. And which are they which one is they because we in the most dangerous like clinically or whatever. But there are pt use which are apparently specialized on dissemination of antibiotic resistances. And we are going to a trying to develop a network of interactions to show this, you know what are the pt use which are apparently central in distributing the resistances. Well, yeah, yeah, that's the answer. There are definitely there are some pt use which have a much higher antibiotic resistance gene density than others. Because it's surely helpful like 10 times more or more. Yeah. Oh, well, so it's surely helpful to flag them to people doing outbreak investigation. Yes, of course. Yeah, that's the idea. Yeah. So, any more questions. There are no more questions so we are kind of on time. First of all, I would like to thank all the speakers. It was really an amazing session. And yeah, fantastic. I'm really happy you you talked in this session, and we can have 15 minutes.