 Yeah, can you hear me? Yes. All right, sorry about that. Let me go back, share my screen. Okay, so you should be able to see a presentation now. Yeah, it looks good. All right, great. So thanks for the introduction. I'm Julian Paganini. I work as a PhD candidate at the UMC Utrecht at the Medical Microbiology Department. And I'll be showing an optimized short read approach to predict and reconstruct ARC plasmids from E. coli. So probably everybody here knows that the antibiotic resistance is kind of a big deal. How much of a big deal? Well, in the European Union alone, there's each year 700,000 infections caused by resistant bacteria, which leads to approximately 4,000 deaths. What I didn't know when I started my PhD is that 50% of these infections are caused by E. coli and that 30% of the deaths are also caused by these bacteria. Obviously, the spread and emergence of antibiotic resistance is a big deal, but we can probably say with a high degree of confidence that plasmids are one of the main drivers behind this spread. And that's because these genomic elements frequently contain genes that provide resistance to antibiotics and because they can be horizontally transferred among bacteria by diverse mechanisms. Plasmids can be transferred among bacteria of the same species, but sometimes they can be transferred among bacteria of different species and sometimes among bacteria of different genera as well. So basically, this means that we need to identify and track plasmids and the question is, how do we do that and what are the challenges that we face to do that? So if you and your lab have money, the challenge is smaller because what you can do actually is you can sequence your bacterial genome using Illumina Short Rids and also a Nanopore Long Rids. And you can do a hybrid assembly and after that hybrid assembly, most of the times you will obtain a nice and yeah, a very nice complete genome in which the plasmid sequences are very precisely identified. The problem with this is that along with sequencing at the moment is very expensive and it's very expensive in Europe. So you can imagine that all the parts of the world it's even more inaccessible to do. The cheaper alternative for this is just doing short grid sequencing with Illumina technology and in this kind of technology, what we do is we extract the DNA then we fragment the DNA into little pieces and then using an assembly we try to stitch together these small fragments of DNA and we don't obtain a complete genome but what we obtain is longer stretches of DNA sequence that are called contigs. The problem here is that we lose information. So at these contigs, we don't know if they derive from the chromosome, if they derive from one plasmid or if they derive from multiple plasmids. Likely there's a lot of bioinformatic tools that help in predicting plasmids from short grids and we can kind of basically categorize the tools into two main groups. The first group will be binary classification tools and there's a few examples over here and these tools, what they will try to do basically is kind of sort the context out into either plasmid-derived context or chromosome-derived context. So then you kind of obtain a plasmidome of a bacterial genome. There's another kind of tools that are called plasmid reconstruction tools, another few examples over here and basically these tools takes this process a bit further and they will try to put these contigs together into individual plasmid predictions. So last year what we did was basically compare how these plasmid reconstruction tools work in particular for equality. So what we did kind of wanna show you a bit of the method is we went to NCBI public database and we downloaded 240 complete ecologenomes. We know that these were complete because they were assembled by this fancy method hybrid assembly using both short and long grids and we also downloaded the corresponding short grids to these isolates. What we did is grabbing the short grids, we assembled the context and then we provide this context and sometimes this reads as an input for six different plasmid reconstruction tools. So for each of the plasmid reconstruction tools we obtain a set of plasmid predictions that I'm also sometimes gonna call beans. And then we compare these beans back to the original plasmids to see how the tools perform at performing these tasks. I'm gonna show you, I'm gonna have to show you a few metrics that we use and what they mean for analyzing how good the predictions were. So one of the metrics that we use, it's called RICO and basically RICO allow us to answer the following question. What fraction of the true plasmid is represented by the plasmid prediction? So after doing sequence alignment of the prediction against the true plasmid, we could say if half of the true plasmid is there then RICO is zero quantified, for instance. Another metric that we use was precision and precision allows us to answer the following question. Is the prediction composed by context arrived from a unique plasmid or from context arrived from multiple plasmids? So in this case, for instance, if we have a prediction of a keyer and let's say AKB derived from plasmid one and then two KB derived from the chromosome. So in this case, we can assign the precision of zero point eight. And then we use the F1 score which is basically the harmonic mean between precision and RICO. So this means that if you get a high F1 score, the prediction is good. It's very close to the true plasmid. But if you have a low F1 score, that means that it's not that good. So how did these six different tools perform at reconstructing the E-coli plasmids? Well, first I'm gonna show you the F1 score here on the Y-axis and on different colors you have the different tools. And what you see here is plasmids that don't contain antibiotic resistance genes. So there's a few tools that perform very well for this type of plasmids. In particular, we see here Mopsuit plasmid spades and fishing for plasmid being the best performers. However, these kinds of landscape changes when we use plasmids that do contain antibiotic resistance genes. So here on the right, we have the same tools but in this case for plasmids that contain antibiotic resistance genes. And what we can see is kind of a big drop in the F1 score values for all the tools. So that means that basically all tools show difficulties for reconstructing our plasmids. Also, we analyze apart from the recore precision and F1 score, the capacity of the tools for detecting the antibiotic resistance genes. And what we can see here basically is the number of antibiotic resistance genes that each of the tools detected. And on the far left, we basically see a true plasmid-derived, detected antibiotic resistance genes. In the middle, we see the genes that were missed by the tools. And here on the right of each box, we see chromosomal contamination. So these are basically the number of chromosomal antibiotic resistance genes that were included in the plasmid predictions. And what we can see here is that basically most of the tools missed to identify an important fraction of the plasmid-derived antibiotic resistance genes. So as a conclusion from this study, in general, as an overall conclusion, we say, okay, MAPSU was probably the best performing tool for reconstructing plasmids on E. coli because it missed very few antibiotic resistant genes which is 10% and then it presented a decent F1 score. However, we thought that we could kind of improve the performance of this tool. And by the way, if you want a much more detailed comparison of these tools, you can find it on this article over here. Obviously I'm not gonna get into all the details that we did here, but it's a very nice article to check if you're interested in reconstructing E. coli plasmids. So by the way, we thought we could improve the reconstruction of these plasmids in E. coli. And the way we did that was basically combining two tools that were developed at the UNC-UTRA. One of these tools is Plasmid EC, which was developed by Lisa Bader, which was a master's student that was working last year. And we also use G-plus, which was originally developed by Sergio Arredondo Alonso. And to understand what we did exactly, we need to understand how G-plus works first. One important thing to note about G-plus is that it takes as an input and assembly graph. So basically I explained before that after doing assembly, you get a set of contigs that are longer fragments of DNA sequence, but we also get something else, which is how these contigs are potentially connected to each other. And this is called the assembly graph. So this is a naive representation of the assembly graph. In reality, usually assembly graph are a bit more complicated. And so let's suppose we give this assembly graph as an input for G-plus. The first step that G-plus will do is we'll try to identify the Plasmid derived contigs from this assembly graph that are here in green, using one of two tools, either emmer Plasmids or Plasflow, depending on the species that you select. And then what it will do, it will try to, it will generate a series of Plasmid works in which the Plasmid derived contigs are basically trying to be connected to each other based on the similarity of the read coverage. And the tool will do this many times, and it will do starting from each of the different Plasmid predicted contigs, basically. And based on how frequently two contigs are found on the same work, it will generate a Plasmid on network, and then it will partition that network using different partitioning algorithms, and it will generate a series of bin on Plasmid predictions, as I explained before. So how did we improve this process? Well, we kind of had the impression that this initial first step in which we identify the Plasmid derived contigs was not working very good for E. coli. So we kind of replaced this, and now you can provide basically the input from any tool to G-plus. You're not limited only to emmer Plasmids or Plasflow. Particularly for E. coli, we replace it with Plasmid Ensemble Classifier, or here I'm gonna show some of the results that we obtained by replacing it by the Plasmid Ensemble Classifier, or Plasmid EC. So Plasmid EC, I still didn't explain where it is. It's basically classified contigs into Plasmid or chromosome derived contigs, and it does that by combining the results of three different classification tools and implementing a majority voting system. At the moment, there's four tools available that you could combine in every way you want. And the way it works is pretty simple. It will grab each of the contigs and it will make predictions with each of the tools. And if, for instance, two of the tools says that this contig want is Plasmid, then it will call it Plasmid as an output and it will provide also a probability of this contig being a Plasmid. In the case that two of the tools say it's a chromosome, then of course the tool will call it a chromosome. So I'm gonna show you some results of these. First of all, I want to show you how the Plasmid EC works for the binary classification of contigs derived from our Plasmid. And for this, we use a data set that includes a 148 complete our Plasmid. And what you can see here on the Y axis are the different individual tools that we combine and also the different tools combinations. So, and obviously on the X axis, there's the fraction of Plasmid context that were correctly identified by the tools. And what we find here is that this particular combination of tools, Plasm, Plasmid, and error Plasmid outperforms all other combination and all other individual classifier, which means that Plasmid EC correctly identified the highest number of contigs derived from our Plasmids. So now we compare the combination of Plasmid EC and G plus against Mobsuit, which was the best performing tool in our previous studies using the same data set. So we will retain the comparability between the studies. And just to test, we also combine G plus with the output from Plasco. Just in the previous slide, Plasco was the best individual classifier, basically. That's why we choose it. And what we find here when reconstructing E. coli Plasmids again, was basically that for on the bottom, you see that Plasmid that don't contain antibiotic resistance genes, there's not a big difference over there. But for Plasmid that do contain antibiotic resistance genes, the F1 score was much higher for both meds that included G plus. And this is kind of a very obvious conclusion. And the F1 score is basically the harmonic mean if you recall from precision and recall. So we wanted to know if it was either precision or recall that was actually, let's say, driving this difference. And what we found was that precision values were very similar between Mobsuit and the true version of G plus. But the main difference was basically on recall. So if you remember, recall was basically the fraction of true Plasmid that is represented by the prediction. So basically, this allows us to include two things is that both versions of G plus outperform Mobsuit at reconstructing Plasmids that contain antibiotic resistance genes and that in general, the G plus methods are better at binding a context together into the same prediction. That's why we have basically a higher recall. We also evaluated how the tools perform at detecting antibiotic resistance genes. And in this case, we have again, the number of antibiotic resistance genes on the Y axis. And we have the genes that were detected in orange, they're not detected in light gray and the chromosomal genes in dark gray. And what we can see here is that basically the G plus method detected the same amount of antibiotic resistance genes as Mobsuit. And the only difference here as well is that probably the G plus methods includes a bit more chromosomal contamination in the predictions. So as a general conclusions from this study, we basically can say that our Plasmids are difficult to reconstruct from short with data. We can also say that integrating the ensemble classifier with G plus provides best results for reconstructing our Plasmid in E coli. And probably the last conclusion is that combining long and short reads is still the best option to reconstruct Plasmid but it's a bit expensive. So if you only have short reads and you're working with E coli, probably you would, I will definitely choose Plasmid EC and G plus to reconstruct the Plasmids. Finally, I would like to thank to my supervisor, Anita Scherzsch, Ninkia Planting and Rob Bilance and also to Lisa Vader and Sergio Arredondo for this work. And I also would like to thank to the entire Bioinformatics Department. They are really very, very nice people. And they really have helped me a lot in this work. I don't know if there's any questions. Apologies, I was muted. Great, that's thanks very much. That was super interesting. This is a chance for anyone to ask any questions. And if not, or if you haven't not having a chance to type it in fast enough, we could save the questions for the discussion session at the end. Nothing going, going, gone. Okay. Thank you, Julianne, for that talk. That was super. And I hope we'll hear from you again in the discussion.