 Hello everyone, to this week's Bitesize Talk. I'm very happy to welcome today Julia from cubic in Tübingen and Marta from UPF in Barcelona. And they're going to talk about another new pipeline that was released just a week ago called CRISPRSEC and off to you. Thank you. Thanks for the introduction. So we'll present NFCOR CRISPRSEC, which is a pipeline for the analysis of CRISPR experiments. And so I would like to start by an introduction to what CRISPR is because I'm sure you've heard that word before but maybe you don't remember exactly where it is. Basically this comes from bacteria and the system is repurposed to do gene editing. It consists of a protein that we call gas and this protein can cut DNA creating double strand breaks. And then it's coupled to a single guide RNA which is basically a sequence, a short sequence of RNA, which is complementary to the DNA region that you want to cut. So this way we can have directed cuts. And then when we have this double strand break in a cell there are usually two types of preparation. The most common one is this one that we call nanomologous angina. So basically that's the cell that goes and tries to repair this double strand break and this can produce some insertions or deletions which can result in the disruption of the gene and then this can cause a gene knockout. And then there's a different way which is called homology editing repair which consists on having a template that we can provide this template and then the preparation is made based on that template so we can introduce new fragments of the DNA and possible gene knock ins. Then apart from these two, getting the right picture, apart from these two mechanisms, there's also this micromology mediated enjoying which is very similar to the nanomologous but it happens where there are two small regions of homology surrounding the cut and then this can be combined so we can get a bigger deletion and then more recently there are these other two technologies called base editing and prime editing which are done not by a double strand break but only for with a mic. And those are more precise because can produce base substitutions of only one base. So that's the overview of all these CRISPR-Cas experiments that we can have then apart from that we can also have CRISPR screens which consist of a library of different genes targeting lots of different genes and then we can perform a screening. And finally also if we couple with a cast protein that's inactive and doesn't cut the DNA and it only affects the expression of the gene so then we call this CRISPR activation or CRISPR interference. So our pipeline CRISPR-C can analyze gene knockouts, knock ins and also base editing or prime editing experiments. And so this pipeline is based on the first pipeline called CRISPR analyzer which Mark that develops so she will explain more about it. As Julia has already said this first release of NFCore CRISPR-C pipeline is based on CRISPR analytics and currently we just have like the core of CRISPR analytics in CRISPR-C which I will show you here. These are like the core steps of that pipeline. The first steps are this quality processing, preprocessing of the sequencing reads where different steps are done to remove low quality reads and also in the case that we have and sequencing reads, the reads are merged. Then the alignment against the applicant reference is done and after that there is a process where each indel and substitution that could be lived but by these genome editing tools are quantified. And finally, some plots and tables are done to allow us to visualize the results. And in the next slide, what they want to show you is other optional steps that CRISPR analytics have that are not currently in CRISPR-C but we hope that we will be able to add it in the following versions. And just to go briefly, the first optional step that we have are the ability of using conimolecular identifiers to cluster the sequences and through that clustering process is what we can remove sequencing and amplification biases as well as correct sequencing errors. We also have implemented an step that allows us to identify the applicant reference looking for it in a genome of reference. In the bottom part, you have two other steps that has allowed us to increase the precision of our pipeline. The first one is the size bias correction in which we have implemented a simple model where we used spike in controls of different sizes and known abundance that were used to maximize biases related with the amplification with the sequence size. Since longer deletions will lead to have shorter sequences that will be amplified more times than longer ones. And then, if we also sequence MOC sample or a native control, we can use this sample to subtract errors that can be also represented in our treated samples. We can choose the alignment that you want to use in the alignment step, but we have been exploring with simulated data sets, the performance of different alignments together with the following part of quantifying the different edits. And then what we have done is to optimize that the parameters of minimap to achieve better results related to the identification of the indels produced by the double strand break repair mechanism. And in the following slide, we have just some examples of CRISPR analytics being used to analyze a bunch of samples. We have analyzed samples from three different lines that were edited with CRISPR Cas9. And in the first plot what we have, what we see is that the main pattern observed among all the insertions that have been found are homology insertions, which means that the same insertion that is in the cleavage side have been also added to in this river process. And this happens more, it happens with higher frequency when the nucleotide that we have free is an adenine or a timine. Also in the other two plots what we have been exploring is the precise outcomes, which are those outcomes that are shown in a higher frequency. In that case we also observe that among these precise outcomes we have these homology insertions, and also we have some deletions of acetocene when this cleavage side is rounded by acetocene, and also we can see some micro homology patterns that have led to longer deletions that have also a high representative in these samples. CRISPR analytics have been benchmarking using several data sets, we have used real data as well as simulated data, and we also created a ground truth data set to be able to also have this kind of data set for the benchmarking. This ground truth data set was generated by several collaborators, which had different subsets of reads, and they were classifying the indels that were found in the reads as indels produced by errors or indels produced by genome editing tools. Finally, these subsets have been used to calculate the percentage of addition of those samples. And we have extrapolated this percentage to be able to compare So to calculate the distance between the person that just reported by different tools and the real distance or the established percentage of addition with this ground truth data set. And from this we just want to highlight that our tool has a good precision without relying on the addition windows. Most of the tools use a windows where the edited indels have to take place to avoid reporting false positive events. Okay, so how you can use NFCOR CRISPR-C basically you can use the typical next law command where you provide an input sample sheet, the output directory as a profile that you want to run the pipeline with, and then we also have this one single parameter to provide the aligner by default. And we're using minimap but you can also choose between BWA or Botaito. And the reason why we don't have more parameters is because most of them are provided with the sample sheet because they are dependent on the sample. So that's how a sample sheet looks. So you have the sample name, FASQ1, FASQ2, if you have only single and sequencing data you can only provide FASQ1. Then you provide the reference sequence here it has been shortened for space issues. So this reference is the reference where the bits will be aligned to so it's the region where you directed your cut. Then you also provide the protospacer, which is the guideline that you use in your experiments to direct the cut. And finally, in case that you perform an homology data data experiment you can also provide that template. And that's the structure of the output folder. I won't go to all the directories specifically that basically, and you will find all the outputs of all the tools used for pre-processing like to join parent reads then that's all the quality filtering steps because we remove sequencing adapters, we remove low quality reads or mask low quality bases also. And then you also have the output of the alignment. And finally, the most important controller is this one called SIGAR. And it's called like that because we parse the edits using the SIGAR field from the mapping. So here in these directories where you will find some tables and summary tables of the edit and also plots. And this is an example of the output plots. So we will provide data quality, meaning that you will have a percentage of reads that have good quality, also the ones that were aligned against the reference. Then we also report the number of reads that were wild type or the ones that contained indels and from these indels we also classify it by a filter of also quality. And if they are located in the expected peak of the cat size and if they are above the sequencing error rate or not. And finally, there's also classification between insertions, deletions that are insertions produced by a template. And also these indels are in frame or out of frame because the ones that will be out of frame are the more probable to disrupt gene function and produce a gene knockout. And finally, these further steps, as Marta already commented, they are already implemented in CRISPR analytics. And we will add them to CRISPR sec. So this unimolecular clustering step to reduce PCR duplicates or sequencing biases because usually for the sequencing methodology, shorter reads are sequencing, sequenced more often. But this doesn't mean that you have this particular long deletion more represented in your sample so we can correct with humans. Then also the automatic identification of a reference and some noise handling. And finally also thinking already with the version 2 of CRISPR sec, the idea is that we will be able to analyze all our kinds of CRISPR experiments such as CRISPR screening. If you have any doubt or want to talk with us, Lohan is currently implementing this part of the analysis so you can join this web channel and talk to us. And that's it. Feel free to, as I said, join this live channel and test out the pipeline and see if there's something that you would like to also include. Thank you very much. That was a very nice talk. Are there any questions in the audience to either Julia or Marta. You can unmute yourself now if you want to or you can write a question in a chat and I will read it out. There currently seems to be no questions but may I ask one. So one of the biggest issue that I know of CRISPR is off target effects. But as I understand you're mapping to fairly short references like just a target basically. So is there any way how we could figure out if they're off target effects with this pipeline or is there anything planned in the future. With this pipeline it's not really thought to be able to detect off target effects, but even that, what you can do. So these experimental steps are based on amplification of your expected target and then your sequence with Lumina or other sequencing. Next generation sequencing platforms. And what you can do is, for example, if you use some kind of prediction of which are the the targets that are more susceptible to have off target you can also amplify these off targets and use them as make the same analysis and see if there are in this in that regions. But you would need to know what to look for them, obviously. Yeah. Yeah, we should have to add guide seek or other kind of analysis pipelines that you use another kind of experimental protocol and also computational analysis so it's something that that can be implemented in further steps. Thank you. Any other questions in the audience. If not, I would like to thank you to for this great talk. I also would like to thank the John Zuckerberg initiative who is funding our bite says talks. If anyone has more questions to both of you, you can always go to Slack, and check either in the channel for Christmas egg, or you can also ask in the bite says channel. And I'm pretty sure the two will have a look at your question. So thank you very much. Thank you.