 So now that we have created our VCFs, so we have created a genomics DB and out of that genomics DB, we have extracted the VCF containing all of the variants in our trio. So basically all variants in our cohort. Now it's time to separate the real variants, the real existing variants from the false positives. And that is not a very trivial task. So just to show you where we are in the workflow, we have to control the mapping. We have created, sorry, the GVCS. We have added those GVCS in a database, consolidated them, and then we did a joint call. So we extracted all of the information out of the database. And now we have a one VCF containing all of the variants in our cohort. So now we basically have the raw SNPs and indels and we need to filter those. But how are we going to filter them? So there's quite a bit of quality control information we can use. For example, you can imagine that we make more mistakes when in regions where we have low mapping quality, so where the aligner wasn't sure whether it belongs at that position, yes or no. For example, also if you have very little support in terms of depth, so read depth for certain region, we might consider to remove the variants because we do not have enough read support or consider that the genotypes may be wrong. There's also, it might also be situations where certain region is only sequenced from, let's say only one strand and not the other strand that might also point to mistakes in the variant call. General variant quality can also tell you something about the quality of the variants, obviously that's the main idea and so on. So we have many different quality measures that we can use to do filtering. So what you could do is just say, okay, for each of those quality measures, I'm going to set maybe a minimum and or a maximum threshold and we're just going to filter for depth. However, it can be possible that you, let's say we have, if we have two quality measures, we might have two sweet spots in there, meaning that if quality measure A is high, quality measure B has to be a little bit lower. However, if you have a low value for quality measure A, but a high value for quality measure B, you also have a lot of correct variant calls. And that is very difficult to actually have if you do hard filtering. You set pretty strict threshold on individual quality measures. So because all humans cannot really do that, we can also ask a computer to do that. So basically taking all of those quality measures into account into a model, a statistical model, or you could also call it machine learning if you want. And then let this model try to estimate which variant would be a true positive and which variant would be a false positive. So this model needs to be trained and applied. With J2K, you can use the Variant Quality Score Recalibration Measure Program to do that. However, you need a pretty big data set to be able to do that. So basically what we can do is of course, say we have fixed thresholds, but then we miss this sweet spot where we have a low value for quality measure A, but relatively high for quality measure B. So Variant Quality Score Recalibration has a better performance in terms of finding true positive compared to hard filtering setting, hard threshold, but you will need a true set, meaning a set of variants that are actually there, should be there in your cohort and the relatively large data set you work with. So basically the cutoff would be either one, at least one whole genome. So a whole genome resequencing or 30 whole axon. So that's a relatively large set. So if we are going to evaluate how well we did with filtering our variants, we want to have some evaluation measures. So this is just a recap for what we mean with Precision and Recal. So with precision, we mean how many of the selected variants were true variants. So do we have a lot of false positives in there? Yes or no? And Recal would be how many of the two variants were selected. So we're remissing variants. So to check with you, I have a question regarding exactly that. Firefall. So let's say we are going to apply very strong filtering. So we are going to remove relatively a lot of variants. So what will we end up with? So where will we gain mostly? Do we get a high precision low recall, low precision high recall, and so on. So as a reminder, I can share my screen with the PowerPoint slide. There we go. Precision is how many of the selected variants were true variants, and Recal how many of the two variants were selected. Okay, I think most of you have answered. Okay, so the people who answered, which were eight of you, not all of you participated, but first the people who have answered said high precision low recall, and then he said that's the correct answer. Because if you do strong filtering, you expect that more of your variants will contain true positive, relatively. So you get rid of false positives, but you might remove also true positives by the strong filtering. So your recall will be lower.