 OK. And you're 40. Yep. Great. So, um... Thanks. Aaron? Yeah. So... All right. So, continuing on visualization of structural variance, I guess this is reviewed from the previous module, right? There are many different complementary approaches to identifying structural variance as Aaron covered in great detail. So, we have depth of coverage, but read mapping, paradigm mapping. So, the one we're going to focus on visualizing is this paradigm mapping approach. So, I'd like to keep this a little bit interactive. So, here we have a donor, a donor which has a small insertion with respect to the reference. So, who could tell me what the reference genome would look like in a schematic when we map the reads to it? Sorry? Straight blue line. Straight blue line. What happens to that paradigm mapping in the middle? What are the characteristics of that? So, this paradigm mapping or this paradigm sequence, when we map that pair to the reference genome, what are the characteristics of that? They're a bit closer together, right? So... Right? This is how they would look if those pairs were mapped to the reference. In the case where we have a large insertion where the pairs don't span the insertion, what happens now? Sorry? You'll have one unpaired. Right? So, the ones that overhang this, you won't be able to wrap them together. And we essentially get a break point where we have no pairs which map over the break point. In the case... And you guys have these schematics in your notes for reference. In the case of a deletion, you've seen this. So, I think you are getting at what that would look like. So, here, no matter what size the deletion is, we actually have a paradigm which spans completely. Are these okay with everyone? It's really important to get these in your mind, and then when you see them in a genome browser, we try to make the representation kind of one-to-one mapping with these pictures. What's going to happen in the case of an inversion? What happens to the read of the R? Right? Yeah. So, you're going to see a criss-cross. Ten deductions. So, if there was only one red portion in the original reference genome, but now we have two copies. We map those pairs. We're going to see that pairs would span that break point. But we're also going to get the regular pairs that, not to the middle, are going to map appropriately. So, the issue that we had, we actually formally did a lot of work in our lab on structural variance, and in visualizing these in IGV, we came across a number of issues. But it gets really complicated when you're looking at heterozygous. Yeah. Question? This one? Yeah. So, the one in the middle. Yes, but the relative orientation of the reads which they map, I think should be okay. Yeah. So, here we're coloring the reads with respect to how they are in relation to each other. Yeah. Great point. Great. Okay. So, when you look at a heterozygous structural variant in IGV, the problem is that you're going to get a mixing of normal and not normal pairs. So, we came up with this representation in savant that's a little bit different. It's called arc mode, and it's specialized for trying to identify structural variance with this paired end mapping technology. So, the key features of this is that in this mode savant will draw an arc where the height of the arc represents how far apart the reads map to each other. Okay? So, not only is it going to be scaled horizontally, but we're going to draw the arc bigger if it's mapped further apart. And we're also going to color these arcs based on the relative orientation as we were just talking about. So, the representation is to look something like this where here we have paired end mappings which are very far apart, and so they're scaled vertically, and we also color them because they're discordant. So, you guys are going to do much the same thing in the lab that we've prepared for you, but now in savant, we've prepared a list of structural variance that we just downloaded from the lab using a similar technology like lumpy that had been called for the patient that you're looking at, or the individual that you're looking at in a thousand genomes. And I guess your task is just to go through these examples, learn the signature as what's a deletion, what's a assertion, tandem duplication, and just learn what these pictures look at so that the next time you see that, you immediately say, oh, that's a deletion. And then we'll have a small quiz. Okay, so thanks for bearing with some slowness. Hopefully you guys had a chance to look at paired end mappings in a way that is a little bit more intuitive, we think. So, just to review, what does this mean for you? I think I heard it, but it was a whisper. You guys have the cheat notes, too. This is a case of an assertion. And these breakpoints are a little bit fuzzy, but okay, you can cheat for this one if you want. What does this event? Deletion? Okay, what kind? Like what's the zygosity? Oh, zygosis, right. I think I have a question about that here. What does a heterozygostation look like? This is where it gets tricky, right? Okay, so it looks kind of like this image, but what are the differences, I guess, between this image and what a heterozygostation looks like? Yes, yeah. So you will see a combination of big arcs and little arcs, and the density of little arcs will be approximately half of what's shown outside, right? So we have an example here, which it's a little bit hard to see the density, but hopefully you can kind of see through it that in the middle there is much less density than outside. And then you have more interesting events which require interpretation, obviously. Okay, so I wanted to talk really quickly. How much time do we have left? 20 minutes. Okay, yeah, let's talk really quickly about how we could take a step further now and do interactive variant analysis. So going back to your question, what's the difference between Savant and MedSavant? We, in our collaboration with people at TCAG, we see sick kids. We, there's a problem that finding disease causing genetic mutations is very difficult. You have a lot of variants that might look interesting, but they're caused by errors in your sequencing or variant prediction pipeline. And some of them are actually not related to the phenotype or disease that you're studying. So some people throw out their hands and say, this is like trying to find a needle in a needle stack. So we tried to create a solution. And I think these, more of these solutions are going to be coming day by day that try to make the process of doing this a lot more interactive and fast. So we wanted to create a platform that allows you to filter your variants based on these metrics of quality. What's the functional effect of the variant? What's the relevant to my disease? And to kind of capture all these processes that we've been talking about over the last two days. So doing variant calling, annotation things like ANAVAR, doing filtration based on those metrics and then visualization. So we've talked again about all these processes and various modules. And we've used a number of different tools to do it. So I hope you guys recognize that the command line is super, super powerful and flexible. But unfortunately, it's not as interactive as things like Microsoft Excel or the genome browser, which is what some geneticists prefer to operate in. And so it's nice to have these interactive tools like genome browsers, but I think we're back in the era of the mid-1990s where an analogy is surfing the web where you had to type in www.yahoo.com to go to some place that's interesting for you. So I think that's the state of genome browsers where you're copying and pasting genomic coordinates in order to get some place interesting. So it'd be nice to have a tool that kind of combines the power of the command line with something like Excel. You have a spreadsheet of variants that you're interested in and being able to really quickly go in between a spreadsheet and a genome browser visualization. So that's what we tried to do with Metsavant. And there are a number of other tools that exist within the space that are, again, coming up all the time. And I should mention that Gemini is something that Aaron Quinlan, who's an expert in this is his tool, it lies in the same space, allowing you for a language to be able to specify filters based on your variants. So I recommend you try out these tools as well. So Gemini, Varsifter, and Golden Helix. And there's a slew of commercial tools that are available if you're willing to pay for them. So now there's also a lab on Metsavant, which is again a derivative of Savant. And let me know if you have any questions. Okay, so once you get loaded up in Metsavant, this is kind of the variant navigation interface. And actually let me step out. It's basically the idea of having apps. And so we're trying to work with collaborators to tell us what apps that people prefer. But essentially there's various apps for uploading your BCF files, performing annotations, doing patient analysis. Here we have an app which does Mendelian inheritance analysis. We're doing a collaboration with Google Genomics, which will allow you to access read alignments that are stored on the cloud through Google. And there's more clinical workflow apps in Discovery. So we have a little app storm which allows you to publish your latest algorithm and download it for use as well. So for BCF upload, you just basically drag and drop a BCF file and a pre-configured set of annotations will be applied like ANOVAR. So you could actually apply any annotation that you could apply through ANOVAR. It will happen in the background upon uploading this BCF file. And then those variants will be accessible to use through this variant navigator. If I pull up the list of columns, you can see these are all the columns that have been attached to the variants. So we have allele frequencies, everything that's in the BCF file, but also annotations like these actually come from Polyfam, DBSNP, ClinVar, 1000 Genomes, et cetera, et cetera. And we can now construct search criteria based on all of these facets of the data. So if I wanted... let's create a filter based on allele frequency for 1000 Genomes. I think it's annotation frequency. And we only want low frequency variants. So just by configuring those widgets, which are graphical, you could basically filter down your variants. This is kind of like shopping on eBay. You specify what you're looking for on the left and you show the list of the results on the right. And through an interface that's shown on the right side, we can now navigate to the position of the image you're done browsing as well. I mean, our servers are on the fire right now. That's it, yeah.