 So good morning everybody. So in module two, we're going to be going over genome visualization or data visualization. The overall structure of this module is try and gain understanding behind why we actually visualize our data despite having many tools to process them. You're going to be able to learn how and when to use particular tools. You're going to gain experience with genome browsers, most specifically the integrated genomics viewer, IGV. And then finally, a bulk of the lecture is going to be on how we do variant inspections, whether it's single nucleotide polymorphisms or SNVs or structural variants itself and a little bit on how you can configure your window to view long read data since it's quite different from traditional Illumina sequencing. The general organization of this module is going to be part one, where we just generally go over visualization tools, the different kinds of genome browsers, their advantages, disadvantages, and then IGV itself. And then for part two, we'll actually go with variant inspections of single nucleotides and structural variants. And then how do you adjust your window for packed bio sequencing specifically, although it's similar to Oxford and Antipore. So for part one, visualization tools, the first question we have to ask is why do we even visualize our data? Throughout any work you do, you're going to be running multiple programs on your data. You're going to be doing multiple tests. You're going to gain different statistics. But at the end of the day, you have to ask yourself a question of whether the statistics actually explain the data itself. To help answer this question or address this question, a mathematician named Anscom developed four data sets. You can see them over here with X and Y. All of these data sets have about the same kind of statistical information, their average values, the variance, the correlation between X and Y, and they have the exact same linear regression. Now, if you were just doing purely statistics, you would assume that all these data sets are about the same. However, if we just simply plot these two data sets, we see that they're actually quite different. The top left data set does fit our linear regression. So the statistics do work for it. At our top right, we can see that a linear regression isn't probably the best approach to try and deal with this kind of data. We need a more general regression, such as a quadratic model. Our bottom left data set does fit an almost perfectly linear distribution, except for that one outlier, so you'd need a more robust regression. And our bottom right is a good example of how a single outlier can lead to a linear correlation, even though you shouldn't have one to begin with. All of this information isn't viewable if you're just looking at the statistics itself. So it's always a good thing to just look at the data and try and figure out what you're dealing with. A Toronto group also developed this data source dozen. What you can see is that they set a series of points in the shape of a dinosaur and calculated the X mean, the Y mean, the standard deviations and the correlations. And then what they did was using or conserving those statistics, they actually made a dozen other distributions of these points, all of which have the exact same statistic. Now there's no way you can actually say that all of these data sets are exactly the same, which gives you an understanding of why we do visualize our data to try and determine any outliers or anything that we would immediately be able to notice. So in terms of visual processing, there's actually two main categories of your visual processing systems. There's the pre-attentive and the attentive. The pre-attentive is anything that strikes out at you when you immediately look at it. So on the left side, it's probably the easiest version of where is Waldo because you can find Waldo right there without actually having to look at it. And this is actually your pre-attentive visual processing because it's the portion of your brain that immediately finds an outlier. If you're using your attentive visual processing, it's when you're trying to comb through massive amounts of data and find what's probably different in that data set. The reason this is important to notice is we want to be able to arrange our data in such a way that we're leveraging our pre-attentive abilities as well. So in terms of outliers for this third row over here, having a different color immediately strikes out at us rather than small differences in shape. So it's something to keep in mind whenever you're trying to pick the best strategy of visualizing your data itself. What this also tells you is that your human visual system is probably one of the most efficient and low-cost types of approaches to dealing with data and trying to find any outliers that might be present. It also is very, very simple compared to writing lines and lines and lines and lines of code where you don't know what to expect in terms of what you're looking for as well. So regardless of the kind of data you're going to be working with, whether it's SNB, structural variants, new genes, proteins, it's always a good idea just to look at the data itself. To address this question, there are actually over 40 different genome browsers that have been developed. Each of these genome browsers tries and takes a different approach or different strength that it'll focus on to give itself an edge over its competitor. The kind of genome browser you use depends on the task at hand, what kind of data you have, how much data you have, what the attributes of the data is, whether the data is locally stored or it's securely on a server because it changes how you're going to gain access to it and how you're going to actually use the tool itself. A few of these genome browsers are IGB, which we'll talk about more in depth in a second. The UCSC genome browser, which I'm guessing most of you have used already, it's an online genome browser where you can just upload small sections of your data and end up viewing them. Galaxy, which is an online platform for bioinformatics analysis, actually also has its own built-in genome browser where it does small sections of visual analytics as well. And then the Savant genome browser, which is used just shy of IGB, has an advantage where you can actually annotate any structural variance you find, something that's not possible with IGB. So that's what Savant's taken its approach of to try and give itself an edge. But for the purpose of this module and this course, we're going to be talking about the integrated genomics viewer. Any slides you see that have this little broad symbol on the top are taken from the broad website who've developed IGB, and we're just using their slides for educational purposes. So what is IGB? IGB, in the simplest way, is a desktop application for interactive visual exploration of integrated genomics data sets. IGB's advantage is that it's able to handle multiple different types of data that you provided all at the same time, whether it's epigenomics, microarrays, alignments, RNA sequencing, and copyright variations, and a multitude of other types of file formats. And it's already been preconfigured to view this data in a very optimal manner, without you having to adjust any of these parameters. What you're able to do is you're able to explore large-node data sets intuitively, taking advantage of that pre-attentive visual cortex. You're able to integrate multiple data types with clinical informations and try and sort them out, to try and see any patterns that, say, a mutation might be present more in males than females. You're able to load data in multiple ways, whether it's locally, remotely, or cloud-based, which is a huge advantage, since most data storages are moving from remote locations to cloud-based locations. And you're actually also able to automate any tasks that you would do in IGB from a command-line interface, which means you can run whatever you want on IGB, leave your computer on overnight, and it'll do all the tasks that you would manually have to do. Like I just mentioned, IGB does allow you to pull data sources from multiple different files, whether it's locally, from the TCGA, Genome Space, or servers themselves. This gives you the advantage of collaborating with people across the world, as long as they have access to your SharePoint. They can actually view any of the alignments or any of the data files that you currently have. For IGB, just in general setup, the basics are being able to launch IGB, selecting the right reference genome, which will actually throw off any visualization you do, loading the data, and then navigating through them. So in the case of whole genome data, it's viewing single nucleotide variants and structural variations. Now, all of you should have seen this page when you went to download the right version of IGB for this tutorial. But there are different kinds, or there are two main different versions of IGB. One is the Java platform or Java web-based application that you can download. And the other one is a binary file for Windows and a Mac app that you can install on your computer as well. They both have advantages and disadvantages. The Java application has the advantage that it's just a single double click to be able to run. The different versions of IGB account for different memory requirements that you have, which is based on your computer itself. But once you launch IGB, you'll end up seeing this initial window. First and foremost, always remember to select the right genome. Because if you're doing, say, stuff on GRCH38, then you always build the human genome. But you have the wrong reference genome lined up. The second you load any data, all of your reads will be colored as a rainbow. And the reason for this is because the positions don't exactly line up, and you get SNBs called all over the place. It's really nice to see from a visually-appealing perspective, but it tells you absolutely nothing about any analysis you do. Once you have the right genome specified, you then load your data. The way you can load it is it gives you different options, whether you're loading it from a file locally on your computer, from a URL that you have access to, or from the server. Loading from a server also allows you to load different annotations that are present, that the broad has already indexed, whether you're doing ensemble genes, UCSC genes, GC percentage. If you want the database of genomic variants, it's all readily available. The version that you use is a bit restricted, so you need to keep that in mind whenever you load stuff from a server. But we'll get more into that in the actual lab component. If you load up the tutorial basics, they end up loading up the following tracks over there. This is how your view looks, but that's the breakdown of the actual IGV space. So you can see the menu above where you have files, the genomes, any options that you want to play around in IGV or play around with the data. Your toolbar up there is where you actually loaded up your genome. A search bar allows you to be able to jump to specific coordinates or genes by just searching right inside. Your genome ruler above tells you where along the genome you're currently viewing your data. The track names for the files you've loaded up are displayed on the left side, as well as an Attributes panel. You can customize these Attributes panel to include any phenotypic information that you guys actually want. So the more information you have or the more information you want to be able to parse with, the more information or columns that will be viewed there. The actual tracks themselves are displayed in the center over here in this data panel. And then any additional tracks that you've loaded up, say RefSeq genes or GC percentage, is shown along the bottom over here. Or if it's loaded on top, they're usually separate tracks themselves. What do you say tracks are in sequence? So they don't have to be sequences. They can be annotations. They can be genomic variants. But they're displayed as one singular row instead. So if we jump back over here, see how you have different annotations over here. So you have annotations. You can load stuff from the breast cancer cell line. So what you can have is GC locations. You can have common SNPs. You can have cosmic SNPs. And all of them will be displayed on the bottom. It is a separate row. So RefSeq genes will be its own. DGV will be its own. Cosmic SNPs will be its own, so on and so forth. So like I mentioned, IGV's advantage is it's preconfigured to handle different data types. You can simply load up different files with any of these extensions into IGV. And it preemptively knows how to process your data and view it in the best possible form. Each of these data types does have to follow its IGV's expected input. So you have to put it in the right format. But once it's in the right format, IGV will take care of everything else. All of this information is available at the road website. So I'd suggest looking into the file format before loading anything. Because if your extension is different from the file itself, IGV will end up throwing an error. And then you're stuck in a loop. And you'll just be sitting there trying to figure out why nothing works. Once you have your data loaded up, for example, BAM files, what you'll initially see is that coverage track above, which tells you your overall read depth at different positions. But then you won't be able to actually view any reads themselves. And the reason for this is when you're loading BAM files, you're loading gigabytes worth of information, which will either slow down or if not crash your computer. To be able to actually view it, IGV itself tells you to zoom in. Now, depending on how far you need to zoom in before you can see any of your alignments, it's based on your computer's memory requirements. Generally, IGV is preconfigured to about 30 kilobases. So in the 30 kilobase window, you'll start seeing all of your reads mapped to each location. But if you want to set a different threshold, you can just change this in preferences. Now, you would change this based on different metrics. How good your computer is based on how much memory it has, as well as the kind of files you have. If you have very, very deep coverage files, that means you have massive amounts of data in a very small section of the genome window. So you'd want to reduce this preemptive loading based on a length of being zoomed in. The more data it loads up, the slower your computer moves, which means the longer it takes you to do any task. If you run out of memory, IGV will throw an error. It'll just stop loading everything completely. You need to clear up memory before you can continue going. But once you have data loaded up, this is what you end up seeing. So each of your reads is going to be displayed, mapped to the reference. You can see the sequence below in this color for bar on the bottom. Each color corresponds to the reference base, A, T, C, and G. And then what we can also see is that our reads themselves also have little notches of color on them as well. And these notches of color correspond to SNVs between our read and the reference present. The colors are also based on the SNV itself. So A, T, C, and G have each different colors respectively. So you get different mismatches. Now, I don't know if it's that easy to see, but the base quality information, so how certain it is about that base, is also displayed based on how deep of a shade of color you do. The lower your base quality for your SNV, the lighter the shade is because you have less confidence. So what IGV is basically doing is it's preemptively filtering out low-quality SNVs and trying to view more high-quality SNVs that you can pick up from your visual cortex itself. Just like I just mentioned, so now we're more in part two of the lecture where we're trying to see what kind of metrics we're looking for when we're dealing with SNVs and structural variants. So for SNVs, the metrics that you need to keep in mind is the coverage. So how many reads you have that cover that specific position, the amount of support for that SNV? So how many of them are in SNV versus how many of them are the wall type base? Your strand bias in PCR artifacts, if you see that all of your SNVs are on the forward strand or reverse strand, you usually have less confidence in it because that might be a sequencing error, or if it's the exact same read going all the way down that has that SNV, that would be a PCR artifact. Your mapping quality, so if we go back a slide, you can see that the reads over here are colored gray. Now, the lower your mapping quality is of that read, the lighter the shade becomes until it becomes pure white. So pure white reads have no confidence in where they're mapped. They're usually multi-mapped at different locations. So the information they provide is less confident or less trustworthy. And then the base quality is similarly, the deeper the shade, the more confidence you have in that base being called correctly. For structural variance, we're going to go over these metrics a bit further on, but coverage is, again, important for structural variance. But in this case, we're going to leverage insert size, which Jared talked about, and the read-period orientation, whether your reads are going in the order we expect or they've been shifted. So if we take an example of an SNV or a SNP that we would trust, what we can see over here is that at the specific position, we have an SNV where a C becomes a T. Looking across this, we can see that the first T that's viewed in this window is a bit lighter shade compared to the rest, which means it's a low base quality one. But we see that the SNV is actually present at about 50% of the coverage at that location. And beyond that, it's also present on both forward strands and reverse strands, which means we have higher confidence in this SNV being true. Like I was talking about with annotation tracks that can load up, in this case, on the bottom, you can see that an SNP calls annotation track has been loaded up below here, and we see a DB SNP position at this location. So it's a common SNP that people have already currently annotated. So that gives us even more confidence that this must currently exist. An SNV that we would have less confidence in is one over here. The reads are colored based on their orientation. So if they're forward reads, they're red. If they're reverse reads, they're blue. And we see that our SNV, where our A became a C, is only present on these reverse strands over here. So because they're only strand specific, we have less confidence in them. But it is an SNV that's present in the DB SNP calls. So it may or may not be true. Yes. In the previous slide, you said that the SNP was present on about 50% of your reads. Yes. So how would you differentiate between a SNP and a Leelik variation? So what you can do is you can look at other SNPs that might be present along the read itself. Try and see whether they're mutual inclusive or exclusive. Otherwise you would have to try and sequence it using a Leel specific sequencing. So you separate your two Leels and you do you sequence each one individually. Could be able to determine that. But that's a bit difficult to try and figure out from just a visual inspection. So those are just quick metrics on figuring out SNPs or SNPs that might be present. For structural variants, we use or IGV leverages paired end information, mainly because it's the most common type of sequencing we do and it does provide an extra layer of information that we can actually sort through. The alignment coloring that we end up doing to view these structural variants are based on the inferred insert size as well as the pair orientation. Now what I mean by that is as follows. Whenever we're sequencing a long stretch of DNA like Jared mentioned, you can't just sequence the entire thing from start to end in one contiguous manner. So what we do is we share the DNA and we try and target specific fragment lengths. So we get all of our sheer DNA is about a Gaussian distribution that's centered along a specific fragment length. We then take these fragments and we sequence them from either end while having a gap of DNA in the middle that we don't know the sequence of and that becomes our insert size. So this insert size is basically the distance between the forward read and the reverse read that we know came from the same fragment itself. When we align this back to the reference, the average value of this insert size is our inferred, the average distance between these two reads becomes our inferred insert size. So it could be 300 base pairs, 500 base pairs depending on how you did your library. What ends up happening though is because this insert size should be about the same across your genome, when you have structural variants like deletions, insertions, or interchromosomal rearrangements, this distance between the two reads actually gets shifted. It either gets larger or smaller or it becomes enormously different and that helps gives us evidence of one of these structural variants happening. So we're gonna take the case of deletions to try and explain or try and explain how this actually works. So when we have deletion in our subject over here, our sequence of our section of read DNA is actually removed from their genome. So what we end up having is this smaller set of DNA. We share this DNA and we sequence it using paired and sequencing. So we get the same kind of insert size we would expect across all of our library. But when we map this back to the reference genome, each of these pairs actually map across the deletion that had happened. So the insert size that we expected is actually very different from the insert size that we have found when we've mapped back to the reference. Because the inferred insert size is larger than the expected value by a considerable margin, we know that there should have been some kind of structural variant that happened there, in this case, a deletion. What IGP lets you do is you can literally literally color your alignments by the insert size and it'll visually spot out any reads that have insert sizes that are larger than expected, shown over here by those red reads present. What you can also see on the coverage track above is your coverage actually dips across the section and then returns back to normal once you've crossed this deletion. So it's two different ways of visually finding a deletion just by looking at the data that you've loaded up. Insertions on the other hand will give you a smaller insert size than you expected and IGP will color those as blue. And if you have interchromosomal rearrangements, one of your pairs will map to one chromosome, but then the other pair will map to a completely different section. Now, since IGP only lets you view a small window of the genome at any point, the way it tells you information about these interchromosomal rearrangements happening is it'll color the mate of the read pair according to which chromosome it maps to. So for example, in this tumor normal window over here, you can see chromosome one and chromosome six like being viewed at the same time. And you can see that on chromosome one, some of your reads are colored orange. These reads are colored orange because they're indicating that the mate of each of these reads is actually mapping chromosome six. I mean, we jumped to that location in chromosome six. We can see that it's mate is blue, which going back to this previous slide tells us it's map active, chromosome one. This color scheme is already preconfigured in IGP and you can always view it back on the website. Yes. The question regarding the previous point was, so the insertion itself is not just one size, right? What is the normal distribution? The insertion itself is... If I select products of a human DNA, and I select those products, I'd say it would be cleaner or something. Okay. So it's just one size that decides the distribution across that's minus 50 base. Yes. So how does the software distinguish like the size variation based on the normal variation of my insert versus the true deletion of the insertion? So the reason why is it can look across all of your reads themselves, try and gain a standard deviation of what all of your expected insert sizes should be. And then if you have insert sizes that are, say, two, three standard deviations outside of the norm, it'll end up flagging those. So there are parameters that you can adjust on IGP themselves on how strict do you want that coloring scheme to be. But generally it can build statistics because you have so much coverage of your library. Can you use the 10th threshold by the... Yes. If you only have like a couple base pairs or three base pairs, or like say 10, 20 base pairs, it's a bit more difficult to determine that, especially if you have a large variability in your insert size itself. So yeah, that is a caveat that you need, or that is something to take into account when determining which insertions or deletions or what kind you're looking at. Yeah. So you have a lot of your paired ends are overlapping each other. Yes. This doesn't help much either, right? No, so if you do paired end sequencing where your reads end up covering the fragment in more than one run, it's basically like having one very long read. So you don't really have an insert size in that case because there's no distance between your two reads. You do get more certainty in the bases that are being called at those locations. So you get more coverage in SNBs, but structural variance become a bit trickier. But because most of them will be overlapping, if you do get insertions or deletions where it shifts completely, you'll be able to determine that, but then coverage becomes more your workhorse than anything else. So the other thing, aside from just your insert size, is the read pair orientation. Because you're doing paired end sequencing from either end, you expect your two reads to be going towards each other. So if you have inversions, duplications, translocations, or complex rearrangements where multiple things have happened, this paired orientation doesn't necessarily have to be conserved. The orientation when we talk about it for read strand is from left side to right side. That's how we read it. And then the read order is your first read versus your second read in terms of word maps. So to help explain what this actually all means, we're gonna use inversions as our example. So if we have this specific reference genome, and we have this section of the genome from A to B, that's being inverted in our subject. What we end up doing is we'll share the DNA in the subject and we end up using paired end sequencing to read the subject's DNA. When we map this back to the reference, the first read maps normally back to the reference, but our second read now is in a map going towards B. Similarly, if we sequence the end in our subject close to A, one of the reads maps perfectly back to the reference genome while the other one maps all the way back to A. When we view these or link these reads up as pairs, we can see that the orientation of our reads are different from what we expect. They're not going towards each other, they're actually following in the same direction. IGV will color these, the left pair as light blue, the right pair as red, because the left and the right are the ones that are inverted in terms of their locations. And what this helps give you evidence for our duplications that have happened or inversions or any of translocations, those structural variants. It's simply available at color and you color it by pair orientation. And you can see in this example over here that at those breakpoints over here where your coverage is dipped, you can see both forms of that left, left, right, right pair orientation, which indicate in this case that an inversion has happened. The breakpoints are what? The breakpoints are what's where the actual section of DNA has actually been inverted. So because of the way you end up mapping, it's difficult to map across those breakpoints so your coverage ends up falling. But then the coloring of the reads themselves help give you the evidence of why the coverage has fallen because there are different reasons why those actually occur. On the actual IGP website, you can actually see the coloring scheme that they have, accounting for the different kinds of read pair orientations you might see. When you see this RL, the green one on the bottom over here, that's usually the case when you have a tandem duplication happen. So the reads actually get flipped in terms of what order they're actually gonna be mapped to. That covers structural variants. I'm just gonna quickly go over what happens when you try and view long read data. So this is actual data that we use in our lab that's being masked anyways. So IGV will color indels of one or two base pairs as purple. Looking at this data, you can't see anything. You can't see SNVs, you can't see the bases that are called because PacBio itself introduces SNVs at a rate of about 15%. So it's very difficult to try and view these third generation sequences without doing some extra cleanup. Because the nature of your data in third generation sequencing is that you have very, very single nucleotide variants of indels or deletions, you can actually go to IGV and you can tell it to hide indels that are smaller than a certain number of bases because they're outside the scope of what you're focusing on. Simply eliminating small indels helps you be able to view your data in a much cleaner, much quicker manner. Now, beyond that, you still have, yes. So third generation sequencing is the term used for long read sequencers. So PacBio, Oxford Nanopore, so they're given the term of third generation sequencing now. Whereas second generation is the paired end alumnus. Once you actually have all those small indels hidden because your reads do still have some error rates at a certain position, what you can also do is tell IGV to take a consensus at any of these locations, otherwise you're dealing with multiple SNVs at a particular site. Yes. This indel threshold value. So you can take that from the manufacturer because it depends on how frequently that's introduced in the data itself and it is different between what kind of technology you use. People have also done studies and they published how often these happen. And then beyond that, the depth of your sequencing and the kind of reads you get does influence whether you'll still have those indels present. So if you get, in the case of PacBio, if you get their subreads that you extract, you will have this kind of error rate, but if you sequence deeply enough, you get these consensus sequences that have already cleaned up all that information. So in that case, you won't actually view it. So it is very much based on what kind of data you have. But regardless, if you're using these kinds of platforms, if you don't have enough coverage, you do have random errors that spot at a location and it's different between read one, read two, read three, read four. So what you kind of want to do is you want to call a consensus at that base. So all your data supports a specific call. All you do is go to preferences, alignments and call quick consensus and your data is finally clean enough that you can visually inspect any structural variance that might be present. Now, if say you have very complex data and you just want to look at the structural variance present inside of your sequencing, what you can actually use are some online structural variance viewers. Now, a couple of these are called split-threader and ribbon. The advantage of this is all you're doing is you're uploading your results file to visually inspect what kind of structural variance are happening or if you have any patterns that might immediately strike out at you. One of these that was shown at the Hospital for Sick Children was a Chromoplexy event where you have two translocations connecting two genes that have no direct relationship to one another to cause a fusion gene that actually drives a specific cancer type. The disadvantage of these online structural variance viewers, though, is you have to follow the kind of format that they want the data in and how they process. But it is an extra layer of step or extra step where you can visually inspect any of your structural variance without having to load up massive amounts of data. And it's preconfigured to be intuitive and visually strike out any patterns for you. Now, our next thing is we're actually going to get some hands-on experience in the actual IGV tutorial. Does anybody have any questions so far? We're going to practice all of those. You're going to practice everything from SNBs and structural variance. Unfortunately, you're not packed bio because the data is still... There's not enough commonly available data to be able to practice it. But we're going to learn how to play around with IGV and actually view our SNB structural variance, how to color them, how to expand different views, how to load the tracks as well. Yes? Looks like you're using whole genome data? Yes, in this case, it's a whole genome data. How about if you only have exome? So if you have only a whole exome, what you'll see is you'll have coverage only across the genomic regions, where the coverage then disappears in the intronic region or intergenic regions. How long does that work? It still works fairly well. So if you have gene fusions that happen between two genes, you'll see that the reads do map across those two. But you're only focusing on genomic variance then. You're not focusing on, say, intergenomic rearrangements where, say, there might be an unannotated section of the genome that's driving something. So it still works. For RNA sequencing data, you would only see it at the exome levels. So you would see your reads split across each of those exomes, but they would still be connected. Yes? I may have missed it, but IGV is doing the map. No. So when you load up your reads, you're loading up a BAM file. A BAM file is an aligned file. Alignment is a much trickier process. It's a lot more computationally extensive. You use different aligners for that, and I think you cover that in module three, I believe. But IGV is a way of viewing this aligned data or viewing your data after you've aligned it to a reference genome, which is why your reference is important. If you're studying, say, a bacterial genome or a species that isn't already available on IGV, you can actually use IGV to build these reference genomes as well, to be able to view it across any organism that you want. Yes? So IGV is strictly a visualization tool, or you can get metrics like mutation frequency for example. So it is a strictly visualization tool. The only mutationally-looking frequency you can find is, say up here, that little red notch over there. If you actually click on that, it tells you the breakdown of your reference versus your alternate allele. So you get the mutationally-frequency antelope location, but it won't pre-amphilically tell you any of that. IGV is used for visualizing data that, or is visualizing sections of the data that, say, a different caller has called as an SNV or as a structural gradient, because you want to be able to weed out false positives before doing wet lab validations. Yes? What do you look at the fact that the average mammalian genome is too large to be out of it in the photo? Mm-hmm. In other words, if I want to look at something, I have to know where to look. Exactly. Exactly. So use IGV to try and validate different structural variants that are present. So what you would normally do is you would align your reads to a reference genome, and then you would run different callers on it to try and find, say, structural variants that have happened, SNVs, because that's also a very tricky mathematical thing to do. And then once you get a list of specific sites where these have happened, you would then load your data in IGV and then jump to those locations to see whether those calls seem to be true or seem to be false before you end up validating them. Because you might end up going, because mammalian genome is large, you might say you get 1,000 SNVs. And from those 1,000 SNVs, you need to know which ones are true so you can validate them and which ones are false. And that's where IGV comes in. So if you have very small genomes, yeah, you can just visualize your data completely, but you're not going to use it to visualize 3 billion base pairs for sure. Like that's not going to happen. Yes. When you talked about, do you see that's something that's right here or is something that you need to know? So you can do one of two things. You can either download the file itself and then just drag it into IGV and it loads it on the bottom. Or if you go to the file load from server, if the file is available, it'll show it right on the annotation struct. And if you just load it from that, it'll connect to the stored Cosmic file on IGV, on the broad website and just pull that information. It's saving you the trouble of downloading it and then drag and dropping it. That's it. But like I mentioned as well, because you have different versions of Cosmic and DB Snip and all of this, the version that you're using may or may not have specific annotations, so they might be missing. So you need to keep that in mind when you're using any of these annotation tracks.