 Okay, so we really want you to understand how to use genome browsers, and specifically to become sort of experts at doing variant inspection, in particular single nucleotide and structural variants. And then hopefully if there's time, we want to kind of take a step further and show you some of the sort of next generation tools that people are using to do variant analysis. Okay, so this is the organization of the modules that I'm going to be doing with you guys. So the first 20 minutes, I'm going to tell you about visualization tools and genome browsers and hope for the better part of this hour for you guys to actually get your hands dirty using genome browsers. So you're going to look at visualizing single nucleotide polymorphism, structural variants, and then again part three tomorrow if there's time to do interactive variant analysis. Okay, so part one, visualization tools. I always like to start off with the history of genome visualization, just to give us some context of where we are. So this is an image taken from the 1800s. Any guesses of what this is? Sorry, phylogenetic tree. The scientist behind this? No, sorry. Who is the scientist behind this, and what does this represent? Darwin. Darwin. Okay, right. So this is a phylogenetic tree in Darwin's scrapbook, 1900s. X-ray crystallographic structure, who are the people behind this? Was it in fact? Yes. Good answer. Good answer. Okay, so really now we're in the era where we can actually look at base pairs. So we've gone from population level to a high level structure to now looking at single bases. And you guys are the next generation of data visualization experts and Nobel Prize winners in that area. So why is it important to visualize data? Why do we do this and not just leave this up to computers? So the textbook example of this in visualization field is this Anscombe's quartet. So this is a set of four different data sets. They're listed on the left here, their x, y coordinates. And each of these data sets has the exact same mean and the exact same variant. And so if you have a statistical program that ran through this, that just did these basic statistics, you would say that these are the same. But obviously when you plot them, you know, the kinds of patterns that emerge from them are very clear to you by eye. So I want to take this just one step further and teach you a little bit about your pre-attentive processing skills. Okay, so there was a fraction of a second there where it showed a bunch of data points. And so can you guys identify what the LR was? Red dot. Okay, so you've identified it was red, you've identified that it was a dot and you could probably point to the position on the screen just a fraction of a second. So the bioinformaticians in the audience can probably tell you that to be able to program that recognition might take a couple days at least. So the point being that coming back to this question of why do we visualize data, your human visualization system is really a low cost and high performance sense maker that enables you to find patterns in data and certainly in the case of bioinformatics to be able to debug and identify issues with your data. So we've done this a number of different times and that's essentially what you guys are going to be looking at over the next couple days is being able to understand whether or not the computational tools you're using are actually making valid predictions. So now we're going to talk about some visualization tools in genomics. There are a large number of genome browsers available. I count 40, there's probably upwards of 50 now. So which do we use? That really depends on what task you have at hand, the kind and size of the data and whether or not you care about the privacy of the data that you're looking at. So the key genome browsers that we're going to be looking at are the IGV, Integrative Genome Browser for a viewer, and that's out of the Broad Institute. And for full disclosure, I'm going to be a developer on the Supant Genome Browser, but we're also going to be using that as well. So these genome browsers are really great at looking at high throughput sequencing reads, and especially good for looking at genetic variants. They're great for looking at even very large alignment files. So I think Michael's going to be talking about BAM files. So those are read alignments. And those can be stored locally or remotely. So if you're concerned about data privacy, these are great tools to use. You might also want to try these other tools. So these are web-based tools, but web technologies have come a long way, and they're being retrofitted to be able to handle very large data sets in private ones. So UCC Genome Browser and Traxxner, I think you guys may experience work with probably tomorrow. So IGV is a desktop genome browser. It's designed again for high throughput sequencing data. And to be honest, I think it's probably the most fully featured genome browser available. And so if you're looking for the default tool to use, I would start with IGV. Savant, again, is a tool that we've been developing here at U of T. Sorry that this is kind of flaky. It's also a desktop genome browser. But we put a lot of emphasis on helping researchers manually inspect single-nucleotide and structural variants. And so we spent a lot of time thinking about pre-attentive processing and being able to depict variants in a way that you could very easily recognize what's a variant. So what's depicted on the left is a SNP. And what's shown on the right is a structural variant. And you guys are going to become experts in kind of disentangling this information. Genome browsers are not the only way to look at genomic data. There are a number of different tools available. You might have seen Circo. So this is a circular representation where it's great for looking at long-distance relationships. So around the axis of the circle, you could have chromosomes and the connections between them, if you'd like. This hive plot is a newer representation that actually is coming out of the same lab that Circo's did. It's just different ways to visualize genomic data. And again, it comes back to what exactly you'd like to represent. Sorry that it's a little bit flaky. OK, so now we're going to talk about how we visualize single-nucleotide variants. This is just a reference for you for the different kinds of variants that we might find in a genome. So we have single-nucleotide polymorphisms, insertions and deletions that might be smaller, large, and copy number variants. So you can use this as a reference as we go through them. So you guys are going to be covering the techniques in a lot more detail. But this is essentially how single-nucleotide variants are identified. So you have reads that come from high-through sequencing technologies, and they're aligned through some computational technique to a reference genome. And then you basically assess for each position whether or not there's support for an alternative base or not. And you can encode that in an algorithm, but you can also look at the data and make that decision for yourself. There are a number of different metrics that are important when evaluating the validity of a single-nucleotide variance. It includes coverage, so how many reads are aligned to that particular position on the reference genome, how much support is there for the alternate allele versus the reference base, whether or not there are artifacts, so PCR artifacts and strand bias are important, and then various quality metrics. And again, you guys are going to learn about these particular metrics a little bit later, but these are important to know. And given a good genome browser, you're able to inspect these things and make these judgment calls for yourselves. OK, yeah. Sorry if you're going to cross-eyed for that. OK, so with that, the next slide is just talking about lab. So we're going to start with looking at IGV. If you're already familiar with IGV, there is kind of a complementary lab on Sabant. You're essentially doing the same sort of workflows. Within the lab, you'll find online, at least electronically, I've listed different mirrors to different data sets. So the mirror number just corresponds to the row number. So there's mirror 1, 2, and 3. For the people in the fourth row, just use mirror 1. That's fine. And just for tomorrow, I'd just like to kind of get the screens with it all. For tomorrow.