 All right. So again, these slides are freely available for you to use, dispense, with the caveat being that please credit where credit is due. All right, so introduction to cancer genomics. When I was told that this was the session I'd be giving, I was a little confused. That's a pretty broad category. And I know that we all come from a diverse background. Many of us have been in cancer for many years. Many of us are more come from the software side of things. So the way I've broken it down, and this is kind of how I see cancer genomics, is that it kind of falls into three main pillars. There's an aspect of technology, technology development, technology advances that have enabled a lot of the research that's going on now. Then there's the discovery, so applying this technology to tumors and so on. And then finally, the third pillar would be then trying to use this information and use this research to actually make some sort of clinical progress. So what I've done, we have an hour and a half. I think I've picked and chosen what I think are interesting from these three different categories. But feel free, if someone has particular questions or things you want to talk more about, feel free to do it. We don't have to get through all of the slides. Or if you just want me to get through all the slides really quickly, we can finish in 45 minutes. We can do that too. I'll talk to you. I mentioned earlier, my background is primarily in the first and third. I come from a technology background. I've worked with microarrays, next-gen sequencing for a number of years. I've used them for cardiovascular medicine. And now in the last few years I've been applying them here at OSCR and working in the field of cancer. So I'll talk a little bit about applying these. And then I'll finish with some thoughts on applying technology and kind of the state of clinical cancer medicine and using genomics. So here's a bit of an overview of what we'll talk about. With technologies, I guess most of us are here because of next-gen sequencing. And so I'll highlight kind of the key players, some of their capabilities and applications. I'll also talk about some of the downsides of some of these technologies. I'll mention briefly, I'll go into some microarrays only because I know later in the week you guys are going to deal with some microarray data, particularly expression data. So I'll just mention them a little bit. Then we'll talk about some challenges that can cancer genomics research faces, particularly things like heterogeneity, tumor cellularity, purity, ployty, things like that. I'll talk about some highlights over the past couple of years, some major advances in the field. Really, I've picked some projects that I find interesting, hopefully you will as well. And then I'll talk a little bit about kind of where I see the future going, things like single cell sequencing, looking at liquid biopsies, circulating tumor cells, circulating for DNA, and some clinical integration. So if we start looking at kind of DNA sequencing technology for the better part of the last 20, 25 years, nothing really changed. Things we went from using radioactive labelled nucleotides to fluorescently labelled nucleotides, running them on big gels to then automating this process and running them in capillaries. But really nothing changed. The chemistry was the same. And it required these giant factories of these automated capillary based sequencers to sequence things like the human genome. And they could do 200 million bases a day and that's why those projects took a billion dollars in many, many years to just sequence in a single human genome. But then about nine years ago, there was kind of this giant revolution in sequencing technology and people started thinking about new ways to interrogate DNA and generating crazy amounts of data. And I'm sure all of you are familiar with this plot here. It looks at the cost to sequence a megabase of DNA and just plotting this over time. And you can see that around the time where the analysis switched from a Sanger capillary based method to these next gen technologies, the cost has just plummeted. It's sort of plateaued. And if anything, it's gone up a little bit in the past year or so. But you can see that the possibility of sequencing an entire genome just was unimaginable unless you had a giant consortium that was done for the human genome project. But now we routinely sequence hundreds of genomes every month. So there's many advantages of these next gen platforms, obviously. We're not making back libraries or subcloning things and it's just to sequence a single stretch of DNA. The amount of data we're generating is such that it's quantifiable just by simply counting the data that is being generated. I think the biggest advantage is that the technology has gotten so good and so efficient at producing DNA that it's actually adaptable to many different applications. And so if you have a particular interest in biology, if you can translate your signal into a DNA signal, you can then interrogate it with the DNA sequencing technology and generate really high quality and vast amounts of data. We'll touch on a little bit of that later. And then obviously, we're generating tons of data at a reduced cost. So the initial trend of this kind of revolution of sequencing was really like a quote-unquote arms race. The technology providers were trying to produce more data, cheaper, faster, get longer reads. And that's what gave us things like, which I'm sure you're familiar with, 4-5-4 sequencing, a GA or Selexa sequencing, which eventually became Illumina and AB solid. But the bottom line is that all sequencing technologies that in this category rely primarily on the same principle that DNA, genomic DNA is fractionated or sized into smaller fragments and then some sort of adapter is ligated onto the end of that. That adapter can then bind to some sort of surface on the instrument and then whatever chemistry is used can then incorporate a polymerase and extend and read those strands. It's all roughly the same. I guess the biggest player in the market obviously right now is Illumina and I think most of the data that you'll be using in this week will be generated on Illumina high-seq machines. And so this is just kind of a table to go over kind of the current state of the instruments. Actually, it's been a recent chemistry upgrade that these numbers are probably about a month or two old. But we can generate 200 bases of sequencing data in a single run. These machines rely on glass flow cells that are loaded on. There's lanes that takes 10, 11 days to run one of these flow cells in the standard mode. We're generating somewhere in the order of 600 gigabases to now probably closer to a terabase of sequencing data in that time. And then obviously, the company has released other instruments that rely on the same chemistry, the same flow cell-based chemistry, but produce varying amounts of data. So you have things like the my-seq that's probably equivalent to an eighth of a high-seq. The next seek that's probably about, well, similar to running it in the rapid run. It runs a quick run. You generate a fraction of the data, but it can be done quickly. And then you have their high-seq Xs that are producing probably two or three times the amount of data that the normal high-seq is with a caveat is that they're only capable of sequencing human DNA. So quickly, I just thought I'd put up this slide. This is from Illumina. Anyone can visualize it, and I'm sure a lot of you know it, but we have these glass flow cells. I guess I don't have a pointer, but I'll try to do. And on these flow cells, they've hybridized tiny fragments of DNA that are complementary to those adapters that get ligated on to your fragments of interest. And then through a process of clonal amplification and cluster generation, what you produce is a cluster of identical molecules or millions of clusters of identical molecules scattered across the surface of this flow cell. The chemistry that they use is one where fluorescently labeled nucleotides are added to the reaction simultaneously, a polymerase comes and incorporates them, copying the strand that's been hybridized to the glass flow cell. And when that incorporation occurs, the fluorescence is released, and they can visualize literally with a camera the color that is released for all of these individual clusters on this flow cell. And this occurs millions of times at once, and as I mentioned, 200 times for our typical sequencing runs, and it takes about a week to do. So at the end of the day, you have your DNA fragment that has adapters at the end, and you'll sequence 100 bases from one end of it and 100 bases from the other end of it. And the nice part about this, which you'll learn more about later in the week, is that we actually keep the information of this pair together as we do analysis. So we know where these 200 bases of DNA sequence are coming from, and that they're physically connected on the same molecule of DNA. So now there's many applications I mentioned. The simplest and probably the easiest to do is whole genome sequencing, where we literally just take DNA, throw on some adapters, and throw it on the machines. There's no processing required, and actually it requires the least amount of DNA to do. It's kind of counterintuitive that to sequence more, you actually require less DNA. Then there's various forms of targeted sequencing. People look at individual genes, entire exomes, the coding sequence. We can call different types of variation using this data, and then we can also look at other types of molecules, RNA, proteomics. We can look at epigenetic marks. We can look at DNA methylation, for example. We can look at transcription factor binding sites. There's a really great poster. I wish this wasn't so small, where we had a bigger copy of it. It's freely available from Illumina website, and they've basically outlined all of the various applications as of, I think they released this in February, that are possible that you can do with current DNA sequencing technology. So there's things on here that probably most of us have never heard about. People are doing really crazy things, modifying flow cells, binding different things to it. So anyways, I encourage you to go and get a copy of this poster. There's one up, I don't know if people are doing a walk around, but there's one up on the sixth floor, printed on the wall, if you want to take a closer look at it. So when it comes to, I guess, the very basic level of cancer genomics at the DNA level, these are the types of variants that people are generally looking for. This cartoon is just supposed to illustrate the DNA sequencing reads as kind of the orange and blue bars being aligned to some sort of reference genome. And so we look for differences between the reference and our reads to identify single nucleotide variations. We can look for insertions or deletions in the reads in comparison to the reference. We look for the abundance of reads in different regions, and this is the quantify ability, I guess that's a word, of sequencing technology where no reads at a particular region could indicate that you have a homozygous deletion. Half the amount of reads compared to a genome average can indicate a heterozygous deletion. Double the amount of reads could indicate gains or copy number gain. Because of that paired-end information that we have, we can look at one chromosome, another read aligned to another chromosome, which could indicate that there's a translocation there or some sort of structural rearrangement. People are also looking at reads that don't align to humans, so possible contamination or non-human sequence that's in your DNA could be pathogens, viruses, bacteria, etc. So obviously these are just kind of the simple level. I think by the end of this week, you'll go in great detail on the algorithms and the tools and pipelines used to work with all of these types of variation. Now, that's, I guess, the simplest form of sequencing, as I said, was doing it on the whole genome, but not everybody. It's still fairly cost-prohibitive to approach every project with whole genome sequencing, and so there's many methods people use to target specific regions of the genome. And really what it comes down to is the types of questions that you're asking or you're interested in during your study. And so the selection of the target enrichment method is going to depend on how much material you have, how easy it is, how much expertise you have in manipulating nucleic acids, how many samples you want, what kind of coverage you're looking for, what kind of biases you want to avoid, I guess, the size of the target, the number of samples. You want something that's fast, like in a clinical setting, where you can turn around really quickly and get data back in a timely manner, or are you more interested in more complete understanding of something. And so, you know, I guess probably one of the most common applications would be looking at just the entire coding sequence of the genome. The methods are fairly straightforward. We still produce the DNA libraries, as we would for whole genome shotgun sequencing, but then those adaptor pieces of DNA are selected for somehow. People can use an array to capture them, although not as much anymore. People use probes in solution that have biotin tags on them, and you can bind specific parts of the library and pull those out. And this is kind of the reason why it requires more DNA to do this, because you're capturing about 1% of the genome. So you can imagine if you only have, you know, a couple nanograms of DNA by capturing only 1% of it, now you're talking about in the picograms of DNA. And so that's why targeted sequencing generally requires more than less. You know, nonetheless, people still do a lot of the same analyses with exome data versus whole genome data. This just is showing, looking at copy number analysis, literally counting the number of reads we see in probably 1,000 to 10,000 base pair windows. And you can see that major, major rearrangement, major gains and losses, like on chromosome 10, you can identify, even though you're only selecting about 1% of the genome, but really you're just losing the resolution that you would have if you had interrogated the entire genome. So that was for the last, you know, about that five-year stretch. And then about four years ago, people realized that they'd kind of come to a plateau in the capabilities of the chemistry in terms of how much data they can generate. And there's still slight advances, but really kind of the things have slowed down quite a bit. And now the mentality in DNA sequencing is that everyone can do it. So people started producing, you know, smaller machines that are much more cost-effective, geared towards other applications, like a clinical market, for example. So they aren't producing the same amount of data. You don't require a giant compute cluster or a cloud to analyze all the data that comes off of it. But the turnaround times are much faster, and so they kind of geared towards other applications. And so I won't go into too much detail, but I thought I'd highlight kind of the other players, just because I think it's really important to understand where your data comes from, and kind of depending on the technology that you use to generate your data, there's going to be different, you know, different biases, different error modes that are going to be associated with them. So the first one would be this, this ion-torrent data from Life Technologies. So the basis behind this is it's basically a miniaturized pH meter on these wafers, where because of the natural process, when DNA polymerase incorporates another base into an extending molecule, it releases hydrogen ions. And so what they do is, in these tiny little wells on their wafers here, they have a growing DNA strand being measured, and they measure the changes in pH that are occurring as these hydrogen ions are being released. And so there's no spent modified nucleic acids required, it's literally just natural DNA polymerase with natural nucleic acids being incorporated. The chemistry works a little bit like this, so if you have a stretch of DNA that's being replicated, that they flow only a single nucleic acid at a time, so a T molecule developed by G and AC in a known pattern. And then for every base on the template strand, it will incorporate as many bases that there are, so if you have a stretch of a single base, it will incorporate many bases simultaneously. This is supposed to be illustrated if you had a sequence that was TGA, ACTT, as you flow a T, you would see a blip in the T signal, when it incorporated two T's, the signal intensity would be twice as high. Three T's, it would be three times as high, at least that's how it's supposed to work. So I guess the biggest caveat with ion-tora data is there's a weakness for doing stretches of home polymer, so when you have a large stretch of the same base, once you get beyond two or three bases and you get three to five, measuring the difference in signal intensity of hydrogen ion release between three and four and four and six becomes a little more difficult. And then a lot of work is done on algorithm development to clean up that data, but that's kind of the caveat going into if you're doing a project involving ion-tora data is to know that you may think you found all these insertions, but really you're just seeing sequencing errors, as opposed to the aluminum sequencing error, which is a C to T substitution that probably occurs during oxidative damage during the library prep. Another sequencer that kind of is people called third-generation DNA sequencing would be the Pacific Bioscience RS system, and the idea behind this one is that you have a single polymerase replicating a single strand, a single molecule of DNA in one of these ZMW wells that they have on their chip. And so the nice thing about this is that there's no PCR amplification and possible biases due to amplification that you're actually just measuring a molecule in real time. They use labeled nucleotides that produce a signal that they can measure as they're being incorporated in real time. Now obviously if you're measuring, you know, they have a camera or a microscope essentially that's looking at this well, a single polymerase molecule, they're a little more error-prone. The accuracy is probably only about 85%. But the nice thing about this is that because we don't have to amplify anything, they can sequence really long stretches of DNA. And so the average read length of a packed biomachine is probably somewhere around 10 kb right now. And so, you know, the applications, the things you can discover when you have a 10 kb read kind of opens a lot of doors of things that you're not going to be able to find just by sequencing short 100 base pair reads. And so that kind of opens the door. The throughput is much lower. Obviously we're not producing, you know, gigabases of data, more like tens of megabases. And then finally, I kind of brought this one up just because this is kind of the hottest thing going right now would be nanopore technology. And similar to packed bio, they're looking at a single molecule of DNA and they have their membrane with these biological pores that, and they can measure the electro-potential changes across the membrane as molecules of DNA are basically eaten by these pores. And what they found is that different sequence contexts produce different changes in electro-potential. And they are able to translate these blips in potential into DNA sequence or into small words of DNA sequence that they can then bring back together and kind of come up with the sequence that was being sequenced. I guess the downside is that, the number of pores that you can fit on a membrane and read simultaneously isn't very high. I think the current version that they've released it has 500 pores. But the nice thing is that really, there's no limit to the read length. So as long as you have a piece of DNA that gets eaten by a pore, it'll just continue to go through. I think a lot of work and still, people don't really understand the error modes of making sure that the DNA only goes through the pore in one direction and doesn't stop or get up going in the other direction. But these are things that I think will come out in the coming months as more and more groups have these machines and start using them. Does anybody hear? Yeah, so we received our first one about a month ago. We haven't actually sequenced anything of our own yet with it. There's ran their kits and now we've got some lambda DNA they want us to run through and upload to their server to do the base column. But hopefully, we're going to throw some cool stuff at it and just see what it looks like. So I guess the take-home from all of this is that we're really good at generating data. And I think why all of you are here is that what can we do with this data? How do we look at this data? And just kind of to give you a sense of the vast quantity of it for those not familiar with it. When we look at just 1% of the genome, we're talking about thousands of variants in an individual genome where the position differs from the reference sequence. With probably somewhere in 100 to 200 of these, probably being considered to be deleterious to the gene function. This kind of just grows exponentially as we look at more parts of the genome that in a typical whole genome, we're going to see tens of thousands of variants that are different from the reference. And even if we're just talking about cancer and looking at somatic variants where not germline variation, in a typical whole genome, we see 1,000 to 5,000 somatic mutations where probably around 50 of these are actually coding predicted to alter protein function. So you can imagine that if you're looking at hundreds or thousands of patients, you're talking about hundreds of thousands to millions of variants. And so really, I think a huge bottleneck is annotation and interpretation of what we uncover. And I think you're spending a whole day just looking at ways to annotate what's being uncovered. Probably the simplest would be coding variation because we have an idea of transcription and translation and how this can affect protein structure. And there's many tools that look at comparative genomics or structural analysis and how coding variation is predicted to alter function. But then we have this entire, all these other variants that don't occur within coding sequence that, through projects like ENCODE and Transvec and such, our show would also be just as functional. There's been predictions that as much as 90% of the genome has actual function in some regards. So I think that that's kind of where we're at in the state of the art right now is that we're very good at generating data, but we're not so good at understanding all the data that we generate. So quickly, I mentioned I'd talk about microarrays. Most of you are probably not using microarrays or never have and never will, but I thought because there's a session later on in the week, I'd just throw this up quickly. Essentially, a microarray is a glass or film layer that, just like a flow cell in sequencing, has molecules of nucleic acid bound to the surface that kind of stick up like little fingers. And then you can process your DNA, you can put adapters on it, and you can bind to the complementary little fingers on the microarray. And with the incorporation of a fluorophore, and then measure the intensity of the binding of your molecules to this microarray. So the biggest, well, I guess the first type would just be using to detect single nucleotide polymorphisms or SNPs. And the goal would be to find molecules that bind homozygous for an allele, heterozygous and homozygous for a allele. But it was quickly shown that you could take the concert of all of these probes that are on the microarray and similar that we do with whole genome sequencing and look at the signal intensity in windows across the genome. And what can be shown is that you can actually identify things like copy number variation pretty readily using these things designed to interrogate for SNPs. I think what you're going to focus on mainly this week though is gene expression arrays, where the same concept that we use, these DNA bound to a film or a glass slide, is used to pull down molecules of cRNA. So total RNA is taken and produced, or cDNA is produced from total RNA by reverse transcription. It's then labeled, and then you kind of do this in vivo transcription to reproduce what they call cRNA. And then the abundances of the cRNA can be measured based on the amount of it that binds to given probes for given genes in the genome. And I think Paul's going to go into much more detail on this. I just thought I'd mention it quickly. So that given the technology that's available and early on kind of with the development of this technology, people thought, saw that cancer was a field that could benefit from this. And so the idea of these cancer genome projects quickly took off on the idea that one could take a tumor sample and a normal sample from an individual and identify all of the genomic variants in that sample and determine what was only in the tumor versus what was common to the tumor and the host. And the idea being that something that's specific to the tumor must be causing tumor genesis. And so the first example of this came out of WashU in 2009, I think, I believe, where they took a single patient with AML and they sequenced the entire genome of the tumor cells that they identified and the corresponding normal blood cells from this patient. And lo and behold, there was a recurrent mutation in IDH1 that was later shown to be, you know, contribute to tumor genesis through the adduction of the HIV1 pathway. And so this was a great success of cancer genomics. We could take a tumor sample, we could sequence the entire genome, and lo and behold, we found, you know, a variant that's causing the disease. You know, you can imagine that in 2008, I guess, that this was a huge task just to look at a single genome where they had almost 4 million variants using those first generation of somatic variant or DNA variant callers that they had to go through this process, which is really the process that we still use today. We kind of start with a list of variants and kind of slowly filter them down through kind of whatever filters we feel are most appropriate. So we have basically a handful of variants that we feel, you know, we're able to deal with, and that's kind of the field right now. So, you know, taking these 4 million variants, they can filter some of them out based on frequency, others based on the presence in publicly available databases. Then they only wanted to look at coding sequence because, frankly, they understood how to deal with coding variants. We can make some sort of story about a variant in a coding sequence as opposed to, at the time, you know, the ENCODE data wasn't released, so non-coding variants were just appeared to be garbage. We can look at the types of variants, synonymous versus non-synonymous. They can try to validate as many using other methods, PCR, Sanger-based sequencing, some of those, they just couldn't do the technical reasons. It turned out about 150 of them were not real. And in the end, they actually had probably eight validated somatic SNV, coding non-synonymous SNVs, and one of them happened to be in a gene that was interesting. So that, you know, that's kind of the state, you know, a few years ago of how this, and sequencing one tumor got you a New England journal paper. And that kind of continued for the next couple of years. Here's just some of the early studies where people looked at breast cancers, other blood tumors, melanoma, and sequenced to either entire genomes or just looked at exomes in very kind of a modest size cohorts in most cases, and identified recurrent mutations, found things we already knew, and that was kind of the way things went until about probably two years ago or so. And so we did learn, you know, quite a bit, and groups have catalogued all this information. Sorry if this is difficult to see, but this is just trying to show you that these are different tumor types listed along the bottom, and they're kind of color coded based on the number of coding somatic variants that are identified in those whole genomes. And so you have on the left, that this would be colorectal cancer that's micro-satellite and stable, so that, you know, these tumors that have this mutator phenotype, where they just have, you know, hundreds to thousands of somatic variants is kind of off the chart. Then moving to the right, in the light pink here, you have the kind of carcinogen-induced mutations. So we have smoking-induced lung cancers, usually induced melanomas, where there's actually some sort of force-driving mutations there. Then you kind of have the common to most tumor types, kind of have this average 50 to 100 mutations. So, you know, these are probably associated with just getting older, the age-associated mutations, and so on. And then finally, on the far right, you have the pediatric tumor types that tend to have fewer somatic mutations, mostly because they just, you know, these are probably have some sort of germline predisposition to developing a tumor and don't require the same burden or mutational load to actually develop the cancer. So the other thing that people started identifying was looking at oncogenes versus tumor suppressors. And really, what came up was that the pattern of somatic mutations within a gene can really help you determine what the, if that gene is functional or is playing a role in the tumor. And so what you see is that for things like an oncogene, what you generally have is mutations fall in specific hotspots or in specific regions. And so if you start to look at the pile up of missense mutations in, quote unquote, known oncogenes, they tend to only fall in certain locations. Whereas tumor suppressors tend to have mutations spanning the entire coding sequence and are more likely to be truncating mutations than just missense mutations. And now this just makes sense if you just think about the biology. If you want a gene that's gonna be activating or driving a tumor, it's probably gonna be a missense mutation somewhere in the coding sequence, probably at a particular location, altering DNA binding or some sort of receptor binding that's just gonna constitutionally turn it on. Whereas a tumor suppressor, something that's going to, something that completely removes the function of the gene is more likely to cause the tumor. And so famously, Volga's gene has come up with this 2020 rule where if we just kind of look across tumors and if we say that to be called an oncogene, we wanna see that more than 20% of mutations in gene are recurrent at a particular spot in our missense versus tumor suppressors where we wanna see more than 20% of mutations are anywhere in the gene are in activating. So they can use this and we can look at all of the sequences that's been done if we look at over 3,000 tumors that have been kind of, where the data's been publicly made available. You know, this is 300,000 coding variants, but only 125 genes are predicted using this rule to actually be driving cancer. 71 tumors suppressors, 54 oncogenes. Now, I guess kind of the downside to this or kind of the part that's not so exciting is that only 30% of these were novel. Most of this had already been identified through traditional molecular biology, through epidemiology and so on. And so this is not, as much hype was given to next gen sequencing and being able to sequence genomes and kind of uncovering everything, really we're only getting a third more information than what was already known using older techniques. But nonetheless, we've prevailed and we're continuing on. And so now kind of, I think that the search for new driver genes has kind of been exhausted. And I'm sure there'll be a rare tumor type here or there where a new gene will be uncovered. But for the most part, for common tumor types, I think we know the main players. And so now it's starting to investigate other questions, such as kind of tumor heterogeneity and tumor evolution. So this is another paper that came out of that same group that sequenced the first AML tumor. What they did is they actually were able to identify cells kind of that were pre-neoplastic and well as normal germline cells from that same individual and then finally tumor cells. And they can look at the patterns of somatic variation and then kind of plot these over time and show that as the tumor progresses from this pre-neoplastic to tumor, it's acquiring more mutations. The color indicates a particular clone of the cell and you can see that at the myelodysplastic syndrome stage that there was 50% of the tumor was in one clone. And then there was two other clones. One would probably appear to be normal in about a quarter of the cells and another quarter of the cells had a different mutational profile. But you can see that as it progressed, it seemed to be this clone that acquired additional mutations that led to the full-blown cancer in this case. But then even at the stage of tumor, they identified probably five different molecularly distinct clones of tumor based on the somatic variants that they identified. So what was really interesting was that they've carried on and started to look at tracking the total evolution of these tumors right through treatment and into relapse. And so you're gonna see the same graph where you have kind of initial mutations in this that are kind of characteristic of AML, the DMT3, AMPN1, FET3, et cetera. But as different clones acquired more mutations, a different frequency in the full-blown disease, kind of during therapy, so most of these clones were reduced and eradicated. The drugs work, they do kill off what they're intended to. Unfortunately though, in this case, there was a particular clone that stayed alive that wasn't fully eradicated by the disease and then kind of over time acquired additional mutations and then relapse occurred as it continued to expand and grow out. And so that's I think one of the powers of being able to interrogate these tumors really deeply in generating so much data is that we can kind of look at questions like this that was not possible previously without doing some complicated biology. So the take-home is that cancer is a disease of the genome. It's kind of the model is that normal germline cells will progressively acquire somatic mutations. Some of those mutations will be passenger mutations that don't really have any consequence on the fate. Others will provide a selective advantage to those cells as it acquires additional mutations will drive the growth of those cells until you're in kind of a full-blown tumor state. And so the real challenge is identifying kind of, I guess it was the expression, the wheat from the shaft where you wanna identify what are those variants that are providing the advantage to those cells when you have this heterogeneous mixture of cells, some that are gonna have many more variants than others but may not be actually providing advantage to the cell. And so heterogeneity I've heard described is, I mean really this just means chaos and that's kind of where we are in cancer genomics right now is studying chaos. So this comes in four different forms both kind of within the cell, within the tumor itself you can have multiple subclones all with different patterns of mutations. Some of them giving or making more aggressive than others. But probably having some sort of common ancestor at some point you can have different variation between kind of a primary tumor and it's different metastases. So as chunks of tumor fall off and enter circulation and then up and planting other places in the body you might have some cells that are preferentially will end up in a liver, others may in the lung some that may never be able to metastasize. You have kind of between the metastases themselves they can continue to evolve and acquire more mutations. They may even end up branching off and seeding back to the primary tumor. So you can have this kind of anything is possible in cancer genomics as far as the acquiring of mutations the growth advantage that these mutations infer and then where these cells end up in the body. And then finally even between patients with the same tumor type there can be a tremendous amount of variability looking at the landscape of mutations that are present. And so that's kind of the challenges that study and chaos kind of gives us. So another major challenge is sensitivity. And so as most of you are probably aware tumors are not 100% tumor cells. And so when you look at pathologists look and identify tumor cells under the microscope there's also other cell types both normal cells, lymphocytes, lymphoblasts, et cetera that are present in that when we take a piece of tissue and then extract DNA and sequencing it we're extracting DNA from all of these cells. And so you can imagine that looking for a variant in a cell population that is only made up of 50% tumor the variant frequency that we're gonna see is gonna be half as if we're sequencing 100% tumor. Finally you have there's quite a bit of evidence of kind of ploidy issues or the number of genome copies present in tumor cells can be quite dramatically different from normal cells. And so the combinations of having four copies of the genome in 20% of the cells versus four copies in 80% and 20% normal cells you can see that the spectrum of somatic mutations can be quite different just depending on where from that tumor you're sampling. Another downside is dealing with tumor tissues can be the accessibility of the material. We often deal with biopsies or fine needle aspirates of tumor tissue and really we're kind of taking a shot in the dark that we're gonna collect tumor tissue and enough of it to interrogate. These are actual slides from a study we did where we biopsied a number of people and you can see that the amount of tumor that's been highlighted in red can vary quite dramatically from biopsy to biopsy and so this is gonna impact on your ability to do different types of analyses as far as how much material you're gonna get and what proportion of the tumor you're actually interrogating in the sample. Another huge resource is formal and fixed paraffin embedded material. So pretty much it's standard clinical practice now for when tumors are resected or biopsy but for diagnostics they're fixated and embedded. The nice thing is that these blocks kind of are storable at room temperature. There's warehouses around the world of millions of these blocks from dating back to probably many decades and the nucleic acids and the morphology is preserved in these and that's the reason they do it is so that under a microscope a pathologist can look at this and say, yes, this is a tumor material. And so to try and take advantage of this vast resource without trying to sample things prospectively a lot of work has been done on trying to go back and get material from these samples and obviously the fixation process and so on is gonna affect that. Yeah. The problem with those is usually they're not consented for research. And I mean I think that depends on your jurisdiction. So in the US for example, once a patient is deceased their samples are open for whatever you want. You can do whatever you want with them. In Canada the rules are a little different. So yeah, that's a huge caveat. We've had to gone back to families of deceased patients and asked for permission to use their blocks before they're destroyed. So yeah, that's a huge issue. And I think moving forward, a lot of the large academic sites at least are consenting people, all patients for that their material can be used for research purposes down the line. It may not be for a specific project. They may not say exactly what's gonna be done but they're just being consented and banked. And that's why we have these giant biobank initiatives right now all over North America and Europe and the world for them. Yeah, so we're still trying to understand kind of what happens to nucleic acid when it's fixated and what kind of damage occurs and what kind of variants are introduced. And so I think I just wanted to illustrate in this cartoon. You know, if the red and blue lines are supposed to indicate paired and sequencing reads forward and reverse and the little colored dots are variants in these reads compared to aligning them to a reference genome. What we have to do is try to look at kind of the frequency of these variants that are the variants only occurring on one forward read or reverse read and try to tease apart what's real, what's an artifact. And so, you know, often people use things like the quality of the alignment of the reads to filter variants or the frequency of the read. You know, sometimes you have things where reads are misaligning and you get patterns where you have a couple variants really close together in one read that are probably more likely an alignment area and that those reads don't actually belong in that part of the reference. We see lots of biases that a read is only coming from one particular, a variant is only called when we see it in one strand and the strand bias error. We get artifacts from the amplification process and that, you know, when dealing with these tumors that have low tumor content, you know, we'll often see some variants that are at 50, 60% and others that are at 10, 20%. And is this just a fact that we have sequenced only 20% tumor material or are these just some sort of artifact that's come up? And so that's a lot of the issues and I'm sure things are gonna be touched on throughout the week on dealing with quality control of the data and calling variants, et cetera. I just kinda threw this in there because this is kind of a personal interest to me and something that I work on is methods that are used kind of upfront before the sequencing takes place to remove some of these errors and artifacts. And so, you know, this method of isoseq was first described in this PNAS paper a couple years ago where they incorporate random barcodes into the sequencing adapters before they get lagated on and subsequently clonally amplified. And so what happens then is that every read has a unique barcode associated with it. And so if all of your variants are only ever seen associated with one barcode, you can be pretty sure that that's just an amplification error and that you really wanna see variants across the spectrum of barcodes to make that, to feel more strongly that they're real. You know, another thing a lot of groups are working on are ways to enrich for tumor material so that it reduces that problem and then we can look for variants at 50%. So from formal and fixed or fresh frozen material, you can actually look at them under a microscope and cut out regions using a laser of tumor cells and try to remove some of that background normal tissue. You know, it's quite, you can imagine, it's quite time intensive to sit down and look at slides under a microscope, identify regions of tumor and draw them, circle them on the computer screen and have the laser cut them out. And you know, depending on the cellularity of the tumor it may take several slices of tumor to get enough material out to it so that you can do things with it. Other groups and something that I'm a little familiar with is taking fresh material, dissociating these tumors and actually doing flow cytometry to sort out different cells. Now this will rely on things like cell surface markers that can be tagged with antibodies that are fluorescently labeled and you can differentiate your tumor into different components and generate cells of different types that then you can tag it separately. But how often do you get fresh samples? I mean, it depends. I mean, we have a project ongoing right now that we are working with a surgery group that is, you know, the samples are coming right out of the OR. Half of them go to pathology for diagnosis and the other half comes to us that day and they're shipped in media and we dissociate them that day. So, you know, I think kind of as new research projects are starting to realize the benefits of dealing with fresh versus archival material, protocols like that are gonna be incorporated. But yeah, obviously it's, you know, we're not talking about hundreds of samples, we're talking one a week maybe. Yeah. No, that's a really good question. So the question was kind of what's the specificity of when you're using kind of markers of tumor that are, you know, you're using surrogate markers for tumor and so, you know, generally for epithelial tumors we'll use epithelial markers like EPCAM to differentiate tumor between normal. But there's nothing to guarantee that you're not pulling out just normal epithelium or that all of your tumor hasn't, you know, undergone EMT and no longer has those cell surface receptors. And so that's why, I mean, I think in our experience we keep by everything but we and sequence all of the fractions separately and try to look for, you know, are there variants we're seeing in another fraction that are similar to what we're also seeing in tumor. Yeah, but that's, I know in other groups sort based on ploidy. So you can take just nuclei and sort them based on the DNA content of the nuclei. And so if you're assuming that most of your tumor has undergone major kind of chromosomal loss or gain that they'll have more or less nucleic acid than normal cells that will just have two copies of DNA. But again, that's, you can't guarantee that you don't have tumors that are closer to that as well. So I think, you know, these methods are working and there's a lot of literature on it, but they're not by no means 100%. Yeah. Yeah, so, you know, the freezing process is gonna remove a lot of the cell surface receptors. And so it's those samples that you'd have to come up with other methods like looking at, you know, that there's new methods on sorting based on intracellular markers. They're a little more novel. You kind of bathe the cells in these fluorophores that are taken up and you can kind of try to do that. But I think more common is where you just extract nuclei from the cells that are still intact and sort based on things like ploidy. I'm trying to come up with, yeah, you'll get kind of batches of, you know, cells at two and at four and at in between. So it's a way of sampling different populations of cells, you know, with no guarantee that your four end is all tumor or that your two end is all normal. I think we found that it varies by pathologist, that some, you know, we did that, we actually did a blinded study of comparing the pathologist estimate of tumor content versus what we see based on just variant frequency and the sequencing data. And some perform really well and some not so well. You know, the work we did in pancreas cancer, we won't sequence anything that a pathologist says is lower than 20% without doing some sort of enrichment method. And that's just kind of what we've done. You know, sometimes they say it's lower than 20%, we sequence it and it turns out that it's higher than that. But for the most part, when they say lower than 20%, it's usually very low. But yeah, the laser capture method is a way of doing it, but it is quite, you know, you need an expert, someone who has a histology background that can sit in front of a microscope and take a slide that's been marked up by a pathologist and then go and look at other slides from that same sample and try to identify the same regions. And, you know, tumors are not, every slide is not the same, you know, ductile structures may differ and so on. So as you move throughout a tumor, the regions you're capturing are not going to be the same. I'm sorry. Yeah. Yeah. Fresh blood. Yeah, definitely. Do you need to take the tissue or you can take the OCT? For it depends on what you want to do. So for our fresh samples, yeah, they go straight into OCT, they get shipped here right away and they get dissociated the same day. And then we viably freeze the dissociated cells and that way that preserves the kind of the cell surface structure of them so that we can use a sorting method later. For fresh frozen samples, I mean, they don't, they can be frozen right away or for paraffin embedded samples, for example. And then, you know, slides can be, agency slides can be generated and, you know, the laser capture can be done on them. And also the micro tissue or OCT tissue, is there any use for discovery of these samples? Well, I mean, I think it depends on kind of the way you think that things will be used clinically. You know, obviously kind of standard of care, all samples are gonna go into FFPE. And so if you wanna come up with a diagnostic marker or a prognostic marker, it'd be simpler to incorporate it into a kind of the standard of care right now. That might not be possible because your marker may alter or something. And so, you know, you may require a fresh biopsy from a patient that doesn't go into the form when to do something. But yeah, I mean, I think most, most of the biomarker work that's done now, they do some discovery and then try to make sure they validated in FFPE samples so that it'd be easier to incorporate into standard of care. The quality is very high. At least we found so far, the abundance, just the number of cells you get from each fraction can be quite low. And so it can be quite difficult, you know, to, you know, obviously some sort of amplification is needed. And the biases in RNA amplification can be quite dramatic. And so, you know, you tend not to trust the, as much. I think we need, I mean, it's too early to say, really, but we get really high quality RNA. It's, we don't get a lot of it. Does the laser capture you? Say, the laser capture, we haven't done as much RNA work, so I don't know if I can comment quite, but it, abundance would be the same. We don't, we're not getting millions of cells, it's more like thousands. Yeah, that's a good point. I don't know, I'm sure. I mean, they're not going active transcription, so it would probably more degradation than anything else. It's slightly off the topic, but I wanted to know, well, the group, like mixture of population, like tumor cell versus normal cell, will it help in identifying the SNPs based on population, because there's a lot of variation in population, but I don't know how much data is out here to identify that this is the normal SNP in a population. So, will it help that? Yeah, I mean, I think most tumor projects try to sequence a corresponding normal sample from an individual to try and identify tumor versus normal. And so you can imagine if you have different populations from even within a tumor, you can start doing these kind of pairwise or multivariate kind of comparisons where you have this fraction versus normal and this fraction versus this other tumor fraction and try to kind of tease apart what's going on within both the tumor and within the individual. Yeah, sometimes it's not possible because they are not normal tumor samples and normal samples are not available. Yeah, it's true. And I think I'm just doing the, I'm sure I believe Srab's gonna talk about calling somatic variation and some of the work people are doing when you don't have a normal and how you can try to filter out based on publicly available databases or other algorithms. I know there's a lot of work going on there right now. Anything else about? How are we doing on time? Oh, okay. So I guess moving forward, given the challenges of cancer genomics and the development of technology, these large consortium of efforts both international, the ICGC project and through the TCGA, the largely American based effort have gone on to characterize the somatic landscape across many different tumor types. And on the rationales, pretty straightforward, this is a huge problem. And so it's better to coordinate resources rather than having everybody reproduce the same work. If we produce data under kind of standardized uniformed metrics, it makes things that like merging datasets much easier to increase power and so forth. And then getting many more groups around the world, this kind of spreads both the knowledge more readily and accelerates kind of the dissemination of information, having groups from around the world from different countries and different regions participating simultaneously. But you can imagine this is a massive effort that just looking at 50 tumor types where if you have 500 cases and 500 controls, that's 50,000 human genome projects. So the scale is quite enormous. This map is a little dated because there's a couple of projects that are missing from it, but this is just the ICGC map of projects that are currently ongoing. They actually, the meeting just took place last week in Beijing where groups from all of these groups met and kind of talk about their data to share kind of progresses and so forth. And you can see that there's many different tumor types, at least 39 in around 13 countries. And so over 18,000 tumors have been part of this effort. You can go to the website icgc.org for more information on specific projects, specific tumor types. And all of the data is publicly available. The sematic data is publicly available. Germline data requires authorized access too, but it's all available through the ICGC data portal. And you can see that just a week ago is a release, the 16th release of data was made where if you can read that, there's 49 tumor types, how data made available. This includes both DNA sequencing data, RNA sequencing data expression. Okay, so Francis is gonna go over that. There's other resources I'm sure he'll talk about too. This one's out of Sloan Kettering where they've compiled all of this data together, make it freely available, which is great for people that wanna look at it. And so what this has really led to is groups have started to look at all of this different data in two ways. We can look at combining data, kind of multiple types of data for a given tumor type, which I'll touch on a little bit, but they can also look at individual type of data across many different tumor types. And so these are these pan-cancer analysis projects going on. So the first one was outlined led by the TCGA in 2003 where they mostly had exome data on 12 different tumor types, as well as other copy number clinical data expression and so forth. And there's a current ongoing when led primarily by the ICGC, we're looking at whole genome data on thousands of tumor types. And so kind of some of the interesting things that start to come out of this when you start to look at data across multiple tumor types is something that came out of the Sanger, I guess later last year, where they looked at the trinucleotide sequence context of somatic variants across all of these tumors. So they took out that for every given variant, kind of listed as the six variants across the top, they looked at one base upstream and one base downstream. So you have 96 possible trinucleotide context for every given variant type. And they just counted those and normalized them across different tumor types. And then when you have this kind of matrix of 96 by thousands, you then use kind of non-negative matrix factorization to try and identify patterns within these. And so what they quickly identified was, I mean, they came up with 21 patterns of somatic variants in the 3000 or so tumors that they looked at and they could relate these patterns to different exposures as they call them. And so if you look at kind of their signatures or patterns, the first one or signature 1A and 1B are associated with age-related cancer, so the age of onset of the disease. So this mutations are probably kind of the, this is probably your typical cancer, progressively acquiring mutations over time until a driver mutation eventually happens and gives rise to the disease. But you can see that there's other patterns, corresponding to other exposures, smoking and lung cancer, UV melanomas, these DNA mismatch repair pathways, these kind of mutator phenotypes, people that have BRCA one or two germline mutations tend to have kind of this flat pattern and there's a lot of indels and so on. But despite that, half of these, so they have no idea what's causing it. And maybe it's just a purely, it's a mathematical thing that they can identify a pattern that's not real, but given that about half of them do have actual exposures, it begs the question, well, what's causing these other types? I think that's kind of the first big thing that came out of, it was pretty straightforward, combining all of these somatic SMV data from many tumor types. Other groups, this is actually another group, but the Sanger started comparing both copy number and structural variant data from a single tumor type, for example. And what they noticed is that the copy number is shown as these black dots between one and three here, and this is actually from a SNP array data. But when they combine the break points of the copy number data, they see that there's actually evidence for structural rearrangements and kind of head to tail and tail to tail inversions at these same break points. And what they then do is they go on and show using simulation studies that in order for this pattern of structural rearrangements to occur, it's impossible, or it's statistically, significantly unlikely to have this pattern of copy number where you really only see kind of this two-step jumping as you move along the chromosome, that if this was to happen in like a sequential manner of a genome doubling event followed by an inversion followed by some sort of deletion and so on, the variation in copy number would be actually quite more extensive. And so what they've proposed is this kind of catastrophic model of chromosome shattering where at a single point in time, some sort of event causes the chromosome to break apart and then kind of using the cells natural methods of repair, you get this kind of non-homologous end joining of these fragments, kind of just willy-nilly across the genome, some regions will become lost, others may become amplified and provided that the necessary components of the chromosome are still there, they probably have to still be some sort of centromeric region that this will continue to propagate in and could actually provide a selective advantage to the cell. So it kind of begs the question that maybe the model of where you kind of a mutation rate is fixed in a given cell type and kind of over time as you progressively acquire more mutations until you finally hit the mutation that gives rise to full-blown cancer, may not always be the case that perhaps you could have kind of this progressive acquiring of mutations over time, but then maybe in some cases you have this catastrophic event that might be the final kind of blow that drives that cell over the edge. And so it's something that punctuated evolution is the term that we use now to describe this where you might have this progressive evolution of the tumor until at one point in time you just all of a sudden explode. Yeah? Well, so the question was how do you find if this is kind of the major driving force behind a tumor? And I think that's kind of the work that is needed still is to kind of functionally validate that this is what's causing it. So you definitely don't see this in all tumor types and even within a tumor type you don't see it in every sample. You can validate that this has happened. You can use fish and sky to actually look at the chromosome structure and see that, yes, you do have these chromosomes that have been put back together, but to functionally demonstrate that that is the way, I think it still kind of remains to be seen. When people used to do like radiation hybrid maps of genomes kind of pre-sequencing, this is kind of one of the things they did was they had a chromosome of interest and they would apply radiation to shatter it to see what necessary components have to be present in the cell for that cell to continue to live. And so you can imagine developing models where you would kind of artificially introduce this and to see if that's gonna derive tumor genesis in some sort of animal or vitro animal model. But I think right now it's kind of a purely speculative, well, I mean, we can identify that the chromatopsis from high throughput sequencing data, but to actually say that this is what's causing a given tumor is kind of requires some further work. Oh, I mean, so we look for, it's quite straightforward. You look at that, you could just combine the copy number data and the structural data. So you're gonna talk about later in the week calling copy number variation, calling segments of copy number, calling structural rearrangements and so on. And then it's, so actually in this paper that they published, they actually come up with a number of breakpoints versus number of copy number states metric. And they look for that across the different chromosomes and if it's above a certain threshold, I forget what the number is, they call that a candidate for chromatopsis or chromoplexia or major catastrophic event. And then you can then, it involves then almost like the manual curation of it. I mean, there's still a lot of that involved looking to see that, oh yes, we do see this kind of distinctive two state copy number pattern with these massive rearrangements associated with it. And that's kind of the state of how you do that. And then of course, you could then, if you have material left, you could go and validate it using fish or other cytogenetic methods to confirm that it is the case. All right, how are we doing on time? So another interesting aspect is talking about the kind of the heterogeneity of tumors and what groups have done now is taking multiple samples from the same tumor. In this case, this is a renal carcinoma that this group cut into six different pieces and able to take cells from each of those six pieces and profile them with next gen sequencing. And what they found was distinct copy number profiles even within individual cells from the different geographical locations of the same tumor. And from these profiles, you can then do some hierarchical clustering to show the cells that are related to each other and those that are more diverse and come up with kind of an evolution of tumor progression within a sample. This is, in this case, as I mentioned earlier, so they floated these cells based on ploidy and you can see that you have some populations of cells where you have a small fraction kind of at different ploidies, others where you have fractions that are a much higher ploidy and so on. And then with the advances in the technology, we can take a single cell and extract the DNA and sequence that single cell to identify copy number. The single base resolution from single cell is probably still a little ways off, but definitely looking at bins of recounts across large regions of a genome is quite possible from even a single cell. And so if we can do this from tumor individual cells, the idea is that, well, tumors are constantly shedding material into the bloodstream so we could probably capture that material and interrogate that as well. And so lots of work is being done on methods for capturing and circulating tumor cells that are flowing that can then be sequenced or analyzed some way, but also sometimes when tumors are rupturing that they shed just tumor DNA. And so just by isolating DNA from patients, you can then perform analyses or sequence just kind of the bulk DNA sample you get out of someone's serum and then identify somatic variations. And so there's obviously a lot of applications that people have thought about with this. Using this as early detection methods, if perhaps even before a tumor is visible on MRI, for example, it might be shedding material that could be screened for circulating, or to track resistance or progression, minimal residual disease. And leukemias, for example, could possibly be tracked using by isolating this material from circulating system. So that brings us to the clinical applications of cancer genomics. And I guess you have to start by thinking with the current treatment paradigm, which is mostly, for the most part, based on kind of physical location of a tumor type and then followed by histopathology. There are several molecular tests that are part of standard of care right now and are conducted in the clinical molecular labs. But for the most part, these are single variant analyses, like a BRAF mutation in melanoma or KRAS and colorectal, or looking at known driver fusion genes in leukemias, for example. And so that's kind of the state. And the goal, I think, with all of the work and the reason we're here is to eventually incorporate all of this data that we're generating to try and change some of the standard of care and incorporate genomics into routine practice. And so we get these N of 1 studies that are published from time to time that kind of keep fueling the justification of why we're doing this. So this example of an extreme outlier on the Everlimas clinical trial that had this profound response and a two-year follow-up of two leaders of progression-free survival. And so they sequenced the genome of that patient, and that the majority of patients on that trial did not have that response. And they identified a variant in TSC1. They applied their filtering magic to get down from the tens of thousands of variants they identified to one in a gene that was interesting. And then if you go back and look at that gene and other patients on the trial, you're sure enough other patients that had milder responses also had variants in that TSC pathway compared to those that had wild type for the gene had the worst event. So definitely there's a role of this drug response that is definitively influenced by the tumor genome. So the other thing that comes up is that as we look at the prevalence of somatic mutations across many different tumor types, we see that while there are certain mutations that are more common in particular cancers, BRAF-V600E melanoma, for example, is very common. But we also see those same variants across other tumor types. And so the idea could be, well, why are we diagnosing tumors based on their histology or where they're located when we could just try to come up with a molecular characterization of the tumor instead? Now, the caveat being is that BRAF colorectal cancers do not respond to BRAF inhibitors like BRAF melanomas do. But the idea is that maybe other drugs or other targeted agents might have the same effect. And with this current drug development cycle, there's a number of new agents and developing agents that are targeted for specific somatic alterations, both at the mutation level. Like I mentioned, a lot of the tyrosis kinase inhibitors are targeted specifically to particular driver mutations or carriers or particular driver mutations. But there's a lot of these antibodies for people that contain different translocations, amplifications, et cetera. And so the idea is that if we can profile the genomes of these tumors and more and more drugs come on available, there'll be more and more opportunity to treat based on the molecular profile rather than just simply on the histology. And so this is an example of a clinical workflow. This was a project that we undertook here where we are not a clinical diagnostic lab. We are a research sequencing center. But we wanted to pair up with the clinical lab and perform DNA sequencing in such a way where a patient could come into a hospital. They would undergo some sort of biopsy. It would be reviewed by a pathologist to confirm the diagnosis. And then that sample would be transferred to the diagnostic lab as part of standard of care. Blood and tumor DNA were extracted. Some genotyping was done in their lab. But we received a sample in our lab and were able to sequence it, identify somatic variants. These variants, obviously, from a research lab cannot be reported back directly to the patient. So we had a validation process within the clinical lab from the sample that never left the clinical lab. And then this is kind of the process that most groups are undertaking. There's tumor boards or panels meet to talk about on a case-by-case basis. And a report is generated and transferred back to the clinician. And so this is kind of the state. I think a lot of the clinical sequencing efforts, while the two research and diagnostic lab are becoming fused into one, the outline of kind of the process remains about the same. And so kind of the sad part of this was that really we only identified variants that we determined to be in interesting genes and about 30% of the patients that we looked at. Now we were only looking at single nucleotide somatic variants. And so obviously if we started looking at copy number and structural variants, that number would increase. But then of those ones that we found were in interesting genes, only half of them were actually considered to be interesting clinically. So we're talking about 15% of patients had something that we felt was worthwhile to report back to the patient. And then whether that information was actually used is probably another 50% of those was actually used clinically. But nonetheless, this effort moves forward. Other groups, such as Aerosyn-Ion's group at Michigan, is doing similar efforts. They're looking at a broader spectrum of mutations and a bigger fraction of the genome and looking at DNA and RNA to try and kind of create kind of a global landscape of somatic alterations for a patient. They meet with a board that then tries to report back the results within about a month window. And so that's kind of the current paradigm in clinical cancer sequencing. But it's pretty simple to imagine a scenario where every sample that comes into a hospital, a blood sample, a piece of the tumor is taken, DNA and RNA is extracted, undergoes some broad characterization of the sample. These analyses, or these profiles, are put into complex algorithms of analysis that are built on top of clinical decision support that's based on clinical knowledge that will continue to feed itself as we get a response data for these patients. The system will feed itself and as it grows with more and more samples and more and more data, and we'll get more and more clinical data that eventually will have this system that could in theory be used to change standard of care in cancer medicine. But this is probably a very daydreaming right now. But I think the tools all exist and it's just a matter of getting the patients and doing a coordinated effort to do this. Then you must think of things like privacy and security. People that don't wanna have their genetic information freely available, they wanna make sure that it stays within their medical records and it's secure, that it's not going to hurt them. Obviously when you're looking at the genome of a patient, that there's gonna be other things that are unrelated to the question of why you're looking at it in the first place that come up, germline predisposition to other diseases could be identified and some patients wanna know about that information, others don't and do we have an obligation to report these incidental information back to the data identified? And obviously just looking at this vast amount of data is gonna require somebody that can be done on a laptop within a clinical lab and require the bioinformatics support, the compute resources to handle all of this if this ever becomes something that's done for every patient. So I guess I'll close. I think I've covered quite a bit of things. I know there's some things I've probably missed or didn't cover as well, but feel free to ask anything that you want. But basically the technology's there, we're generating tons of data. I think as you're gonna hear throughout the week, improvement is needed across the board. Everything from the human reference genome that we currently use to align things to and make calls from needs to be improved. And a recent iteration was just released that has improved a lot of these things. The algorithms that are being developed to differentiate somatic versus germline are constantly being improved. We went from a day where we were using the same algorithms to call variants across in diabetes and heart disease to now we actually have cancer specific algorithms. But obviously the hindrances I mentioned with tumor genomes specifically, looking at heterogeneity, cellularity, and so on are gonna affect how you can call variants. And then finally, as we call more and more of these things, the functional annotation of these variants is gonna become key and being able to identify drivers versus passenger mutations or mutations that are giving rise to the tumor earlier on or later on or the timing of mutations is gonna become important. And then if this is ever to go into a clinical setting to translate all that information in some sort of automated way so that we don't have to constantly meet with groups of people and talk about things on a case-by-case and go over thousands of variants. And so with that, I believe we're on break now. So I think the next session starts at 11 o'clock.