 Good morning, everyone. My name is Trevor Pugh. I'm from the Princess Margaret Cancer Center in Toronto. I'm sort of tasked with sort of the flyover of cancer genomics in general. So basically I'm going to try to strip this right back to absolute basics. What is the cell? What's DNA? And then move pretty graphically to next generation sequencing and how we use these techniques to understand the cancer genome. So I am a PI at Princess Margaret. I run a cancer genomics lab really focused on translation and application of next-gen sequencing. So really trying to not just read out the genome but also make sense of it, interpret it, and link a lot of what we see in the cancer genome across cancers to treatment. So the way I've structured this talk is really what is cancer? What is DNA? Moving into next-gen sequencing. And then the last, probably third of the talk, is basically a published case report. You can read the paper, but basically we'll go through this basically end of one example how a group in British Columbia at the time really went through using whole genome sequencing and RNA sequencing to make a treatment decision and what happened that patient over time. I guess a little bit about myself. I'm from Vancouver originally. I did my undergrad in PhD at the BC Cancer Agency. My postdoc at the Broad Institute in Dana-Farber really much more in computational biology space. I did a clinical laboratory fellowship while I was in Boston as well. And coming to Princess Margaret, I sort of sit between those two worlds. Part of it is in the traditional research space. The other part really being in thinking about how to apply genomics in the pathology department and in the translational genomics lab. So the whole point of this module is to understand how cancer genomes are different from normal genomes. Understand how we can use bioinformatics to measure those differences, to measure those differences. Thinking about different types of genomic assays and how they can read out and form different types of genome variation. And as I alluded to, how we can take what is essentially a list of differences and assign those clinical relevance and actionability. So I said that I was going to start really simple. We are all made of cells. Cancer arises from a cell. If you have a cell, you can have a cancer of that cell. And really, this is sort of the guiding principle really where cancer starts. Regardless of the cancer type, regardless of the original cell type, the common theme is that cancer actually starts to reuse and reuse more or less the same molecular pathways regardless of what cell is derived from. All cells have DNA, or most cells have DNA. I just want to show the most personal slide which is my human genome. This is what one of my blood cells looks like if you drop it onto a slide. So these are my chromosomes. If you drop a slide or a cell onto a slide, you sort of get the smatter of chromosomes. And in the old-fashioned days, you take a picture like this and literally with scissors and tape, make a paste up that looks like this, where you basically put the chromosomes by eye, side by side. You read these banding patterns and look for differences. Again, by eye down microscope, looking for banding patterns that happen to be different. I wanted to show this generally as a benchmark. This is where normal cells are coming from. This is where cancer cells are coming from as well. A minor change or sometimes a major change in these genomes can result in a shift from a normal healthy genome here to a cancer genome. So here's a carry type from a brain cancer. You can just tell by eye it's completely different. There are way more chromosomes there than there should be. Some chromosomes are missing parts. Pieces of one chromosome are being attached to another chromosome. This is really where next generation sequencing shines, because now we can start to read out all these different types of cancer genome variation, not just at the microscopic level, but really at the molecular level. This is actually a quote from my post-doc supervisor, a mathematician. He left this example of really all happy cells, or like each unhappy cell is actually unhappy in its own way. So this is actually a plot from Circos.ca. This is actually a Martin Chvinsky's site from the BC Cancer Agency. This is actually not showing a cancer genome. This is actually showing all the ways that a genome can or could be modified. And these arcs are actually showing the regions of the genome that are related to each other. They have a similar sequence. If you zoom in here just sometimes one little segment, each of these tracks refers to one of the different ways that cancer genome or genome can be different from another. Point mutations are the most famous single base pair changes. Copy number alterations really wholesale changes of thousands, 100,000s, or millions of bases of the genome being gained or lost. Structure rearrangements taking a piece from one chromosome attaching it to another. Of course, within all of these are the genes and the elements that regulate them, regulatory elements, pathways of those genes in code, and there's a lot of redundancy in the human genome. Really these homologous sequences, these epigenetic modifications, really layer upon layer of regulatory control on top of the DNA sequence itself. The other lesson with cancer is you just forget this boom of massive aneuploidy or copy number or mutational changes at once, but rather cancer is actually thought to accumulate mutations over time. And really it's not until the acquisition of a driver mutation where really you start, where things really start to take off, where you start to get this highly unstable genome, highly unstable cell, that then is prone to acquiring additional mutations such as copy number alterations, as some of those actually being driven by a therapy itself. And really this model actually is a great paper by Mike Stratton, really is the number of mutations accumulate over time until you acquire what are, have now been termed drivers. And really one of the, still one of the central challenges in cancer gene research is to differentiate a driver mutation from the inevitable passenger mutation set. Accumulate, accumulate over time but are not thought to have biological function. Yeah, I mentioned external forces like drug treatment can select for specific clones. I just have a slide later on about this idea of clonal selection within a complex cell population. But before I got there, I wanted to talk a little bit about mutation burden, this idea of the number of mutations in each cell. And really in this analysis here from the Broad Institute, just looking at the number of mutations across all cancer types, across all types, and the general trend being that pediatric cancer is a hematological malignancies being relatively low mutation rate. The cancer is associated with environmental exposure, smoking, sun exposure have the higher mutation rate. I'm not so sure about hamburgers, but in general higher mutation rates on right hand side of the slide. But really it's not a hard and fast rule. At the population level, yes, these cancer types have low mutation, have low mutation rates. But I've sort of highlighted two sort of anecdotal exceptions to the rule. Naroblastomas is actually from work. I did a postdoc there, this sort of mysterious case with an amazing number of mutations, actually just as many mutations as a late stage lung cancer. This was actually a child with neuroblastoma, in this case they were the only patient with two hits in the DNA repair gene. They had a mutation of one copy of the MLH1 and a deletion of the other copy. This cancer cell had lost its ability to repair its DNA. So in this case we had a nice molecular reason for what would be expected to be a low mutation rate cancer to have a very, very high mutation rate. The exact opposite story here in melanoma, this is actually the only melanoma in that study that did not have some exposure. It's actually a tumor on the bottom of a patient's foot, just extraordinary bad luck. They had cancer of the same cell of origin as other traditional melanomas. But again, very different optimal likelihood level. Even though down the microscope the cells looked like melanoma. It was really different at a fundamental level. It was very famous slides been updated actually several times. Really this idea of hallmarks of cancer. Really the idea to take these biological themes or biological processes that are active in cancer and assign them into relatively few distinct characteristics. So this circle has actually grown quite a bit since the original Hanna-Hann-Meleinberg hallmarks of cancer paper. But really this idea of tumor cells acquiring abnormal activities using pathways that are already active in normal cells. Really this idea of cancer being dysregulation of normal cell processes and often turning on pathways in a cell that they otherwise should not be active in driving sustained flip through the signaling, evading growth suppressors, et cetera, et cetera. Really this idea of just reusing pathways that already exist. This idea that oncogenic somatic mutations target a core set of these biological functions. You'll hear me use a couple of terms. This idea of driver mutations or oncogenes. These are really thought to be sort of the activating mutations. Really the mutations, cup number alterations, rearrangements that drive or activate pathways that shouldn't be expressed. And the other category being tumor suppressor genes. These are sort of the cells' breaks. These are trying to be turning off pathways. And these are the genes that are commonly deleted or have loss of function mutations. And the whole point connection sequencing is you will read out all of these classes of cancer genome variation that then result in a cancer thing to take. And this is really still a challenge, especially in these tumors like melanoma, lung cancer. These tumors have thousands of mutations. And the real challenge is to hone in on specifically what are the mutations that drive these tumors? Tumors and to treat patients, what can we do about those? How do we actually act upon knowledge of activated drivers and loss of tumor suppressors? So that sort of brings me to sort of the area I like to work on on a daily basis which is how do we use targeted therapies, immunotherapies to bring those to bear on specific tumor types. I've shown two of the most famous examples. In this case is our example of lung adenocarcinoma with an activating mutation of the epidermal growth factor receptor. So a mutation, a receptor very high up on a signaling pathway that activates the entire pathway. So the canonical personalized medicine, principle has really been this idea of that inhibiting this pathway shrinks tumors that have these precise mutations. Same sort of story here, metastatic melanoma with activating mutations, a BRAF you can just see by eye. Really this poor patient here has just covered in metastatic melanoma and really these very, very dramatic responses specifically with a BRAF inhibitor that really hits a lot of those tumors over time. It really hits all of those tumors in a targeted way. The real challenge here is that resistance to these inhibitors inevitably arises. In almost every case, especially with these targeted therapies, you'll often see these dramatic responses but then some sort of resistance mechanism will arise because these tumors have such plastic genomes. There's either a subclone that already has that resistance mechanism or you've really selected using drugs for a specific cell that has resistance phenotype. This idea of linking drugs to genomes is really is still a very, very active area of research. This is a paper where Chris Sanders group took a basically all the tumor genomes in the cancer genome atlas at the time, picked out relatively few genes that have been linked to pathways. And what they found is that you really don't want to look at these genes in a single way. 12 of those genes were actually being hit not just by point mutation, not just by cough and derogations. Really, this idea of genes being hit in multiple different ways using multiple different types of cancer genome variation but having the same phenotypic activation or loss of a tumor suppressor or a cancer driver. The other big lesson from this study, so the way to read this, each column is a cancer type, each row is a gene and they draw on a little pathway diagram down here to sort of group these genes into themes. So the lessons from the study, lesson number one was drug alterations cut across cancer types. In this case, FGFR activation happens in bladder and two other cancers as well. So really a study of one drug, one cancer type really starting to fall away as you get this high resolution view of the cancer genome. Combination therapies may be effective and actually, this is no longer May, they are effective in tumors with compound pathway alterations. So in this case, these cancer types that have activation of both the PIC3CAAKT pathway and the CDK pathway really these idea coincident mutations hitting two pathways at the same time. So it's potential for combination clinical trials. And the other lesson that at least half of the tumors had at least two disruptive drugable pathways. So they recall in this paper was really for let's think about combination therapies rather than trying to hit these pathways one by one or we do inevitably see resistance to single agents. The other compounding factor not only are these tumors hit or these tumor cells hit in multiple ways, they then spin off progeny and these progeny then go on to evolve in the wrong way. And that should result in tumor masses like this where you have a blue cancer cell has then perhaps given rise to a red or to a red subclone and a yellow subclone. You can actually see these microscopy, maybe not so great here with the light but really what's happened here is you now have cells that inherit the cancer genome alterations from their parental cells that then go on to acquire their own resistance mutations or their own oncogenic drivers. So even if you have a therapy that's very effective against the blue the blue clone there's potential that the red and yellow clone may still persist and there's a need to monitor and overcome overcome the drivers that are active in those subclones as well. Actually a nice third diagram of this published six years ago now is really this idea of starting with your tumor clone, the acquisition of multiple clones over time so each of these clones is defined by different cancer genome alterations. You might have a yellow mutation here, you might have an orange cough number alteration here, you might have a rearrangement here in purple. The idea that then you have a chemotherapy acting on all of these clones at once and the risk that a resistant clone may be somewhere in that larger cellular population that they may then persist and then have a space into which can then grow and acquire additional cancer genome alterations. So the real challenge here to not just look at cancer genomes at one point in time but really having the ability to sample multiple tumor samples of the actual tumor tissue or I'll do to later a little bit of work in cell-free DNA sequencing really trying to read these subclonal populations and resistance mechanisms out more routinely rather than trying to rely on diagnostic material from four or five years prior. The other challenge is these tumors will also change and evolve as they move throughout the body as they metastasize. So in this case some work by Peter Campbell in the UK really starting with a primary tumor and then seeing metastatic masses that then just like in clonal evolution start to acquire additional cancer genome alterations. We have this primary tumor spins out in second metastasis spins out of tertiary metastasis and this is really where clinical interpretation is extraordinarily challenging. First of all how do we sample all of these metastases cell-free DNA maybe one way to get there but also what do we do about them? Do we have to treat this met or is there some clonal truncal mutation that you could really hit that could blow away all of these met? Certainly in the BRAF mutation example I showed you that was very effective you do see the dramatic responses but then you'll also see these dramatic resistances as well. So why do next generation sequencing or why do molecular testing for cancer in general? Obviously treatment you want to see what are the genome alterations what targeted therapies may work. Are you already resistant to a drug? Are there specific drugs that you cannot metabolize? Drug resistance and metabolism. I'll talk a little bit later on about an inherited hereditary cancer syndrome is really where you're born already with a loss of function mutation of a tumor suppressor really with a very very high predisposition to cancer. These are very early onset cancers in these patients specifically and prognosis what type of cancer do you have? Even if there's not a goal to assign patient to a targeted therapy what we've seen with increasing number of tumors being sequenced or analyzed is that there are multiple molecular subtypes even within breast cancer lung cancer brain cancer virtually all cancers there are distinct molecular subtypes some of which do better clinically than others. So I'll just pause here quickly I don't know if anyone has any questions about sort of the overview of cancer genomics in general before I dive into a little more of the I guess end of one single patient analysis. Okay so the first part of the talk is sort of that overview here's what the landscape of cancer genomics looks like that's great for a single patient the question is not so much what are what are other cancers like but what are targets in my cancer how is it different from other cancers in sequence mutation polymorphisms all things I just talked about structure copy number alteration transopations how is it functionally different even if the genes are altered what genes are expressed what genes are disrupted and are there external forces are there is a viral infection is this cancer associated with a viral infection how is that cancer different from other cancers of this type and not only what is the last one of the targets but what can actually be done about them. So this is the next part of the talk probably a little more technical getting a little closer to bioinformatics theme how do we apply next generation DNA sequencing to cancer specifically so I've grossly simplified this but this is really how the four critical steps for genomics in general happen you extract genomic DNA and RNA from a cell biopsy blood cell-free DNA some sort of extraction you make a library a library is essentially DNA that's been made compatible with a machine that can read DNA so basically you take this DNA in the case of luminous sequencing which I'm almost going to be showing exclusively you put little adapters on either end that are unique to that machine you then load that DNA onto the machine like the luminous sequencing device that generates a huge text files of the ac's, t's and g's and this is really where things are much less standardized sort of the more and more custom you get the less standardized it becomes the alignment is very standardized and as you get down to how to interpret a genome that's really more lab specific but essentially those are really the four steps extract DNA make libraries put them on a machine and analyze the data so I analyzed I mentioned a big text file this is literally a I run from a project that I was running essentially each of these is a DNA sequencing reads 75 base pairs and this is 25 reads on a single luminous high sequence of conventional machine one lane of an eight lane flow cell generates 600 million of these so you get millions millions of ac's t's and g's and the first step is to compare them to something what do any of these mean no one wants to look through 600 and millions of these all at once that's where this idea of a bioinformatics pipeline comes in it's really exactly like an oil pipeline you want to have your text file at the beginning you want to run it through a series of software of software modules it came to serve a pipe module but essentially you want to run all these software programs in a systematic way and to end to go from raw DNA sequencing a rod DNA sequence to a list of variants at the end and building one of these also very much like a like an oil pipeline you want to really run through a series of software that are all compatible with each other you want these softwares to really link up and end to end without leaks and you want to put them together in really a systematic way and I'll have a slide based on each of these steps really the idea of taking the first sort of that text file align it to something some sort of reference in this case we use a human genome reference sequence anyone can download this off the internet I of course have a copy of my laptop and you probably will by the end of this workshop as well pre-processing really sort of cleaning up that alignment in areas that are very difficult to align I'll show you an example of that QC metrics did all your reads align to human did you do a human experiment do you have a bunch of other sequences that are not necessarily human did the experiment work does it look like the cancer that you are studying variant calling calling out mutations transformations copy number alterations and I've just put in slightly basically in gray down at the bottom how do we interpret these variants and how do we write clinical report to communicate this analysis to clinicians who may act on it really the go-to place for almost every pipeline certainly I've worked on is some adapt some modification of the genome analysis toolkit best practices and this is sort of basically a more technical version of the site I just showed you really the going from raw reads going to it through an alignment going through some cleanup of that alignment some variant calling and then basically tying up those variant calls so this is actually described in great detail in the probe institute's website another great tool just to bring all your attention to is the Picard suite of tools which is really a QC a collection of QC tools that really look at the output of this pipeline and allow you to compare it across runs this best practices workflow actually came out of a lot of the non-cancer work being done really across the world and for cancer gnomes we've inevitably had to modify this and bring in our own cancer specific collars some very challenging to take tools that were written specifically for germ nine germline analysis and apply those specifically to cancer I just listed some of the tools here for doing variant calling specifically for cancer and then also the types of cancer genome variation that we're looking at in cancer that may not be certainly as frequent in germline or or normal cell genomes so this is really step one this basically if you took that big text file I showed earlier aligned all those reads the human genome references is really what you get this is a software called integrative genomic viewer I think there's a little tutorial session specifically on how to use this tool I'll show you a lot of screenshots of how this works or how to read these screenshots but just to orient you the chromosome is here along the top so just like the chromosomes I showed you earlier they've taken that little banded karyogram and put it horizontally along the top so we're looking at this portion of this chromosome in this case we're only looking at 92 base pairs at a time in white is the human genome reference sequence and then gray is a histogram of a number of reads aligned to each base these are all the sequencing reads this is the data I showed you on that slide earlier and you've already see that data represented in this way is a lot a lot more human readable than it was just in the flat text format first of all these reads have more or less the same sequence I've intentionally chosen a region that doesn't have any verification so in this case all these regions basically match perfectly the human genome reference and now we want to run uh collars that look for differences between our data and the reference and there's actually a Wikipedia article called short read sequence alignment there are many many ways to do alignments um bwa is the one I've used here but there are many many others um so really want to choose a specific line or appropriate to your biological question uh this is a cartoon of the exact type of data I just showed you so the human genome reference being here along the top uh here's chromosome one chromosome five here all the sequencing reads and here the types of genome variation we want to find using bioinformatics point mutations single mutation difference from the reference an indel assertion or an assertion or a deletion of a base or multiple bases uh in a small region missing dna so tumor suppressors in regions that don't have any reads whatsoever if you don't put dna into your uh sequencing device you're not going to get reads out or that region is not represented that looks like a deletion a hemizagous deletion so one of the two chromosomes is deleted that means you get half as much data as you expect a copy number gains some of these alko genes get uh amplified tens or hundreds of times as a result you get tens or hundreds of dna sequencing reads and happening to that region uh a translocation break point where you take one piece of one gene attach it to another piece of another gene uh and this is now it has brought one gene on chromosome five under regulation under the uh control of regulatory machinery that are meant for uh genome chromosome one so these are very powerful alkygenic drivers in cancer i'm not going to talk too much about pathogenic sequences but really not everything aligns to the human genome sequence especially in cancers like uh cervical cancer for example that where they're one of the original uh drivers of those cancers is exposure to uh to virus hpv ebv etc in this case those reads of course come through in your DNA sequencing reads and you can align those to other read to other references beyond the human genome reference uh and here's sort of a cheat sheet of sort of the ways that we can think about in um and sequence dna rna uh and start to infer the presence of proteins that dna is wrapped around uh whole genome sequencing this is put all the dna into sequencing or into the sequencing device uh and sequence it it's whole genome sequencing exome this is focusing specifically on exon so only the coding portions of these genes uh much much cheaper this is probably one percent looking at one percent of the genome relative to the whole genome targeted gene sequencing this is what we often will use in the clinical labs much smaller targeted panels much more cost effective much easier to interpret targeted variant gene typing this is getting even more focused looking at single basis single mutation uh very high throughput very inexpensive um the least comprehensive test but certainly the most clinically actionable uh and epigenome modifications really looking at methylated promoters or other regulatory regions rna sequencing looking at the sort of the functional output of dna looking at transcriptome sequencing either looking at whole rna sequencing targeted rna sequencing is also a uh protocol that's being increasingly used in clinical labs and micro rna sequencing looking specifically at small micro rnas um that are essentially regulatory control on gene expression uh and i'm not really going to talk at all about protein mapping or dna fingerprint mapping actually may be an entire bioinformatic workshop on this topic specifically uh i did want to i want to show you real data as often as possible so here's the exact same sample sequence using genome sequencing exome sequencing and rna sequencing focused on a very important cancer gene k-rasp the way we read this um so here's the region for uh 49 kb's regions were resumed out quite a bit further than we were on the previous side here's k-rasp down here on the bottom so x on x on one down at this end so this is on the minus strand going from right to left uh whole genome sequencing you just get a c of reads so all those gray sequencing reads i showed you earlier each of those just shows up as just a little tick on this mark on this uh display because we're zoomed so uh so far out exome sequencing we've essentially used a reagent that will allow us to only focus our sequencing reads on the exons of interest this is where the coding mutations uh where the coding mutations live this really is much more spent uh much more cost effective because you don't have to look at the the reads and intervening regions that you may not be interested in if your question is specifically around coding sequences and rna sequencing you let the cell do the select the exonic selection for you so in this case the cell has spliced out the intron so all of your reads are only mapped uh to exons as they should be you also get the added information of what exons are attached to which exons and this is also quantitative so if you have very very high coverage of specific genes those genes are expressed at a higher level uh than genes that have low coverage are expressed at a low level i use the phrase whole exome and whole genome sequencing uh that's really a bit of a misnomer they really don't actually measure anything or probably more specifically we don't have great bioinformatic methods to assign absolutely every uh every single sequencing read uniquely to a portion to a region of the human genome uh this is actually a slide from my own work work in neuroblastoma uh in this case each little tiny tiny slice here is a as a neuroblastoma tumor um and the y-axis on the y-axis here is a single exon so each little slice going this way uh is an exon one means that we are completely confident in where those reads were aligned and our ability to find mutations uh in those exons and so these are sort of the reasons where we had no doubt we can find mutations we have good coverage we're well powered as you get towards the bottom of this plot you can see the our confidence column mutations essentially drops to zero not just in the exome sequences which are in these uh purple but also in the whole genome sequences as well and this is really sort of a blind spot in the genome where we just they're highly repetitive our reads are short we just cannot be confident that our read that in our ability to find mutation necessarily in this region uh the other interesting technical aspect is the blind spots in the genome are often well down here they're essentially the same as exome sequencing but often they're actually not exactly the same as exome sequencing this is due to differences in read length uh mapability and just sheer depth of sequencing uh with exome sequencing since you are able to focus the capacity of your DNA sequencer on a relatively smaller genomic footprint you can have sequenced those regions to much uh to much deeper coverage I might have much greater confidence in the mutation calls at each of those bases so I tend not I still use the term told genome and whole exome sequencing because the concept is you are really trying to be as comprehensive as possible but it does not necessarily mean 100 percent of the genome or exome Francis uh so I think at the time it was about 80 percent of the whole exome was covered at I think in this case it was only 14 X was needed to be called covered I think that's never has probably gone up to 90 plus now but I think this is there's still a lot of room for a lot of um sort of custom panels to look especially at these hard to read regions and to complement not just the luminosity to complement luminous sequencing with other sequencing technologies longer read sequencing technologies that basically generate much larger reads that are much easier to map across these hard to uh capture in sequence regions and one side on pack bio I'm not going to cover those technologies very in great depth actually in this talk uh beyond yeah rarely I don't really know really any clinical labs who are currently using those technologies due to um problems specifically with background error rate in clinical labs especially we're quite focused on point mutation detection those technologies have very high background point mutation error rate I think they are going to find a place at some point of resolving large structural variation but they really have to need to compete on not just um sequence quality but also uh cost as well certainly dollar per base it's very hard today to be to limit sequencing and so that's the data that's by far the most you pick for this um I wanted to just give some uh context to what RNA sequencing data looks like this is actually a plot I like to make for all of our RNA sequencing data set so for a single patient I like to just plot all of the exons and just say what is the expression level for all those exons just to sort of get a feel for what's the dynamic range of our data actually do is for almost every data set I get before I really do anything I make a distribution I look at what is the shape of the data and I show it in this case because uh this is actually data from a breast cancer patient and we want to know what type of breast cancer do they are they are they are positive are they PR positive what specific markers are they expressing and at what and at what level and what we want to do is take the single patient really just compare it to tumor reference sets the tumor from an ER positive breast cancer and ER negative breast cancer uh why the columns are really not showing up great here in this case here's uh the ER right here in red here's the ER um the ER transcript and you can see the you can see the corresponding transcripts down here in the in the ER negative breast cancer are really quite low they're really well below the the threshold here and the patient as well basically what we want to do is look at all right you can't even see it before we the story here we essentially want to take these reference genes look at their level in the patient and look and basically compare those to have to reference sets as well so if these dots are higher up in the patient and correspond to a specific threshold in a training set to be our positive breast cancer ER uh negative uh the third type of uh cancer genome variation probably being used more and more uh being read out more and more by next-gen sequencing is that epigenic modification in this case it's another IGV screenshot I've shown mlh1 here along the bottom here's the promoter that controls this gene and in this region methylation of this promoter is very strong control over expression of that gene so in this case here's normal tissue this promoter is unmethylated in the section results in normal expression of mlh1 in normal tissue and here's the exact same type of data the exact same assay run on uh endometrial carcinoma you can see the promoter has essentially turned completely red so what's been done in this experiment is they've taken the DNA they've treated it with bisulfite and that converts all of the unprotected Cs uh into a T so this is uh into a T and this is really indicative of a of a essentially a promoter with a whole series of Cs in a CPG island are unmethylated and therefore result in over our normal expression normal tissue but silencing with this gene uh in cancer uh just like mutation calling this this assay is completely quantitative as well so you can see this promoter is methylated in this tumor but you can actually see a sort of a shadow of blue signal as well and this is still persisting from normal cells that are also present in in the same tumor tissue as sort of a cheat sheet great for um sort of exams that's sort of coming up with the experimental design really none of these assays really stand completely alone genome sequencing is great for sequence mutations often of alterations etc i don't have to go through this in great detail but really think about having to not just read out these sources of cancer genome variation in one shot but really start to think about uh integration of these types of mutations so a experimental experimental design i'll use quite frequently is actually a combination of both exome and RNA sequencing exome to get the sort of the parts list what are the mutations what are the copy number alterations i'm going to talk a little bit more about purity employee how do we have sort of challenges in working with clinical samples but not only what are the mutations but what ones are actually expressed which ones are actually being used by the cell which ones result in loss of function or loss of loss of expression of a certain gene which mutations are expressed at a very very high level uh methyl seek as well being very useful for looking at regions or uh genome variation that can result in loss of gene expression what are you're completely blind to using exome sequencing so i'm going to go through this list sort of one by one so the next the sort of series of the talk is really just marching through what are these types of cancer gene variation and what do they look like active read level and how do we detect them and think about them so i'm going to start with uh somatic mutations for the most famous type of mutation a single base pair change uh i want to start again with what do what does uh genome variation look like in normal cells so in this case here's a sequence a read stack i put the reference down here on the bottom and i've shown here a difference so in this case half the reads at this exact position have a c and half the reads have a t t being the one that doesn't match the uh the reference in this case this is a heterozygous variant so i showed you those chromosomes at the beginning you have two copies of every gene in this case one gene has a c one gene has a t and next chance sequencing is very is highly quantitative you can literally just count the number of c's count the number of t's and get more or less a 50 50 balance in cancer specifically in um in cancer specifically in applied or uh clinical genomics this is compounded by a few features that uh a few features of the tissue that we're actually given to work with the first one being that the tumor tissue uh often yields variable dna quality and quantity from the tissues that are used routine diagnosis so in cancer so in cancer diagnosis uh cancer biopsies their surgical samples are excised they're then fixed um with four uh with formula and then embedded in a paraffin wax cube or paraffin wax cassette and this is because um that type of reagent is very easy for sectioning and looking at cells down the microscope unfortunately it's absolutely terrible for dna uh just illustrated by this slide here so this is a dna ladder these are 12 samples all taken from the same cancer type from the same hospital uh one sample gave a nice high molecular weight dna how much higher than the ladder you can see as you move from left to right you get increasingly worse dna this is actually highly fragmented uh luckily aluminum sequencing is relatively short uh sequencing reads so sort of about 50 base pairs at the low end to 300 at the high end so even these small fragments are still usable for dna sequencing but there is it's an additional challenge of just making this dna uh sequenceable or compatible with the uh with aluminum sequencers uh the other challenge is that tumors are not just perfectly homogeneous masses of cancer cells they're actually a mixture of tumor cells and invading or surrounding um normal cells so this is actually a screenshot from a laser capture micro dissection experiment this is a uh a lung cancer that's metastasized to a lymph node so it's already very rich in immune uh in lymph node cells immune cells uh what we've done here is actually have circled all the tumor deposits in this case to enrich the number of cancer cells for analysis we use the laser to cut out these cells and capture those and then do a dna extraction from purified cells um and sequencing has gotten cheaper and cheaper a secondary approach is the reason where we'll just sequence all of this dna together and sequence this actually to great depth instead so instead of just see generating 15 reads like I showed in that general line example we'll generate 150 or 100 1500 or 15 000 just sequence past the normal in the search for somatic mutations in uh highly impure tumors uh so you may hear this concept of contaminating stromal cells referred to as tumor purity or also tumor content so the two buzzwords that you'll see to describe this idea tumors being a mixture of cancer cells and non-cancer cells uh the other challenge I touched on earlier is tumors can have multiple copies of chromosomes so that assumption of two copies of every gene is broken in cancer and even more confounding not every chromosome is necessarily at the same copy state from someone here is it four copy from some two is it three copy like to be easier to find um mutations uh potentially in the uh the three copy um the three copy chromosomes versus the uh four copy chromosome so you may hear this side this concept referred to as annu ploidy or ploidy for short uh one way around this is the sequence um the dna from these admixtumers to depth um this is where really where xome sequencing and certainly targeted sequencing still really has a home this is the same sample sequence two ways genome sequencing and xome sequencing in this case a this region of the genome has only 15 reads the xome has nearly 140 and that really gave the um the algorithm or the alignment the power to pick up the mutations that actually or the the dna fragments that correspond to the read supporting this actually um this uh actionable mutations so really trying to think about what is the goal of the performatic analysis is it to find absolute every mutation in a comprehensive way sequencing very very deeply or is it a survey cancer genome variation with the acknowledgement that in low coverage regions especially in highly annu ploid or low purity tumors uh mutations may be missed uh one benefit of the deep sequencing approach is the ability to infer tumor purity ploidy and potentially subclonal structure as well if you have the depth to detect these very rare variants this is one algorithm we use in my lab called sequenza um the way to read this plot here the chromosomes ordered from left to right each dot here is a variant a germline variant or some sort of difference in the genome and you can see just by eye here sort of these copy number changes as you sort of get these shifts in both um in both allele uh coverage but all allele specific coverage but also not a coverage across the whole genome itself uh the goal being to re to generate these genome-wide copy number alteration plots where you should have one copy of uh red allele one copy of a blue allele so a normal cell here would just have two blue and red lines to just line up nicely along this baseline in this case this is a highly annu ploid tumor with lots of copy number gains lots of deletions actually this chromosome here is quite interesting where one one allele was just deleted so that is basically a copy zero and to stay alive the cell actually duplicated the other chromosomes so it still has two chromosomes but they're just both the same uh these algorithms can infer both tumor content and average ploidy these are inferred from knowledge that normal cells can contribute DNA as well so you're not necessarily expecting these perfect 50-50 balances between mutant and non-mutant allele you actually get this sort of dilution effect of having normal uh cell contamination and then tools like pythone phylo WGS is another tool that will take this type of data they'll take your somatic mutations and then basically answer the question or give you a readout into what percentage of the cancer cells have these mutations so in this case all of these mutations here are all in 100% of the cells these these mutations here are in roughly 80% of the cells and here are subclone mutations that are often visible to shallow genome sequencing uh that are in this case only visible in 20% of cells with something that you can see with deeper profiling and to then assign to a specific subclone in this case very important driver mutation present in the uh the clonal cancer population as francis alluded to Illuminous sequencing isn't the be-all and end-all of uh DNA sequencing in the world I've just shown an example here where we use uh long read packed bio sequencing to complement what we have from Illuminous sequencing so in this case we discovered mutations in these two genes here uh we wanted to confirm that those mutations were real so in this case this is a point mutation so the base changed colors you can see it this was a out-of-frame two base deletion and we wanted to ask the question can we see those impact by our data as well the real challenge with this technology at least at the time this is a little old now you can see the real variant shining through but you can see this background error background error noise uh what's useful is the background errors are not necessarily always in the same place but they're certainly there so and so have this background effect that can make variant discovery uh sort of more challenging than it would be in in that clean of Illumina data um certainly not really a problem for point mutations because the reads that have the variants are very clear uh this we thought might be more of a challenge in the indel space because the types of errors that the packed bio sequencer makes were specifically insertions and deletions but you can see this variant was very highly recurrent so something that we could call out on both platforms and this is where it's very useful especially we're looking for challenging types of cancer genome variation to have an orthogonal technology to validate um what would validate and what was discovered in one platform uh was something that's more complementary I want to talk a little bit about depth why sequence deeper in addition to the overcoming challenges of purity and ployty the ability to really detect very very rare sequencing variants so I just wanted a plot of how how low can you go in terms of allele fraction and with depth um we actually just published this uh last year in the CT DNA study um exome sequencing is sort of being in this realm of 100 to 200x coverage the clinical lab will sequence our panels to 500x plus if you really want to get serious subclinality we'll use highly targeted custom assays to look for these very rare subclones and circling tumor DNA we're currently sequencing those to a 25 000x coverage in this case this is DNA from tumors that have been shed into the blitz stream so we have to sequence past all the contaminating normal DNA fragments and finally 5 10 15 fragments of DNA that are derived uh from tumors uh you can't necessarily so we'll often certainly in that study we had the tumor we had a bone marrow aspirate in the case of multi-myeloma uh actually I'll just jump here um so we knew what the answer key was we knew where the somatic mutations were but we did have several examples where we found mutations definitely drivers we'd seen before that weren't in the primary tumor multi-myeloma this is multiple bright in the name that you have many of these masses and I'm convinced those mutations must have been some other mass in the body but you couldn't necessarily say we found this in the blood it's that one versus that one it's present but you don't necessarily assign them to a certain tumor mass uh so here's a sort of a now famous cartoon of where circulating tumor DNA can come from so I do want to differentiate cell-free DNA from circulating tumor cells these are two sort of flavors of a similar idea I'm going to talk about specifically about cell-free DNA so this is DNA that comes from um tumor cells that are turning over dying shedding their DNA into the bloodstream and is now dissolved in the plasma portion of the blood there are also circulating tumor cells so much harder to find often that lower uh often lower concentrations these are actually in the cellular portion of specifically of blood I'm going to be talking about the liquid portion which is a combination of 99 plus percent normal cells and one percent less uh cancer cells here's what next-gen sequencing data looks like from that type of data it's from the two papers two technologies that were published one's called TAMC this is a PCR based approach you put your primers down you amplify only your region of interest you can see highly highly focused here two exons down here they really wanted to find um basing mutations only in these exons this is a little more challenging in that it requires highly multiplexed polymerase chain reaction so p highly multiplexed PCR the other challenge here is it's very hard to scale this into larger sort of larger especially contiguous regions of the genome so another approach using hybrid captures is the same technology used for exome sequencing you put your bait down on two exons here this will actually pull down sort of regions of DNA this is the method we certainly use in my lab to tile across regions that are very difficult to walk across with PCR with PCR reactions but something you can just put in probes not just on exons of interest but really in any arbitrary region of interest it's conceptually exactly the same as exome sequencing rather you're just focusing the the panel design much more and sequencing to much much greater depth the other benefit of circling tumor DNA is several reports have found that the quantity of cell-free DNA in blood plasma it reflects tumor burdens so if you basically take up all the tumors in the body add them all together there's a correlation between how much circling tumor DNA you find and the number of tumors or size of tumors you have and the real real potential here is not just looking at CDD at one point in time but starting to monitor and look at changes in tumor burden over time both a fusion in this case but also in this case they actually baited a specific fusion and monitored the frequency of that tumor derived fragmented blood over time and compared that to tumor volume actually see it was a very nice take correlation between presence in blood and that tumor in the body as well conceptually exactly the same workflow I showed you earlier extract DNA make a library sequence it just sequence it much much deeper and then there's the computational analysis being tailored specifically for finding these very very low allele fraction mutations so that was pretty much it for point mutations sort of single base pair changes this is now stepping back looking at much larger changes gains of whole larger regions of the genome some in some cases whole whole chromosomes I think come out really well but this is a spectral perotype where basically they have probes for every single chromosome and what you should see if it was a normal cell is shed too they should all look like this you should have two copies of every chromosome each with its unique color you see that's absolutely not the case in this cancer genome you know all these chromosomes that are being constructed from other parts of chromosomes part for chromosome one are being sort of shuffled and reshuffled and this is really the the major challenge in that cancer genomics is these tumors are completely scrambled and how do we measure that scrambling but then how do we focus in specifically on the alterations that have clinical meaning gains and losses are evident from sequencing data so I showed you earlier gains result in more coverage losses result in less coverage here's a microarray microarray being compared to sequencing data reds being gains at a very high level making amplification very obvious from sequencing data not something necessarily see in in normal cells this can be done at the genome wide level using exome and whole genome sequencing but it can also be informative in small targeted panels as well so this is a output from a clinical a clinical test we run at princess margaret it's very focused very uh the only sequence of a few exons from a few genes but you can you can see even by I hear um EGFR amplification every single exon has much much higher coverage than the baseline than you'd expect and the same sort of story in the tumor suppressors single copy uh or sorry two copy deletions of a p10 we're basically getting no data whatsoever from the p10 focus indicative of two copy loss um of that gene uh the challenge with this type of data is you can see it's relatively noisy this you know this baseline isn't perfectly at zero this exon's a little wonky it's a little higher and then this is really a challenge with uh interpreting a lot of these targeted assays and this is really where big not exome wide but big large multi you know 100 gene panels really still have a place certainly in clinical sequencing where the same sort of idea of putting bait specifically into individual genes now having very um you know much greater coverage for single gene amplifications whole um whole chromosome arm level losses just by loss of uh loss of coverage or uh gain of coverage it's really reading out the same type of cancer genome variations gains losses rearrangements using smaller cheaper uh targeted panels I mentioned translocations uh translocations and large intergenic deletions really the so the beauty of mentioning sequencing to result these um these deletions right to the breakpoint so you have a read that comes in it hits this breakpoint and the rest of the read actually continues over here so we're really trying to infer deletions even at the exon level of of two exons that are not necessarily lost in normal cells or in wild type cells this is what the real world data looks like in this case reading in this is a famous rearrangement EWS slides where these reads are mapped very nicely to this region and you sort of get this pile up of what looks like hundreds of mutations of course you've never have hundreds of mutations like this all in a row this is actually a translocation breakpoint and the this side of the read actually should align to the the other uh the other partner gene same story down here of all these soft clip reads that are viewable right um right in the IGV viewer all these reads are correctly mapped here and then there's a break and these reads actually are continuing over uh in the partner gene uh they're bioinformatics tools to call these out now these are probably the noisiest tools the false positive rate is extraordinarily high uh because sometimes these reads actually have um high repetitive sequences or other sequences that make these reads very difficult to map and this is further challenged by this concept of chaining translocations this is a very complex example uh in prostate cancer where you actually have this chain of four genes all being stitched together in different way this is not a nice bioinformatic tool this is post-doc McBurger actually went through and manually linked up all these reads one by one there's still not a great tool that just magically assembles these highly complex rearrangements so there still is manual manual intervention is still a big part of the downstream uh enterprise of bioinformatics in this case you know multiple chromosomes even intra-inter or intra-chromosomal rearrangements all being stitching these these different genes together uh just like mutations some tumors are highly arranged some uh some tumor types are rearranged at a very very low level in this case here are three lung adenocarcinomas sort of small medium and large number of rearrangements here's neuroblastoma very rarely rearranged apart from this one tumor here in fact monar and colleagues reported 18 percent of neuroblastoma have highly focal chromosomes um with a phenomenon called chromothripsis where essentially the chromosome has more or less exploded and then stitched itself back together resulting in many many breakpoints in a highly focused chromosomal region uh a feature of pancreatic cancer as well i've transcriptome sequencing sequencing RNA um in addition to gene expression just counting the number of reads assigned to each um to each gene itself you can also look at the coverage of individual exons so in this case here is a the umps gene here's the coding sequence and the way in this paper here palachygrithus and colleagues has sort of encoded each exon is exon one exon two exon three exon four and what you'd expect for this gene is one is attached to two two is attached to three you know everything's sort of stitched together in a methodical way and you should really very rarely see an exon one to exon three jointer and that what that's exactly what you see in this myth 101 cell when um you just sequence it sequence it without drugs so here's exon one it has very amount of coverage there's good coverage for an exon one to two junction very very poor coverage of an exon one to three junction we really would not expect that and then the rest of the transcripts sort of continues on exon two straps of three three straps of four and transcripts of tag what happens with this cell line when you treat it with this drug five flora urusil this starts to use this unusual junction so you can see the coverage of an exon one three junction actually jumps up this is actually resistance mechanism of this cell line specifically to five flora urusil start to skip exon two and just reads from exon one jumps to exon three and then keeps going this is something you'd be completely blind to with any sort of dna sequencing this is really happening at the regulatory level really speaks to the need to read out the functional effects of of the genome in addition to essentially the parts list in dna i'm sure you've all seen these types of heat maps before essentially they're just huge colored tables of genes by samples and i think there's an entire entire module on this to essentially taking individual samples clustering them and then looking for patterns that differentiate two different sampled two different types of samples based on the presence of specific genes so essentially if you just think of these big matrices as tables of samples and gene types and all this really being changed here is what are the order of genes and zooming in on specific genes that really inform inform these types would always know in the back of your mind the entire matrix is available somewhere all of those genes have been sequenced so that data that data are available and these heat maps are really just subsets of a overall super matrix the other big benefit with highly quantitative data like RNA sequencing is the ability to start to compare your data with other with other groups this is something we did in four lung tumors of unknown tissue origin we just wanted to know where those tumors come from we knew they weren't necessarily lung primary in this case we took all these tumors and we clustered them with a large set of the data from healthy tissues in this case this is the genotype this is the g-tex project the genotype tissue expression project and they're just sequencing tens or hundreds of representations of healthy healthy tissues or tissues from healthy individuals I was actually surprised and trust you these lung tumors actually clustered specifically with smooth muscle these were lung tumors actually in women with a hereditary cancer disorder and and we actually think and almost exclusively in women and we actually think these are derived from smooth muscle of the endometrial lining better than seeding these sarcomas in in lung cancer because to see how very nicely all the other cans this is completely unsupervised how much the all the other all the other tissue types really segregated into not just the the tissue or the correct tissue of origin but actually in case of brain specific components of the of the brain itself I talked a little bit earlier about aligning reads not to the human genome references an example from a lung cancer project we were involved in in which we're actually able to completely reassemble the epstein bar virus and what was diagnosed as a lung adenocarcinoma this actually turned out not to be a lung adenocarcinoma was a lymphoethylthelioma like lung cancer we stain this in the tissue only the cancer cells lit up for ebb but really this was just data that came along for the ride when we did RNA sequencing resulted in a change of diagnosis really if the DNA or RNA are in the tube it's in there to be found assuming you have the right bioinformatic tool to run on it just a couple of new slides this year all these concepts we talked about from RNA sequencing are now applicable and doable at the single cell level so now several commercialized technologies that now will deliver unique cell specific barcodes in the tube so essentially you start with this tube of gel beads that contain unique barcodes and this device this is a 10x 10x genomics technology you basically have these beads come through a microfluidic channel and they get merged with essentially a single cell suspension of your sample of interest and these actually get merged into an individual oil droplet and you burst first of the um you burst first of the reagent droplet and that this will essentially deliver barcodes specifically to the RNA for your cells you know which RNA comes from a single cell and then you burst the entire oil droplet and then sequence them on mass so this really has the benefit in that now you have this big mixture of RNA that can now be deconvoluted and mapped back to single cells and the real benefit of this specifically in cancer is the ability to reassemble not just tumor subclones but also all the supporting structure around those the supporting microenvironment around those cells as well so this is actually data from three from three people that if you focus specifically on this plot patient A2 is a patient with multi-myeloma so each goth is a single RNA sequencing experiment for a single cell the other two are healthy bone marrow so myeloma is a cancer of the bone marrow and you can see that all the healthy bone marrow cell types are all clustering together the patients healthy cell types are also all clustered are also all seated in here and we also had two very very large populations that are patient specific and what you can actually you can play this game you sort of circle out specific cell populations and you can look up the genes that are being expressed in the each single cell in this case these are T cells these are all genes associated with the T cell receptor in this case these are macrophages all the genes expressed most of the genes expressed that these cells are expressed in macrophages and same as story over here these are all neutrophils and these actually turn out to be only seen in the myeloma patient and one of these is the so the clonal myeloma cancer cell population so this is a real opportunity especially in cancer genomics to take data from these admixtumor populations and start to compare those with huge normal reference sets that are emerging very similar to g-tex one group that's really leading the way here is the human cell atlas so their goal is to basically assemble a huge compendium of all single cell data from all tissues with a big focus on normal tissues and this is a really incredible reference set specifically for cancer because now we can take our lung cancers and compare them to thousands of healthy lung tissues for example it's also a great funding opportunity so founders of facebook so mark zuckerberg's group is now founded a group specifically for founding for funding health initiatives including the human cell atlas and trainees who want to work specifically in the single cell genomic space okay so i'm coming up to my last half hour so i'll just come through some of the hereditary genomics group and then we'll take maybe a two minute break and then talk about the the case report at the end conceptually this idea of two hit tumor suppressor still holds you know i talked about these these genes that encode regulators of pathways that are sort of the brakes of the cell in the case of a of someone with a hereditary cancer cancer predisposition they're actually born with one of these loss of function variants right from birth so in this case here's an example here's that same canonical example of two chromosomes but in this case actually born with the mutation in this case they're born with a loss of function mutation in retinoblastoma gene and essentially now there are only one hit away from developing cancer and really this idea of multiple types of cancer genome variation acting on the same gene really holds true in the hereditary cancer syndrome space where there are many different ways where loss of the second remaining functional deal can happen a local event sort of a gene conversion do every combination just loss of that gene or just loses from some completely there are many different ways you can get for a second hit and these patients are very highly predisposed predisposed to cancer because really any one of these can lead to cancer in their case of course you can read this out in next-gen sequencing as well so here's a sequencing read stack from the MASH normal from that patient this is a SUFU so sort of a regulator of the wind pathway in this case they have a loss of function mutations they have a deletion of a single of a single base and that's in their normal DNA you can see half the reads here have the deletion half the reads don't they still have remaining functional allele then have this huge deletion of that chromosome they lose that second chromosome and as a result in the tumor every single read has that deletion except for this one this is one straggler on normal cell but in this case there's only one copy of chromosome 10 left and that's because the tumor has essentially lost the remaining functional allele so that's finding lists of mutations so those are all the ways that you can look for a variation of cancer genome but once found the challenge is not just finding these things it's really linking them to clinical action and this is really by far the least automated part of the bioinformatics cancer genomics in general how do you go from linking finding to function so same sort of diagram I just showed earlier but really the next step after you've run some of these bioinformatics tools how do you annotate interpret and reporting these variants so how do we do this there are clinical guidelines for how to interpret and report a clinical variant here's sort of a list of some of the tools that we would run to interpret a variant all these are now currently reported in a sort of a text sort of a text paragraph back up to patients this is something for research especially if you're looking at exome sequencing or whole exome or whole genome sequencing you have thousands of these variants it's very challenging to go through every single variant one by one and to do a sort of a bespoke or by hand interpretation of every single variant so instead we use a variant interpretation tools on potato is one that I've used quite a bit certainly used for cancer genome atlas there are interactive desktop software that will also let you review these variants as well and online tools are becoming increasingly powerful for annotating and interpreting variants on mass I think serena has a whole talk on mutations and interpretation so leave it to her to go through how to find the mutation how to interpret it these are two of the values here's sort of a screenshot of the American College of medical genetics and genomics there's a sort of a standard so if you want to do clinical reporting these are the guidelines that we follow that really follow a lot or use a lot of the tools and guidelines that I just outlined on the previous on the previous slide the challenge for clinical reporting is this is really not scalable so this is data from a clinical report at pincis margaret the tumor had three mutations that had a certain mRNA accession number it was heterozygous in all three cases affected cdna and affected protein and we wrote a paragraph for every single one of those mutations which is you know there's a lot of detail here there's a lot of support for yes this mutation is important um the challenge is if you have a thousand of these no one really wants to reach through a thousand of these variants and this is really where clinical cancer genomics is really starting to move away from the single gene single variant model and much more into a more comprehensive reporting approach one of the initiatives that's really driven this is the aacr project genie this was a really initiative by american associate of cancer research to get all the data out of clinical labs and put it all into one place so we could essentially search across multiple hospitals for what variants are being seen on these very very large panels on these on large panels across large centers and how can we think as a group about how to interpret or report these types of variants and start to annotate these samples that essentially in a higher clinical at higher clinical detail so this group has been active for a couple of years now there are seven founding groups that's probably going to be able to 17 or 18 groups in the next year so today if you go to this website there are over 38,000 targeted panel tests data from 38,000 targeted panel tests just available for browsing this has been very very powerful for looking at not just at the famous mutations ggfr b rafts but also at this long tail of infrequent mutated genes and this has been a very important data set for us in research to start to resolve passenger mutations from driver mutations and to think about what new clinical trials or which targets could new clinical trials go after not just in single agent trials but also in in combination therapies as well so the last data side what does that more likely report that the future going to look like this is C bioportal it's a data viewer that was developed for the cancer genome atlas it's now especially within genie sort of growing into more of a clinical and genomic viewing tool so the way to read this you basically have a patient who has multiple tumors over time a clinical timeline at the top I actually find increasingly helpful especially as we see a tumor multiple tumors being collected over time so here's the timeline here all the treatments they've seen and it's actually been very helpful to see you know oh these tumors are hyper mutated versus diagnosis what treatments that they have in the intervening time and there's some reason that there was hyper mutation from a treatment perspective genomic landscape so what are the copy number track what are the copy number variants for these four tumors over time you can see just by eye there are differences and where the mutations lie specifically in these tumors and are there unique private mutations in a single tumor that aren't in the others and then going back to this idea of annotation having a handy table at the bottom that really gives you annotations at a click having not so yeah so having specific annotations you know is there a bioinformatics tool that predicts that this is this is pathogenic are there drugs or trials available really starting to grow this table out quite a bit more around what trials are available or what other model organisms are available to test basically to test specific drugs in these in these patients so that's it for the first part the next sort of half hours really going through this case study so there any sort of content or concept questions at this point before we go into the into the case study no okay I will move on then so this is the one of the very first examples of personalized medicine using whole genome and RNA sequencing it's published Jones et al really looking at a very unusual adenocarcinoma of a salivary gland so initial presentation is going to be a single patient 78 year old man no real reason that he should have cancer fit and active presented with throat discomfort medical exam found actually pretty large tumor at the base of the tongue two centimeter mass that's very large tumor no real reason non-smoker non-drinker no reason to sort of suspect like a head and neck cancer for example but he has cancer pet CT scan and subsequent biopsy these aren't actually his this is his h&e this isn't actually his scan but the idea that I've got a pet CT scan this tumor really lighting up as well as the drain lymph nodes as well so really not only is the tumor itself positive but there's potential that this tumor may have spread I don't know if any pathologists who want to take a guess at what this tumor is some useless adenocarcinoma of a salivary gland so eventually this is basically the first step get a tissue sample review it by h&e have conventional scanning and conventional pathology so they had laser resection of both the tumor and the draining lymph nodes the primary poorly differentiated adenocarcinoma some pathology features the nodes three of 21 nodes are positive so not rampantly metastatic disease but positive for sure got radiation to the primary completed in February and actually worked for some time good quality of life return to work the primaries now removed but most concerning where we can came to the attention of oncologists at the cancer agency was they started to develop he developed numerous small um basically lung metastases in both lungs so this is really quite serious clearly the tumor hasn't left the salivary gland now it's invaded his lungs this is a much more serious medical finding so the question is what do we do now what treatments are really going to work for this individual at the time they had an eGFR inhibitor trial online eGFR mutation testing wasn't wasn't necessarily routine they ran the eGFR immunohistochemistry they wanted to know does this tumor express eGFR and if it does um before you give them an eGFR inhibitor you of course don't make sure the tumor is expressing eGFR not overwhelmingly positive so cells will sort of turn brown when they're expressing eGFR but the answer is yes so the question was what treatment could this this patient this patient receive in this case there were eGFR inhibitor trials available so one of the drugs available is called urlatinib it's an eGFR inhibitor unfortunately all the pulmonary nodules grew well on urlatinib so while this tumor expressed eGFR this drug was just not necessarily effective in this cancer type in fact the largest lesion really grew from one and a half centimeters to over two centimeters and they discontinued pretty quickly and they really know other options at this point they sort of exhausted standard of care they had radiation they tried a clinical trial with a sort of an early molecular result but they're really talking palliative care at this point and this is really where researchers at the cancer agency came into play the question here was are there other targets that could be targeted that we could go after specifically in in this patient so I'm going to keep this little timeline going as we go through this so presentation head surgery head radiation lung meds were diagnosed they began to grow a lot and failed what next so what are the targets in our case so the exhaustive standard exhaustive standard of care they had an REB which approved a protocol just for him so an N of one case the patient descended to full genome sequencing with understanding that they could suggest treatments this of course is not a clinical test it was a research study for a single patient with the goal to nominate treatments that could work I talked a little bit about the challenges of formal and fixed paraffin embedded tissues for uh cancer genomics in general in this case it took a fresh biopsy for RNA sequencing so here's an module a here are the tumor cells not the greatest stain but um basically the purple cells are cancer cells uh stain the aspirate had had tumor cells for analysis they took a final aspirate pathology review nice high tumor content so overcoming that problem of tumor purity 80 percent of the cells in the aspirate were um more cancer and then for the dna they used the diagnostic block formal and fixed paraffin embedded dna and it came back with a pretty short list of mutations came back with a pphd3 mutation of sort of unknown significance not really an obvious loss of function they had an rb loss of function mutation and the loss of rb specifically has been um had been shown even at that time to result in good fit and resistance so this may be one reason why this patient was resistant to the ejfr inhibitor right from the get go so having a comprehensive analysis right up front maybe would have avoided that treatment and then mutations are really in two non-cancer genes that you really are variants of unknown significance and these are variants you see all the time when we do bull genome sequencing some mutation of unknown result in gene you've never heard before uh this is why we're going to have jobs for years and years and years because we don't know whether these mutations have function or have clinical meaning I suspect actually a lot of them do in this case we sort of been these two as as passengers and we confirm those messes the the clinically informative ones by secondary method in that case standard sequencing just like I showed in that illuminate packed bio comparison result here's one way to show copy number alterations so the way to read this all the chromosomes are running around the outside the the gains and losses are showing here so reds are gains I believe yeah reds are gains greens are losses you see here is the ejfr amplification so this again really fits with the ejfr uh expression seen in this tumor loss of p10 like I showed earlier so you have this deletion of p10 also another indicator of resistance to um ejfr inhibition focal highly focal ret amplification you see how small this is right next to the solution ret is at a very very high level in this case and then other cancer genes as well we have lots of deletion of one copy of p53 deletions of other tumor suppressors amplification of the map map kinase pathway a lot going on you do whole genome sequencing you can see all of these types of genome variation and some of them will inevitably hit hit um cancer genes again really wanted to validate what we're seeing by whole genome sequencing using fish so in this case fish is essentially a probe that will sit down on top of a gene and you literally just count how many dots you see in each cell so each cell for normal genes should have two dots in this case p10 had a single copy deletion so for every copy of chromosome 10 you only had one copy of p10 a focal ret application hard to see here but many many many copies of ret in every single cell again something that you can see at the cellular level but also read out directly by next-gen sequencing uh did RNA sequencing as well and this is really we're having multiple types of multiple ways to read the cancer genome uh is really very informative in this case smad 4 had extraordinarily low expression compared to a compendium of match normals uh matched tumors and normals uh so in this case smad 4 was deleted and down regulated so if you just jump back here there's this deletion of smad 4 really recapitulated in the RNA sequencing data as well and on the other end ret was one this is one of the most highest stresses of ret we've ever seen 34 times higher than the rest of the compendium reference set uh so very attractive drugable target nothing you'd even consider in this cancer type of the time without some sort of the genomic analysis and p10 same sort of thing it's deleted and expressed at the low levels we have these two pieces of data that really um sort of tie the story into a nice boat um this list is helpful uh because it gives us some feel of what genes are are active in the uh are active in the cell but at the time to really make sense of what the drugable targets were they basically took all the um the gene basically all the uh copy number data and the gene expression data and assembled it into a pathway and this to this day is still the way that the personalized oncogenomics program is still thinking about and displaying these data and that they take a annotated canonical pathway and then they layer on the data for the patient they layer on the copy number data they layer on the expression data and they sort of look at what pathways seem to have clusters of overexpression deletion amplification etc so in this case those ret oncogen was overexpressed they saw deletion of this negative regular regulator of that pathway um this parallel pathway was relatively untouched so really that looked at the time that they um had since ret was overexpressed the downstream targets on that pathway were also overexpressed but really this seemed to be the most important pathway in this gene not just because of the ret result but of all the the constellations of genes and regulator regulators both activators and repressors all seem to focus on the ret pathway so they came up with a list here are the taking all this genome data it really boiled down into four or five um bullet points map k pathway is important ret is important what drugs could act on this axis uh came up with a relatively short list and then sat in them sorry acting up in slendak uh and this is really where you presented to the oncologists and say what could this patient bear which of these drugs is actually available what trials are ongoing um and she choose she chose synitinib and it really worked so after four weeks on synitinib one of the four drugs on the on the chart there's a 22% decrease in tumor size so here's that original biopsy I showed you earlier two nodes here uh took a month to do whole genome sequencing and all the analysis so you can see during that month of analysis time the tumors grew like this uh started synitinib right here and then almost immediately within four weeks you see the same tumor masses trying to shrink so really this idea of knowledge of cancer genome variation linking it to a drug really the concept of the proof principle really appears to be quite promising uh he stabilized for uh seven months so they reduced the clinically they reduced the uh side of the dose to reduce uh side effects otherwise great quality of life sort of thing where he was thinking about coming back to work uh but I started this entire talk talking about the inevitable resistance to targeted therapies exact same story here then after seven months um or after four months rather the lung mats began to grow but we had all that genome sequencing data and we had three other drugs that could potentially be used uh so actually switched him a combination of serathenim and slendak and again stabilized the disease so really hitting the same pathway in a different uh in a different way or hitting some of the other alterations in a different way uh and continued for another uh another three months again recurrent disease after seven months so really this idea of hitting cancers on a single pathway in a single way with a single agent did not work certainly in his case uh in in this case actually recurrent disease not necessarily in the lungs but actually back in the primary site so even though we had the tumor removed that there were some residual tumor cells that were still growing or potentially metastatic cells from the lung going back and repopulating the primary this is something that could be answered with genome analysis not something that was done here uh the other concern a new nodule in the neck the lung mats are really starting to um to progress there's new metastasis in the lungs quality of life is of course deteriorating and the question here genomically is what has changed so went back in and got another aspirate uh so more cancer cells so in this case they have the neck mass and sometimes very accessible with a uh finding the last bit the exact same analysis and came up with all the mutations and alterations that they saw earlier in addition to these additional mutations uh none of these are really knowing cancer cancer genes they're all missense mutations are very difficult to interpret there's no functional data around what these mutations do uh this is really the challenge especially with metastatic or recurrent disease yeah it's very hard to link even the function of these necessarily no most of them back to uh as cancer drivers no evidence whatsoever in the preterm and biopsy even at low frequency if you're looking at the one read in uh 50 or 60 there's no hint that these were there originally did the same idea they took all the cuff number alterations and mutations mapped them back to the exact same pathway and now you can again see this pathway is now light it's just red hot so there's cuff number alterations there's over expression all over this pathway as well as this parallel pathway as well in this case the tumors response to inhibit inhibition of the wrath of the rat pathway was just a ramp up rat expression so I have insanely high over expression of this rat pathway as well as a parallel pathway to achieve this uh the same biological goal and this is really this idea of subclonal selection or these this tumor dynamic that I alluded to right at the start really in action you could see it in the real data even just with a paired biopsy and this is really where there just weren't great treatment options then or even now actually do you consider a cocktail targeted drugs that have never been considered this completely not on the table you can't just take drugs out of the medicine cabinet and give it to people what do you do about these multiple pathways do you hit rat do you hit EGFR are there other pathways that are also active uh completely untested and there's a real risk of adverse side effects uh he was the only one at the time to have this level genome analysis we didn't really have nice reference sets like genie has today for what other treatments have worked in patients like him um could we have detected these resistance mechanisms pre-treatment or during treatment could cell free DNA been a part of his uh routine care it's not currently but certainly these are technologies that are really coming online uh none of these mutations were evident pre-treatment so this really speaks against the need uh against using small targeted panels to monitor these tumors really the need to sequence even cell free DNA fairly comprehensively to look for the acquisition of new drivers and mutations um do you think about serial biopsies or a blood test these are sort of the state of the art of clinical genomics today uh so here's his timeline unfortunately in the end he did enter palliative care uh until he died but you can really see the so the promise of clinical genomics to extend life and certainly quality uh quality and um extent of life uh with knowledge of cancer genome variation over time so i'll sort of leave it there and we can take any final questions if there are any references uh so in practice uh i think the way forward is reference sets and bioinformatic algorithms uh exactly this was a powerpoint slide that they put together and they still put together more or less manually this is sort of a ripe bioinformatics area for taking comprehensive genomic data and linking that to drug databases not just for other patients but also model organisms cell lines mice are huge high throughput projects that do pharmacogenomic screening all those cell lines have comprehensive genomic profiles i think this is going to be the next step now we are good at generating high quality high throughput genomic data from patients and you know the 50 oncologists have all met in the room and i think there's a good process to do it now and the big opportunity is to think about algorithmically how can we now take patient data and share it across all the centers because we can't all sequence everybody we need these huge numbers uh but also to essentially build pathways like these i actually think it's a very attractive way to think about and you know nominate treatments you know for individual patients but i think we need algorithms and it basically has to scale and it starts with big numbers and algorithms and i think it's the sort of thing the skill set you learn this week you can do this you can call variance you can you can look at dna and data um so the technical aspect is doable it's that last mile to okay now you have a list how do you interpret it that's the that's the gap that needs to be bridged yes so the current clinical state of the art is this table i just showed a minute ago so it's this um it's not a hard to fast rule some of you know it's read papers is there uh functional evidence has a clinical trial been done it's really not necessarily black and white in the very early days of cancer genome atlas it was purely frequency you'd sequence 100 tumors and which genes have more mutations than you'd expect by chance given the mutation right that was like the early definition of what a driver was that's not become a little more nuanced because we don't always have tens of thousands of tumors to find a very very rare but important driver um and the functional data is actually still very very important has there been a kinase assay has it been a mouse has it been a cell line um so i think this is even this paper at the time was more a little bit controversial i guess for like this is the way to interpret a variant but it was something and that you know this is really what we in the clinical labs will follow to interpret a variant so i guess this is close to a rule set so i guess a set of pre theories are going to get but that definition is going to continue to evolve over time i think the genomics data in isolation is very challenging i think you need that second experiment you sort of saw in that case report every time there was a genomics result there was validation by a fixed result by sangra sequencing by some other secondary piece of data even if that secondary piece of data is more genomics it's dna plus rna those two together that's a very strong argument to be to make it's amplified and overextressed okay it's likely to be functionally important in some way but it's very hard to just say i saw it therefore it's a driver really needs something more yes yeah that's sure it was an unanswerable question i think every lab will really do that in a different way uh often it's what understanding do we already have from colleagues either locally or internationally has some the number one list of genes actually find interesting when their novel are have they been linked to a net of disease and is there a body of literature even non-cancer around it so this is all the cancer talk but all these genes have been studied in another context that's why they're in reference databases so that's really to the top of the list in the unknown significance the genes of unknown significance the ones that had been worked on in another context are usually where i would start i guess the other area is is it being expressed at an abnormally high level given the cancer of origin so the expression data i find is also the most informative for trying to inform what's important in the mutation landscape yes do you think the identity of the driver gene may change and run away what kinds of what's really um certainly in that new drivers arise it's unusual to see a driver go away it certainly does happen especially if there's a large chromosomal deletion but certainly acquisition of subsequent drivers especially subsequent sub-formal drivers i think that's going to be the main enemy as we get good at killing off clonal populations so i think once a driver you're likely always a driver but certainly additional drivers we need the ability to find and monitor those over time and not just drivers but also modifiers to existing drivers so i actually i didn't mention this talk but eGFR is the most famous there are sort of three canonical activating mutations in eGFR and they all get treated with kinase inhibitor but there's a very distinct secondary point mutation that increases the affinity of eGFR for ATP instead of the drug so that secondary mutation is not really a driver it's at t790m it's not really a driver but it evades the drug itself so it's sort of hard to be an everything to passenger driver even though i presented it a bit that way yes and not not necessarily matched um i matched tissue especially for cancer patients is almost impossible to get unless it happens to be normal adjacent so especially for this type of analysis tissue of origin cell of origin work um because you have to rely on public data sets uh where match normal is i would say basically mandatory is exome and RNA sequencing because there is just so much germline polymorphism that isn't captured even by the largest polymorphism databases especially regionally around the world though certainly the middle east is not very well represented in dbSNP or in the public um the public genotype databases it's getting better over time but really whenever we think we found a variant of up on those significance and don't have a normal my mind always goes first to it we need the match normal to look at it it's just it's almost always turns out to be a polymorphism um so certainly for the dna big panel exome or genome work i think they really want to match dna control it doesn't have to match to the tissues we'll always use blood as the match normal for RNA certainly using the public data is really the only realistic way to go uh even normal adjacent there's always a concern is it really normal is there a field effect are there tumor cells in that normal um certainly valuable to have but i think for this type of work especially when you need brain lung thyroid then you're not going to get that from one patient ever um so that's really where you have to use the public data so match normal for dna public data for RNA yeah so just uh so um where's that companion side yeah so this is a real franken at the time that we only had 50 tumors in match blood this is really an imperfect reference it was some of those tumors easily could have had very high gfr expression some of them were long um match blood is very strange as a reference but you need something as a reference at the time so the idea was basically to find some sort of baseline across this scrambled egg of of tissue types and then look for wild outliers and that actually made this outlier really quite striking because despite 50 you know coupled together tumors and normals it was still a wild outlier that really makes that red application look very important uh if we were to do this again there's such a rich set of data from gtex from pcga uh you don't you wouldn't have to take whatever happens to be handy you could really use uh you know a long reference set a long tumor reference set uh you could actually do that in uh actually this is what they do in the pog project they will do that cancer by cancer yeah sorry i asked sort of a nifty question about the clinical case so that patient and that choice is basically for a lot of that patient still e gfr wild type although only over it's yeah that's right they're a wild type curious about why that was done was that before it wasn't before e gfr like e gfr uh correlation was known but it wasn't clinically available so it's sort of in this gray transition period between the discovery at the time there was a debate uh actually passage on a great top of great paper um mutations or a fish story is supposed to be amplification or mutation that was still actually an open question at the time so yes in that case if we were to do this all again e gfr mutant negative maybe wouldn't have considered a lot of it okay great well i'll be around with the break and enjoy the rest of your week i'm actually past student of this program so uh highly indoors thanks everyone