 Yes, okay, so I'll be approaching this differently looking at a cancer exomes specifically at sporadic cancers I'm a PI who works on melanoma genetics, but this could be applied to other cancer types So it's well accepted that cancer is a genetic disease one of the best examples for this is called a rectal cancer Where you have a development of from a normal epithelial cell all the way to a metastasis and these very small phylogical stages are linked to various molecular changes seen right here Now this has become a paradigm for other solid cancers Even though a cancer has been a looked at genetically in depth We still are at the tip of the iceberg and many more alterations are to be found out so I'll start talking about the Platform setup to look for somatic mutations in cancer genomes So there are three main hurdles for this the first is establishing a very high quality tissue bank The next is the sequencing of very large quantities of samples and finally the analysis of the large number of basepers for these mutations These are just two examples of discoveries using unbiased sequencing approaches two kinases That have been found to be highly mutated in different cancer types and that have FDA approved Drugs be wrath has a recently FDA approved drug and so even though there are these hurdles obviously It's worth moving forward and looking for these alterations So I'll start talking about the tumor bank establishment. So we have here An example from breast cancer where you could start off by getting the primary breast tumor Or you could get the metastatic or and you could get the metastatic a brain tumor You could get these as fresh tumors or embedded Some of these tumors could be turned into cell lines Which could be useful as well for a tumor bank and some of these samples could be turned into a xenograft in an immunodeficient mouse Once you have this genomic DNA you could start looking at genome-wide DNA sequencing So what are the advantages and challenges for these various sources? So the advantage for the fresh frozen tumor or an OCT block is Obviously that you have highly reliable data. This is closest to the actual tumor that was resected However, usually you have very limited amounts of DNA from the source It is heterogeneous because you'll have different clones of cells populating that Tumor and it is a labor-intensive extraction It will require pathologist to find out where the tumor lies And then you will get either macro or laser capture micro dissection to get out the tumor cells Paraffin embedded tissue again highly reliable data the challenge again You'll have limited amounts of DNA. It's going to be heterogeneous again You'll need a pathologist to determine where the tumor cells are and with the paraffin embedded tissue You will have DNA quality issues as well because of the fixation procedure and the embedding procedure cell line Of course, we'll have a lot of DNA It's going to be mostly homogeneous because you have a clone of cells that being expanded in tissue culture The extraction is usually very simple There are cases where it's more of a challenge for example in melanoma because you have melanin being produced that could affect PCR reactions so you have to work on making sure that you don't have melanin in that DNA and of course the cell line could be Useful for downstream functional studies But we have to make sure that the alterations are Identified in the cell line do represent the original tumor So it is worth going back to that original fresh tumor to make sure that the alterations are found there as well Xenographed again. We have a lot of DNA. It's homogeneous. The extraction is simple However, again, we might need to make sure that the alterations are similar to original tumor It is expensive to make xenographs more than the other sources And finally you will get mouse DNA contamination and so that would affect your analysis down the line and you need to keep that in mind Okay, the other part of your tumor bank is going to be that you need your normal tissue Remember we're looking for somatic mutations So usually blood would be an excellent source for this DNA However, it's not always available and if you're looking at a leukemia for example, of course It's not going to be relevant So you could go to neighboring tissue But there you have to be careful because you might have some Contaminating tumor cells and now that we're looking at in-depth sequencing You will be able to pick up these contaminating tumor cells in the neighboring supposedly normal cells You want to have as much clinical information as possible About your patients so the date of birth date of death date of diagnosis malignancy stage The location of your primary and metastatic tumors and the various therapies that the patients underwent So this is an example of the tumor bank that we established But obviously this is applicable to any cancer type So in this case we have metastatic tumor DNA the match normal in our case It's blood the OCT block so the original fresh tumor a as well as the matched cell line so alterations in the cell line could always be gone and tested again in the fresh tumor The cell line allows RNA being made a as well as protein lysates and finally clinical information I'd like to point out That we have here three different cohorts It's a really worth while getting several cohorts in order to be able to validate your genetic data Each one of your cohorts have particular bottlenecks And so you want to make sure that filters are being applied to the various Cohorts and not affecting your results. So it really is worth spending the time on getting these additional cohorts So once you have your tumor back There's several quality controls that are worth considering The first is snip detection to make sure that the tumor and your normal tissues are matched the second is To implement an assay to determine that the fraction of the tumors are 75 percent or above That is because if it's below that there's a chance that you will not see loss of heterozygosity or homozygos alterations The assay that we use in our case is looking for melanoma antigens using immunohistochemistry But obviously for different cancers different assays would be applied And then a third quality control is the mutation analysis of known mutated genes to see whether the Percentage that you identify in your tumor bank is similar to what's known in the literature So this is just a schematic of a somatic mutation analysis where you have your patient Sectored you have your normal tissue in some cases you can get a cell line you extract your DNA You sequence the gene of interest You do the same for your normal tissue and then you compare those sequences Now we're looking at a somatic mutation any difference between the normal and the tumor In this case, we're looking at the coding regions as well as flanking 15 base pairs flanking the exons to look for splice site variations as well So if in the past We use candidate approaches now with whole exome and whole genome we can start being much less biased And so as mentioned previously TCGA the cancer genome atlas is applying a whole exome and whole genome sequencing to cancers So this is probably the largest Industry to provide all this data And so they said that there will be at least 3,000 new cancer cases by the end of September 2011 and they are on target So I'll move on to looking at exomes and Touch on how do we choose the genomic DNA source when looking at whole exomes? so We have to consider DNA quality so from fresh tumor I said we have limited DNA from the cell line We are unlimited The fresh tissue is heterogeneous the cell line is homogeneous Now does the cell line recapitulate the tumor biology now different ways to find out whether a cell line would recapitulate tumor biology And so how do you choose it? How do you assess it? In our case a we were fortunate to be able to do whole genome analysis and this led us to To decide which DNA to use for whole exomes So in the whole genome study We compared the fresh tumor to a cell line derived from that fresh tumor as well as blood And we run it through the lumina sequencer and looked for the melanoma somatic variations These are the build statistics and just like a gym and less mentioned in please note the Passing filter depth coverage you see that it's between 34 X and 67 X It's much lower than what's required it for exomes obviously because that does it's more Homogeneous in its coverage and you'll see the coverage that we get for whole exomes And this allows for 92% of the callable genotypes across the entire genome So the bottom line of this study was that 97% of the alterations identified in the coding region Overlapped between the fresh tumor and the cell line However the copy number variations were less concordant compared to the SNVs So based on this we decided to use the low-passage cell line DNA and not the fresh tumor because We were mainly interested in finding the single nucleotide variations And we knew that we needed at least six micrograms of DNA for the whole exome And so we could derive that from the cell line but getting it from the fresh tuned tumor was more of a challenge Also, we knew that we wouldn't have stroma contamination when looking at the exomes. So we decided to do the cell lines So this is the study design. We use the exome capture in this case 14 tumors were done and in parallel the normals were done as well So 28 exomes in this case the sure select a 37 megabase was it was used And so we captured about 20,000 genes And this was done at this so we had elan and then cross-matched applied This was the discovery every one of these studies is followed by a validation and the validation was done in this case using Sanger sequencing So we've seen a couple of these this morning already This is an example from this study where you can see the BRAF gene and its exomes You can see the sure select oligos covering these various exomes and you see the coverage itself And like I said the coverage is in a variable across these various exomes And that's why you need to go very much in depth in your coverage So what quality tests do you need to do once you get your whole exome data? So these are divided into five and we'll start off by looking at the coverage So this is an example of some of the samples that we did like I said the coverage was extremely high It was a minimum of 180 x and this allowed to a percent target region genotype coverage of 90 percent in average and So this is the performance summary where you get 12 gigabases a depth of 180 x and 90 percent coverage with high quality genotypes So the next item that it should be applied once the exomes are derived. It's worth looking for the specificity in your exomes so and this is a Some of the data that we got and how it was transferred it to an Excel file So what can be seen here is the gene name the reference a little the various the variation a little Same for the amino acids and what in addition to this we have the MPG score The gym and Jamie were talking about earlier As well as the coverage in this case We're using an MPG to coverage ratio and what we're seeing is that if you have a ratio of point five and above and This ratio occurs both in the tumor and in the normal sample when you look at Sanger Evaluation most of the time. This is a real somatic mutation. You can rely on this alteration However, if that ratio is below point five in either one of the sample So either in the tumor or in this normal sample the Sanger will not show you that there's a real mutation there A different way of looking at it is this way So if for example, you take about 90 regions and you assess them by Sanger sequencing They're divided like this So the ones that are above the point five ratio will validate and the ones that are below we won't So this allows you to calculate your specificity So you get 97.9 percent coverage rate and 2.4 percent false negative rate Once you apply this you remove 18 percent of the alterations Okay The next quality test is worth doing once you get your whole exon data is to look at the sensitivity so in this case We knew we already did candidate approaches before doing the whole exome And so we already had 47 somatic substitutions that we expected to see in the whole exome Out of those 47 only 38 were present in our whole exome study, which means that we had 81 percent Sensitivity now note that the missed alterations were not because they were not captured They were captured and they were sequenced, but they were simply missed and we're not sure why at this point Okay, so a the next item that's worth considering when looking at you the exome data is the number of somatic mutations that are identified per tumor So graphically seen these are the various samples that were sequenced and you can see the total number of mutations that can be seen There's a variation. There's a certain range But clearly there's one tumor here that has an extremely high number of some of mutations compared to the others And so one needs to identify this there's obviously a biological reason behind it But it's worth noting it and deciding whether this is a relevant sample for this particular study Now, of course, there could be all kinds of artifacts. These are hard to predict, but it's worth looking out for them And so I'll give you a few examples So in one example when we're looking at the data It's worth so we sorted it only not only by the sample but it's worth sorting it also by the chromosome So in this case it was done for patient number nine and there seemed to be out of the ordinary a Large number of somatic mutations on chromosome X. So we look more closely at this and we found that The genotype on chromosome X in nine n so the normal was one a little we knew the patient was a male But his tumor had two alleles in the same precise location as seen in this table So what's happening here is that there's a copy number variation and this is an important item Especially when looking at cancer genomes that you do have copy number variations And so in this case it could be a Y chromosome deletion an X chromosome duplication You could check this for example by fish. We did not but we do know that there's something going on here And so it is worth investigating the underlying reason before including these alterations in patient number nine on chromosome X Okay, so we went through these various quality tests now. Let's get to the actual study So as I said, we have a discovery screen and then we have the validation screen So we select the tumors and then we do the exome capture both of the tumor and the normal samples It is worthwhile doing the normal in parallel rather than the relying for example on DB snip for the various reasons that At this point Jim Jamie and less we're talking about so here's an example where we used a we first got the 14 samples And we found over 300,000 variants versus the reference sequence But once we applied the normal information It went down to about 5,000 so at this point. Yes, it is worth doing the normals as well Here's the validation screens. So then you go up to validate interesting genes in a larger number of samples I put an x-axis is obviously dependent on the number of samples that you have but the more the better Here you need to compare the gene mutation frequency to the expected background and we'll get back to this later in my talk It's very important to do this because this will allow you to find out whether the genes that have mutations of passenger mutations or their candidate cancer genes Again, I'm emphasizing the discovery and the validation screen For budgetary constraints. This is worth doing Exactly like this going for the discovery and then applying validation scaling up of just a subset of the interesting genes Okay, so let's go into the filtering process in more depth So the number of potential somatic mutations are seen here We filtered it through DB snip and went down to 141,000 then we applied the somatic filter. This was a 58,000 now because of the limitations of capture and sequencing some of the alterations that are actually polymorphisms could come up in another normal sample That's not necessarily the match normal and it's worth taking that into account because this is going to be a polymorphism So when you find that alteration another normal again, you filter that out and Then you remove all the non-coding very alterations and you get down to about 5000 alterations so a this is again recapitulated in the This part of the slide We have about 5000 alterations now when you apply the MPG coverage ratio of point five Like I said, we remove about 18 percent. So we have about 4,000 alterations And this is the way the alterations can be divided by missense non-sense splice sites insertions deletions and Synonymous it is important to keep track of these synonymous mutations as well Because then you can find out the ratio of the non-synonymous to synonymous mutations this is important because This ratio a to be is expected to be two to one if there's no selection However, if there's a significant difference from the two to one ratio It suggests that there has been some sort of selection for these mutations And so this particular gene probably has a role to play in the cancer So obviously a track down these synonymous mutations Okay What kind of data do we need to evaluate the drivers and the passengers? So which genes really have a role to play and which are just there and they have a neutral effect So this is a challenging question. It's being dealt with a lot by the field And this is just an attempt to answer the question. It can be answered by statistics by informatics and functional studies So if we look at statistics first as I said the non-synonymous Synonymous ratio if you look at the full exome study it was two to one Suggesting that most of the alterations that were identified are indeed passengers Mutations above background mutation rate So when you're doing your exomes track this down because the background mutation rate is the number of mutations per Megabase of DNA derived from all your exomes Of course, if this is in the literature, it is worth comparing it to your numbers as well Melanoma the number was eleven point four mutations per megabase Okay, the next is to see whether you have any hotspot mutations meaning the same exact Alteration in the exact same amino acid in different samples the likelihood of this happening is extremely low So again, it suggests this would be a driver and then look for the highly mutated genes So searching for the hotspots in our case we found nine novel genes. We're requiring mutations So a obviously this is only 14 samples. So we scaled it up in additional sets And you can see it went up to a very large number of samples And this is a summary of the hotspots and how they scaled up So this is a non-synonymous alteration. It's new it occurs in our sample set six times And in one of the cases it's found in a commercial cell line It is worth including commercial cell lines in your exomes or at least in the validation step Because that allows you a to do functional studies later on and it also allows the rest of the community To actually have a sample that has the alteration that you identified Validated and also do some functional studies So another interesting point here is that we found some synonymous Mutations that scaled up. So it's synonymous mutation that occurred in exactly the same position in three different samples This is interesting because I mean this is not going to affect the protein per se But what is it doing? It is being selected. So it is worth capturing these as well and maybe following up on them This is a schematic of a one of these hotspots And so this is a statistical test to see what's the chance of this happening by chance In order to calculate this you need the background mutation rate that we talked about before and you also obviously take into account The number of samples that were sequenced. So in this case the the likelihood is extremely low So this is being selected for and what is it doing to the a function of this protein? So we know that this protein is a histone acetyltransferase and we know that when it's disrupted in mice It causes lethality and defects in cell cycle progression Bioinformatics obviously is worth applying in this case You can see that the particular alteration is highly conserved in various orthologs So again suggesting it that it is very important If at all possible and it's hard to predict which gene is going to be a found in the exomes But it is worth doing some functional analysis to completely validate and see whether the mutation has an effect on the actual function So this is an example of a functional study that we did a Where we knocked down this particular protein in a cell line that was either wild type or mutant for the protein And we used shRNA and then check the effect of this knockdown on cell growth and apoptosis And so when we knocked it down in cell lines that had the mutation We could see an increase in apoptosis when there was the cells were growing low serum But this did not happen when we knocked down the cell the protein in cells that were wild type for this protein So this is a what's called oncogene addiction So the cells are dependent on this particular mutation. And so when you knock it down, they're more sensitive and they will die So so it is really worth trying to do some functional studies on alterations identified Okay, the other side looking for highly mutated genes So in this case, there were 16 highly mutated genes. Now, what do I mean by highly mutated? We look to see Which genes were mutated in two at least two samples out of the 14 that were exome captured and I wasn't enough to just look at the percentage of Mutations, but we also needed to do a binomial test where you take into account the background mutation rate and the size of the transcript of the particular gene So again when we found these 16 genes, we validated them so we scaled them up in additional samples seen here So here you can see number four when you look at these when you do this p-value calculation Always take the longest transcript size of this particular gene in order to really push this formula to its extreme And then you take into account the background mutation rate as well So here's where you only consider the percentage of the tumor that are being affected when you don't consider that p-value If you sort it by percentage you see these various genes that come up and they're highly mutated But you immediately see that for example Titan is in this list now Titan is the largest gene in the human genome It's a false positive. It's just because it's such a large gene and there's such a large background mutation rate Of course, it's going to be highly mutated However, when you start applying this p-value and you sort your data then you get a completely different table So in this case you get b-rap on the top of the list It's known to be highly mutated and has the lowest p-value and then you get these additional genes Now you do have a positive control here because then when you scale up in additional samples You can see that there's a concordance between the percentage that identified in the larger number of samples Compared to the original 14 samples. So we knew that we're probably in the right direction here And then when you look at the actual genes we went for the second best brief rough Of course is very well characterized a grin to a was never shown to be mutated previously in melanoma At least not highly mutated So if you focus into this then we did what I talked about earlier Look at additional cohorts to see if we can see a similar percentage in additional samples So you see that indeed when you look at these other cohorts The it is mutated there as well and we have the commercial cell lines here like I suggested earlier So it is important to acquiring these additional patient cohorts This is a schematic of the alterations in this particular protein I'm putting it up here because it's worth a looking at the particular alterations once you have a gene of interest in cosmic Now cosmic is a database it developed by the Sanger Institute It summarizes published and unpublished alterations identified in cancers different cancer types and so if you find a particular alteration In your gene of interest in cosmic it is likely to be important And so it's always worth going back to cosmic and in this case We found two alterations exactly in the same position and then we found a third one in cosmic So it's likely that this particular alteration is important Okay, what do we do with the complex exome? So we've been talking about using cell lines, but what happens once you start using the fresh tumor So we don't have much experience with this. We only started looking at it We do know that when you apply the MPG coverage ratio, we can find a Particular mutations in the normal tissue not only in the tumor. So for example, we found the BRAF v600 e Mutation in the normal. So clearly there's a contamination of our normal sample with tumor cells Which is what I said could be could be a problem Also, there's probably going to be heterogeneity. We haven't determined this but that's something to keep in mind So probably when looking at fresh tissues different algorithms different filters will have to be applied Okay, so I'd like to remind that we've been obviously looking at the large number of genes in the genome a lot of their Exomes and so you can look at genes, but also you can start looking at pathways And this was done for example for pancreatic cancer where a 12 core pathways Were identified to be altered in this particular cancer type, which means that if you look at two different patients Even though they have different mutations in different genes these genes still affect the exact same 12 core pathways So it is really worthwhile applying your exome data to pathway analysis and see how What kinds of pathways come up to be significant? Now is it worth delving deeper? So we talked about 14 exomes Once you complete your first set of exomes. How do you know it's worth doing more or are we done? So this is just a graph showing the mutations per megabase identified for different cancer types Now since we focus on melanoma I have these numbers right here derived from different studies and what seems to be the case is that at least for melanoma There's a very large number of mutations per megabase in some cases in order of magnitude higher So that seems it does seem that it's worth doing more exomes just because they're going to be many more passengers And so when you're doing your exomes, it's worth finding out what the mutation per megabase is Comparing it to other sample other cancer types and seeing you know, is it worth delving even deeper? So what are our future challenges? Obviously finding out what are the drivers versus the passengers from all these alterations How do we analyze interpreted all the data? a The functional studies are pretty important But how do we do this in a high throughput fashion and ultimately how do we apply all this data back into the clinic? So once we have the genes we get the pathways, but we could also Integrate all of this further and see how the various pathways Cross-talk and so this is the kind of database. We're hoping to derive a from many many exomes And this point I'd like to acknowledge all of these people in the slide. Thank you