 Okay, first I would like to thank the organizer to give me an opportunity to present our work here. Today I'm going to be talking about our single molecule epistate for analyzing multiple tumor methalome sequencing data. The USC's on Hopkins team led by Peter Lea and Steve Bellin over last year have done the DNA methylation profile for about 10,000 samples, mainly by microarray platform. Today I'm going to be talking about a completely different platform, namely, Hosea known by suffice sequencing, which gives us the profile of 28 million single CPG with more sequence variation information. We complete the sequencing on 47 DCJ patient sample with on average 15-h sequencing per sample. And as you can see in the table here, we have a set of nine cancer types you can see in the table. Our group developing a method for identifying the recurrent, differentally methylated reasons, which I will refer DMR from now on across the multiple tumors based on the single molecule approach for the Hosea known by suffice sequencing. Hosea known by suffice sequencing offers a great but also challenging opportunity to test the tumor heterosanity is weak reflected in the by suffice converted sort rate. And you can see I have the example of three low side. Three low side consist of 8 CPG in the column here and X-ray represent a sort rate where you see the methylated CPG stay the same denoted by the red circle here whereas the un-methylated CPG was converted and denoted by the blue here. And if you just compute the red methylation, you're going to see all three low side have about 40% methylation. However, the nature of the data are very different as you can see. Amostanite group proposed a concept of epipolymorphism which measures the dynamic between the within read pattern in the set of the by suffice sequencing read. And you can see here the first low cost have no epipolymorphic because the read are the same. The second one have is highly polymorphic because the read are very different from each other whereas the first one have is somewhere in between. We propose to use epistate to summarize the difference between read by suffice sequencing pattern. You can see here in the first two low side we infer single epistate with different methylation levels and the first one we have two distinct epistate here. We use the method expectation maximization to learn the epistate mixtures which was successfully used to infer the allylic methylation previously. So when we have the epistate we are going to estimate the epistate frequencies for each individual sample and again using this example let's say we have a pair of colon samples one normal and one tumor. And if we pull the read we learn two epistate namely the epistate unfair and epistate beta here and you can see the epistate unfair is less methylated whereas the epistate beta is more methylated. The methods also give us the posterior probability of a read that being originated from either one of the two epistates and you can see here in the normal samples all the read are likely to be originated from the unmetallated epistate whereas most of the tumors read are from the methylated epistate. And then we compute the average of the probability we call it epistate frequencies and you can see here for the methylated epistate beta the normal sample have almost zero percent whereas the tumor sample have 83 percent. And then we use epistate frequency to identify the difference in methylated reasons. And here another example of two loci from six samples say from two TCO colons and lung here it cancer tie have one normal and two tumors. And using the epistate frequency we can identify the loci we have two epistate as the DMR loci. And the DMR reason can be the tumor specific epistate as you can see in the first locus because the methylated epistate only can be found in the tumor sample whereas the second locus is TCO specific epistate because the methylated epistate only found in the colon sample here. So in overview we develop a method called epistate that can pull the read from both normal and tumor sample. And then we use the window length of the read length like 200 ways to scan the whole genome. We ask whether this locus is have the two epistate or not and if so we determine whether it's tumor specific or TCO specific for further downstream analysis. And today I will present some of our results on the tumor specific analysis. First I want to present how we use epistate frequency to estimate the tumor purity. And here's one of the cancer specific loci we have. We have a locus of 170 base pair about 70 CPZ. And we infer two epistate the un-methylated epistate and methylated epistate you see here. And we found no methylated epistate in both leukocytes and osatine normals. And we look at two colon tumors as we saw the methylated epistate appear in the first colon tumor at the 76 percent whereas the second one have 30 percent. And in the whole genomes they have about 16,000 loci like that. And if we plot the scatter plot of the epistate frequency we will see three distinct cluster indicate either the loci is methylated in one of the two tumors or it can be methylated in both tumors. And if we plot the density plot it allows us to compute the mode of the frequency distribution. And from that we can compute we can estimate the tumor purity. For example the first colon tumor have about 80 percent and the second have about 40 percent purity. And in order to see whether we do a fair job we compare our result with absolute estimations. And as you can see here the acid is the estimation from our methods and the Y-acid is from absolute methods. And our result is highly correlated with the absolute estimation except for this sample which appear to have very few copy number of variation changes that may affect the accuracy of the absolute estimation. One of the advantages of our method is we can detect the epistate at very low frequencies. And here's one example I want to show you we estimate we inferred two epistates that locus only methylated in the colon sample. We have three colon samples here and this have 40 percent of the methylated epistate but the two last sample have very low namely 30 and 15 percent only 30 and 15 percent methylated epistate. And it have like almost no methylated epistate found in position normal as well as other tumor. The next and the last example I want to show you is the advantage of the sequencing approach over the array approach that we found a high amount of the DMR loci located at the distal element. And here one of the example we have 300 base pair locus we inferred two epistates we found no methylated epistate in both leukocytes and adjacent normals as well as in some of the tumors like colon here but we found the methylated epistate at the different frequencies in other tumors for example breast and endometrial sample. And we wonder where this locus is in the genome we look at the genome browser and we see this locus lie on 5,000 base pair upstream of the gene called SMAS3 and it's overlaid by the enrichment of the histone mark S3K27R situation which indicates this reason is active regulatory reasons. So it led us to check the activity the transcription activity of these genes where we found the nicely inverse correlation between the epistate frequency you can see in the eye assist here and the gene expression levels from RNA6 in the YR6 here 8.1 sample from 9 cancer ties which indicates these potentially enhanced reasons have player roles in regulating the function of this wind signal and cell cycle genes in a certain number of cancers. So it brings me to our summary we developed a single molecule epistate method for analyzing multiple hosinome bisulfide sequencing data and our methods can be applied to estimate the tumor purity as well as finding the interesting differentially methylated reasons such as regulatory reasons as the very low frequencies. And I would like to end my presentation by the acknowledgement to the people who are contributing to the work especially Ben, Peter and Huay and our work was sponsored by the TCJ grant and thank you for your attention. Thanks we have time for a question or two. I had a question when you're learning the epistates there's this you mentioned this EM expectation maximization step. Do you often find you'll converge to a you know a strong answer? Do you have to rerun this thing over and over again? Can you say a little bit about the method there? We when we run the EM we use the basic information contents to estimate the significance whether that locus have the two epistates or not and if it converts and then it has the because actually we run it for the we ask whether it have one epistate first and it will have the two epistates and we estimate its model using the BIC information and from that we will determine whether it have the two epistates or not. Do you find you have to rerun it though to get any different answers? Yeah, I mean we have the multiple starting solution and then we run it until it converts. Questions? Thanks Huay again, thanks.