 Thanks for the opportunity to give this presentation a part of our work. And I'm Avet van de Broek and I work at the pathology department of the VU University Medical Center in Amsterdam in the Netherlands. And my supervisors are Gerrit Meijer and Raymond Feynman. And this presentation is about structural variant detection in colorectal cancer. And this project is supported by CTMM projects that stands for Center for Translational and Molecular Medicine and by the Cancer Center Amsterdam, VU, VUMCCCA. Colorectal cancer is a major health care problem with an incidence worldwide of 1.2 million patients each year. And the incidence in the US is almost 150,000 patients. And there is an inverse relation between the states of the disease and the survival rates. And in total, approximately 40% of all colorectal cancer patients will die due to metastatic colorectal cancer. In our group, we are focusing on biomarker discovery. And this is a picture of an adenoma to carcinoma progression. And we need diagnostic biomarkers for early colorectal cancer detection. And we need prognostic biomarkers and predictive biomarkers. And I will focus on the prognostic biomarkers. 85% of all colorectal cancers exhibit chromosomal instability resulting in gains and losses of chromosomal segments. And this is a skyplot spectrocharotyping of a high abundant colorectal cancer cell line. And you can clearly see both the numerical and the structural variants in this plot. And I will show you, for example, chromosome 2, in which are more pieces of chromosomes, a piece of DNA present in this tumor than what you expect in a normal situation. And there is a translocation between chromosome 10, this is chromosome 10, and chromosome 23. So the red part belongs to chromosome 10. Cairo and Cairo II studies were performed in the Netherlands, faced three clinical studies in the Netherlands, and in total 1,575 patients were included. And this study was focusing on chemotherapy and metastatic colorectal cancer. The Cairo study is published in the Lancet, and the Cairo II study is published in the New England Journal of Medicine. And from this patient cohort, we have DNA from 356 patients. It is a representative group. And the DNA is derived from the primary tumor and the metastomal tissue. And the DNA is isolated from FFPE material. This whole set of 356 DNA samples was performed on an echelon 180K CGH array, comparative genomic hybridization. And after segmentation and calling, we are able to find the copy number changes, the numerical aberrations. And I will explain this in the next few slides. This is a profile of a tumor. And on the y-axis, you can see the lucky ratio of the tumor DNA compared to the normal DNA. And on the x-axis are all the chromosomes. And each black dot represents a probe. And the colored lines are the segment values. And we use the CGH call package to define the gains and the losses. And the green parts mean that there is a gain in the tumor compared to normal and the red, that there is a loss. So this is the aim of my project to identify the recurrent somatic structural genomic variants that cause colorectal cancer. So we used the CGH profiles from these last cohort of 356 chiro samples. And after segmentation, we identified the breakpoint and merged the breakpoints per gene. And we ended up with a list of candidates' genes potentially involved in structural variants. And this is the CGH plot again. And the definition of a breakpoint is here. The breakpoints are defined by the start position of the first probe of each segment and that suggests an underlying chromosomal break that could disrupt the normal architecture and the normal function of a gene. And we found in total 5,737 genes with one or more breakpoints. And 482 genes, or our candidate genes, identified with a recurrent breakpoint with a false discovery rate less than 0.1. And here are in this bar graph the most affected, 50 most affected candidates' genes and on the y-axis the amount of affected samples in every CGH. And micro D2 is the gene that is most prominent. It is present in about 40% of all samples. We have clinical data available. So we asked the question whether this affect the, this have an effect on survival. And this is in Kaplan-Mayer plot and the red line represents the samples lacking a breakpoint in micro D2 and the blue line are the patients with a breakpoint in micro D2. So we have our candidate list. But there are some important limitations using every CGH data. And the first is about the probe density. So the average distance between the probes is 70 kb. So the breakpoint location is only an estimation. And we don't have insight in the structure of the DNA. So we don't know which parts stitched what and all copy number neutral events will be missed, the balanced events. So we need candidate validation. And next generation sequencing can help us with these problems. We used the cancer genome atlas, correct or cancer cell lines of samples, sorry, which are sequenced, whole genome sequencing, parent sequencing, and we only used the tumor normal sets. And we developed our own algorithm, in-depth algorithm for structural variant detection which is candidate driven. And we selected genes lacking a breakpoint as a negative control. And our algorithm is mainly based on a read-pair approach. And the criteria for discordance are listed here, based on the location of the reads, the bridge length, and the orientation of the reads. And these are the discordant pair types, the translocation when the reads, the made reads are aligned on different chromosomes and an insertion and a deletion based on the bridge length and an inversion and an eversion based on the orientation of the reads. And a single map read could indicate that there is a breakpoint. And we combined this read-pair approach with a read-depth analysis and we defined the breakpoint location using the soft-clipped part and the matched part of the reads. And at least we determined the tumor-specific events. And this is an example of a translocation, macro D2. The colored reads indicate the translocation, the DPs, and here you can see the fusion partner and an additional evidence is that there is a clear breakpoint in both parts of the event, in the gene itself and in the fusion gene. And overlapping reads with the same DP type were grouped together and the distribution is in this pie chart and the biggest part of all DP groups are the deletions. And these are the eversions, the inversions, the insertions and 8% of all DPs are translocation. That is the data over all samples over all the candidates. And this is preliminary data. And I will focus in this presentation on translocation. And we found in our candidate genes a 5-fold higher number of translocation DP groups compared to our control genes. And this is the distribution of the translocation DP groups, that is on the y-axis, over all the candidate genes on the x-axis, and focusing on, assuming in on the first 20, we see that macro D2 is again the most prominent one. And we plot these data together in this plot. On the y-axis, the frequency of the translocation DP groups. And on the x-axis, the frequency of affected samples in error CTH breakpoint analysis. And macro D2 is on the upper right part of this graph. And there are some genes that correlate very nicely. And there are other clouds. And that is working progress. So we want to know what these candidates are. These are all the candidate genes. To conclude, we identified 482 candidate genes with recurrent breakpoints in a cohort of 356 corrective cancer samples based on error CTH breakpoint analysis. And the TCGH provide an essential corrective cancer reference data set to validate our candidate genes, validate the structural variance in the candidate breakpoints. And identification of breakpoints based on error CTH is correlated with structural variant detection in TCGH data. And further studies will be performed to investigate clinical and functional significance of validated candidate genes. Thanks. That was my presentation. And on time, besides, questions, yes? Yes. This is Angelo from Harvard. My first question is, did you use the low-pass polygenome samples for your validation? And second, can you comment on the presence of how many TCF fusions you found by your method? Because we found them in the marker paper. And also the Myerson Group previously published TCF 7 fusions. We are digging in the data at this moment. And we used, indeed, the low-coverage samples. And the most challenging part of using the low-coverage data is the statistical analysis. And that is one of the problems. So we tried to find the recurrent events over the whole sample set. So that's what we tried to do. And we have our candidate list based on breakpoint analysis. So we can be very sensitive in our computational methods. And we didn't find the fusion gene you mentioned. We only have the three high-coverage samples. So I don't know if one of the samples should harbor this fusion gene. Last quick question rule and quick answer. Hi. I thought it was a great talk. I wondered if you had looked at the RNA seed data from these TCJ co-director samples to see the effect on the transcripts. Is that the next step? So we haven't done that at this moment, but we will do that. Thanks. That's an excellent example of how we hope, increasingly, that TCGA data will be used in the context of other sorts of studies, including ones which are designed for clinical answers to clinical questions.