 Good morning. Everyone had a good rest yesterday after a long lab? Was it good? The lab? Up until nine o'clock? The open lab? Everyone enjoyed it? Cool. Okay, so you spent a lot of time yesterday hearing and exercising in the field of the biomarker discovery. SOREP did a great job giving you an overview of what the biomarker is, what it's for, and introduced you into full details of the process of the discovery of biomarkers, therapeutic targets. So let's suppose that we have identified our biomarker, which is excellent, has an excellent profile, and you invest a number of years into the development of that biomarker, and then you spend some 200 million dollars on going through all the clinical trials in the ordeal with the FDA approval and everything. And so you put it into the clinical practice, and then it turns out that it's not effective for all patients with this current disease, and even more patients show the full spectrum of the reactions to this particular drug, side effects, adverse side effects, or lack of the therapeutic effect. And so that's why I think it is really important to take into account all of the aspects of a future personalized medicine, which includes both the discovery of novel biomarkers and development of new drugs and the application of those drugs to actual patients. And so in this module, I'm going to focus in the first part on the pharmacogenetics, pharmacogenomics, what it is, and its evolution, because it's basically transformed our understanding of clinical intervention from silver bullet to a personalized medicine. And then I will touch a little bit on the whole genome association studies and how those studies have proved to be useful for pharmacogenomics and pharmacogenetics. And then I will move on to a cancer genomics as a special case of a human disease, which is most complex disease. And then in the second part of this module, I will be talking about the analytical techniques to analyze clinical data. So you've heard a lot about the whole genome and whole transcriptome data yesterday and the ways of its analysis. And today we're going to talk about the clinical data and specifically about the type of clinical data, survival data, and survival analysis, because it's a special type of data and it requires special types of analytical techniques. And these are widely used techniques I'm going to cover. So the pharmacogenetics and pharmacogenomics is the study of the inheritance in variation in drug response. So these drug response can range from life-threatening adverse effects on one end of the spectrum to the equally serious lack of therapeutic effect on the other end of the spectrum. And so over the last top century, pharmacogenetics has actually evolved from a study focused on the monogenic traits, this is pharmacogenetics, to a pharmacogenomics with a whole genome perspective. So the earliest experiments of pharmacogenetics experimentally validated examples were reported back in 1950s and 1960s. And they grew up from the observations that a large, that there is a large differences in response to a standard drug doses in a population of patients. So one of the examples was short-acting muscle relaxant, which was given in the standard dose to a number of patients. And in some of them, it caused a serious side effect, potentially lethal, a prolonged muscle paralysis. So the investigation of a process of a metabolism of this drug led to the discovery of a genetic variation within the enzyme metabolizing this drug, which produced dysfunctional form of this protein in those patients. Another example at the same time was reported about the same time was the anti-tuberculosis drug that was showing a bimodal distribution of a plasma concentration in patients. And those different concentrations were related to the risk for adverse effects. And thorough investigation of the metabolism of this drug has led to the discovery, again, of a genetic variation in the activity of an enzyme NAT2, which was responsible for catalysis of this drug. So these observations served as a stimulus for additional studies in this area. But again, the focus of the pharmacogenetics was a particular factor, and in most cases in pharmacokinetics, and that is a drug metabolism. And it would be usually a typical phenotype to genotype approach, which is widely used in hemogenetics. But the following examples actually illustrate the evolution of pharmacogenetics from a biochemical to a molecular pharmacogenetics. So these are sort of the icons of that era, the C2D6 story. So this gene is a cytochrome B450 family of microsomal drug metabolizing enzymes. This gene catalyzes bio transformation of a wide spectrum of drugs, anti-depressants, anti-arhythmic drugs, and it also activates analgesic protracodion. So and then the thorough investigation of this gene in a population of patients showed that actually those patients had a great number of, a great frequency of genetic variants within this gene. Some of the patients had non-synonymous coding SNPs associated with the decreased activity of this protein. Some of the patients had gene deletion. Other patients had gene duplications, up to 13 copies. And so this plot, this figure actually shows you a frequency, the frequency distribution of a ratio of a drug to its metabolites within a population of patients. And you see here a few groups here. So the major group is, was called an extensive metabolizers. But there were two extreme groups that were called four metabolizers and ultra-rapid metabolizers. And so now you can imagine that the poor metabolizers can suffer from the excessive drug effect when given something like an anti-depressant, for example. But in case of the codeine, they may lack therapeutic effect because this protein is supposed to metabolize codeine into morphine. And conversely, the ultra-rapid metabolizers may suffer from the excessive effect of, say, anti-depressant. But I'm sorry, lack of the therapeutic effect of the anti-depressant. But at the same time, they can be overdosed with codeine because they metabolize codeine into morphine too rapidly. And so there were cases of a respiratory arrest in those patients, which were given as standard doses of a cough depressant containing codeine. So another example is the TPMT story. These protein catalyzes the asymmethylation of thiopurine drugs. And thiopurine drugs are cytotoxic and immunosuppressive drugs. They are used to treat the acute lymphoblastic leukemia of childhood and some other diseases, including inflammatory bowel disease and organ transplant recipients. So these drugs unfortunately have very narrow therapeutic index, meaning that the difference in concentration of a drug necessary to achieve a therapeutic effect and the concentration of a drug that cause toxicity is very small. So the most serious toxicity, which is induced by the thiopurine drugs, is life-threatening myelosuppression, bone marrow suppression. Then it turned out that there is a genetic polymorphism in that t-vating TPMT through protein misfolding. And that genetic polymorphism was associated with the increased risk for myelosuppression. And so those patients who had that mutation had to be given one-tenth to one-fifteenth of a standard dose. And so this example actually was the first example of including pharmacogenetic data into the drug label. And this is very important because you can imagine that, for instance, you put the new drug into the clinical practice. And as I mentioned before, it turns out that it has very serious adverse effects in a large fraction of a population. And it may be ineffective. And you cannot really predict what dose you are supposed to give to a patient. And then in the end of the day, gone through the ordeal of development and testing this drug and putting it into the clinic, you have to withdraw it. So this is very painful process. And it was really a breakthrough that FDA started hearings on this topic. And this was the first example. So this figure just shows you the distribution, frequency distribution of TPMT activity level in red blood cells. And they designated the inactivated isoform with low activity. And the wild type is high activity. And so these patients were homozygous for the low activity allele. These ones were heterozygous. And these ones were homozygous for a wild type allele. So pharmacogenetics has been focusing on a monogenic Mendelian traits and mostly on the pharmacokinetics aspect of a drug. But it's becoming increasingly evident that most of the human diseases are a polygenic traits. And the drug responses to a proper therapy is also governed by many factors. So the brutal fact actually was that even if you know the factor that is playing some role in the diverse reaction to the drug, even if you know it, it does not explain 100% of cases of variability in drug response. And so with the following example, I'm going to show you that not only pharmacokinetic factors, but also pharmacodynamic factors actually influence the efficacy of drug and the side effects of the therapy. So the EGFR story here. So endothelogrofactor receptor is overexpressed in non-small cell lung cancer. And there was a number of antibodies developed against this receptor. One of them was Gefitinib. And it was also noted that there was a wide range of response to this antibody. And best responders were women, never smoked, and of East Asian origin. So then when they started to probe different populations, it turned out that there was a genetic variation in the target of this antibody, EGFR, in the ATP binding site. And this mutation actually was activating. So patients with this mutation responded better. And so within this North American population, there was very low frequency of this mutation and very high frequency in East Asian population. And again, very high frequency in responders versus very low frequency in non-responders. So that's how there was a discovered a direct link between this mutation in the drug target, which is a dynamic factor. So here's another story, the warfarin story, which is probably the ultimate example here. So the warfarin is the most widely prescribed oral anticoagulant. And unfortunately, it has serious adverse effects, hemorrhagin and desired coagulation. So this drug is predominantly metabolized by cytochrome P450-chymaline member CYP-2C9. It's another one compared to the first one, but still. So it was found that there were two common polymorphisms present within this gene and they were associated with a decreased activity of this protein. One of them decreased activity down to 12% of the wild type and another one down to 5% of the wild type. So the frequency of polymorphisms was about this level, about 10%. Unfortunately, the Firmaco-Clinetic genetic variation, this one, did not explain most of the variants in response in patients given warfarin. And for quite a long time, the target of this drug would not know until some 2004 when this gene was identified and cloned. This is V4C1 and also no non-synonymous SNPs were found within this gene. Several haplotypes were discovered within this gene that were associated with the final dose of a warfarin. So warfarin's story actually represents probably in the simplified manner a polygenic model that we must expect to see in the future. So this model includes both the Firmaco-Clinetic factors which alter the metabolism and the final concentration of a drug and Firmaco-Dynamic factors which affect the drug target and the pathways downstream of the target. So yesterday, you've heard a lot about the different approaches for detection of new biomarkers using different platforms and techniques. But in this case, I'm just going to touch a little bit on the genome-wide association studies. There were numerous publications using this approach, but I'm just going to give you an example how these type of studies can be used directly in clinic. So what is the genome-wide association study just to remind you? It is a examination of genetic variation across a given genome designed to identify genetic associations with observable traits. These type of analysis require two groups of participants, subjects with the disease cases and subjects without disease controls. So first, you do the genotyping of each individual and then you test the association of set of markers, SNPs, with the disease or trait. So this is just a this is just a list of some of the recent publications where the GWAS has been used for the detection of some polymorphisms that could contribute to risk of a number of diseases including diabetes to breast cancer and others. But unfortunately, the problem with the GWAS studies is that the odds ratio is pretty low. But there is one example and this is a statin's story. So the statins are a class of drugs that are used for controlling a cholesterol level. So they are an HMG, phoenzyme A redactase inhibitors. So the serious side effect of these drugs is myopathy. And so there was a clinical trial search called search commenced to investigate the effect of the drug dose onto the patient's response. And as a part of it trial, the researchers investigated patients who had a serious myopathy. So they had a number of 85 patients with this condition versus 90 controls. And the genotyping gave the GWAS study gave one single polymorphism in this transporter gene, which had an incredibly high odds ratio. And this is a p-value. So for the genome wide association studies, this is relatively small sample size of patients. But anyway, they proceeded with this funding and they validated this finding in a much larger cohort of 10,000 patients. And they got a lower odds ratio, but still it was 2.6. So the authors stated that they could explain more than 60% of variability in the drug response using this polymorphism. So now I would like to change topic a little bit and to talk about the cancer, which is the most devastating and most complex human disease. And here again, I'm going to talk about the cancer genomics, which is the study of the human cancer genome. And it is a search within cancer families and patients for the full collection of genes and mutations, both inherited and sporadic somatic that contribute to the development of cancer cells. And it's progression from localized cancer to one that grows uncontrolled and metastasizes. So what is the cancer? Now I think everyone is convinced that cancer is not driven by a single oncogene or tumor suppressor gene. Although there are a depth of this theory, but still it is widely appreciated that cancer is a polygenic disease with many factors involved. And so this is a typical karyotype of a cancer cell. This is a breast cancer cell line, MCF-7, and different colors here mean different portions of different chromosomes. This is a sky image. And so you see a great deal of aberrations happening here within the genome. So the aberrations in cancer happen on practically all levels of cellular function, starting from the aberrations on the genome level. And these are genomic aberrations such as inversions, copy number changes, deletions, translocations. Then there are point mutations with an important genes. The next level is the changes on the transcription level. Those are differentially expressed genes that we were talking about yesterday, the great detail. Then we have changes in splicing regulation. It can be an average splicing due to mutations. It can be altered regulation of splicing. It's a changes on the epigenetic level. And of course it is a change on the protein levels. So in addition to a wide diversity of events that take place on different levels in cancer, there is also another complexity of a reshuffling of the genome within cancer cell. And so this example shows you how complex the amplicon can be within a particular cancer. This is a breast cancer cell line. And this is an amplicon on chromosome 20Q. And the sequencing of this amplicon gave this structure. So you see a lot of different segments coming from different places. Well in this case mostly from the same chromosome but also from another chromosome are concatenated with each other. And this finding was validated using using PCR on junctions and breakpoints. So cancer also is a very heterogeneous disease. So this is a typical copy number frequency plot. And you see copy number increases and deletions. So these aberrations take place not in the 100% of cases with the same disease. And this is also seen here on the expression profiles where you often see heterogeneity in some times you see subtypes of a particular tumor. And so the HER2 story and Trastizimab story actually tells you this is the first story of application of cancer genomics into the discovery of a drug and putting it all the way through the clinic. So the HER2 receptor is a cell surface receptor ties in kinase. And it's a member of the ERB family. There are four members within this family. This is one of them. So the overexpression of this receptor results in the activation of intracellular signaling through the RAS rough ERC and PI3 AKT pathways to promote cell division, cell growth and inhibit aboptosis. So back in 1987 Dr. Islaman an oncologist made a striking observation that a HER2 overexpression was highly correlated with the patient survival and with the tumor sites. And so he started to investigate it and he found an overexpression of this HER2 in 25 to 30 percent of breast cancer patients. And the association with the shortest survival and relapse times was significant. So a little later Genentech in 1990 develops antibodies against this receptor and it was a breakthrough work when they humanized a murine antibody within some 10 months and produced a humanized monoclonal antibody against HER2 receptor. But they were testing this antibody on different spectrum of breast cancer samples. And so they got a only effectiveness only in a few percent of patients. And then there were approach with people who knew about the methods developed to detect HER2 amplification with fish and immunohistochemistry. And once they have coupled these methods of detection of the amplification within breast cancer with the treatment with the antibody they initially they instantly got a excellent response in those patients who had an amplification of HER2. And so the trust to zoom up together with the screening test for the HER2 amplification went into clinical trials already in 1992 and now it is a standard of care the use of tests for HER2 expression status and Herceptin in combination with other drugs. So this slide actually shows you a clinical trials with Herceptin in patients who had a HER2 amplification. So these were a highly selective patients. And so here you see a monotherapy with Herceptin and you can see the response rate was from about 10 to 30 percent. And then there were a number of trials where they used a combination therapy of Herceptin with these different drugs. And within these two clinical trials they had a control cohort where they used just this drug without Herceptin and this is again the response rate. And so you see that the combination therapy with Herceptin and HER2 positive patients gave a response rate of up to 70 percent and that was an exceptional breakthrough. But it also actually points out the necessity of approaching a disease from multiple points from multiple aspects and using different drugs targeting different targets to achieve a desirable response rate. So this was the first example which set the road into this direction. And right now there is a great number of studies being published devoted to predicting outcomes of patients based on some set of blind markers. And yesterday you've heard about this paper. I'm just going to remind you a little bit about it within a few minutes where they used a integrative analysis of copy number and expression to predict outcome of breast cancer patients. So the expression signature of breast tumors showed a distinct subtypes including basal, luminal, and erby 2. Well as Zorab mentioned yesterday it's a little bit of the overstatement here because there is no striking clustering of this expression data. This is slightly the other question I just have to mention that these subclasses of breast cancer had been reported before. But in this case there was and also the same pattern, the same subgroups are observed within the model system such as breast cancer cell lines. And so but these patients have been treated with aggressive therapy and that might contribute to a clustering like this. But anyway you could see some clusters here including basals and erby 2. So what is the survival experience of these groups? It's shown here. This is a Kaplan-Meier curve and I'm going to cover this method later and we will be practicing on this particular data set we will be producing the same Kaplan-Meier curves. So but what it tells you is that erby 2 has the porous prognosis and the basal subtype 2. And this is based on the expression profiles. This figure shows you a clustering of a copy number changes within the same tumors and what you can see is that there is a number of high-level amplifiers within a luminal subgroup. And when they stratified patients into those who have these high-level amplifications and into those who didn't have, they got these survival experiences. So even within a particular expression pattern-based subtype, they could further stratify patients into poorer and better outcome patients based on the copy number changes. So this is just one of the examples to show what sort of research is taking place right now. So the question here now is are we really managing cancer and where are we in this fight? So if you look at the statistics and this comes from the Canadian Cancer Society of the incidence rate and mortality rate for a few selected cancer types, there is actually a very interesting trend. So let's look at the males. These are different cancers. On the left you have incidence. On the right you have mortality. So this is lung cancer. It goes down the incidence rate because of the increasing awareness of a smoke risk factor. Here's the prostate cancer which has a peak about 1992 and that was because of the PSA screening test when into clinic. And so all of those patients that were under diagnosed before were all suddenly diagnosed here. And then you see a decline here again because over time it became evident that the test that they were using was not that adequate. And so a lot of patients were actually over diagnosed at that point. And so that's why it came down. And then it went up again meaning that they came up with the improved screen that gave this incidence rate. And so the mortality from lung cancer goes down. This is good. And the mortality from prostate cancer, where is it? It's not here. But anyway, the mortality from prostate cancer in males is also going down. So this is great. So what do we see in females? It's again very interesting data. So here we have breast cancer and the incidence rate and mortality. So the mortality goes down a little bit because of the improving diagnostic procedures. And sometime around 1985, the mammography, the modern mammography went into the clinic, increasing the rate of the detection of early stage breast cancers. But interesting thing is that for instance, the incidence rate of a lung cancer actually goes up and mortality goes up. So if you look at all of the cancers together, these are males. These are females. This is the overall curve. This is a starting point, the baseline, the rate that it used to be at 1980. And so you see that there is an increase here in the incidence rate and incidence in the mortality. But if you correct for the aging population and for the population growth, you see that the incidence goes just a little bit up in males and the mortality rate goes a little bit down. And this is primarily due to the prostate cancer and lung cancer. In females though, you see that the incidence rate is pretty much flat over time and the mortality also stays about the same. So this actually illustrates that there is still a keen need in novel prognostic and therapeutic biomarkers. And we need to understand in a greater detail what's happening when a patient is given a particular therapy. And then what particular therapy is best for this patient and for that patient. So what have we learned from this part? So the major message is that modern technology and biological knowledge has transformed our perception of clinical intervention. Now it's become clear that there is no one for all decision for every patient, but it should be a personalized medicine. And so, of course, most of the diseases are polygenic traits and that increasing, there is an increasing appreciation of genetic and genomic data that should be used for drug labels. Modern technologies have enabled scanning of the whole genomes and transcriptomes for additional or better prognostic and therapeutic targets. And then we also could see that the integration of multiple level data including genomic expression, epigenetic, histopathological data increases our power of discovery. But it also clear that still much is to be done for putting principles of personalized medicine into practice. So we need new biomarkers, we need new analytical methodology and moreover we need a new legislation for including genetic data into the healthcare. And as for the cancer, still a long way to manage cancer well enough. So now I will take questions. All right, so now for something completely different. Once again. So this part is really meant to be interactive and more of a group discussion. And so I hope that everyone can offer their perspective and opinions as I go through some of this material. So we are entering a brand new world in terms of data generation, where traditionally in bioinformatics, most of the studies up until just a few years ago have been focused on model organisms, the anonymous one human genome project. And the data, bioinformatics has really gained strength from data being put into public repositories as it's being generated and just freely available for research. But we've entered a brand new era where if we just do that with human subjects, we have the potential of introducing some complicating factors in terms of identifying individuals that should remain anonymous. So what this is meant to do is just to really promote an awareness that since we're all working with clinical data, some of the care and precautions that need to be taken when handling this data. So we're really going to go over what identifying data means in research. And then we'll go over just the policies of three example organizations, the ICGC, the European Genotyping Archive, and the Cancer Genome Atlas Project. We'll talk a little bit about controlled or tiered access to data. So really we've come to a situation where we have our generating starts to generate lots of data for the purpose of studying cancer or studying human diseases. And as I said before, what this does is it introduces a bit of a conundrum or dilemma where we want to make advances in research. And we've seen many examples that sharing data in the research community where international collaborators and actually anyone can download data has tremendous benefits in terms of analyzing data. So the more groups that are looking at the data, the more we're going to be able to extract from that. So we want to be able to promote this but still maintain the protection of the privacy of the people that actually donated the samples. So this is really an issue that in bioinformatics has not come up until just the last few years. And so we as the data handlers really have to be quite careful about this. So of course hospitals have to deal with this and physicians have to deal with this for a long time. But as researchers now we need to really become aware of this. So genetic data is identifiable. So what that means is that even with just a very few number of polymorphisms on the order of tens, you can uniquely identify an individual. So we're generating these data, let's say a million SNPs on a chip. But all it takes is a few tens of carefully selected polymorphisms to then be able to identify that individual. And so why is that a problem? Well we want to preserve the anonymity of the donor in order to avoid certain things like just personal embarrassment, legal or financial consequences, stigmatization or even discrimination, insurance companies, employment loans, etc. So there's a very nice paper that really kind of outlines this problem. And I recommend that you read it, it's published in Science in 2007, that really talks about the sort of coming, the balance point between how research is going to be conducted and to get the most out of the data while protecting the privacy of the patient. So maybe I'll just pause there and just take any questions, comments on this whole issue. And what do people think about this era that we're entering in and what kind of consequences or if you have personal stories or about data that you want to share? So I'll just open it up to the floor here, what people are concerned about or not concerned about or if you had your data, if you had donated your samples to research, to what extent would you want your privacy maintained? So let's open it up. Do you not care that much about that? So I generally think that much about it, that they are not aware, as aware, we are talking about the impact of the analysis and the privacy of the patient. So I came across just a piece of news and I'm not sure what the status of this is, I appreciate if anybody knew, but there is a bill put forward in the US about discrimination based on genetics. Does anybody know about that? Yeah, did it actually pass recently? So there are laws now that are catching up to this issue. So any other comments or questions on this? So that's really the setup that one needs to have. So I recently had my laptop stolen and you can imagine how many samples, genetic samples were actually on my laptop, but because I'm in the research side of things, actually there was no harm that could be possibly done there in terms of revealing personal information about the patients because everything was just through anonymous identifiers and only a few people at the cancer agency actually hold the keys to de-anonymize those identifiers. So I mean, I'm sure we all work in institutions where this is safeguarded, but the issue is about when we actually deposit data in writing papers and want to deposit data in public repositories, whether it would take some great heroic efforts, but it is possible for people to potentially de-identify it. Yeah, so you need a matched sample with the linking personal information. And so it's been said also that this problem might be overstated actually that maybe this isn't such a huge problem. So what this paper discusses is that is the growing body of databases like that, though in the police system and other types of institutions like that where they actually have the samples and the results of genetic profiling and the personal information. So if there's a breach there, it's a big issue. Okay, so identifiable data and privacy law. So generally speaking, identifiable data does fall under the privacy laws of most countries. And what this results in is actually controlled and conditional release of data. And so this data is not available for public release. And the question is, is really does this impact research? So as I said before, that research is really benefited tremendously and science has benefited from freely accessible data. Consider the human genome project. So basically as the data was being generated, people were able to analyze it within literally 24 hours of its data generation. And so the labs around the world and the scientific community was trolling through this data as it was being generated. And then we have other examples like GenBank, which is contains all the sequences that have been generated and put in public repository. This has been an absolutely invaluable resource when trying to functionally characterize your sequence or what have you. And so these rich sources of data, we want to probably, we want to do something like this for clinical genomics. But will this model work? So that's the question. And so I think we're still all navigating this space. But yeah, please. So one of the things that people do is, for example, they make, for example, birth dates less precise. So they just say the year in which someone was born. And maybe just don't put an address certainly in the clinical data, things like that. So absolutely so that it doesn't take, even if you don't have precise information, it doesn't take many variables to be able to specifically narrow down an individual. So that's a very good point. So I just want to go over now how the large scale data providers are actually dealing with this problem. So here's the International Cancer Genome Consortium. They have a white paper here that outlines their policy. And so again, I encourage you to visit this because there's some interesting ideas. So let's just go over the core bioethical elements that are in this document. So it says, for prospective research, ICGC members should convey to potential participants that the ICGC is a coordinated effort among scientific research projects being carried out around the world. Participation in the ICGC and its components projects is voluntary. Samples and data collected will be used for cancer research, which may include whole genome sequencing. The patient's care will not be affected by their decision regarding participation. So basically, the patient has to understand that this is a donation that's not going to come. There's no communication back to the patient at all. So the samples collected will be in limited quantities, access to them will be tightly controlled, and will depend on the policy and practices of the ICGC member project. At least a small percentage of the samples may be shared with international laboratories for the purpose of performing quality control studies. Data derived from the samples collected and data generated by the ICGC members will be made accessible to ICGC members and other international researchers through either an open or a controlled access database under terms and conditions that will maximize participant confidentiality. So I was speaking with Francis last night about this. It's dinner, and so the freely available data, either through the open data, this is going to be pooled data. So you'll have, for example, all patients with particular cancer, the SNPs will be just pooled together and it'll be made as a frequency assessment. So it's a summary of a pooled data, whereas the individual data will be under controlled access. So those accessing data and samples will be required to affirm that they will not attempt to re-identify participants. And then there's a declaration that there is a remote risk of being identified from data available on the databases. So and then just to finish off, so once data is placed in open databases, the data cannot be withdrawn later. So once it's there, it's available and you can't just take it back. In controlled data access databases, the links, local data that can identify an individual will be destroyed upon withdrawal and data previously distributed will continue to be used. ICGC members agree not to make claims possible, to possible intellectual property derived from primary data and no profit from eventual commercial products will be returned to subjects donating samples. So these are the types of guidelines and rules that are governing the ICGC. Are there any comments or questions on this? And this is literally something that I just pulled out from this white paper, but it's actually a really quite interesting relevance, I think, to this workshop. Any questions? Yeah, so I'm not sure what implications that has for the ICGC. Householdman Francis is going to be here this morning, but maybe he'll turn up a little later. But so I can give you a perspective from BC is that so with our ethics review board, we had to keep going back like this over and over again. And eventually, I think their top ethicists said actually, so you don't need to actually do that. So the patients, in terms of that our ethics actually consented for kind of any kind of experimental analysis, but in generally speaking, that I think you're right that most of the review boards really require specific testing. So I'm not sure how they're going to reconcile that. Brother, on using those samples, but if there is, you would have to say, in your project, you would require to go to the clean conservation in the chart or whatever, then you cannot go back to the chart, then you are free to do that if the patient can consent for water and research. I have a lot of understanding from a conversation with someone that understands that this is a priority, you know, that kind of thing. And access, make their chart to get extra clean information in the future, right? So even though you cannot make a consent form, that you would ask the patient to sign? Yes, if in the consent, you state that you accept the clean conservation from the chart and if the patient is consenting for that, you cannot ask them to, or rather, do anything you want. If the patient, if the consent form says that you will not accept the only kind of consent between the term and the patient, then you can, you can ask the patient to sign. Great, well, thanks for that perspective. Okay, so here's the, the Cancer Genome Atlas and what their, their governance says. So the TCGA pilot project anticipates that its data will be of high value in a number of research areas and will be used in many ways. Those include, but are not limited to the development of new analytical methods, identification of the genomic etiology of individual tumor types and subtypes, and development of new experimental diagnostic, therapeutic and preventative approaches and strategies for cancer. Thus, the TCGA project recognizes that the data should be available to all users for any purpose, limited only by the need to avoid identifiability of the research participants, and they cite the same paper that I cited earlier. So now this one seems much more relaxed in terms of, in terms of its governance. And so, so I think that as long, so what they're basically saying is that as long as I, I, de-identification is maintained, then, then data should be, should be made, made, release, releaseable. Yeah, yeah, please. It's a, the problem with this type of data space is that problem of control. Meaning, if you don't have access anymore to the patient, you don't know what's going to be the future of the patient. You, you're assuming to see the healthy control and you find out the plighter of the patient, because not an healthy control, you'll lose that information. Yeah, right. So the, the problem of the anonymization of the samples and so doing whatever you want with it, will, will jeopardize the quality of your control population. Yeah, I think that's an excellent point. Especially with disorders that are not, that are, that can appear later on by texture. And calling, calling it the control population might help you to find out if you don't have access to further, further down, meaningful information, you'll be not having good quality. Great. That's what they, the, the, you know, I, I spent, but they started doing that, probably got away. Yeah. They, they were assuming that there would be follow-up on through that. And so you have the, all including on history of the patient. And so we could be using the full history of the patient. That's after the patient's death. Right, right. Yeah, so. Okay, so, to ensure it's going on with the, the TCJ, so this is to ensure protection of genetic privacy for sample donors, data users will have to agree to certain conditions described in TCGA patient protection policy and controlled access policy as to how the data will be used. For example, users will have to agree that they will share these data only with others who also have completed a data access agreement and that they will not patent discoveries in a way that prevents others from using the data. This means that reviewers of a manuscript who needs to see, need to see any controlled access TCGA data underlying a result must also agree to these user access conditions before they can see these data. So this is not a situation where if you have agreed to have access to this data, you can just distribute it again. So the only, the only people you can give it to are the people who have also agreed to the material transfer controlled access policy. So here's the BC Cancer Institutional Policy. So the BC Cancer has supported a collection of de-identified data from more than 1000 individuals. And just the, the, so this is just a description of the resource here. And the data collected by the study have been stripped of all personal identifiers. But the wealth of data available on them might make possible the individual identification of some study participants to protect the confidentiality and privacy of these study participants. The recipient who has granted access to these data must adhere to the requirements of this data access agreement. Failure to comply with the data access agreement could result in denial of further access to study data. Violation of the confidentiality requirements of this agreement is considered a breach of confidentiality and may leave requesting investigators liable to legal action on the part of the study participants, their families, or the Canadian government. So this is a, it's part of an agreement that, as part of our New England Journal of Medicine paper, we wanted to release the data so that other people could, could pull it down. And this is basically part of an agreement that people have to sign when they, when they're downloading the data. So what, where we actually posted the data is this place called the European genome archive, or genome phenome archive. And this is really created to store and disseminate the data from the Welcome Trust Case Control Consortium in the UK, which has genotype data on about 17,000 patients. And Anna talked a little bit about some of those studies that have come out from this population. So, so they have a white paper as well that I encourage you to read. And here's their policy. So the EGA will provide the necessary security required to control access and maintain patient confidentiality while providing access to those researchers and clinicians authorized to view the data. In all cases, data access decisions will be made by the appropriate data access granting organization, not the EGA. The data access organization, such as the BC Cancer Agency, in their case, will normally be the same organization that approved and monitored the initial study protocol or designate of this or approving the organization. So, so what the EGA says is okay, we'll, we'll control, we'll store the data and we'll disseminate the data without, without identifying any, any information. And all of the consent and, and all of the protection is the responsibility of the organization that provided it in the first place. So it's really quite kind of a nice model. And I just wanted to highlight what the process is. So, you know, as we're, we're all involved in some sort of clinical data analysis. And you come to the point where you want to publish your paper, what, what you've done with the EGA. And this is basically the policy here. So, so what we do is we first encrypt, encrypt the data using a key that's known to, so here's what, so let me just give you the example from our own perspective. So, so I create a key and I tell the EGA what my key is and encrypt the data. And then I upload the data to EGA. And then, so it's sitting there in its encrypted form. Actually, so they decrypt it and then, and then encrypt it with a, with another key. And then what happens is it's posted there on the website and you can go to the EGA website and you'll see your little study there. And, and so what happens is a user requests data. And when that happens, then the EGA, so they request data to the EGA. So they informed, they informed a committee, which is made up of myself and a few other people at the institution that of the request and the material transfer agreement is sent to the user. And then material transfer agreement is signed by the user or appropriate institutional rep and representative and then returned back to me to the committee. And, and so then, so this actually goes to our technology transfer office. And then the, the committee notifies the EGA and the user is given their decryption key. The user downloads the data decrypts the data. So it's, it's a bit of a process, but, but it's still anyone who wants to have access to the data can do so. It's just so long as they sign this material transfer agreement. And I think the EGA provided an incredibly valuable service. I would not have, I would not want to be the person that had, had to provide this data myself. And, and so they, they have really set up a beautiful infrastructure for this. And, and I think the ICGC is, is, is poised to, to, to have something similar for the, for the cancer genome projects that are coming out. But if you were at, at a stage where you're writing up your paper and you're publishing SNP chips or, or sequence data, you know, I would highly recommend to get in touch with amazing people to work with. And this is like a really seamless process that I think could have been very, very painful in terms of how to provide data. And they just made it really easy. So this is a, this is certainly the way to go. So just in conclusions and so potentially genetic data is potentially identifiable. Researchers certainly have a legal responsibility to safeguard the privacy of donors. So the way you can do that is simply by working with anonymous identified data. And then several models have now been implemented as to how to safeguard the patient privacy. And one should look at the ICGC, the TCGA and the EGA as, as different types of models to handle and provide potentially identifying data. So are there any comments or questions?