 With that said, it's my pleasure to introduce Judy Cho, who is going to talk about finding drug targets. Judy. Judy. So, I was given the rather daunting challenge of saying something helpful on finding new therapeutic targets through genetics and sequencing, and when I was thinking about how to construct the talk, I decided to divide it into two parts. One is to just very briefly give three examples of kind of how genetics integrates with identifying new therapeutic targets in the disease I study, inflammatory bowel disease. Those three examples will be the IL-23 pathway, which was one of the clearest first pathways identified through GWAS. Some more recent unpublished data on Nod2 mycobacterial disease in innate immune cells, and then close with the TNF pathway. And then I'll finish up by putting this in a broader context of how this IBD discovery may allow us to systematically leverage high throughput sequencing to prioritize new targets, looking at two approaches. One is driven by phenotypes, and then the converse is driven by genotypes, or what was called last night as ENCODE data-driven discovery. So in terms of the interleukin-23 pathway in immune-mediated diseases, one of the first signals early on in the GWAS era was the identification of multiple protective and risk alleles that independently confer increased risk for initially Crohn's disease, but subsequently Ulstrophilitis psoriasis, ankylosing spondylitis, and a whole host of them. And the largest effect size in the smallest p-value that we see in the interleukin-23 R region is for an uncommon protective allele at codon 381. About one out of every seven European ancestry individuals are heterozygous carriers for the glutamine allele, and this confers anywhere from a two- to four-fold decreased risk of developing inflammatory bowel disease. Now in short order, on the first Crohn's disease meta-analysis, one of the big signals that came out was that the IL-23R polymorphisms highlighted a very distinct signaling pathway of interleukin-23, which is critical in the perpetuation of these pro-inflammatory effector TH-17 cells. And if you look at the canonical members of the IL-23 signaling pathway, two cytokines, two receptors, two signaling intermediates, and the transcription factor, STAT 3, of those seven canonical members of the IL-23 pathway, five out of those seven are associated with inflammatory bowel disease. And then you combine that in subsequent years of the identification that that protective risk allele in IL-23R functionally is a loss of functional allele, and as opposed to the PTPN-22 story, this has been actually quite consistent between multiple different laboratories, that the less common glutamine allele is associated with decreased function as measured by either CD8, TC-17 cells, or decreased STAT 3 phosphorylation, clearly this protective allele is a loss of functional allele. And because of that, it seems quite logical then, because a lot of the criteria which have been previously discussed with respect to PCSK9 would apply similarly to this glutamine allele in IL-23R, it makes sense that as of what we know so far is that the heterozygous carries the glutamine allele as far as we can tell do not appear to be at increased risk for infectious disease complications, and so therefore it would seem logical to block this pathway. And again, this is not driven by the genetics in full disclosure. The blockade of the IL-12 and 23 pathways, these two related cytokine pathways, was well underway prior to the genetic discoveries, but certainly these anti-P40 treatments have been already approved for the treatment of psoriasis, and with respect to inflammatory bowel disease, the phase 3 studies are ongoing, but the phase 2 studies do look promising. The genetics can further inform this blockade of this pathway in a number of different approaches. One is that we clearly have an enormous 23 pathway effect, but there are some things which the genetics theoretically could help us inform, whether or not it makes more sense to block the 23 pathway specifically, or whether we should block both the 12 and 23 pathway. Again, we see associations, again in components, of both the 12 and 23 pathways. So for example, we see an association not just the 23R, but the P40 subunit, which is common to IL-12 and 23. And so again, this is one of the issues as to whether or not blockade of a single cytokine is going to be more effective or blockade of multiple cytokines. A related question is also, we see associations all along this pathway. At what level along that, should we block, should we block at the cytokine pathway, which is the classical approach, but more kind of recent approaches in industry as to block downstream through various Jack inhibitors. So these are kind of open questions. When you look at the totality of the cytokines and cytokine receptors that are associated in immune-mediated diseases, I think this is going to be very complex to work out. The second kind of vignette I want to talk about is a more recent story with respect to nod to mycobacterial disease and innate immune cells. This is a result of the immunochip collaboration in inflammatory bowel disease, which is involved a large number of IBD cases as well as controls. The net effect of this very extensive amount of work has been the identification of 71 new loci in inflammatory bowel disease, which brings our present total to 163 loci with genome-wide significant evidence for association. If you look at the boundaries for these associations, this encompasses about 1500 genes. Now, this large number of loci for a complex disorder provides an unprecedented opportunity to utilize these loci to perform improved network analysis through systems biology. This was led by Eric Schatt using a lot of the co-expression networks that he's previously published on. And so the question, the first question we asked is if you take all those 1500 genes within these loci, within these co-expression networks that have been previously described by Eric, which one of these co-expression networks is most enriched for IBD genes? And our top module was for an omental adipose subnetwork, which is enriched in these innate immune cells or macrophage cells obtained from obese patients. And so basically, it kind of went over this. And this particular co-expression module is a 523 gene module from adipose tissue. And then this is what it looks like in a cytoscape file. And the pink nodes represent genes that are contained within IBD-associated loci. And we see a cluster in this force-directed cytoscape file of variants which contains Nod2 within this pink cluster in the middle of this gene module. When you drill down and look at the precise Nod2-centric view of the submodule, we can see that there are seven IBD-associated station genes within, with the exception of one, I think it was within at least three edges of Nod2. And what's striking about this particular kind of Nod2 area is kind of the association and overlap between Nod2 and mycobacterial disease, we identified, for example, a new Crohn-specific association to L-galas-9, which is involved in autophagy, it's induced in MTB infection, a genome-wide SRNA screen established that knockdown of L-galas-9 modulates intracellular levels of mycobacteria, and it also is involved in apoptosis of activated CD4T cells. Some genes that have been implicated in candidate gene studies to MTB susceptibility, such as N-RAMP1 in the vitamin D receptor, are also clustering close to the Nod2 story. And you can see in the upper left-hand corner a triad of Nod2, IL-10, and HCK. So IL-10 is an anti-inflammatory cytokine, and Mendelian forms of IBD include autosomal recessive mutations in either the cytokine or the receptor. You see highly correlated RNA expression between Nod2, IL-10, and this new IBD gene, HCK, or hematopoietic cell kinase. And HCK is critical for the differentiation of these anti-inflammatory or M2 macrophages that secrete high levels of the anti-inflammatory IL-10, identifying a potential new therapeutic target for inflammatory bowel disease. A couple of comments about the TNF pathway, to a large extent, even though we don't see an obvious signal for the TNF pathway in these IBD associations, I think IBD can be considered to be a TNF-mediated disorder. If you make mice that over-express TNF, they develop iliitis and arthritis, the two diseases for which anti-TNF therapies are utilized. Anti-TNF really is the mainstay of treatment for moderate to severe IBD. Although we don't see a direct signature of TNF pathway associations, we see multiple mediated signals at NF-CAP B level as outlined there. And crucially, when you treat IBD patients with anti-TNF, if they have evidence for latent MTB, one of the early side effects that was noticed with anti-TNF therapy was reactivation of tuberculosis as a side effect of anti-TNF therapy. And so I think that one of the things that also that might be useful through sequencing is to better understand how established effective therapies actually work in the genome era. And so this was a paper from David Baltimore looking at the transcriptional profile of cells in response to TNF stimulation, and you can divide them into red, green, and blue. The blue genes are characterized by having a few AU-rich elements in their 3-prime UTR and very stable mRNA levels. Some of them are moderately stable that have 2 to 4 AU-rich elements in the 3-prime and translated region as demonstrated by the green. And some of the key genes in response to TNF stimulation are these red genes, which are characterized by having many AU-rich elements in the 3-prime UTR and having a very unstable mRNA profile. And so in terms of kind of specific genes, TNF itself is a red gene. One of the IBD-associated genes in a very important immune-mediated gene, A20 or TNF-AIP3, is also another red gene. IBD-association in CCL2, the maximal association signal that we see is actually within the 3-prime UTR, and again identifying what its functional effects has yet been done. And then finally, in our latest immunochip data, when we do GO pathway analysis, one of the big signals that we see are genes that mediate ubiquitination and de-biquitination as outlined here. 15 loci are actually involved in ubiquitination and de-biquitination, which mediates a lot of these kinetics of gene expression. So how do we systematic leverage high throughput sequencing to prioritize new targets, phenotype to genotype? We've talked throughout this meeting a lot about some of these stories, these loss of protectable leels as ideal therapeutic targets. We gave the two examples, other examples would be CCR5 and HIV infection, and again IFIH1 and type 1 diabetes, again, therapeutically we don't know this yet. But it also highlights the value of targeted re-sequencing around GWAS signals, and as we identify protectable leels, this will provide enormous structure function data that will be useful for improved therapeutic targeting. Other ways are fairly obvious, early onset severe cases, nowadays the clinicians who actually identify these Mendelian cases by IBD of loss function of the IL-10 cytokine or receptor, they state now that when they see these young kids with early onset IBD they will sequence and proceed with bone marrow transplantation, if they see a characteristic clinical picture of this, and again earlier mention was made of this Milwaukee case of a Nick Volcker young boy with early onset IBD and was found to have a mutation in XIAP, which is absolutely essential for nod to medial signaling. It also highlights the value of omics data and systems biology and two obvious kind of sequencing approaches that can refine systems biology approaches. We are now in the process of doing RNA-seq to improve quantification in these large expression data sets that will be more quantitative than microarrays and obviously chip-seq, and then just close with cross phenotype analysis and the importance, I think, we have very large, mature, we have large, mature collections of immunity diseases, but less mature are infectious diseases, and in terms of the pleiotropy story, if you look at the IBD loci, the immediate disease loci, what's striking is that by GWAS of leprosy, of the seven leprosy loci, seven are also IBD loci, it's outlined here, if you look conversely at these more Mendelian signal Gs, primary immune deficiencies, we can see that of the MS-MD genes, the Mendelian susceptibility to mycobacterial disease, six out of eight of those MS-MD genes are within IBD loci as outlined here. And so this interferon gamma are two overlap between MS-MD and IBD as of interest, and it provides a good kind of transition to the genotype to phenotype approach, and this was a very elegant study from Jean Laurent Casanova, where they were looking at kids that presented with MS-MD that had autosomal recessive mutations and interferon gamma R2, and what it was was actually a gain of glycosylation, so you can't just annotate the genome once. Every mutation that you identify could potentially result in the identification of new functional motifs. So this is my last slide, genotype to phenotype, the ENCODE approach, the importance of kind of re-annotating as we re-sequence and identify these new missense mutations, looking at covalent modifications, glycosylation, phosphorylation, ubiquitination sites really should all be part of this, and it's not just looking at the base function, but also as you identify new mutations, identifying these new functional moieties. Obvious examples of regulation of expression as outlined there, and I think crucial in the analysis and information dissemination as we do this is really to assess critically the validity and the magnitude of effects. Biologists are not going to care about small effects, and so again, this is fairly obvious, but bioinformatic probability versus experimental validation, frequency versus population specificity, and as we identify these rare variants, distinguishing negative selection from drift. Thank you. It was spectacular. I mean, I have to say that, you know, coming from a different discipline, it's just amazing to look and see how rich that world is compared to some of the other worlds. It just, you know, we salute you. And with that, I'd like to ask a question as you start to look at pathways, and most of the things in my mind that have been published as pathways are evangelical believed by the authors who create those pathways but are rarely, really validated in many systems, but here you really have that opportunity to group related things. Are you able to extend that so that the discovery of some of the variants in places that were not in the already canonically established, you know, pathways of immune response that they could be pulled into this or not? Because it, you know, it really is a very good example of where we really have something concrete to understand pathway as opposed to just sort of trying to organize our favorite genes and make very good sort of hand waving explanations for why we think these are going to be related to each other, but really don't have, you know, the kind of gravitas that you clearly have in this field. So for the immunotube analysis, we did both kind of the canonical pathway analysis as well as kind of these expression, co-expression data sets from Eric Schatt that Eric Schatt provided. And I think that we utilized both of them for this, but what was really convincing to me in analyzing these data sets was the fact that we're taking these preconfigured modules that were just this macrophage and rich module here. And when Eric first did the analysis, he actually did, he called it Card 15. We were actually most interested in L-Gauss 9. That was my favorite new gene. And he said, well, let's look at the submodule around that. And then Card 15 was close to it. And I said, that's actually not too. And then you see this clustering, again, of all of these MTB genes. And so it's a kind of a congregated effect that kind of, from a biologic perspective, was unexpected and convinced me as kind of proof. I think additional levels of proof will be to try to validate this in an independent data set. So this was done out of, I mean, an argument can be made. This was a mental adipose tissue. And how relevant is that to IBD? I can actually tell you a story how it actually is relevant to IBD, but we'd be very interested. We're in the process of taking ileal tissue from about 500 Crohn's patients and trying to do RNA sequencing on that. So if you see the same network and a more disease-relevant tissue, that'll provide additional validation for this. That was wonderful. It played to some of my longstanding biases about kind of how this field could most constructively develop. I already mentioned earlier that I like the emphasis on therapeutic targets and sort of better treatments that are going to work on a fairly substantial part of the patient population and would just reiterate my skepticism that our future really lies in a lot of highly predictive kind of clinical activity. For a long time, 10 or 15 years, I've been actually trying to interest various institutes in taking sort of more interest in this sort of loss of function as a model for a perfect drug idea in genetics and also very pleased to see that there's starting to be some ill resonance. Francis, for example, last night was talking about this. I'd just like to point out that I think that this idea is more radical than it is often recognized to be. There are some clear-cut examples, such as the ones that you cited that seem to fit very normally into our usual way of doing business and come out of fairly conventional studies. But I would argue that the way that we're just organized to do biomedical research only almost by accident discovers these examples. It has no strategy to go and look for them. And I'll just emphasize that with a couple of points. First of all, the patients of interest here are often atypical. They're people that are not as sick as expected. But we still insist that they meet some long set of diagnostic criteria for IBD, for CF, for schizophrenia, or for whatever, but the real sort of mother load of this type of analysis is likely to involve patients that don't meet all those diagnostic criteria. Indeed, they may very well be patients that underutilize the health care system because they're just doing pretty well. Whereas we focus this tremendous amount of attention on the subset of patients that engage the health care system intensively. To try to do something about that is not so straightforward and would require the will to change our way of doing things fairly substantially. And I just would generally encourage in the context of thinking about cohort sequencing and trying to integrate data intensive biology with medical records and so forth that, you know, we just recognize that we can potentially do something really new here, at least new being done on any large scale. And your other point I liked very much was this cross phenotype analysis, which of course is ubiquitous in sort of immune related phenotypes, which actually is probably a fairly high fraction of all phenotypes. We just have trouble sort of identifying such factors. But in many areas, we're going to have to do this kind of cross phenotype analysis. The final point is that an interesting feature of this path towards sort of new therapeutics is that it actually doesn't require the kind of very deep mechanistic understanding that much of our contemporary research focuses on. We don't really know, you know, why CCR5 mutations have such dramatic effects. There are a lot of people trying to understand the precise role of this co-receptor HIV infection. It's not really terribly irrelevant because once discovered the phenotype of Homozygous Null individuals there are so dramatic that it suggests of a novel therapeutic target. And I think that's often going to be the case. But all of these ideas do run rather counter to the sort of standard Bethesda model about how we actually discover things that are going to have health benefits. And there are many strengths to this model and I don't want to imply otherwise. But I think the time is here to try to think more radically about how we can change this model and still meet many of the ongoing requirements of more conventional study designs. Manny, can I just follow up and ask you? I mean, I think one of the reasons we're here is to talk about large cohorts and large studies where we would not necessarily have nested case control, you know, biases for selecting and drawing people in. So really what I'm hearing you say is it's really the most heavily phenotype group that would be of the most interest to you that would give the greatest opportunity for discovery of these sort of borderline or under-the-radar variants that probably do have some medical and most importantly biologic insights. Not that we have to know what all the biologic insights are. So to that end, how confident are you in the phenotypes? Because there is an awful lot of heterogeneity in how phenotypes are both defined and collected. And I think, you know, we'll only find out once we go down that road, but just curious your thoughts on that. Yes, the implication of my view is that the richly phenotype individuals are the best candidates for this strategy. And by rich, as we discussed, I think, last night, I tend to mean broad as opposed to extremely deep. But so how confident I am, not very confident. The, you know, it's a well-known problem that there's poor concordance between even standard laboratory tests and certainly the way that physicians kind of go on the whole-sized people up. On the other hand, I think that we have to overcome these problems mostly with large numbers. There is a tendency, I think, in Bethesda to try to sort of let the perfect be the enemy of the good. That, you know, we'd all love, you know, to have 100,000 people that had agreed to, you know, have virtually anything done to them and that had signed up for life and came back every six months and went to the clinical center and we did everything we know how to do with them and so forth. You know, this isn't going to happen for a lot of reasons. It's, but probably at the core of it is that it's prohibitively expensive and requires a kind of centralized infrastructure that, you know, that will suck resources for, you know, for all time to come. It discourages local initiatives. There are a lot of problems with that model. So that leaves us with a healthcare system and I think that's where we have to go. Now, it doesn't mean that we have to start with the most problematic cases of healthcare providers. You know, we want to start with there are plenty of them out there that are, you know, that are really very concerned about phenotyping consistency, at least diagnostic consistency within their own organization and are pushing their physicians to do things in more standard ways, keep more systematic records and so forth and those are our friends. Rory, I'm glad you put your hand up before I was going to call on you to respond. Thanks. I mean, there's a lot of hand waving around this issue of deep versus wide phenotyping and I think, you know, people tend to divide into lumpers and splitters and, you know, we get all these kind of caricatures but I wondered whether there was, you know, some way in which we could try to get a bit more quantitative about the value of, I mean, are you better off with a thousand people really very well phenotyped or at the same cost with a hundred thousand, much less well phenotyped? And if one looks at, say, the experience of GWAS, there was a huge amount of effort put in at the beginning of some of those studies about how you define the disease characteristics of the people that were going to be included in the different studies and lots of different people used different definitions and they spent lots of time working out what the definitions were. And then having done that, they then did meta-analyses of all of these different GWASs using all these different definitions which they don't even bother now to worry about and get results. So was that original strategy wrong? Is there something to be learned from that in terms of the next 10 years of whether it's one put one's resource? Is it scale or is it depth? Well, let me ask Peter. I mean, you spoke about analytic approaches and I think, you know, there is a big difference between looking at variants that are 1%, 10%, 0.1% and 0.001%. And the question is, even if we have the most pristine phenotyped individuals but only 1,000 of them, are we, you know, is that going to overcome the testing problems of at least discovering what's worth following up? A couple of comments. I think Rory's absolutely right that the lesson from GWAS was that actually phenotyping really careful phenotyping wasn't as critical as might have been thought and by lumping things together. The way I think about it is if phenotyping is not quite as good as you think but you can increase sample sizes, then the extra power will get over the noise you introduce through not phenotyping well. And I think that's a pretty clear message from the GMY Association studies. I think in terms of sequence-based studies, there's a, as we've said and I've said a few times, there's a lot of evidence that affects at the rarer variants, aren't perhaps as large as we may have thought they are and so we need to do quite large studies. So there's no point, I don't think there's any point doing 1,000 people who are deeply phenotyped because we won't have power to see anything in them even with the exquisite phenotyping. I think we need to do large studies and then amongst those large studies, you know, as much phenotyping as we can get would be good. Could I just follow up, but you've said several times that best evidence is that many of these rar variants have small effects and I'm just curious. What studies are you talking about or what kind of analysis? So I have in mind things like others are involved in the studies more directly, so I could speak to them, but the large ESP sequencing project, that's 7,000 exomes across a range of phenotypes, type 2 diabetes sequencing of, in the project I'm involved in, 3,000 people, half case and half controls, both exomes and whole genome and then there's another project which is 5,000 exomes. Those analyses aren't complete, but they're not awash with examples of low frequency or rare variants of substantial effects. I think it's fair to say. At any time, you're all responding there. Rare variants that have large effects with rare variants that have small effects or even opposite effects or, I mean, you don't have any power to tell whether a particular rare variant is having a big effect. That's correct, but the only way we'll see those signals is somehow aggregating, isn't it? Yeah. Just one comment, so this is a response to Rory's original question. If you have, the question was whether you have 1,000 exquisitely phenotype patients or 100,000 not so exquisitely phenotype patients, but when we start to do these studies, what we're going to want to do is start to look at subgroups. And 1,000 exquisitely phenotypes patients won't get us anywhere because there won't be enough in whatever subgroup you want to look at, whether it's people who respond to a drug or whether people who have inflammatory bowel disease who then go on and get arthritis or whatever. So I would argue that the nature of the problem or the promise of the problem means that we have to have large numbers. I mean, because we're going to want to look at these subsets. But we look at the subsets. Exquisitely phenotype, everybody wants to sort of say, assuming they're exquisitely phenotype, what do we do about the genomics? I'd say, assuming the genomics are under control, how do we deal with the phenotyping problem? Because I think, across the set and each subset. I think that there is a problem. I think that's exactly right. And the message of the numbers is we have to be, it's got to be large. That means it can't be deep. It's got to be breadth, but it somehow has to have things able to connect on to it. And that can be simple, as in, let it, you mentioned in your consortium, Michael, I think you have a family group and you have a case control group. So part of designing the core, the workhorse, is it needs to have the 20 major things well done and it needs to be able to talk to or link on to, even sequencing itself will only be done on a very small fraction of whatever that total common space is. It's got to be broad with the capacity to go deep where it needs to go deep. It cannot be just one or the other. And it could not possibly have the depth that people have talked about and the breadth for the numbers out of the question. So we're just looking for a modular system. I mean, the one point I think we'll talk about later today is I think of premium value as the capacity to revisit or re-contact subjects who are involved because as the discovery goes on and you may have a small subset of phenotypes you start with, there are observations that may take you with other ancillary or other published information to conclusions that you would want to then come back to that same group having sequenced X number of, you know, thousands and thousands of individuals. So I think that's certainly something to think about for later today. Just two points. One is on the phenotype resolution or accuracy. I mean, I definitely agree with Rory that larger sample sizes will probably drown out a lot of these things. But I think that we don't really know from the GWAS studies whether the size, to the extent to which the large sample sizes was drowning out the noise of the phenotypes. I mean, I'll use as an example heart rate. There are some GWAS that I've been involved in that have used a pretty exquisitely defined heart rate by an electrocardiogram. And then there are other larger ones that have used a hodgepodge of both electrocardiograms and pulse rate that's reported in a, which I would view as probably not as accurate. But, you know, and signals come out of that, but the true test hasn't been conducted, which is with the same sample size using either pulse rate or ECG, how many more signals do you pull out of a GWAS from that? And my guess would be you'd pull out more from the more accurate ECG than from the pulse rate. So I think that we're not, I don't think we really have that information as far as, especially when we're talking about $10 million will pay for only 2000 whole genomes is what we heard yesterday. So, you know, are you gonna pick those whole genomes on the ECGs or on the pulse rates? Is it a way of looking at it? It depends whether you're comparing light with light. If you have the same number with the ECG versus not ECG, then I would accept that you might be better off with the ECG. But if you have 10 times the number that are based on a record rather than on the ECG, are you better off without the ECG? And that's why I say I think one needs to be quantitative because you, where should you put your resource in terms of getting the biggest return? And I would argue that you get a bigger return typically, not in every case, by putting it into large scale. And then, as Patricia said, have the opportunity to go into detail in certain instances, but not do it as a, at the beginning. Well, and I would add to that that it's fine to have ECG define heart rate as an example. We will pick on the ECG, but you're not gonna have that in every cohort. And you're also going to want to use multiple phenotypes to compare to the genome. Cause once you have the genome, obviously, just like with GWAS, you can relate it to anything. So yeah, it's fine if you're studying a very narrow thing. I think what we're talking about here with a, you know, if we were to do a million person cohort, we would want as broad as possible and somewhat superficial. So at least we get some hints and then can pursue more deeply. I have one other comment because the presentation we just heard was such an elegant illustration of how Metabochip, so it wasn't sequencing. It was actually a chip that was actually, wouldn't have existed, I would argue, unless the NHGRI had supported the housing genomes project because a lot of these chips, not only Metabochip, but CardioChip, what's the other, Metabochip, Immunochip. Oh, no, no, use Immunochip. I'm sorry, not Metabochip, but there's all the chips. So the question is, and I think that we're beginning to learn now what we're getting out of those followup, you know, genotypings. And so should that be part of this discussion that with a limited dollar that, you know, sequencing on some individuals in Metabochip or whatever chip on others? I'm seeing Eric, not his head, no. But I'm just curious because it's a way to extend the findings to a much larger phenotype population. And I think similar to that, I think the argument has been made that if you do a dense chip in a population that has been sequenced when it's the same, if you can hope, the same population, the same ancestry, and that the imputation is probably better, is that a fairer thing to say, and may help you with being able to impute back? I know the Icelanders make that argument. I'm not sure if you'd buy it. Yeah, I think that's certainly true in Iceland. It's probably true in Finland. It may be true in some other really extreme founder populations. It'll be true in larger, outward populations. You'll do better imputing if you've got good sequence data in the same population. But I mean, imputation gets harder and harder as variants get rarer. And yeah, so we need to think a little bit about the frequency range in which we're imputing now or would be able to impute and how much action there might be in that range. A lot of that is a function of the reference populations that you have. I mean, when we look at a lot of the cancer meta-analyses that are going on, we have not seen the Goldilocks variance and we've actually used the 5 million and 2.5 million chips in some very large studies and not found things that are two and a half, three, three and a half. Again, coming back to your question, I think the imputation has really been hampered by the references making it difficult to get to 2% and below with as much accuracy as we would like to be able to do it. But that's probably a matter of time before enough individuals are sequenced at higher coverage that gives greater certainty of those lesser frequent variants. But I mean, that is a tractable issue, I think, over time. And couldn't part of, I'm sorry, couldn't part of this be related not only to the rarer variant, but the fact that if we were to get more finely defined reference populations, we could assume a certain degree of LD in those populations. And so could we take an individual from the Hmong girls that make up the US population and basically say, all right, in this region, here is their ancestry, and this is the reference segment that one should impute to, and in that region, that's a complicated analysis and may not get you anywhere. But wouldn't that be a better way to approach it than just to sort of, you can only look at Finns that you know have four Finnish grandparents? Yeah, I don't disagree. I think it's quite helpful in these kinds of discussions to distinguish between what some people call low-frequency variants, things in the kind of half a percent, one percent, two percent, three percent range, and things with frequencies lower than that. I think imputation has potential mileage, it does better in some populations than others, it does better with better reference panels, and it does have mileage in the kind of half a percent, one percent, two percent, what I call low-frequency variants. There's a limit to how well we'll ever do with imputation for rarer variants than that. I think it's quite helpful to be careful of terminology and to focus a discussion on one or the other for certain purposes. Can I turn the discussion around 180 degrees and ask the phenotype question that some talk about is imputing phenotypes, being able to take a certain amount of information from questionnaires and studies, which I think as we look at some of the cohorts and some of the studies, these are clearly addressed. I mean, there's a whole epidemiologic literature on this, we're talking about imputing genotypes, but on the other side, we may get to the point that we have 85 or 90% of the sequence that we want, but we have individuals who have different parts of what we would think would be the master plan of phenotypes and how well and how robustly we can impute those phenotypes across those populations. No, but this is a very active field, I see some heads nodding, I'd love some comments on that. Yeah, I mean, this is a great point. So in Framingham, we've been thinking a lot about this and there's two scenarios. One is when it's missing and the other is when there's a drug treatment and you're trying to think, well, what would the phenotype have been if it weren't a treated phenotype, which of course is a what if, but it actually is important if you're thinking about not only genetics, but epidemiology, so the point you're making is it can be done, it has been done, and I think it's something we should be thinking about. I'd only add that I'd echo what Chris just said and that it probably comes back to this business of rare versus really rare. I mean, the rare, rare phenotypes are gonna be hard to impute just like the rare genotypes are gonna be hard or impossible to impute and that may be where some of the mileage that we're looking at comes, but it's unlikely that one project or one resource is gonna solve all the problems that are around this table. Do you think that, both of you, do you think, I mean, having spoken about it, do you think that this is a viable aspect that can be added to the study? I mean, some people like that. Julie's gonna talk about phenotyping this afternoon, so I'll let her answer the question this afternoon, but my preliminary answer would be it needs work, but as Chris says, people have been looking at imputing phenotypes and it's an interesting thing to do with an incomplete data set. You'd have to have correlated, correlated traits that have been measured in order to do the imputation, so that's the bottom line, basically. Like correlated genetics? Mike had raised about being able to go from CF to COPD to smoking or whatever. I mean, these related traits and how well they translate as individual analyses as opposed to being able to impute to increase your sample size or to identify what the validation would be. And I would argue that the ultimate imputed phenotype is LDL cholesterol. I mean, it's calculated. It's based on HDL and triglycerides below triglyceride level of a certain amount, so we've been doing this for decades and there's no reason we couldn't pursue it a little bit further. All right, I think we, at this point, we probably will now take a pause and I think Thomas is gonna show a couple of slides.