 Thank you, Rudy. So since it's been about two years, a little more than two years since we've given you an update on the ENCODE project, I thought I would start with a little bit of background and motivation for the project as a context for this talk. So back in 2003, when we were anticipating the completion of the human genome sequence, we were asking the question, how can we read the sequence? We knew we had the genetic code. We were moderately good at identifying protein coding regions, but fine structures were difficult to predict from the sequence. We had no powerful regulatory code. We knew that evolutionary conservation can help to identify function-important regions. At that time, it was estimated to be about 5 percent of the genome was conserved, one and a half percent of the genome was protein coding. We were really interested in finding out what the function of the non-coding conserved sequences are as well as the function of the non-conserved sequences. Also over the last number of years, we have an increasing understanding that non-coding DNA is important for disease and evolution. We know now that about 90 percent of GWAS findings lie outside of protein coding regions. We know that non-coding DNA variants can cause human diseases and alter human traits, such as in the Fragile X syndrome and ALS, and also about 80 percent of recent adaptation signatures lie outside of protein coding regions. So back in 2003, we started the ENCODE project, which stands for Encyclopedia of DNA Elements. The goal is to compile a comprehensive encyclopedia of all sequence features in the human genome and in the genomes of select model organisms. One of the hallmarks of this project is rapid prepublication data release. We really wanted to create a resource that was freely available to the community to enhance our understanding of gene regulation as well as the genetic basis of disease. This slide is a cartoon of the various data types that are being generated by the ENCODE resource. I don't have time to go into this in detail. I just want to point out that the consortium uses a number of high-throughput technologies that are primarily sequence-based, including, for example, chip-seq and RNA-seq, to measure a number of different biochemical properties of the genome, including mapping of transcription factor binding sites, histone modifications, demethylation, and RNA transcription. All of these data are then mapped into the genome to map transcripts, genes, promoters, and other quantitative regulatory regions, such as enhancers. This slide outlines the timeline for the ENCODE project. I mentioned that we started back in 2003. We started with the pilot project with a well-defined 1% of the human genome. And at that same time, we initiated the first of a number of technology development initiatives, which we felt was going to be important in order to create the resource we were interested in making. Based on the success of the pilot project, in 2007, we launched the production phase of ENCODE, where we went from 1% of the genome to 100% in complete genome analysis in the human genome. At the same time, in 2007, we launched the modern code consortium, which focused on creating a catalog of functional elements in the fly and worm genomes. And then in 2009, with the availability of some economic stimulus money, the RR money, we launched a small effort in the mouse ENCODE. These projects ended in 2012, and when we started ENCODE Phase 3, which we're about halfway through now, we focused on expanding the catalog in the human and mouse genomes. And then we added in some computational analysis program as well. We felt that it was important to expand the expertise in computational analysis in order to best make use of the ENCODE resource. And at that time, we funded the fourth technology development initiative as well. So Eric mentioned this this morning. It seems like a couple of days ago already. So I just want to remind everyone about the modern code publication. So modern code funding for that program ended in 2012. And this summer, there was a series of publications that kind of capstone of the project. This included worm fly and human comparative analyses. There were three main integrated papers published in Nature at the end of August. These are focused on transcription, chromatin, and regulation. And these papers found that there were commonalities in these processes across these three evolutionary quite diverse organisms. In addition, there were at least 15 companion papers that were published in Nature, Genome Research, Genome Biology, and other journals that was this summer. And those joined another approximately 150 publications that the modern code consortium had already published. And there was an additional 150 publications or so that were published by the researchers outside of the consortium, but they were using the modern code data. We have compiled all of the information about these modern code publications at the code portal at Stanford at www.encodeproject.org backslash comparative. You can get links directly to the data, as well as all of these publications, and hopefully that will make it easier for the research community. Okay, so I just want to expand a little bit on the goals of ENCODE3, which is where in the current phase of ENCODE3 phase. We wanted to expand towards completion the catalog of functional elements in the human and mouse genomes. We added additional cell types, including more primary sources, additional data types such as RNA binding proteins, which had not been included in ENCODE2. We're continuing to generate high quality data using high throughput pipelines. We're developing new technologies and analytical tools to generate, analyze, and validate the data. And we're providing the data and tools to the community in as useful form as possible. We're wanting to provide easy and rapid access. A key thing is to provide sufficient metadata so people understand how the experiments were performed. We want to describe various ways to use it and also recognize the diverse needs of different types of users. This slide outlines the structure of ENCODE3. There are seven data production groups. They deposit their data into the data coordination center, which performs quality assessments. They house the data and they just make the data available to the research community. The data is taken up by the data analysis center, which supports the activities of the analysis working groups. This is where the computational analysis groups are involved. In the output are gene models, chromatin states, and candidate function elements, which in total represent the encyclopedia. I think Eric also mentioned this this morning, but one of the features of the current phase of ENCODE is we have a revised data release policy. The key point is here where external data users may freely download, analyze, and publish results based on any ENCODE data without restriction as soon as the data are released. We ask that the consortium is acknowledged, as well as the individual data producers and the specific accession numbers. Additional ENCODE3 features include cloud computing. We heard a little bit about that this morning. The data is available at Amazon Web Services, or AWS. We are creating uniform processing pipelines that will be available at DNA NEXIS. This provides transparency about how the ENCODE data is processed. These pipelines are also available for the broader community to use on their own data and will be continually updated as the pipelines are updated. We've taken the lead on data interoperability. We're coordinating ontology selection and metadata standards with related projects. I just want to say a word about the genomic sharing policy that was mentioned earlier today. In addition to rapid republication data release, ENCODE is continuing to work ahead of this policy. You recall it's not going to be in place until January of 2015. We've been working to develop sample consent language for open access for genomic data and we're currently working to obtain a wide range of samples using the new consents. There is a new ENCODE portal at www.ENCODEproject.org which is hosted by Stanford University, the data coordination center. We hope that the community will find the enhancements that are made on this portal useful in accessing the data. I just want to point out a couple of different features. If you click on the data tab, you can get access to the assays, the biosamples, and the antibodies, and the characterization of the antibodies. And there's metadata-driven searches that can really help one narrow down to get to the data types of interest. I've shown one example here where you can click on different tabs on the left side of the panel here. If you're interested specifically of all the assays you're interested in chip-seek data, you can click on that. If you're interested exclusively in mouse data, you can click on that. Different biosamples types, if you want to use primary tissues. And then looking specifically at the organs, you can actually click on multiple ones. Here I've clicked on liver and kidney. And then looking at the life stage, you're looking specifically for embryonic tissue. And here it shows 28 samples, data-type data sets that fit these criteria. And this will hopefully enable the researchers to find what they're looking for. Encode has developed a number of data standards and software tools. If you click under methods, you can find a lot of this information. There are experimental guidelines, including data standards for many of the common assays that are used in encode. We can elaborate information about quality metrics as well as software tools that were used to create the resources. This includes tools that were created by encode or used by other groups for the resource. We also monitor encode publications, as mentioned earlier with modern code and Erica mentioned earlier. This is one measure of the impact of the resource. And if you click on publications under about encode, you can get information about previous encode integrative analyses. You're all probably aware that 2012 encode published about 30 papers that were integrated in a publication package and get the details of those here. You can look at other encode-funded publications, those in published by the encode consortium, by the mouse encode consortium, and the modern encode consortium. But we're also trying to track community publications, which is a little tricky since it's really hard to search on encode to find old publications. But we have narrowed down where we're finding publications that actually use encode data. Erica showed the slide earlier that shows in purple the number of publications by the encode consortium. They're currently about 360 publications by encode investigators. And in the blue are the number of community publications. At the current time, there's about 660 publications. So in total, about 1,000 publications. But about 2 thirds of these are coming from the research community. So we're very excited to see this increase over time. We categorize these by the outside community publications, by what researchers are using them for. We have about 160 publications that are disease specific. And we've categorized what disease is there. Studying, not surprisingly, about a third of the publications are in cancer research. 15%, looking at auto-immunity and allergy. 13% in neurology and psychiatric diseases. And about 7% in cardiovascular disease. But there's also a wide range of other disorders that have been published using encode data in smaller percentages. I just want to mention very briefly some outreach activities. Encode has had a series of tutorials. Many of these have been done in conjunction with the Robab Epigenomics Program. We've held these tutorials at ASHG in 2012, 2013. And another one already sold out is planned for this year's meeting in San Diego. And we've had a tutorial also at the biology of genomes. And all the materials from these tutorials, the handouts are available on genome.gov. We held a workshop, an encode charge consortium workshop in January. This is a consortium focused on heart and aging related research. And we had a one-day workshop on how to use the data and form nice collaborations with this group. But we like to expand our impact. And so we're planning to have a users meeting in 2015 to bring in a much wider swath of the research community to get to know the encode resource in more detail. You can follow us on Facebook and Twitter, which release information about major publications, data releases, and other information about the project. I just want to make one mention, although we kind of already alluded to this earlier today, about the future of encode. The current project period ends in 2016. So we're initiating a planning process that will really mostly take place in 2015, where we'll have a planning workshop. And certainly the relevant discussion at the July sequencing meeting will be a big starting point for this discussion. We plan to have one or more concept clearances at the May 2015 council meeting with potential RFAs to be released in the summer of 2015 and review and funding in 2016. I'd just like to acknowledge all of the participants. This is a picture from the consortium meeting we had in July at Stanford, about 180 participants. I clearly cannot acknowledge everyone, but you can find out who they are on genome.gov. And then I just want to also acknowledge my colleagues, the great people to work with at NHGRI. I pay some new co-manages encode. Peter Goode has been involved with the project since the beginning. And Dan Gilchrist, who's just joining us now, want to acknowledge a tremendous support that we get from Jeff Schloss, our division director. And we couldn't do our jobs without great program analysts. We've had a series of analysts over the years. Our current analysts are Julie Corson and Hannah Norton, who you were introduced to this morning. And with that, I will be happy to take questions and probably bring up my colleagues here to help answer them. Questions. Actually, before we take questions, I should really point out that real kudos go to the encode group around the one point, at least made, about really getting ahead of the genomic data sharing policy and working out consents and some of the things they didn't have to be ahead. They could have waited and just conformed with them when the policy, but it really was an example of a lot of hard work both on program staff at the institute, but also members of the consortium who have been just amazingly willing to really try to get ahead of these big challenges and help pave the way for how it's going to have to be done as the data sharing policy gets implemented. So kudos to all of them. Thank you, Eric. Can I just point out, also, this is another example where we had tremendous amount of help, not only from Laura Rodriguez and her group in policy, but also in the GM Society division. So, Didi. And you've done a great job in big successes in this program. So I'm thinking about our discussions on the concept clearances today. Is there a plan to have some discussion about your thoughts on what will be coming in May at the February Council? Because I think it could be helpful and also thinking about opportunities for leveraging the different programs with sequencing effects. Right, so we probably won't have held our planning meeting in February, but we can certainly share with you our plans. We will have done much of the planning for that planning meeting and put a lot of thought into it getting input from additional people. So we can certainly keep you up to date on it at that point. We've talked about this before, at least, with the organization of the new plans that are coming and the existing sequencing plans. They'd be great. And I wanted to know what your thoughts are on coordinating the genomes that are going to be part of these disease focused SNP projects or discovery projects with the functional projects. Right, we talked a lot about that in various ways through the meeting as well as at the workshop. And I think it would be a great opportunity this time where we missed a few of those last time because of logistics, consenting, et cetera, et cetera. What are your thoughts on that? Yeah, so we're very interested in exploring all those kinds of options and don't want to recreate the wheel and want to be able to synergize the projects as much as is feasible. There are probably logistical and technical things that may not make it as easy as it sounds, but we're definitely open to those models. And ideally, if we could blow everything up and start over, we would do them all at the same time. Other comments, questions? Eric, I guess my question, it's related to Joe's comment is looking forward if the large scale centers would have, how many, whatever, are focusing on these common diseases, most likely looking at whole genomes for a major portion of their work is what can be done to coordinate and target cell lines and tissue sources that would complement the diseases that are being targeted. And I guess, Adam's probably having a heart attack over here because it probably sounds like, oh, you need a coordinating set, right? And, but on the other hand, I think a lot can be done at the investigator level and the institute level to make sure these various activities are not just going in parallel, there's a lot of crosstalk and complementary activities. Agreed. Just building on that, so when they are selecting samples and all that, it should be thought of more broadly as not just the sequencing centers, but being able to use those for in code and other things that, where they could leverage all the results from that. So getting, having access to the tissues and not just the DNA. David, I'm not sure if this is a question to you, Elise, but just free associating across the course of the day, I'm Bob, I'm Bob Nussbaum, I'm thinking back to your comment when we were discussing one of the concepts and you said, surely it would be great if we were to connect the dots between our collective experience with GWAS and the common disease studies going forward. And then I'm reminded of one of your early slides which highlighted the fact that 90% of the GWAS hits are outside of the coding region. And so there would seem to be, not, there would actually seem to be a fairly major disconnect between those two worlds. And I would love to hear someone sort of synthesize and bring these together into a paragraph for me so I can understand. So maybe, I don't know if other people are puzzled by what I'm saying or if I'm the only one who's trying to make sense of both the plan going forward and the residue of GWAS. Are you talking about whole exome versus whole genome sequencing? Well, I'm saying yes. The proposal that we have spent much of the time discussing is one that would focus on exome sequencing. And we have talked about the beauty of connecting this to the body of work already funded by NHGRI and a very simple one line summary that you provided of GWAS was that 90% of the hits are outside of the coding sequence. So I'm just looking for someone to speak a paragraph that'll help me put these together. Jerry, you wanna write that paragraph for him? I can try to, so the rationale for focusing on the exome and the large scale sequencing as opposed to the non-coding is basically I think a power issue, right? No, no, I understand that issue. So I guess the power that you need with whole genome, the number of samples would be very, very, very large. I'm asking more of an existential question, which is what does it mean? What do we think when we speak of the success of, I mean, even in the intro to the common disease right up, one of the early statements is we're building on something to the effect that we're building on the success of GWAS. And so I'm just trying to actually connect these intellectually, I understand the power arguments that come into play, which basically pertain to a more concentrated target and also the higher interpretability of coding sequence variants. I'll need time to write the existential paragraph response. Elise mentioned the collaboration between ENCODE and the charge consortium. And indeed that is one of its major focuses is to help the charge consortium is this huge GWAS consortium around the world of mostly cardiovascular, metabolic, and aging phenotypes. And then trying to use the ENCODE data to help understand the biology, the biologic underpinnings of those bindings, including detailed sequencing in those regions and following up with ENCODE annotation information. So indeed that is the focus. And then the second, I think back to the large scale program discussion, I'm gonna be very surprised if looking three years down the road, we're doing as much exome sequencing as the current writing write up indicator. I think we're gonna see a more rapid shift to whole genomes and away from exomes. I just can't see us continuing to use the sample size argument to continue whole exome sequencing. I think we'll need to drive down the price of the assay and have large sample sizes and good study design to promote whole genomes. I see Val grabbing a microphone, so go ahead, Val. Well, I don't know if this is what David's getting at, but it seems like if you had the complete ENCODE data fleshed out, you wouldn't have a whole exome sequencing. You'd have whole exome plus whole ENCODE. Like Eric said, eventually we will be using whole genome, but the analysis will still be focused, at least initially, to the ENCODE sequencing. So I guess my point is this is a very valuable program. And this maybe builds a little bit on David's point in a different way. If we talk about the success of GWAS, right, we're really talking about kind of a partial success, right, in the sense that we have thousands of great associations, but in very few cases do we have definitive causal variance. I'm just curious, maybe this has come up in past counsel, but is there any discussion of kind of a really, I mean, beyond just these, the charge ENCODE consortium, really a well-funded kind of systematic effort to really finish the job in the sense of actually nailing down causal variance, you know, whether through ENCODE or some other mechanism. Does anyone in the back want to, any of the program directors want to speak to that? We have, we're moving in those directions. I just wanted to single out something in particular that I've had happen to me recently, and that is when I talk about GWAS and other, many other aspects of genomics to the internal medicine residents I work with, their eyes rapidly glaze over. Then when I pointed out that there was a GWAS hit to a spot in BCL 11A where there was ENCODE data showing that there was a transcription factor binding, et cetera, et cetera, leading to perhaps the first opportunity to really tackle sickle cell anemia after 60 years of knowing what the defect is, their eyes did not glaze over. So I think the coordination of GWAS hits, hmm, they rolled back of their eyes, yeah, right. Howard, you really, no, no, you really know how to put in the bone moe. It really made a difference. And so the way, in fact, in an accessible way even to people who are not command line programmers of being able to coordinate GWAS data and the ENCODE data, I think is a really powerful and important contribution. And I really wanted to congratulate you on doing it. And as someone who used to study regulation of fetal hemoglobin expression, I really like that example. Okay, thank you, Elise and ENCODE team.