 Okay, great. Thanks. So, they don't think that's me. Ah, great. So, I am going to summarize what ENCODE has been doing for the last 10 or 12 years. Okay, in 15 minutes. So, as Elise said, really what we've been doing is throwing a large variety of different assays with the goal of trying to functionally annotate the genome. These are mostly biochemical assays. And I won't go through every single one, but they basically are intended to map transcription factor binding sites, open chromatin, three-dimensional organization. There's some DNA methylation experiments, a lot of RNA-seq experiments, and then some RNA binding protein analyses associated types of experiments. And basically, the overall structure of this is that, there's a slide, so I'm from UIN, but updated. The idea is that, actually, there's been a large number of different assays, meaning different chip-seq experiments against different transcription factors, different ribosome binding protein, on a very limited number of cell lines. I'll dig into this in more detail in a minute. And then there's been a fewer number of assays thrown across a lot of different bio samples, different cell lines or different tissue specimens and such, so that you have, again, a more focus set there with the ultimate goal of trying to extract as much information as possible, and at least learn basic principles from these sorts of studies. And so at the end of the day, there's roughly this many experiments done so far, and each experiment you'll see has a replica. These are the major ones that are there. And the total number of experiments is listed up here. So there's over 3,000 experiments that ENCODE has generated that are sitting in the public databases. I think this is updated as of February. I just got an email this morning from Shirley Liu, and if you ask how many other sets of data are out there that aren't ENCODE or RMC, the number with replicas is best as I can tell from her email is about 1,000. So that's a transcription factor binding site. I don't have RNA-seq experiments where I imagine there's a lot. And there's a few hundred DNA-seq experiments. So basically, the rest of the community combined with their replica experiments as far as I can tell is roughly the same as ENCODE, and I can't speak to the quality of those data sets. Okay, what I can speak to is the quality of our data sets, which we believe to be very high quality because a lot of effort was put in to make sure we had reasonable experiments. We demanded that there be at least two biological replicas. Experiments for each thing, this seems obvious now, but imagine yourself in 2003 when we launched this actually, first of all, most experiments didn't have replicas. And second, half the ones that were published actually gave background levels of information. I don't know if you remember those days. Again, as someone who was there at the beginning, a lot of the data was basically noise. Okay, and so I think that's why we installed these, and remember that was chip-chip days. That was a lot tougher for those things. So anyway, we've added lots of quality control measures. There's been some nice algorithms and approaches that have been added, which I've actually spread through the community, which we'll see in a minute, to make sure we had replica peaks that were being called, that each experiment in fact had good quality data. There are various measures you can do to actually follow that. And so again, these were all installed, and so we think it is high quality data for the most part. And then we also set up there's some, as you'll see in a minute, standards for ensuring high quality experiments are being performed. And importantly, all the data is being processed in a uniform fashion so that you can actually compare data from different labs for the first time. And these are some of the steps involved, the mapping, the uniform peak calling, QC I mentioned already, and actually you can drive information from this. And actually this pipeline is now spread in the other projects. So it's had impact on GTEX, REMC, IHEC, and other projects that are outside of NIH. For example, there's a SIRM grant that Joe and I are part of, the beta cell consortia, that is NIH, have all used ENCODE standards for processing data and bring this out. So it's really having impact again very broadly. We've set up a lot of standards for carrying out these experiments. These literally took years to work out. They're very painful. And nonetheless though, at the end, we came up with what we think are good ways of generating data and set up guidelines for the, again, information we compared across labs and across projects. These are a little more detail about the types of data we've been generating. So I won't go through every one, but you can see it's a diverse array of data. There's a log scale. Blue is what we've done so far. Red is what's coming, or at least what people said they're going to do over the next year and a half. And even though these bars look tiny here, there are actually many more experiments to come in some cases than are here currently. And so you can anticipate, I forgot what the final number is, but it's probably about two or three fold more over what we have now that's been generated. And again, there's quite a few samples that have been generated for the mouse. This is human. This is mouse. Okay. Once again, there's lots and lots of assays that have been run on smaller numbers of lines and biosamples. And this is the number one hits, if you will, the number of assays that have been run on various cell lines. And so here are four cell lines, three of which are cancer. Liver didn't quite make the list. That one actually got added in code three. It is starting to accumulate more data. That would be the one tissue that's starting to pile up. But otherwise, these are, in fact, cell lines. For, as I said before, more limited number of assays have been run on lots of samples. So RNEC has been run on over 200 samples. The chip seek is a little misleading because it's a hodgepodge of factors. But DNA has been run on close to 200, most of them from John Stam's lab, histone marks on a number and so forth. So these limited assays have been run on lots of different samples. The net result is that there is a lot of analyses on different samples. So even if they are smaller in number, they're still reaching a wide range of samples. So primary cells and mortalized cells and tissues are getting hit pretty reasonably again with a more limited number of assays for both human and mouse. So the data, as I think Elise mentioned, is all up on the cloud. In fact, so are some of the algorithms you can use for actually processing and analyzing the data. That's in the cloud. And Mike Cheer set this up in a very highly searchable format. One important thing, this is a new part of ENCOBE, is part of this phase, the data actually is instantly released. That is, there's no embargo. So as soon as the data is up there, you can release it. I think they'd appreciate if you'd talk to some of the producers, but you don't have to. You can publish it straight up. This is the portal site. Probably the most important thing from this talk, so you can browse a bit. My understanding, by the way, is a lot of this summary here is in your packet. So if I'm going quickly, please look at that packet. There's a first generation, if you will, simplification of some of this data to set up a simplified annotation, what we're calling a prototype for the encyclopedia. Marker, scene and others are part of that effort. To put this into a simplified version, many people can use in case you're overwhelmed by all the different data and all the different cell types. As Elise mentioned, there are a number of computational groups that are heavily involved in analyzing the data, and I've broadly grouped them into three areas. That might be an oversimplification, but basically these are some of the areas in which there's extensive activity going for analyzing ENCODE data. These various computational groups along with the production groups have generated over 30 different algorithms that are out there and publicly available. There's a website I should have put up here that will actually list where these are. I know most computational biologists hate to use each other's algorithms, but a few of these have actually merged. So in fact, they're getting used widely by the world, which is nice to see. That's what I would call a remarkable achievement in and of itself. Okay, so here's a summary of the impact. I think we have lots of open and diverse data types on the same cell lines and tissues, so people actually do use this. You can think of especially the heavily study ones as reference cell lines, and that's what a lot of people do, where they dig very deeply to try new kinds of analyses and such. We've had strong impact on setting up experimental standards, new analysis methods, and these methods and standards again have been widely adopted by the community, and the data are widely utilized. I think we showed the slide already, but this is the number of publications from ENCODE, purple or the ENCODE folks. So if you take those out and you look at non-ENCODE folks who are publishing, you still have a fairly heavy dose of information that's getting used, and over 750 papers, and over 200 of those are actually disease-associated, so it's having impact on human disease. And there's also a related set for non-ENCODE coming out that these are papers that have been published by non-ENCODE people. This is just some of the disease areas where these papers fall. Again, wide range of disease with cancer showing up quite a bit. We have a variety of outreach activities to try and get the information out to the community, so there are tutorials here, and I'm told they're quite good. We've had workshops that have helped set up relationships with other groups that we think would take advantage of ENCODE data, and there will be a users meeting this year. I think it's the summer. Other high-level impact that's occurred is summarized here. We can now view the genome as sitting in interesting segmental elements that you can look at. I think this information has been useful for helping to organize certain regulatory principles you can think about, so there's a series of papers that came out in the last burst of ENCODE papers. But this is arguably one of the most high-impact areas where ENCODE's had value, just to reiterate what's been said already. At least 85%, if not more, of SNPs from GWAS-associated SNPs, LEED SNPs, do lie outside of coding regions, and so you can actually start scanning some of these SNPs and see where they lie relative to ENCODE peaks and get excited if one of your LEED peak falls on top. But more often than not, it can be a situation like this, where you'll have a LEED SNP that's somewhere in the genome, and then you have to look in a larger linkage equilibrium, this equilibrium block, and try and find a better area that might better match what a candidate region causative SNP might be, and this is a case from something we looked at, where there was a LEED SNP for type 2 diabetes, but actually the most striking candidate was over here, where there was a lot of ENCODE data suggesting an in-fact transcription factor binding site lay, which might be the more causative or a stronger candidate for being involved in the disease. So this is one way, again, I think many of you are familiar with this, but this is one way in which these data can be used. Actually just reflecting back a little bit where we were in 2003, when the genome was finished and ENCODE was launched, this was our view of the genome, 25,000 protein coding genes, not that many non-coding genes, mostly tRNAs, no RNAs, and a few small RNA genes, things like that, and there really wasn't very much regulatory information mapped in the human genome. Fast forward to where we are now, well, the number of protein coding genes probably dropped a little bit. There are, in fact, thousands of non-coding genes that are quite reproducible, and some of which have been ascribed function, not necessarily all of them, and they're certainly now thanks to, not entirely thanks to, or not entirely due to ENCODE, but certainly ENCODE has had a big part in actually mapping regulatory segments throughout the genome, and now we appreciate in many hundreds of thousands, if not millions of regulatory, potential regulatory elements throughout the human genome, which I think gives us a somewhat different view from what we had 12 years ago. So those are some of my thoughts about what we've done to summarize, and again, we showed this picture, but here are the groups that did this, and I think I made it in 15 minutes. Ewan? I'm really impressed by the 800 transcription factors. How did you scale that, and what is the, is there anything that's coming out from that in terms of the, 800 is really a sizable level where you can start thinking about the genes? Yeah, it's even bigger than I realize. I've got to go see where they all are. What can I say? I think some things that have helped a lot. We've actually switched into more automated mode, and Rick can comment on his, but we now do things in a 96-well format. And so our production rate just in the last six months is three-fold higher than it was before. That doesn't sound like a lot, but in certain things like transcription factors and salaries, that's a good thing. Sorry, is it still all antibody-based, the kind of performance? Yes, there is some tagging going on. That hasn't been quite as high throughput as we had hoped. So there's tagging with GFP lines from Kevin White. I didn't present all that here, and that has led to some. But it's mainly antibodies. That's impressive. What's that? It's mainly antibodies. It's mostly antibodies. It does seem high to me too, to be honest. I think it also counts chromatin remodeling, proteins, and things like that. So it's not all pure TFs, it would certainly have other RNA polymerase subunits and things like that. It's a slide I got from Brenton, what's that? What color was the 800 blue or red? It was black. To be fair, the blue is red. These were supposed to be ones that were done, right? So according to Brenton, they're done. He's the one who pulled all the data together. It sounds high to me too, I have to say. I think a decimal place mixed in there is somehow another. No, it's more than 86 that I know. No, I'm joking. I'd be surprised that the number is that high. You know this, but the antibody success rate has always been poor, and it's gotten worse for both of our groups, I think. Well, we can check it in 70 or 90. But it is a lot. I'm not sure what the actual number is. I know when I last looked at it two years ago, it was about 200. Yeah, so, yeah, Frank. Yeah, actually, I wondered if you could speak to that antibody quality issue. Maybe two questions, anecdotally, I always hear about antibody quality being poor, but how does that factor in? I know there's antibodies checks, but are they keeping up with the amount of data being produced? And is it just being reported like the antibody quality, or is it really being linked in a meaningful way to data quality? Yeah, good question. With regards to the latter, all of the antibody characterization data is now getting posted. So you can actually look at it yourself for the western blot or the mass spectra and things. So all the metadata associated with an antibody characterization will be associated with each of these experiments. So that part you can follow. On the issue about keeping up, I think there's mixed opinions how we're doing. I think Rick feels that there aren't as many good ones. Our group so far hasn't had, we're not limited yet by antibodies, although will we be? I'm not so sure. There's a lot of groups that are still producing them, but it is true that we characterize four times as many antibodies as actually become successful for CHIP. That is to say only 25 percent of the antibodies we start out with, we deem suitable for a good CHIP experiment. So that means you're characterizing, you know, for hundreds of factors, you're characterizing thousands of antibodies. And now we didn't use to, but now we actually keep track of those that fail. So they are put up there along with their lot numbers and such so you don't have to make the same $300 mistake we did. Yeah, let me comment about that. I know this is technical detail, but it is maybe important to some people is that we did start seeing a decline. Mike's group, our group, by as many antibodies from as many different companies as we can, we've made a few too. And we saw at least a two-fold drop in success rate over the last year or so. We did. Mike's grouped in, but we did from the groups that we're buying from. And we think it's because the earlier ones were vetted. You know, they had been tested at least some. You never trust a company when they say it's CHIP-able because whatever that means. But some of them clearly had at least been tested somewhat. So I think that may be why. Yeah. I think in general, you know, we can give you some rules, but the polyclonal ones have worked better than the monoclonal ones and such. Yeah. Ross? And I wanted to follow up on that because I think regardless of what the actual decline in success rate is, and there's a decline, it raises the question of whether or not antibodies are going to enable, whether or not there are going to be enough good antibodies to really cover the space that we want to hit. And of course, there's tagging methods can be applied and they have their own issues, but you know, the antibody will work. Yeah. Well, maybe people know this, maybe they don't, but there is a common fund initiative to make these that has not generated them at the levels we've been hoping for. And obviously that would have been nice. There's huge value to having a renewable antibody reagent that works forever because it is renewable. You can use it on any tissue or what have you versus tagging where you're not going to go around tagging humans. You can tag cell lines, but you won't. So these are the issues you're facing. So I don't think the value of having good capture agents is going to go away. I think that's always going to remain extremely high. It's incredibly high in other fields like proteomics and things where you start finding markers and you really need a high quality antibody. So I think it's still strong in many communities that we need such reagents. It just may take more work to get them. And I think there are creative ideas for doing this, just like there are creative ideas for new sequencers and we need to keep exploring those. I'm not sure who wants to go next. I had a quite provocative question for you. When do you call it the day? I mean we have now so many hundred. They're all in MCF7 or Hila where most of them don't belong. How far are you going with this, the whole 1600? Well that's the goal. I don't know if we're going to make that in 2016. This is an interesting discussion we have with Lease quite frequently. So obviously when things aren't expressed in these relevant cell types we try and move into other cell types. That's why not everything is done in those four cell types. There was a goal initially to pick six. We never quite got that sixth human sample done. But there's liver in these other four that are very high priority for us and so we've been trying to do that. They were chosen in part because they're fairly diverse and we thought lots of TFs would be expressed in those cells and so that does give us a fair amount of leverage. But it is true that we'll probably have to go into other systems and in some cases for example you may have to do human fetuses or what have you, early development or do some creative things with stem cells. We haven't done perturbation experiments. This is always something discussed as well. So we are monitoring this at some level to see where there are cells where there's still a big hole and where we don't have transcription factor binding information and which cell lines would help maximize that. So I think that is one thing to consider going forward in the future. I think another thing and maybe this is inserting my bias before Joe speaks but I could easily envision us choosing other reference lines as well. You know these some of these were chosen for historical reasons and they work well and you can also, they're transfectible and people were working with these and we can now think it's a different world in 2015 than it was in 2003. Maybe some other lines or tissues would be better references than the ones we're using now. I don't want to cuddled the argument yet so I think that'll come later but these are things you could think about. Some of which would then help fill out the remaining parts that we're missing. Getting back at your question and other would just be useful to the community as references as well. That's something to consider. Aravindar? I was just going to ask you know a cheap way to find out is the thing to do is there are thousands of you know scientists who use antibodies on their own favorite primary cell line or cell type of tissue and there has been a consistent way of mining that information and I don't know but given how encode is now being used whether there's some way to find out their experience all they have to do is tell you whether the antibody is chipable and how good the evidence is which can be excellent in giving you that information. Yeah so two comments there is we did try we put out calls to folks if you have any antibody we'll chip it for you and give you the information back so we have put that kind of request out there and some people have taken us up on it but not as much as I had hoped. The other comment I can make is Rick has the same well I guess three comments. We have interface with all the antibody producing companies and some actually give us the very low cost antibodies because we are doing high throughput in return they get information back about how good their antibody is which they like so we have interfaces certain with all the major producing companies. The third thing I can say is there's something called an anabotapedia? No it's an it's not anabotapedia. Is it anabotapedia? Anyway it's the human protein atlas associated resource so there that is a crowd sourcing area where you're supposed to put information about this on but very little that's chip related as you may know it's more immunostaining and immunoprecipitations so yeah. Maybe one more question. I would just say so we've heard that there's some issues with chipping transcription factors but as an outside user it's one of the most amazing resource that ENCODE has to offer it's really unique to ENCODE so whatever needs to be done I think should definitely be done to make sure that we keep flowing through more transcription factors and more cellings. Oh that's great to hear. Thanks.