 Okay, I'm kind of an outsider here, don't work with ENCODE at all, but I think about the problems that ENCODE also faces in my own work. My background is actually, I spent 30 years training in biochemistry and studying genome or gene regulation from a mechanistic point of view in the past 15 years doing it on a genomic scale. And as I was listening here, some of the things that really resonated with me was I think Joe mentioned this, but it was sort of the PI vision, and it's really the how of each of these elements. I think when we talk about how do these elements work, we really mean mechanistically, how are they pushing proteins around, how are they working with each other, the whole conglomerate of factors. And when we think about the whole conglomerate of factors, there's sort of one thing that's kind of missing from this whole thing, the elephant in the room, and that is if you get beyond the functional elements themselves, there's a world of factors there. And beyond the chromatin, beyond the sequence specific factors, there's the chromatin regulators. And I know some of these things are covered under ENCODE, but at least from my perspective that they seem to be afterthought, and it's really the primary focus on the sequence specific factors. But you have these vast number of regulators, you have the initiation machinery and the transcription machinery and elongation, all these guys are going to be working together as one mass of complex, and I think it's really important to understand how all of these guys work together, so that's the entirety of it. So when you think about mechanisms, here's a mechanism, a car steering mechanism, and we can think of our bodies as mechanisms, and when things break down you need a mechanic for the car, but for us we need drugs, and how do we get drugs, but we need to understand mechanisms, and how do we get mechanisms, but it brings us all the way back to the top, and the concept of how do we take all this genomic data to get to these mechanisms. And the first phase, really in this process, is called correlations, and we've been doing a lot of that, and that works really well. But correlations is like saying, okay, you have a steering wheel, a rack, a pinion, here's the target, all kind of correlated with each other, they're all in the same place, but that doesn't mean we understand what they do. So we need to go to that next phase, and there's three parts to it that I see, because correlation is not causation. We have to determine causation, and structure and dynamics, and the way I think about it is causation really comes down to making genetic perturbations in the system. So over here, if you took this rack away, well, what happens? Well, these wheels won't turn, but the steering wheel will turn, and the drive shaft and the gear will turn. So we know parts of it will turn, other parts won't. So that's causation, and we'll know what the wheel can do up to this point, and the steering wheel would not be on that, all right? The second is high resolution assays. If you look at this structure here, you can pretty much guess what it does. So if you have a really high resolution view, and I'm talking about base pair resolution view of the genome, you can just get a visual on what it is, and that's going to be quite helpful. And finally, time course, you can get a better sense of how things work when you see them moving and changing with time. But I'd emphasize, I don't think time courses are necessarily causation. They can be along very parallel paths, but not necessarily causation. And really, the whole process of knocking things out and looking down the stream of that may be more effective than causation. I put a red X next to functional assays in states, because at least for me, these concepts, histone, chromium states, and things like that, weren't necessarily resonating with me. Because, in part, they're either out of context or they're rather low resolution. And if we get into things like causation, structure, and dynamics, I think these things fall out from that, that you'll get function, and you'll get more detailed definition, and therefore states go away. So when I think about states, I think about in resolution. Here's sort of an aerial image of actually a schoolyard. And you can get lots of things from it, but at low resolution, I kind of think of chip-sync as this kind of resolution. You get certain types of information to be quite useful about where things are. But the details are kind of lost. If you get much higher resolution, you can get into the details and change the questions that you're asking. You get much more specific, you can see how all the parts fit together. So that was the whole visualization component that I was mentioning. You have the details. And one concrete example of that is Fox-A2 transcription factor. We were doing some of this work with Ken Zarath's lab. And essentially, the chip-exo assay that you can use with exonuclease to chew up the chip data and chew up the borders, so you can get pretty high resolution. And with Ken's lab, we found 35,000 sites for that each one of them with a binding motif. If you blow that region up, you can see that you can get the chip-exo sort of midpoint between the two peak pairs of the exonuclease stock. And each of these columns is a single base pair. So you get really base pair of resolution. You can line this up to the structure of the protein. And in fact, if you just line up the sequence there for each of these columns, you'll see they line up with the nucleotide. So these three green things there, T, T, T, T, you see up here and then this G is the gold. And then finally, this midpoint between the blue and the red is the cross, so you can define it right here. So you get pretty, we use this actually for structural information. It's great when you have the actual crystal structure, because you really need it to interpret the data. So the charge that I had here was about coming up with a proposal, a proposed set of words, all right? And so this is what I'm going to describe to you. And really it comes down to like a proposal, you need to have some background, and then you need to propose an experiment, and things like that, and feasibility. So for this cartoon purposes, these monitors will be the genomes, the ATTG, the epigenomes sit on top of that, that's the blue box, and then the environment. And to me there's really two major factors that one could change in a system. It's either the genetic change or you can end environmental change. That's really all there is if you're talking about doing high sequence sequencing to look at the epigenome. So there's these two types of perturbations. And of course, each one of them creates an epigenome change, and the epigenome change goes on and creates a phenotype. So this is kind of the framework that I think about things. And I want to just use one example here, as in a proposal, you might have some preliminary results. This is our sort of chip-acto-assay of a protein. It doesn't matter what it is. The window here is about 100 base pairs from here to here, just to give you a sense of the framework. And this is promoter regions of a set of genes where this protein is bound. And if you elicit a genetic change, turns out a protein that regulates this thing, the confirmation changes. But it's only at a subset of genes to which this protein is more broadly spread. So through these mutations, we can get a sense of what's happening here mechanistically. Now there's a lot more to this. I'm going to show you in a much more descriptive manner. And I'm going to take this data here, which you're all familiar with, as heat map. I'm going to turn it into an abstraction. And then I'm going to start layering other data sets over it. And how it informs us of mechanisms. And this is what you see here. So for example, there's about 100 genes here. Okay, and it's just average data through. So these one set of genes, another set of genes, and so on. And you have a blue protein bound and this red protein spread across here. And then it's green and the black. And if you actually go through this genotypic change where you actually remove the blue, the red collapses, the green expands out. So this is spatially what happened. But instead, if you went and did an environmental change, the blue also goes away, but the red stays broad and the green goes away. And then this black thing moves in. So there's all sorts of these microscopic or base pair level changes that take place when you go through either knock out a factor and ask what downstream things happen, or you create an environmental change and you ask what downstream events happen. But really what that takes is having a look at the individual proteins at high spatial resolution so you can separate them all from each other. So in terms of a proposal, the idea would be simply, and this is, anybody can think of this as fairly straightforward, you have all your various cell systems. I don't care what they are because, from my perspective, we work in yeast, we work in flies, we work in mouse, we work in humans. I don't really care what the organism is. I just care about mechanisms. I care to understand how all these proteins work together. And fundamentally from yeast to humans, those mechanisms, you can think of them irregardless of the organism. So you take your cell type and now you do your various tools at it that you have. Your dynamics is going to be time points. You can do your CRISPR-Cas9 based solutions. And then plus or minus perturbation. And you go through this and then you run through your assays and you stick to your quality control metrics, okay? So part of any proposal requires a budget. And so Ross came up with 800 million. Did you come up with a budget for that Ross? Not enough money in the universe. So get reasonable about it. And this is just sort of a down payment on Ross's proposal. Is to say, okay, well if you took ten cell types and you took three time points through that and five deletions of, let's say, the sequence-specific transcription factors. And you were just trying to look at 22 factors in particular. Of course, you'd need to do replicates. I put three down. You want two to hold, one in case it doesn't work for that asset. So you add that up, you get 22 of these factors right here. And that's about 10,000 assays right there. It seems rather simplistic, but you do the multiplication and it's about 10,000 assays. If you can get it down to $200 an assay plus some fixed costs, you're up to about $3 million just in this project. So these are fictitious numbers, but they're hopefully reasonable. That's the kind of project that may be feasible for one proposal, I would guess. Okay, so now this gets down to sort of my last slide, really. And that is how to actually make this thing happen. And this is kind of, some of this I think, again, I'm sort of an outsider looking at code, so I don't fully understand the details of it. But it strikes me really what you need and have already is just basically a pipeline, data production and processing pipeline. So you make it and then you analyze the data. One idea would be the community samples come in. So they can be anybody who's interested in this or perhaps even to an RFA come in. And I think what really is key for me here is this is a sort of single monolithic pipeline, if you will. That the data comes through, the samples come in, they get processed and they get pushed all the way through. And then finally comes out the other end. This is the rapid release that the community can do analysis on. And the reason why I'm kind of keen on perhaps a monolithic, at least for the purpose of discussion, is because of standards. I think really what's happened over the past many years has been a compromise. Okay, standards were hard to come by. They were years in negotiation, they finally happened. But really they were working progress. And as we go on, I think we need to continue to improve those standards. So we also need to somewhat lock them down and hold different labs accountable that may be processing samples to the various assays. So some of the enabling components, I'm going to get back to this in just a second. But this is what I think that whatever runs the pipeline can control. For example, you have affinity capture. We talked about this today, antibodies. Antibodies can be a real challenge. I've heard Mike that 75% of them in sort of a random test aren't very good. So we need to get a hold of that and maintain strict standards. There's noise minimization, there's spatial resolution that's important. You can use the exonuclease, Mnase, Dnase, Tn5, the mapping five prime, three prime ends of RNA. These are all some important high resolution methods that will give you a spatial picture of what's going on. I think it's important to look at these things in a native context rather than in reporter context because the native is going to be really the proper context in which this thing will be ultimately functioning in. High throughput, that's going to be critical. But I also think of it as true production versus variable compliance. True productivity or processing through, you might think of a car being manufactured in a plant. There's absolute standards. Everything is done the same way. It isn't like one plant might manufacture a car slightly different from a different plant for the same model. So the question is variable compliance. I think that's going to be important. And perhaps finally, this is something I actually gave some thought and I've really waffled on. Should the DCC be maybe something more of a line item? What I mean is should it be open for funding, determine whether it should exist or not? Or should it already determine that this is what we need? And it's going to exist. And really what the RFA should be about is how to improve it, how to adjust it, how to tweak it, how to do all sorts of things with it. But that question is existed, all right? Because I don't think we need something like this and I don't think that will ever go away. And I don't know if it's a good thing to have it pass around, maybe from one academic site to another. It seems to be maybe we really want to lock it into place. Okay, and I'll just turn it over to this. These are some questions that I think Elise sent over to identify. And I'll just leave the answers there and answer any questions that you might have. So Frank, you're proposing the sort of the sequencing center model for this, right? Well, I mean that you have, that you say. Yeah, and actually one thing I, I really- Which is sort of different than your $3 million project. Yeah, yeah, actually, so a sequencing center but not necessarily just one. You might have different pipelines here for different assets and I think Encore already does that, right? There's different labs have different specialties and so you can use that. But I really see the central pipeline as the DCC, coordinating not all of it but also setting the standard both in terms of before every aspect of it, a lot more centralized control. Also I bought, when I bought the last time I bought a Toyota, I got the one that was built in Japan, not the US because it was better. Because the doors closed differently. Close fancy and better. You know, Frank, I would say most of that already exists or is in place. Right. Okay, so maybe I'm missing what you're proposing. I guess just to continue doing what we're doing. Yeah, no, no, that's exactly the point. I mean a lot of these things already exist, just from my outside point, these are just independently what I think should happen. They already are there. And the point really is just around the details, improving it, tweaking this, and how it might be centralized. Well, wasn't the community sample input, wasn't that a change? Yeah, well that was part of it. Yeah, right, that's what I'm, yeah.