 Thanks, Mark. So, let's see, is it, what do I do to advance this one? Okay, we'll find the button here. Okay, so this is going to be a little repetitive, I mean, sort of to continue on a little bit repetitive but some additional suggestions and I will say this workshop was really, I thought a really good one. It had about twice as many people or it was pretty big. Lots of outsiders and insiders and it was very helpful and when we say should and recommendations, there was no consensus about everything. I mean, you already have heard some things that are, and so they're just suggestions. I think it was really good to have that to be able to discuss it. So, one of the things, and this is partly, I think the original title of the workshop or at least the way many of us came in thinking what should an HGRI do next, you know, now that we've grown up, what's the next type of stuff. So, a lot of identity discussions about what is the institute. So, clearly it supported large scale projects and these have been extremely important for a lot of reasons. Obviously, the standardization, I'm not talking about just ENCODE here, I'm talking about everything that the institute has done from the beginning, including the Human Genome Project. What comes out of this are quality control metrics, all sorts of things having to do with the processes that you use, economies of scale, the way you share data, really relatively different here than it would be for a small grant or small projects. And it allows you to at least start thinking about integration more. This is not the only place that you see that, but it allows that. So, the smaller, medium-sized projects, of course, are critical as well. It's not either or. They really should go hand in hand. Obviously, with smaller ones, you get much more detailed knowledge and focus and interest in a particular biological problems, which, of course, is what NIH is about in general. It's often much more mechanistic. Well, and often that is the goal, is mechanism. And often in these big projects, you don't get that mechanism. We should have some discussion. I think Frank Pugh will probably bring this up about that being potentially something that we should think about integrating into larger-scale projects, too. And for testing specific hypotheses, obviously. And also, obviously, the smaller ones support a large number of researchers, but I will say the big projects do, too. I mean, NCODE has a huge number of people in it. I don't know how many all told, and the genome project had even more. So, I'm just going to go through some of the cross-cutting themes that were at the workshop and then a little bit more about, I guess, recommendations and quotes. So, clearly, a big part of this is genome biology itself is important. We can't forget that. We haven't figured it all out. It needs to be something probably that the institute should continue to lead in. Evolutionary genomics is really important. This has been extremely valuable, not just for this one point here about finding conserved elements by comparing our nearest relatives, but for almost every step of the way. And, obviously, RO1s, many other institutes do this, use this, but it's been a pretty critical part of NCODE and some other parts of the NHGRI. And I just want to emphasize that we have learned a lot about from before the genome project even started. There were about 100 Mendelian diseases that were understood. There are now 4,000 or 5,000, some very large number, and growing. And they really teach us a lot about knockouts, because most of them, of course, are recessive loss of functions, mutations. Some of them are dominant mutations, and they're interesting. So, these are suggestions, not recommendations. That's the word that we're using. So, again, just echoing what a couple have said. Don't try to, NHGRI probably shouldn't try to do everything in genomics, and clearly it's not. I mean, all the institutes, or many of the institutes, are using genomic tools, especially in the genetics arena, but in others as well. So, and this is actually not a suggestion to partner with other institutes, because it has been since the beginning, but probably need, there's probably opportunity to do even more. And I will say that I do remember in 1990, that was a big deal to start trying to get the institutes to interact more, and I think it's been pretty successful. So, many whole genome sequencing projects, especially, and even functional genomics, on a large scale, with really, really large numbers of samples, sometimes hundreds or thousands, are being done by other institutes, and will continue to be done. These are usually disease specific. That's not really a suggestion. This is sort of the background for it. But the suggestion is that NHGRI should still play a major role in those kinds of studies, but as you've heard, probably should not be the own, certainly are not expected to be the lead in those. So, here is some of the ways that we thought that, and these are from the workshop, they're from our colleagues, and partly just Mark and I talking about this over the last few days. So, one of the key things is, I mean, it's wonderful that we can sequence DNA for whatever we can, and don't believe $1,000, but whatever we can do a whole genome for, and maybe we can get better at even analyzing it too, so that it's even less. But imagine that being another 10-fold boost, or another 100-fold boost, and that's probably the only way that that will happen is for NHGRI to continue to lead in this, or at least support the development of this. And I will say, well, these have been companies, they've been academic company partnerships, and I've sat on lots of grants that Jeff Schloss and others led that for technology development, and a lot of the technology development comes out of sometimes wacky ideas in academia that end up leading to companies and sometimes monopolies, unfortunately. So, this is actually something I wish Evan Eichler was here, so he would see that I'm pandering to him, but we should support work to where we really don't give up on the hard parts of the genome. In fact, we almost did in the genome project, and Evan was one of the ones, but others as well, how critical those regions are, and so we really want to be able to look at every base pair in the genome, regardless of the context of it, and try to figure this out. And I'll bet you in all these big projects that many of us are involved in looking at undiagnosed diseases, et cetera, I bet you many of the ones that we're not seeing are ones that we're not sequencing well. I could do several slides on just saying this. You've heard a lot about it. I will say one thing about phenotypes. When we talk about all these high throughput assays for function, especially for transcription function, which Joe showed one, there are quite a few others, those are really valuable. They are ways of helping to link at least molecular function to a SNP or to a DNA sequence variant, but that ain't the organism, and I think we have to, I mean, that's an obvious point, so do you make a mouse, do you do whatever, but going from, I work on psychiatric disorders, which are the hardest and worst things to work on, and trying to link a promoter mutation that clearly affects transcription in maybe a big, big way to whether that's causing the phenotype is probably one of the hardest things that we need to do, and I don't think there are easy answers to that as Joe mentioned. And again, and I'm not pandering here because even though I'm not a bio informatics, clearly the rate limiting on most of the things and many of the things that we do are not the experimental wet lab parts. They still need to be done well, and they are still hard, they still need developments, but is figuring out how to handle the data, and that means every step of the way on that. And while this institute does support that in a quite a big way, it almost never is enough, and we need to probably get a whole lot better at that, and so part of that is supporting the development of new tools for that. But I will say this just from my own experience, that development needs to be hand in hand with the biologists because it does not do any good to develop algorithms if nobody needs them. And then systems biology approach, we've already talked about and others have talked about doing perturbations in a big way, and we all see that as happening a lot more. So sort of summarizing a couple of those is emphasizing integrating functional studies with evolutionary information. We don't do that very much, and it turns out to be very, very powerful. So that means not just measuring conservation, but looking at all of the parts of that and the population genetics folks especially, but then going back and integrating that with kind of understanding how transcription mechanisms work. You have a variant, you think it's, you know something about its conservation, and then you need to understand if we're talking about transcription, you need to understand transcription. And we'd also like to get better and better at predicting whether a variant is deleterious or not, and so there are these various methods that have been developed that are quite good, I think they can get better too, or we think they can get better. So this is starting to get into the mechanics of how you might do it, and I think we're going to probably have discussions throughout this workshop. How do you want to do this? And again, I'm not being self-serving here because I want every one of these, so I guess I am self-serving, but you want to, you'd probably still need to do some production size or some large to medium size projects like ENCODE so that you get the advantages of those, but you need lots of smaller grants, and especially to create the technology but to apply it as well, maybe partnering more. And I think this is just a general point about NIH, but certainly here as well, that those should continue to be probably an increased part of their portfolio, and that means things like these couple here, but many others as well. All right, so here's some of the more, less experimental, but more. So the idea that there's all these efforts that are going on, and they're not coordinated in any way, and there are different institutes, there are different countries. Some of them, I shouldn't say they're not coordinated at all. Dan and you guys talked a little bit about Mike, and Dan talked about all of those projects, and clearly there's some interactions between them, but you can't go to one place and get this region of the genome in all the different places and all the different projects. Here's what we've learned from it. I don't know how that might be so hard to do, but this whole idea of interoperability and cataloging them, integrating them, and somehow do that. The idea of the experimental and data analysis standards, having some set of standards, that's the only way you'll be able to integrate or make them interoperable. And this isn't just whole genome sequencing, of course. This is all the stuff that we've been talking about with functional genomics, et cetera, and metabolomics even, epigenetics. And then I just, one more pitch, which I hope we, I think we all should say something about this. One of the very powerful things about NHGRI, some of the other institutes, I guess, I know do this as well, but NHGRI from the beginning has taken a big, big effort in training genome scientists, because we didn't have genome scientists when the project started, and there have been lots of some of those are great success stories in this young folk, younger folks in this room and still up and coming. Here's one reason, Mark and his team got this together, and actually these numbers might even be underestimates. I don't know, he was worried about TCGA being so big, but this is just RNA-seq, and they're just giant numbers of data sets on those. I would love to be, and frankly we do use TCGA ourselves. We go and look at it and apply that back into ENCODE. It'd be nice if that was easier to do. I think that's probably the most important thing, and that's just one type of data. There are others as well. GTEX doesn't have a logo. I couldn't find it anyway, but maybe they do. And so here's one example and is that people have talked about the DCC, that Mike Cherry is running for ENCODE and the data DAC also, which is a data analysis center, and it's basically groups in ENCODE. A lot of people in ENCODE who get together and put all the data together in a high dimensional, high quality way, standardized, and in a coordinated way. And that's one reason why I think ENCODE data is valuable, because we've been able to do that. And if it was just, if it wasn't standardized, and we weren't doing things similarly, it would be much harder to do that. And it'd be nice to be able to do that worldwide and, you know, connect that to sequence variants. I mean, that's what Joe kept mentioning that Mark and I would talk about. That was a big part of the meeting as well as what we all think we should be doing, of course. And then I have standardized pipelines for everything that we do on that. Mark like this, I can't remember who proposed the Amazon model. Was it John? John, I think this was you, John. We'll blame this on John Stam that the idea that Amazon started out being really, really good at one thing and then expanded into everything. So, all right, so this is, Mark, you want to come up for questions? This is, that's what we had to say. I think key messages were here and hopefully will lead to discussion. The last part, I think it's really good and really positive the way, in particular the ENCODE 3 DCC has developed and evolved. But I think it's inevitable that all of biology interlinks and there's a point where you just want to traverse all datasets, you know, from structural biology through to ENCODE, through to something else, through to cell models, through to image datasets, etc, etc, etc. And that's a big role the, you know, NCBI and EBI play. And I, the good thing about the current model is that the way Mike does it is very much in line with these bigger aggregation points. But I think ENCODE's got to see itself in the context of all these other datasets. It's not the case that because of that interconnectedness, you know, you should try and organize everything because, because it's, that problem is just impossible, nor do we think that way at the EBI. It's really genuinely impossible. What you have to do is enable all these different communities to correctly link up across this information space. His, his, you know, he's run the Saccharomyces genome database since the 80s or 90s whenever it started. And it was, is, sorry, he still runs it, I think. It is that. They have whole teams of, of not just curators, but people who try to do that for lots of different types of data, including, I think, cell biology and imaging data. But this, I'm not sure if that's easier. It's a smaller community. I mean, I think it's a smaller community and more organized data, right? I just don't think, I would caution from trying to, trying to, trying to have the ambition to be all of that connected. I just think that it's just going to collapse on itself. You will, it won't work. I mean, that's, it's very clear it won't work in my head. Because that's kind of all of human biology and there's no way you can, you can handle that. One more, you probably answered the kinds of data broker and so forth. And I, I kind of think that, you know, one question is, how does EBI and NCBI see us? And obviously you're the person to answer that question. But, you know, in the sense that you probably don't want all these projects and all the individual vests directly putting stuff, you know, into the central positive. You, you probably want something like a genomic data broker that's kind of, you know, handling a lot, kind of putting together something. I mean, what, what would be, I mean, in terms of making the ultimate knowledge base for biology or for science in the future, what, what is the right model actually? I mean, so, so we absolutely, we, we use the word broker. This is quite annoying. Whoa, whoa. We use the word broker and, and a number of resources act as a broker. We have community driven resources that act as a broker, sometimes model organism databases, like fly base, like, like yeast, sometimes portals, like encode or, or organism encode. Other people completely back their whole limb system onto us. So, Sanger does that, TGAC does that, everything else. So we have a variety of different modalities about how that can be supported. So I do think the broker model is the right model. But that model suggests that a community has to say this is the data sets and the data items that we really organize. You know, this is the stuff that we as a community do. And this is the stuff that therefore we don't do. And this is the stuff that we want to broker back into the system so that everybody else can use it. And, and drawing those boundaries correctly is the, is the art of the game. And I think that it's quite easy because everything in biology is interconnected to suddenly not see any boundary. And then, and then it just kind of becomes a sort of, it becomes an impossible task to, to handle that. And I think you need to have the right boundary of the, of this, of that ambition. Yeah, I think, I think one, one thing is, you know, to start out, I think it's possible, but it's very hard, as you and said. But the first thing is to take similar data and get it in one place and interoperable. So you could just take the encode like data and get it somewhere in the world, in one place, right? In the next decade, that would be a good thing. I mean, I still, well, I think your mission one is to make sure that you continue to organize your own data and break it back into the system and make it, make that accessible. If you fail at that mission, then, then, then everything else is not, is not good. So you and then are all of you, that means then a user, I mean, we should be doing this for the users out there. That's the whole, the biggest point. The user might want, might be studying either a transcription factor or a cell type or something and not know about all those other ones though. So that's, that's the problem. So how, I thought that's what we were. Yeah. So, so, so those are the roles of the, of, of making sure all the data is registered and then people building other user-focused portals on top of those registries and schemes. And some of that, from a genomic sequence perspective, you call a genome browser. But from, you know, you think of genome browser not as a genome browser, but as a registry of interested experiments that you can look onto the genome. And, and so you shouldn't have one. You should have a big variety depending on the user group. And there should be a separation between the, the data flow coming in and the, and the portal that, that integrates data around it. And, and any other model, I don't think scales well. So I think, I think if you try and confound the portal and the broker in, you get yourself into a huge amount of mass. Well, practically though, do you think that, I mean, for the future genome browser, is that something that, for instance, for a future genome browser that takes in more than just encode data, it browsers all the genomic stuff or something like that. Is that something that EBI does? Or is that something that, is that a portal onto data in EBI? Or how does that work? Yeah. So, so that's, there are portals where EBI, so it's not just one portal, don't forget. Portals are really focused on user groups. So you might have a portal that's focused on people who want to start from structural biology and then go from there, or portal. It's really stepping stones from user groups. So there isn't, there shouldn't be one data broker, there shouldn't be one portal. All of these different ecosystem components of the ecosystem have to, have to work together. And, you know, UCSC Ensemble would, you know, they have forward plans, they want to keep serving their communities well. You know, if they were, they would, they would have ambitions for lots of these different things. That shouldn't prevent anybody else standing up and saying, I want to produce a portal around this, around some data sets as well. But you've got to be in a multi portal, multi data broker world. I'm just wondering whether I can sort of ask a different sort of kind of question. You raise obviously the issue of sequencing technology. And one part of sequence technology, classically has been more sequence, better, more accurate, faster, cheaper. But obviously it would be tremendously helpful because we are talking of cis elements for the most part to have long range, you know, contiguity. So where does, meaning where is that? And where does this fit in? Would it make some epigenomic analysis easier? I would, I could think of scenario as well. That's that's right. And where does this, where is the technology and how does it affect? So obviously, people are still working on this for that reason. And it is important when you do have it, it certainly helps with de novo genomes. There's still a lot of, there's still a few groups, not a lot sequencing a lot of hard genomes. But it's really paid off in RNA, I think more than anything because it get in the contiguity. Your point about really long range contiguity of putting in phase this element with this one in the DNA would be a really good one. There are, I mean, we, there are probably people in the room who know more about this than I do, but we use and are trying to stay on top of all of the different technologies and they're real error prone. They're not high throughput, they're expensive, but they could get a whole lot better. And I think that's, that's still the hope. Is that what you were asking or you were? Okay. Are there current error rates such? I can see that assembling a, you know, getting a new genome would be terrible. Even re sequencing, you know, genomes might be bad. But is there sequencing so, so bad that for many kinds of epigenetic analysis, it would be terrible? Yes. Yes. At least in our hands. I don't know. Jeff or somebody, I don't know. Maybe you guys know more. I mean, it and it and it's actually not just the error rate is that the throughput is, you know, 10,000th of the other technology. So it's really expensive. We done? Thank you.