 I'm totally, we're totally happy to claim Harris is part of our session. It just makes us look even better. Alright, so I think there was a lot of discussion in the early talks about the value of chromosome level assemblies and certainly the technologies are improving in terms of the ability to generate that data at reasonable cost. I wonder if there's anybody has thoughts on does that become a new standard and what would it take to make that a new standard? I would say that in my experience what I'm finding is that cost has been a huge driving factor in decisions that are made because I think almost 99% of scientists want more accurate higher quality data. They don't even like to hear the word low quality and so forth. And so because what I found is that from a psychological standpoint, if the cost ratio of generating a chromosomal level more contiguous higher quality assembly is three times more than a draft one, okay then people open up their pockets, whether it's the funding agencies or the scientists themselves and do it, but if the cost ratio is five times and more, even if it's five dollars versus one dollar, they're going to spend the one dollar. It's just been my observation. Beth and then Aris. I just wondering, Carlos, you seem to be promoting this idea that the funding for this can come from outside of these traditional avenues of looking for funding, but not really for this early fundamental foundational work that needs to be done that plants and animals need. Would you have any recommendations about how to go about getting this infrastructure built but not coming from these traditional sources? Yeah, so great question. I'm going to sort of punt and say that, you know, read the book, not my book, but one and it's about process. I think it's easy for us to pick examples and cherry pick examples, which obviously what I did in terms of a wonderful anecdote. I wish all eager grants and SBIRs had that kind of a trajectory. Ultimately, if you're making the case to investors, there has to be some economic return. That economic return doesn't need to be today, right? When Fauna started, it was clear to us that this isn't going to be a profitable company for the next five to 10 years, right? That means that you need the kind of speculation to come in. And obviously in that case, the wins are the potential pharma that comes out the other side, right? So in pharmaceutical investment, you're exchanging market risk because you're going to get a monopoly for technical risk, right? In the case of these kinds of technologies, part of the big issue is the market risk, right? You feel like, yeah, I can get a company that's going to standardize the technology, but are people going to buy it? So I think part of the issue is understanding what the research market is ultimately going to look like. And in that sense, the funder's drive, right? So NIH, NSF, Chang Zuckerberg, Welcome Trust, all of these coming in and saying, we're going to place bets on this is going to drive the technology development because then the investors will say, well, maybe there is a business here to be built, right? If the federal funding goes away, then that becomes extraordinarily difficult. And that's actually one of the things that Jane Way talks about. You can't have any one element on its own do this and accelerate the development of technologies as quickly as the three-party game does. But you also have to be willing to take some of the losses, right? There's a reason why in sequencing, it's just been a brutal game, right? Every technology just continues to crush the next technology and it kind of has to or else you're not going to make the shifts in putting money behind that, right? Eris? Hey, I'm Eris Aiden and our team has developed things like high C and juice box that you've heard a bit about and we also have a consortium called DNSU, we're doing things like together with Eleanor assembling about 10% of a million species. I wanted to comment about this cost issue and also specifically the use of long reads because I think everyone is on the same page that long read, you know, you know, assemblies coming from many technologies are phenomenal and that we should be doing that, we should be assembling everything. But I did want to highlight why, for instance, DNSU by and large, we've shared about 100 chromosome-length genomes mostly in mammals so far, why we have mostly been avoiding long reads and there's two reasons. One is cost. Today, a short read chromosome-length assembly, roughly human genome project quality is about 500 bucks. If you want to do it diploid with chromosome-length apple blocks, that's about 2,000 bucks. So you can start to imagine doing things like large fractions of the mammals at the scale of a couple of R01s. The second thing which is actually a much bigger factor and which really wasn't addressed at all this morning is acquiring samples. Acquiring samples that have good high molecular weight DNA that you can reliably use without having sort of a blowout where you do tons of experiments and end up with crummy data is incredibly hard to do. Whereas, if you're doing short read-based techniques, it's possible to use fecal samples, incredibly badly degraded samples, museum samples, biobanked materials. There's extensive biobanks, as we all know, that have thousands of species in them. All this stuff can be mainlined if you're using short read-based methods. So I think that there needs to sort of be a complementation to efforts like VGP with efforts that are going to take critical target clades, like, for instance, primates, mammals that are really key for NIH, and says, okay, let's just assemble all this in a couple years for a couple million bucks so that everyone has it while we wait to get all vertebrate genomes done at incredibly high quality. Yes, so I'll argue that, well, I think short reads will be useful for something that won't generate errors, and that's fine for some, and it should be. If it's cheaper, but I will argue that there is a bigger need for long reads than what you might be saying, is that a lot of genome assembler folks generate assembly and then pass it on to biologists to do analysis, and what they don't realize is the downstream cost in the amount of effort in time and monthly salaries you got to pay students to correct gene structures. And so the downstream costs are actually much more expensive with a draft assembly than it is with a higher quality assembly. So I would say if there's new generations of short read assemblies, that simply solve those problems today. Suzanne had her hand up first. So we heard a lot about the different technologies that are being used to short reads and long reads and advantages and disadvantages. I was wondering for the big projects like the Earth Biogenome Project, whether there's a push to have standards to use kind of the same sequencing technologies and protocols, and if not, do we need new tools being developed to kind of have fair comparative genomic comparisons to avoid biases and, you know, the different reference assemblies and how they were created? Thanks for your question. In fact, that's one of the primary things the Earth Biogenome Project is doing. There are five committees that are dealing with various aspects of standards from the early stages of the sample processing and collection, how you store tissue to make sure everybody gets the high quality DNA. We already heard what a problem that is. And it is the biggest limitation. It's not the technology. Ira's pointed this out. The biggest limitation is on getting the samples, voucher specimens to do this. And there are subcommittees on sequencing and assembly. There are shared subcommittees with the VGP, since VGP has been leading the way on those. But there are very specific things to the different tax that have to be accounted for. So we're trying to agree on a set of standards. It turns out to be relatively easy to do that when you can get enough DNA to do what you want to do. But if the organism is small or their single cell microbial eukaryotes, you need to think about it in a different way. And we're beginning to think about it in terms of the amount of high quality DNA you can get. So if it's less than 100 nanograms, you're really forced into other strategies. And if it's less than 10 nanograms. But we know the progress with PacBio has been quite amazing, five nanogram protocols now, a half a mosquito, a full mosquito. So things are really progressing well. But there's a whole world of eukaryotes out there that are single cell. And you won't be able to apply a uniform the same standard that you might be able to get 100 micrograms of high quality DNA to something that's a single cell. And we recognize that. So we'll probably do it on the basis of the availability rather than an individual tax up. Just to add to that point, I think if you look at the most successful federal projects, they've all had that kind of a flavor of standards production so that you could turn around the day after the project is over and redo the project for a tenth of a cost. That was the case for the Human Genome Project. That was the case for the HapMap Project. When we started the HapMap Project, we really didn't know how to genotype at scale. And the whole purpose was it wasn't to get the 270. By the end, you could just throw away the data and you could regenerate it super quickly. Same thing with 1,000 genomes. When we started 1,000 genomes, it's like, let's sequence 170 genomes. It kind of sucked. And by the end, you had GATK and all these processes that made it far, far cheaper to do it. So if you're going to tackle this, the other thing is that the design is important. So if you only start with crummy samples, then you'll learn how to do crummy samples. But if you want to learn how to do end to end really good genomes, then you have to have the right design and be able to do that at scale so that you could turn around and anybody else could do it. That's how you really democratize the technology. So it's not just we well-funded investigators, but rather you want the real community to be able to use it. And that's so true of Earth Biogenome Project because most of the collections are in natural history museums where the samples are not frozen tissue. They're dried leaves or carcasses. And we've learned this on the medical genomic side. How many times have we just started with the samples of convenience? I'm going to go study these cohorts because they're the ones I know how to get access to. And that's what leads you down the road of like, oh, well, all of the investment's going into this small pocket. It's like, well, that was a bad idea, right? You have to do the hard work in the beginning to set it up right. I'll just say one. I know there are questions here, but I'll just say one thing about that. We need the funding agencies and others to invest in getting the high quality samples. These animals exist on the planet. You can take an airplane and get to it or something like that. That's much cheaper than it is to have to sequence a low quality sample that then causes a lot of money downstream. And I'll just add one anecdote to that. And it's a great point. And when we look at the amount of money that's being invested by the Welcome Trust and the Sanger Institute's new project, 50% of the funding is going on the sample collection, acquisition, and DNA isolation, 50% recognizing the importance of this point. And you could also make the case to your institutions as to why they should invest in it. All right, I'm cutting off the panel and asking other people to talk. Susan, we love your question. So I think, Rachel, you had a question a while back. Yeah. So my question is sort of a two-part question based on one phrase that I hear used quite liberally. And it hasn't really been defined well. And that's the idea of a chromosome level assembly. So we have two problems here. One is that we don't necessarily know the chromosome complement for many of these species. And as we've already heard about platypus, odd sex chromosome complements, it's actually not that uncommon in a lot of lineages like marsupials, for example. And so is that part of this initiative? And number two, is this initiative actually going to be bold enough to tackle T2T? Are we going to include telomeres and centromeres, which really are critical components of chromosome evolution when we think about it at a chromosome scale? You've definitely touched on a hot button topic. For the vertebrate genomes project, EVP, and others. And so starting with the last point there, actually, there is a telomere or a telomere consortium that involves some people that overlap with the vertebrate genomes project, including myself and Adam Philippi and others. And so yeah, our goal is not only to try to reach this G10K metric that we talked about earlier, but also to eventually get the telomere or telomere perfect, hopefully, assembly. We're not there yet for most organisms. So when do you define a chromosomal level assembly? One is defined as chromosomal level. That is a hot button topic. But I would say there are two camps. One camp is you need fish-carry typing mapping and you need cytogenetic mapping underneath the microscope for all the species to do, or any species to define as a chromosomal level assembly. And the other is the high seed, like the approach that errors and others have developed. The high seed mapping profile can help you identify whether you have something that's arm-to-arm chromosome, regardless of the number of gaps, one gap 100 or 200 gaps. But is it a scaffold that is arm-to-arm and has no other scaffolds mapping to it, then some of us are defining that as a chromosomal level assembly? I think there was a question at the back. Yeah. Yes, you. Sorry, I can't see your. I'm Hans Cheng with the USDA ARS. So it's great that we're trying to get a lot more quality genome sequences. But one of the bullets there is on the annotation. So I'm wondering if I gave you all the sequences on every species in the world, what's the limitation? And the next step then is I would think the annotation has to be very important or critical. So I mean, Eric's done this and Robin, I guess I'd like to hear what your message is. So I would say annotations come a long way from when in a rapidopsis every gene was looked at. There are proven methods. And I think one issue for the community to decide is to do conservative annotation so that you don't overanitate, but also give the community different levels of annotation. So you may identify potential gene models and they should be there, but you can have a high confidence set and then you can have your working set. And I think part of its education, because there's lots of tools to annotate now. I think a lot of communities overanitate. They use every piece of transcriptome data they have. They get a lot of incorrect gene models. They present them to the naive community, I think, as bona fide models. And instead, there should be a high confidence set with this is a rigorous gene model. Here's some other ones for you to consider if this is your favorite gene of interest. And I think there's some education in there and there should be some defined standards as well so that the community understands, like, there's a gene here. This is the most probable gene structure, but here's some other evidence for you to look into. I guess I'd go beyond just predicting genes. I mean, that's low hanging fruit. I'm thinking, you know, regulatory elements and et cetera. I would just say for the vertebrate genomes project, I mean, this is a big black box unknown, but we're starting out with using NCBI and EBI have been pushing us to try to get transcriptome data for every single species sequence instead of just relying on homology-based annotation on other species. And we're starting out with brain and gonads for as many species as possible because you have a greater diversity of transcripts there, no epigenetic mapping yet or other type of annotation you're mentioning. Some predictions up to five tissues is maybe what's sufficient to give, let's say 90% of the genes annotated. We don't know that yet and we're gonna discover over the next several years, but it's a real big unknown. Okay, please. Common. So. Annotation. So I think that's a great area, you know, because basically for the DNA sequencing, you only know about the structure of these, what a real name function of each of those are the regular elements because there are really a teacher-specific and different development stage. They're also acting differently and even different physiological stage are different. That's how, you know, there's a huge of the effort need to be taken to generate more of those epigenomic data in order to understanding what is the real name function of those elements, you know, in different species, how conserved, how they're involved, and to understand a real biology of how they're regional of the genes. Steve, I think you had a question. So my name's Steve Ellison from the Division of Biological Infrastructure at NSF and your comment before about NSF-like hypothesis-driven research is spot on, but as I point around the room, you guys are the PIs, you're also the reviewers. And so my question is, how do you get the reviewing community excited about infrastructure support to push these things forward? So the eager story you told, that's a program officer making a call absent peer review. So it's not that the agencies are averse to infrastructure. We need the community to get excited about it and point us in the right direction. So how do you gin up some support for that? I would turn that question around and ask you that because the community is really supporting this in many different ways. We're getting behind, the community is getting behind each of the individual tax on projects and the larger projects. So I think NSF really should be involved in convening the community because I think the community support is there. There is no vehicle to support a project. If you look at the infrastructure projects at NSF, I was just talking all about this, they're in the break in terms of biological infrastructure. Large-scale sequencing would not be eligible for a grant for the infrastructure, for one of their infrastructure awards not, right? You're agreeing with me, Carlos? Yeah, absolutely. So I think it's all about special programs. I think you will never get this kind of stuff funded through investigator-initiated R01, kind of vanilla funding. There's a reason NIGMS didn't build the human genome, right? And NHGRI has always been extraordinarily strategic about this. They've punched well beyond their weight. It's a $500 million budget on $34 billion of spend, yet it imbues most of what NIH does, right? And part of this is that it's very strategic in how it sets up the mandate for the review. NSF has done very smart things in this regard, too, often partnering with NIH. So I would point to the Mathematical Biology Program, where I had the privilege of being funded and also sitting on the committee. And the reviewer instructions for that NSF Biomath Program was very, very specific. You had to be able to meet both NIH criteria and NSF criteria and sort of push in this direction of getting mathematicians and statisticians interested in biomedical research. And the reason that that whole mechanism was put together is because we're in enough of them, right? It's the direction you wanted to drive. And so I think from the Fed level, you know, this is kind of on the feds to be able to really carve out the strategic kind of funding that we need in order to get there. Because it's not gonna come from the vanilla set of funding that we've gotten so far. This is really where the interagency working groups kinda come in and can formulate what I think we need, which is a national sequencing strategy. Okay, so we're gonna take a few more comments from the audience and now the panel needs to be quiet. Cause then we're gonna let you guys all go to lunch and we can keep the discussion going after that. I think. So I saw Mark Ernstine. Mark, yeah. So how one thing I was keen to get some of the panelists to comment on while your folks are at the side? Nope. Yeah, Mark Ernstine from Yale. So one thing I was keen to get some of the panelists to comment on is in the human genome community. There's a lot of interest in alternate structure so that you don't like graph genomes and so forth. And when we're talking about comparative questions, a lot of the most interesting things talked about today were duplications, SVs, rearrangements. And do you think that will be meaningful in the framework of the grass genome? All the stuff that you're talking about for a comparative point, will you be able to understand that graph-wise? So we're gonna, yes. We're gonna, it's a great opinion and we're gonna go to the next person now and we'll discuss it more. Yeah. Okay. Charles Ernstine. Wow. An opinion. Yeah, I'm Charles Denko from Cornell. I'd like to get us back to the whole question of functional annotations and adding that to these genomes because I think it's a really critical point. And several of the panel have made the point that this is a really hard problem because of how tissue-specific these elements are. And the number of assays that you have to do to really annotate these things well makes this very costly. And so I would like to make the point that we need to really pick out these assays carefully. If there's one thing we've learned from in code is that there's a ton of shared information between these assays that people have already applied in human and we can use that information to learn what is the minimal set of markers that we need to completely annotate a reference genome and scale up those assays for production across a lot of tissues in a lot of species. That's a great point. Thank you. And Sue, in the back. Yeah, I just wondered how much thought is going into some of these selection of the individuals that are being used as reference species because it sounds like in the plant world that's a big consideration for obvious reasons, potentially in agriculture. But if you're talking about wildlife, you may just be opportunistically selecting. And I think understanding the phenotype of the individual from a very heterogeneous species that you're selecting can be really important when you're referencing that as the prototype. Excellent. And then, John Lou. So I think we all think that comparative genomics is really fascinating, going to provide a lot of important questions and the issue has been raised about the samples for this and especially about high quality samples and the need to do annotation. The infrastructure would benefit greatly or has benefited greatly from the establishment of collections of high quality samples. I'm Oliver Ryder from the San Diego Zoo for 40 years. We've been banking fibroblast cell cultures and now these are being used for generating reference genomes and they're being used in the original genome 10k white paper. There's a phrase in there, there's a couple of sentences in there about we should be making induced pluripotent stem cells from these for the species that we're banking so that we can study their development in vitro so that we can have complete, potentially complete transcriptomes for them. And there's been no support for the infrastructure for these kinds of samples. I'll just echo what Harris's point on infrastructure. I think many years, quite a few years ago, NIFA was about to set that genomic tools and the reagents program and we collectively decided not to. Just imagine a small investment in that area and the impact is huge and all the sciences are conducted basically on the basis of the infrastructure. Otherwise, it's all the wrong sciences we are doing. It's really liberations of man powers in many other fields, including agricultural medicine and the entire sciences in general. So I would say National Science Foundation and the USC are to be really worked together to invest in that infrastructure making sure high quality investment in sequences and so on. Otherwise, all the comparatives, all the related downstream research is just low quality. Go ahead, just one more person. Well, maybe these last two. Last two, okay. Hi, I'm Eric Lyons from the University of Arizona. One point that Robin brought up was data release and one of the things I'd like to hear more about both from the panel, from this audience and the community at large is how are we doing on that front? Do we need to continue to do more to ensure the timely and accurate release of data, especially in view of the varying levels of quality that can be associated with this data? So for example, how quickly do we want things out in a very rough draft form versus how long do we want to wait in order to get further up that quality tier? And the last person, I think Ed, yeah. So it's really a follow up on what Carlos said came out of that squirrel project that it became an attraction for the VC folks. Do you see NHGRI say funding PhD MBAs at some point? That'd be a great topic for discussion. I'm happy to bring in some NHGRI folks who want to talk about that, so. Thank you, everybody. I'm sorry for those that we couldn't get to. I suspected this would actually be a problem with this audience, so we would have too many opinions. So thank you for at least validating that. Everybody, for working hard to get us back on schedule, we're gonna be back here right at one. If you've ordered a lunch, it should be, if you go around to sort of behind here, it should be waiting for you with your name on it. If not, go find lunch and come back. You're welcome to join us in the lunch room there for more discussion. I can't hear you, Lujeda. What?