 We have a couple of scientific reports or presentations to give you this afternoon. The first is an update on the Human Genome Reference Program. And I'm going to ask Xander Arguello, Program Director in Genomic Sciences to do the introduction here. Thank you, Rudy. As Eric mentioned earlier, the Human Genome Reference Program's goal is to generate a human-reference genome that is more inclusive and representative of the diversity of the world's populations, a PAN Genome. The awardees of this program have formed the Human PAN Genome Reference Consortium and here to provide an update on its activities and answer any of their questions on its progress are two members of its Scientific Advisory Board, Dr. Diana Church, who's currently an independent consultant and Dr. Martin Hearst, a professor at the University of British Columbia. Diana? Great. So thank you so much for the introduction, Xander. And just to point out, it does feel a little odd presenting work, it is not my own. Martin and I are on the Scientific Advisory Board and Martin is the chair of that board. But I will say I'm super excited to be able to present the work that this group is doing. It's been very exciting to watch this team come together and do things that I'm super impressed that they've been able to do in this short period of time. And I would also like to make sure the group feels free to stop and ask any questions at any point during the presentation if something is not clear. So probably you don't really have to explain to this group how fundamental the human reference assembly has been for science. It's really been transformation and foundational. And the current reference to date right now is part of what's called the GRCH38 series. I think we're on maybe patch 14 at this point. And despite this great value that this reference has brought to science, I would also make the argument, and I think many people have done this as well, is it's really woefully an adiprate for the task at hand to date. The current structure of the reference assembly is this single coordinate system of sequence, one for each chromosome and the mitochondria. And realistically, even when you look at the sequence as a whole, it doesn't even represent a valid haplotype within the population because it is a mix of different haplotypes from the back clones that were used to construct it, although we do have a very high quality valid haplotype assembly right now from the T2T consortium called the T2T chromosome 13 assembly. But I would argue that single sequence representation is still not able to really even faithfully represent an individual genome, much less the genomes of a global population. And we can show this with a somewhat trivial, but also transformational example that was described in the Jewett Al publication in 2008 from Chris Tyler Smith's lab. And this was to study this polymorphic, about 117 kb deletion polymorphism within the human population. And in early versions of the human reference, we actually had an error at this point in the reference assembly because the RP11 library, which comes from a single donor who is largely admix and makes up the bulk of the GRCH38 reference assembly, has a heterozygous representation at this locus. So one haplotype has the UGTB270 gene, the other haplotype does not. And at that point, trying to make a single consensus actually created an error because you cannot make a consensus sequence when you have these sort of insertion deletion polymorphisms. And this is just from one individual where we had this difficulty. We can see in this graph here from the Jewett Al paper that different populations across the world have different representations of this polymorphism. And so trying to come up with sequences that both better represent the diversity that we see across the globe, but also coming up with representations that allow us to better both represent allelic relationships, but also represent these sort of structural variations I think is super important. And we know that because of the bias of short read sequencing and a lot of studies today, we've had a lot of mapping biases and we've had lack the ability to see a lot of structural variations because of some of these inadequacies. And it's not that we were not aware of many of these issues in the early days of working on the human reference, but we really didn't have the technology available to allow us to address these. And I think that that has changed, which is really exciting for someone like me who spent a large part of my career working on a reference assembly. And so the human pan genome reference consortium came together and these are their primary goals, one of which I think is a really important one, which is how can we improve the representation of global genomic diversity? And the initial goal for this project is to create at least 350 diverse deployed reference assemblies. And a good part of this is prioritizing quality. The GRC H38 series of reference assemblies still has large numbers of gaps and missing sequences. And we've seen from work from the the T2T consortium that we really can make high quality representations of assemblies now. And we'll talk a little bit about that later in the presentation. Somebody want to add something? Okay, must have just been a blip. And in addition to developing these new sequences and resources, coming up with new data structures that allow us to represent sequence and variation within the same data structure, I think we'll make using all of these sequences much more facile for the community. And then, of course, outreach education and implementation are important parts of this project. So really one of the first things that the team had to do was set up protocols for really productionizing the sequence process. And so this really was a challenge because there really had not been a lot of productionizing of long read sequencing. And you can see from the slide that this has been a multi-center effort that has been focused both on the sequence production, as well as ensuring that we had high quality cell lines available that showed genome cell line stability. And we had these biological resources that we could use as part of developing these sequences. So one aspect of this is the development of really optimized and consistent long read hi-fi sequences from Pac-Bio. And so in this data we see, or in this slide, we see that this data has been developed primarily at three centers, the Genome Technology Access Center at the McDonnell Genome Institute at WashU, the Rockefeller Institute and the Genome Sciences University of Washington in Seattle. And we can see that these centers have consistently been able to produce both high accuracy and long read hi-fi reads pretty consistently across samples and across centers. Additionally, a small group from UC Santa Cruz, Oxford Nanopore and Circulomics have been working on trying to develop the Nanopore technology sequences. This has been super useful both from the viewpoint of being able to make these what are called these ultra long reads. So these are reads that are greater than 100 kb in length. And these have been these ultra long reads have really been critical for resolving some of the really hairiest parts of the genome. And while this slide really focuses on some of the work that the team has done to increase the throughput of these Nanopore sequences, they've also done significant work in terms of increasing the quality of these individual reads as well. And I think really importantly, one of the things that we've seen from this team is their ability to deliver on their promises. And so in this slide, what we see in the graph here, we have in black the proposed number of genomes per year. And then in blue, we see the delivered number of genomes per year. And so they have consistently delivered upon the promises of the number of genomes that they needed to do. These samples have largely focused on using samples from the 1000 Genomes cohort. And we're going to talk a little bit more about sample selection in a bit as well. But we can see in years four and five, not only are the number of genomes per year increasing, the team is going to start moving away from solely using the 1000 Genomes cohort into new recruitment. Now, I think one of the things that is super important about this project, it's great to have all these long read sequences and this diversity, but really being able to take this data and produce high quality deployed assemblies is really valuable. And this group has really assembled what I would consider a dream team on assembly algorithm development to try and really make high quality reference assembly production turnkey and relatively easy. And so I'm sure many of you saw the publications in science earlier this year, describing the work from the TTT consortium, which involved creating a high quality telomere to telomere representation of a single cell line, CHM 13. But this is an assembly from a single haplotype because the CHM 13 cell line is derived from a hydrogen form mole. So one genome was duplicated. So you have two copies of the same genome in that cell line. And what this team really did was show not only is it possible to produce a high quality assembly, they also developed not only the assembly methods, but also the QC methods and the evaluation methods to really, I think, lead the way for generating the standards that the field will be using in the future in terms of doing genome assembly. But I think one of the things also from that paper is that the amount of effort and curation that went into making that TTT assembly is not going to be practical for producing lots and lots of genomes in terms of a high quality production sort of approach. And another thing this team has done is they've really brought together this diverse group of people to test different approaches to doing genome assembly in a very automated way. And so they are establishing these best practices, these turnkey algorithms. And also, importantly, they're working on diploid assembly. So now they're really moving into cell lines where you have a maternal and a paternal genome represented. So you have that additional level of difficulty in terms of representing and resolving that diversity. And while the assembly is being produced from these approaches, they're not quite of the quality of the TTT. So you can see in gray here, some of these regions where you're still missing sequences because these are really hard to resolve. These are really very high quality assemblies where the teams are able to resolve the maternal and paternal assemblies and certainly create assemblies that are of higher quality than the GRCH38 reference that most of us use today. But I think one of the points that we were really trying to make earlier is that one genome is not enough. We really need to be able to bring together genomes from people with diverse ancestries and sequence diversity in order to really try and represent all of the genomes on the planet. And so this is one of the I would even say significant challenges this team really has taken on, which is, you know, how do we how do we define what is enough diversity to represent in a reference assembly? So they have a fairly practical definition at this point where they would like to represent variants that have greater than one percent minor allele frequency or what we would call all the common variants in the human population. Of course, this is also somewhat challenging because we still have an incomplete understanding of the full spectrum of human genetic diversity. So much of our understanding of diversity is biased by shortreads and single nucleotide variants. And this is, I think, going to be sort of an evolving discussion to have in terms of thinking about what is the best way and how much sequence do we really need to represent in this new reference assembly? So as I said earlier in the first three years, the project has really leaned pretty heavily on the thousand genomes cohort. And so what we see here are the different representations using the thousand genome superpopulation labels of genomes that have now been sequenced and added into the production line. So currently they've got up to these these hundred and fifty total lines from these varying from these varying samples. But I think what everyone would like to do is be able to move beyond the thousand genomes. We don't think that just sampling thousand genomes is going to get us to the level of diversity and sampling that we really need to. There are certainly holes within that thousand genome sampling. And so to date, the HPRC has been partnering with two biobanks, the Biomeed Biobank out of Mount Sinai that can take advantage of the rich diverseness and the sort of melting pot that is New York City, as well as a recruitment center that's been stood up in the Washington University area to recruit African-Americans from within that community. And so currently they have three models that are being proposed for trying to prioritize inclusion of new samples into the sequencing pipeline. And these three models, my current take on this is that, you know, there's not a decision made on which of these models to use or could they use a couple of these different models, but the idea is, you know, can you leverage some short read sequence data to just maximize adding in samples that are going to make sure we're representing common variant diversity as we currently understand it. We can also use that same sequencing and array data to look at PCA plots and try and maximize genetic divergence. And then lastly, you can think about targeting underrepresented populations based on self reporting and geographical data to try and bring in sequences that we might not have seen before. And the team is currently evaluating these models and trying to generate data and come up with a plan for really having a principled way of prioritizing new samples. And a key component of the sampling process and of the whole project, in fact, is an LC team that is really embedded within the whole project. So the key of this, I think there's a couple of keys in terms of this sort of LC component, right? We want to reach out to diverse populations and include them in this project. I think one of the, you know, one of the really valuable things from the human genome projects was the original human genome project was the international flavor of that project and trying to extend that even more. But ensuring that we are engaging these populations in an ethical and informed way, I think is really critical to the success of this project. The team have started trying to reach out to develop more international coordination on this. So this is a new website that's been put together for an international human pan genome in an effort to try and engage more international collaborators and populations. So just to finish, you know, just to finalize up on the sample and representation, the team will continue to leverage the reference 1,000 genomes populations, but they will also try and move away from that in terms of recruiting new samples. We haven't really talked a lot about the early dependencies on trios. So some of the early samples that were selected were selected because parental samples were also available and the team could use that parental data to help phase those assemblies and now methods and sequence data have developed such that that need to have that parental data to do the phasing has really diminished. And so they can really start moving towards single sample, which I think is really going to help the ability to include more diverse samples because the requirement to have trios is really limiting in terms of sample selection. And again, they want to establish new collaborations and international partnerships. And lastly, I just want to end with some of the stuff that I certainly find really exciting, which is I think the production of all of these high quality assemblies that sort of exist as independent assemblies that they, you know, team has been great. They have a very open model of data access. The sequences and the assemblies have been deposited at repositories. But I think to make these sequences and assemblies readily usable, we need novel data structures that allow us to put all of that sequence into a single data structure and maintain these allelic relationships, which I think will help improve both individual genome analysis as well as population genome analysis. And the team has a preprint that I believe is in some of the material you receive describing their first efforts at developing some of these data structures. This is, I would consider this super early days for these data structures. They're really proof of concepts, but they sort of have three approaches that they have looked at and described in this preprint that is under review right now. And I think believe in revision. And so this is just I'm just going to give a couple of little vignettes from what I think the power of these data structures allows us to do. So we'll start with using our, you know, classical example of complexity in the human genome, which is the HLA region. And so now within this structure, we have encoded four different haplotypes in the structure, and that includes the HLA as it's represented in GRCH 38. And then three other haplotypes that have been available in this data. And you can see when you de-convolve these into sort of their linear representation, we can see how these block changes lead to this complexity of understanding this, these different alleles. But all of that complexity can be encoded in this graph and accessible in a way that I think is going to be empowering for understanding these relationships. Additionally, what we can see is when we have the ability to take advantage of all of this sequence and not just be dependent upon a single linear reference, we can take more or get a better advantage of the data we have. And so these I apologize, I forgot to put the reference. This is from their bio archive paper. This is a couple of panels from their last figure. But when taking some chromatin mark data and chromatin confirmation data and using the pan genome versus just a linear reference, the ability to capture additional peaks in information increases through the use of the pan genome because you now have access to this additional sequence. Additionally, and I think this is personally, I get very excited about this, is that this pan genome approach also allows us to do better annotation at these heterozygous structural variant sites. And so these are all places within the genome where there is a heterozygous SV. And now we can annotate that SV and get more information. And this is, again, looking at just the H3K4 metal peaks. But, you know, traditionally right now, if we have a sample that has a large insertion relative to the reference, we just note that there's a large insertion there, 100 KB megabase. And we don't tend to analyze that insertion because it's really it's really challenging. It has no annotation because that's what we use the reference for in the first place. And this now more holistic approach lets us get more insight into those sequences, which I think is going to be really transformative for the way that we think about doing genome analysis. Now, of course, this project is not without some challenges. And so what I've tried to put here to some of the remaining challenges that I personally see for the project, you know, I'm sure they probably have a longer laundry list, but I think these are the big ones that I tend to worry about, which again, I think is the sample selection and resource development because you do want to make this project truly international. And we want to both, I think the idea of having these usable cell lines for further experimental work is hugely valuable. But really understanding how many sequences is enough to represent variation in a way that is inclusive, I think is going to be a continuing challenge. I think the the pan genome implementation will continue to be an evolving area that will need to understand better. So currently, you know, they've been using these graph representations. There could be other representations. But right now they have three representations that I'll have different sort of pluses and minuses, and this is sort of the first iteration of that. And so I think understanding, like, do we need different representations for different applications? Or, you know, can we really just put everything into one is going to be an area of very exciting future work that will need to be addressed before we can really see the full value of this. And then lastly, you know, I think that adoption of the pan genome by the greater science community is going to remain a challenge. I mean, we live in a world where I still read papers where people are still using GRCH 37, which is a very old assembly. It's not even the current one. So how do we encourage adoption of this new resource, which is, you know, it's going to be super powerful and exciting, but also clearly come with some additional complexity that we're going to have to help the community address in order to adopt this. And so I did want to make sure to put up the acknowledgment slide before we go into discussion. Because as I said, this is none of this is work that I have done. This is all work done by this great team of people that have really impressed me by their ability to get the work done that they have in the short period of time. And so with that, I'm happy to stop talking and take any questions. Thank you, Deanna. Questions from the council, Peter. Thanks, Deanna. That was a great talk. I think we're nine years after the introduction of HD 38. And but it actually has the same data structure as HD 19. It just has more alternates, right? So and that that was it's impossible from any clinicians to to adopt. So I'm wondering what strategy or have has this group begun to explore strategies to essentially implement a new generation of databases, a new generation of varying calling software to actually take advantage of the pendulum? Because other than that, it will it will be very difficult for the rest of us to use. Yes, that's a that's a great question. And there are there have been numerous efforts in this. So I would I'll start with the fact that they have been working very closely with the INSDC databases in terms of developing data structures and methods for ensuring that these data data can be represented in these archives, which is one start. There are different tools that have been developed for short read alignment, for example, for annotation. So, you know, the last couple of slides I showed were based on being able to align short read data from some of these chromatin experiments to try and do that annotation on the pan genome to then compare to the to the reference. And I would say one of the other what I think is smart things that they have done is in most of their graphs, they use either GRCH 38 or the T to T CHM 13 assembly. And they've been thinking about methods for being able to kind of map back and forth. So are there ways you could think about doing your analysis in the pan genome and then translating that back to to the to the sort of flatter structures that we're more used to a sort of a bridge to being able to use the full data structure. So they've definitely started thinking about those those things. I think it's super early. Like, I think there's definitely more work that needs to be done on that. But that is definitely something that they've been addressing. If the car. Thanks for that very nice overview. You mentioned five super populations and one of them was ad mixed Americans. How many of those samples will not be ad mixed? In other words, truly indigenous American genomes. You mean indigenous, like Native American genomes? Yes. So to date, none. So if somebody else, I don't know if Karen or Amar are on the call. I don't believe any of those right now are Native American genomes. I know that the team has been reaching out to more indigenous and underrepresented populations to work with those groups in order to try and help get some of those genomes represented in this pan genome. But as you can imagine, those conversations need to happen in a thoughtful way that are that that make those those communities comfortable with data structures. And, you know, and this is a global challenge for many of these underrepresented population. So I believe those discussions are ongoing, but I don't know. I don't believe any of them have been resolved to date. Xander may actually know more about that than I do. Maybe maybe I can just add that, you know, that's my understanding as well and that these conversations are conversations that are going to occur over over many years. I can say within the Canadian jurisdiction, indeed, these conversations are going down a road of separate database structures where indigenous communities would have control over their genetic information in a way that may or may not allow it to be integrated within these kinds of efforts. So I think these are long term conversations and not ones that can be adequately addressed in the context of the current each PRC work. So but these are important conversations that need to need to continue into the future. And the reason I ask is, you know, as we are trying to generate polygenic risk scores for, you know, different groups, those kind of data would be very helpful to have those reference samples that, you know, we can construct a reliable scores for Latinos and other admixt individuals. 100 percent. And I mean, I know that from talking with the team, I think they're very passionate about ensuring that these diverse communities are represented in the reference data structure. It's just a matter of what, as Martin said, those conversations are take a lot of time because you want to make sure that those communities are comfortable. So and and feel empowered to as part of the project, not just being used by the project. And I think the team has been pretty thoughtful about trying to engage those populations. But I think it's it's hard to say what that time frame is going to be for more inclusion of those populations. Other questions, if not, then Deanna. Oh, hold on. Oh, Howard Chang, go ahead, please. Hi, Deanna, that was a very nice presentation. Thank you for sharing the progress. It seems that data visualization is going to be a really important challenge because a lot of the way we're used to looking at genomic data is it's all based on this one D track and we're all so used to looking at that. So as soon as it's basically the multiple versions and you don't know where to look, then it's like very like we've almost like it's like we have to learn a whole new language. Is this also like, I guess, a part of the division going forward, they're really kind of thinking about how to sort of be able to visualise and make use of this new data and important resource that'll be coming down the pike? Yeah, I think that so you've said a couple of things there that I think are important that are worth going into a little bit. And one of which is this idea of versioning. And while they will have, there is already discussions around versions of what a pan genome might look like. I think one of the really interesting aspects of a pan genome representation is this sort of, I'll call, I guess I'll call it the lift over challenge of like going from 37 to 38 in some ways gets minimised because your coordinates system, while harder to understand is more stable because you can just continually add things into the graph. So there are advantages to this in terms of adopting new data. The team, I think, has been thinking about the visualisation process and how we would do that. And I actually, the point I make to them realistically is like, you have to have things like bed tools. So you have to be able to like, how do you do bed tools? How do you relate this to things like cytogenetic bands? Like we've talked about all of these types of annotations that we think need to be on and visualisation need to be on the pan genome structures to make them useful to all people who might want to use the reference. But I would say it's still pretty early days on that because I actually think the fact that they got to a proof of concept pan genome and three pan genome representations within the first three years was pretty significant. So I think there hasn't been at, it's definitely on the radar in terms of the representation and visualisation. These have not been as much work done on it because they first had to get to the point of actually having pan genomes. But it's definitely something we talk about a significant amount. Yeah, and I can just add to that. I mean, I think adoption is really going to be the critical next, you know, phase of this project. I mean, I can say, speaking from the International Human Epigenome Consortium, we just finished realigning all the resources to HGR 38. So that annotation is really critical to enable adoption of these builds. And so thinking about ways to enable that moving forward, do we go back and realign all existing data to the new graph or are there other strategies that could be taken up? I think if this is not thought through carefully, the value of the pan genome reference is going to be greatly diminished and it's going to take many decades for the research community to catch up. So I think this is something that needs to be thought through very carefully, identify key early adopters and build tools and resources, visualisation, and otherwise that can help support those communities. Peter, go ahead. So disease and rare disease is part of who we are as a species. And if you take the estimated frequency of rare disease, it's somewhere between 2% and 8%. So if you have 360 people, probably 10 of them have Mendelian rare disease. On the other hand, one of the purposes of a reference genome is a standard against which to make a diagnosis of rare disease. So I'm wondering if you've given thought to how to include a medically relevant variation in the pan genome reference. That's a great question. And I actually have been thinking, so I don't know how much the team has been thinking about that in terms of actually having those variants represented in the data structure. But I think thinking about the right way to do that is critical because Peter and I have thought a lot about the rare disease community. But I actually, like thinking about how you annotate those alleles that you think are contributing to disease in a structure like that becomes super important, I think for us to move into more common disease. And maybe things that are less rare in terms of understanding how different alleles contribute to various human phenotypes. So I think that's going to be a really important question, I think, for the team to find the right people to work with addressing that. Because I think one of the challenges of the, I think one of the challenges this team is going to face, the work that they have done in terms of the productionizing of the sequence and the assemblies and even these representations is a big task. And I think, certainly from my days in the GRC, I think one of the things that I found most challenging was the breadth of people who want to use the reference. And every single one of them wanted something different from the reference. If I had a dime for every time I was at a conference and somebody told me what the reference should really look like, I would be pretty well off. And it's because different people want different things. And the reference is so fundamental for a lot of things that go on. I mean, I just, and I think rare disease is super important, but we're, and it will require a lot of thought in terms of how we integrate the structure with that community. But we're also in a world where we can now write genomes and we're in a world where we can, we want to think about polygenic risk scores just came up, more common disease came up. So I think one of the challenges the team is going to have is finding those right partners to start partnering with to really understand what their needs are to start doing some of these proof of concepts. Because I think most of their brain space has largely been taken up right now. What's just, can we get to this point of being able to do these assemblies in a pretty productionized fashion and can we come up with a way of representing them? The application space for this is going to be huge and them finding the right partners is going to be key for realizing that. Okay, I've got Judy and then Tim. You mentioned admixture early on. So I'm just curious, when you think about the data platforms and how you share that, how are you thinking about incorporating admixture of the populations, challenges and opportunities? And the application that I'm thinking about particularly is if you think about cell-specific enhancers and so forth and how you think about cellular polygenic risk scores. So just thoughts about admixture. Yeah, I think, I mean, admixture is who we are effectively for the most part. And I think the pan genome structures, and we've had some brief conversations about this, but I think with the pan genome structure in terms of thinking about representing haplotypes. So if you break it down into sort of haplotypes as opposed to thinking about whole genomes, that may be a pathway for better dealing with some of that admixture. Because I'm pretty confident that many of the samples they're using to date are admixt anyway, like most, yeah, right? And so, and it really gets hard to break down into thinking about a complete like chromosome level haplotype structure. But if you can break it down into smaller blocks, I think you could think about traversing the pan genome to reconstruct those individual genomes by sort of stitching together those different haplotype blocks. But again, it's such early days for us to think about the best way to manage some of these that like, these are all great questions that I think the team would love to have somebody to work on to think about how we best take a tennis piece. Tim, go ahead. Thank you. And thank you for this wonderful presentation. Going back to some of the issues on adoption, as I've been thinking about this, I sort of see there's a long-term eventuality of we would love to have a universal coordinate system that we can all agree on. And I imagine that's a very long-distance goal. But I'm thinking about some of the intermediate steps we can take to increase adoption in the immediate term. And I think I'd like to encourage you to think about some of the functional genomic data sets that the communities invested a lot of time in. The ENCODE project, IGVF, and so on and so forth. Maybe bringing those into some of these pan-genome references, mapping them over as a sort of a first step might be a way to create a lot of early adoption even before we figure out the complexities of, frankly, where we are in the genome. Yeah, I think that's a great suggestion. And I am fairly sure that the team have been talking to folks like, say, some of the IGVF folks. I think where we really run into some challenges are to even get back to Peter's comment about rare disease. Right? I think one of the most valuable data sets that you would want to put on the pan-genome would be Nomad or TopMed. And I actually think there are both challenges in terms of going to get that data, both just from the data access, because there's so many different data access issues over some of that data. But also just the money that it would take to manage the compute. Like, I think there has to be some discussion around how that gets funded, how that gets managed. I definitely, I mean, I shouldn't say definitely because I don't know. I don't think it's part of the remit of this grant as it exists today. But I think starting to get creative to think about how we can start doing some of those, I think you're absolutely right. Like the whole reason that it took so many people a long time to try and move to GRCH 38 is in code wasn't on it. Nomad wasn't on it. TopMag wasn't on it, right? Like, so yes, you're absolutely right. Thinking about how we get those data set there is critical to adoption, but that's actually going to require funding and dedicated resources and everything else that I think will be, at least I'm not sure what the solution to those things are. Yeah, yeah, no, I very much agree. It's an expensive prospect yet an important one. Yeah, very much so. Yeah, I would echo those comments. Sorry, just to, you know, just in the IHEC consortium perspective, which includes ENCODE, of course, ENCODE is a special case where the data access agreements allow for sharing of the read level data, but for the vast majority of data sets, that's not the case. And it took a consortium over two years to navigate the data access committee agreements even within their own consortia to be able to get access to the read level data. So it is a significant effort. It's an important one, and I think we need to meet this challenge moving forward. Yeah, just from a standpoint of a member of the TopMed Executive Committee, I would suggest that you start discussions with the TopMed group early and start thinking about how one could integrate your efforts with what TopMed is doing. After all, 200,000 plus whole genomes have been generated. There's 82 studies. There are issues related to consents and various other things, but in many ways, I think it can be worked out, at least through some very frank and open discussions. So I would suggest starting soon. Other questions or comments from Council? If not, Deanna and Martin, thank you very much for that presentation. We appreciate you spending some time with us today. Thank you very much for your time. Pleasure.