 So I'm hoping I can speak. This is a result of visiting grandchildren over Thanksgiving. So what I want to go over with you today is the Glenmore Resources that we are developing at NCBI. And introduce you to it in several different ways. We've had several planning meetings to discuss what kind of resource that we should build to facilitate open access to information about medically important variation and to have that resource be able to support tools to integrate into testing and other methods to interpret variation. OK, this is going to work well. So another way of looking at Glenmore is its infrastructure as far as the database is concerned. So we are treating this as an archive of information that is submitted about the relationship between genotype and phenotype. We are not doing the interpretation. We are representing what is submitted to us. And because of all of those submitted records, we can provide an interface to facilitate comparison of information from different sources, provide information about how much a particular report of clinical significance has been reviewed by different individuals, by expert panels, or has just come in from one source. And in so doing, this builds on the foundation of the standard databases that we've already discussed, the information that comes from the analysis of sequences. So it brings in information from gene standard reference sequences, the variation databases, and more. And again, as I was trying to indicate previously, it will provide a data service to be used by all. Another way of looking at ClinVar is indeed a record keeping method to keep track of when something was submitted and whether it's been updated so that the archive is present. And you can go back and look at versioned information. And so it keeps track of individual submissions and also allows aggregation of information from individual submissions if they are all representing as much as we can compute information about the same combination of phenotype and genotype. And to make this a little bit more explicit, the idea would be that we would have different submitters might submit information. Excuse me. That information might be slightly different as to the clinical interpretation. And so we would calculate the fact originally that there might be a conflict, but then if an expert group came in and reanalyzed this and came up with a current representation of what the clinical significance of a particular variant would be relative to a disorder that would also be made available. And the history of all of this evolution of understanding would be able to be tracked through the history of all of these successions. So let me just briefly go over the data elements that we're going to be capturing in ClinVar. Of course, a key element is the phenotype. And that phenotype may be of multiple classes. It may just be the diagnostic name, but it may be the clinical features that go with a particular diagnosis, depending upon what was tested and what was reported in a particular submission. And the relationships among those phenotypes can also be provided in the record. We're basing this right now on using the information system that the National Library of Medicine creates to integrate information from multiple vocabulary sets, UMLS. And so that group, in case you are not familiar with that, keeps track of multiple terminology sets. They're definitions, they're local identifiers. But it also assigns a unique concept identifier when it is thought that these different groups are all referring to the same concept. As part of the information that we're maintaining about the phenotype, if we get submissions that do not match in a way that we can calculate that is consistent with a concept ID in UMLS, then we will add those vocabularies to our system. And if UMLS does not incorporate commonly used ontologies such as the human phenotype ontology, then we're also bringing that into our system. And although I'm representing this as a report of what ClinVar does, ClinVar is integrally constructed with the genetic testing registry that is also under development and NCBI. And so these data that I'm talking about for phenotype are also going to be part of the public record from the genetic testing registry. Another data element is obviously information about the variation, so we will accept it however it is supplied to us, and then calculate as best as possible how that corresponds to standard reference sequences, whether those be the chromosome coordinates, the RefSeq gene coordinates, the LRG coordinates, so that we can allow a translation among all the different current and or historical terminologies that may have been used to describe a particular variant. If there are other database identifiers with that variant, those are also incorporated into the record, but if we get a submission for a variation or set of variants that is not currently in the public databases, then we transmit those to DBSNP and DBVAR so that they can be accessioned and can be made part of those databases as well. Obviously a key aspect of being able to make ClinVar useful resources to keep track of the evidence that underlies a particular interpretation of clinical significance, so we have worked with our collaborators to identify the key elements that we need to capture with respect to the study, what was sampled in the particular study, and all the different types of observations that may be critical for later interpretation, and so I've not enumerated all of these here. It's an ongoing process, but the idea is that one of the major functions of ClinVar is not only to be able to accession a particular assertion, but have all the evidence available as well so that can be freely accessed by others and be a resource for recomputation and reevaluation of the significance of a particular variant. So in representing the interpretation, we will recognize that there needs to be a data model for the mode of inheritance, and we'll, excuse me, so that will be part of record as well as information about whether there might be some concern about whether the variant is actually a variant or not, so whether these are regions of the genome in which there might be paralogs or other things that are contributing to a signal from us, an array-based method of assaying variation, and as I alluded to previously, the clinical significance will be reported with different levels of confidence so that it will be readily apparent how much we trust any particular assertion. These will be dated as well so that if nothing has been looked at for a while, that will also be readily apparent. There will be the usual elements that keep track of who submitted things when they submitted it, and when it was last touched, and all of these will be part of the database. So, ClinVar as a resource has been discussed internally in NCBI for a while. It is based on a lot of information that comes from the resources that are in NCBI, so certainly a key aspect of being able to report on variation is to have a good standard for knowing where the exons are, where the splice junctions are, so that sort of started when we worked on the CCDS collaboration. That evolved into generating standard genomic sequences, which we termed ref-seq gene, which is now part of the LRG collaboration, and so the idea here is to be able to provide all of the information that might be necessary to support a standard way of reporting variation against sequence that is independent of reassembly genome. Another thing that I'm putting on the timeline is that we started in 2008 to be able to accept submissions for information about variation in HTVS expressions. Those were tools that we would take either a single submission or a batch submission into DBSNP that correlated that HTVS expression with a publication, a phenotype, and an optional interpretation of that phenotype relative to the variation. We've recently been ramping up a lot of efforts in working with different testing groups to be able to capture information from those groups, and so we've tried to build information structures to capture information about rare variation that is thought to have clinical significance. Because of our work with supporting as the local house for the gene reviews and gene tests, we've also been mapping some of the tested variants to the genome in case those were not already so mapped, taking into account some of the historical ways that the variants have been referred to, and this again is a key aspect of our work on building the genetic testing registry so we know which variants are subject to genetic testing explicitly. Obviously a key motivation for accelerating the work on Glynvara is the understanding that there's the ability to capture data about human variation is rapidly increasing, and so having a tool to facilitate interpretation of variation in bulk is even more important than it was previously. So what's the current status of Glynvara? Well, as you can see, we don't have a tremendous number of records that we have processed, and so we're looking forward to having more groups submit the variations that they've identified. We do exist as a website in the boxed region, and I'm pointing out, in case you actually have ever noticed this website, is we recently added more documents from the community discussions that we've been having with genetic testing groups about some of the discussions of what is necessary and what is desirable for Glynvara. So we do have an email set up, so we're hoping that as a result of these discussions, we can engage you in contributing to the direction that we should be going in the emphases that we should be having. Glynvara is currently in a very silent production mode. As you may know, DBSNP has put together a tool called Variation Viewer, which brings in information and allows you to filter out only those variants that were submitted from local specific databases or through our computational analyses of the ulyllic variants in OMIM, and so the interpretations that are there are actually the reports that we have from the Glynvara infrastructure. We also have recently launched a tool to facilitate automated analysis of variation relative to the genome. It will take as input HTVS expressions or locations on the genome, and it's very similar to what I think Paul was talking about. It gives back information if you're representing a variant or just a location on the genome, whether that variant is known to our variation databases or what the minority frequencies are, whether there's clinical information known about that, and there's a function as well to download, excuse me, to download the full report, which also gives you the translation of what you submit into HTVS expressions in genomic, CDNA, and protein coordinate systems. What's coming soon is we hope more interaction among multiple groups so that we can provide information about variations that have been called from different data sources, and that we can also make these data available by FTP or by APIs. We have been trying to mock up what a full report might look like, corresponding to the types of data that I enumerated that we would be capturing, so there would be a quick overview section with links to more information that might be related to the disorder and or the variants that we're talking about. There would be a list of all the different ways that we know a variant might have been expressed historically and or in different sequence coordinate systems. So in this example, referring to the variant in Ref-16, LRG, OMIM, terminology, DBSNP, and other historical representations. It was very interesting going through this example because I was using the, as you probably notice, the susceptibility, the risk factor for age-related macular degeneration, and I discovered when I did this that the Ref-16 and the LRG that were created for a complement factor age was actually the cellular rather than the non-risk allele, so that was interesting. And there's information about morphinotypic data as a way to navigate to more detailed information. A displays for displaying the variant on the genome and in the genomic context, a full enumeration of all the observations that may have been generated, including the counts of cases and controls or whatever information may have been submitted. I should make a point that this is a mock-up. We do not have this level of detail yet in what we've been managing. And because of our integration with genetic testing registry, there may be information that we can also support about whether a test is available and if there is a decision to be rendered about the clinical utility of acting on a particular, this variant, if it's been inserved. So I just want to close with say that I'm talking about information that comes from many, many groups because a lot of work has gone in from all the different groups generating the variation data and stop squawking now. Thank you. Yeah, Brad, it was a murder from NHGRI. On your evidence slide, there was a bullet that said, review by expert panel. Could you clarify the level of active curation? So go on. So I was trying to represent that we consider as ClinVirus a substrate for panels to pull the data and then review them and then resubmit information that would say the results of their conclusions. And we would point to the whatever citations or whatever document came out of that review process. So I'm not saying that the ClinVirus staff is reviewing all of the submissions, we are just providing an infrastructure to show all the evidence and support any, just support the representation of whatever conclusions may come from expert panels. Then answer your question. Thank you. Chris O'Donnell, NHLBI. This is a really important resource and just as we heard earlier about the differences between RS numbers and other ways of characterizing SNPs, there's also no agreement on how to characterize phenotypes and clinical diseases and traits. So that's gonna be terribly important in being able to use this in any meaningful way. I'm wondering what efforts are being made to have some consensus developed about what's going to be used as the phenotype ontology. Or I think NCBI would be a great organization to help spearhead that effort. So we've certainly had discussions with office of rare disease research and other groups at NIH to try to determine what would be an appropriate strategy to consolidate all the information that's coming from all groups at NIH about how we're going to represent phenotype. And we all agree it's an excellent idea. But there is no one easy answer. And so I think all we can do is continue to provide the evidence for why it was called that way and I hope that that's going to work. We will be able to have a sightable object as a concept identifier, what we're talking about. And maybe that will help. So just thinking about potential solutions to that problem and solutions that have had false starts in the past, you mentioned SET as one of the historical sources for getting some of this data. And for those in the room who were involved in SET, we recall that there were lots of discussions about really trying to standardize the phenotype data that was submitted to a lab when a test was ordered with the idea that that could then go into these types of phenotype databases. Not clear to me to what extent that ever happened, but just thinking generally, it would seem that both the labs doing the tests to the extent that they need to get this data anyway may be a source of phenotype data. And building on what was discussed in the last section, if or when EHRs evolve to the point that they are actually phenotype databases, again, blurring the line between clinical and research, but that seems like an obvious place to go to get phenotype data as well. And if we're thinking about how we build the future, I would think that both the labs and the EHRs are excellent places to look for this information. Yes. Well, there is explicitly listed up that the reason we're using the UMLS is because it includes intellect, smell, and CT. And one change that we got, I have to add on to my question. So, oh, sorry, we did get a reason to come up to. So it was explicitly listed there, and it might have been hard to hear Donna say it, but we're using UMLS on purpose because it does have standard medical vocabularies, and it's such as they are, one of which being SNOMED CT, which is a standard for HL7 EMRs. We also got agreement from the SNOMED group that for what they called the genetic subset, we could redistribute those names and identifiers without a license fee, because a lot of the smaller laboratories doing genetic testing may not be able to purchase a license. And so we are aware of that, and we're attempting to sort of open up as much as possible the standard vocabularies and make them accessible, and then let them come from multiple sources. Donna, I have one last question. So on your evidence slide, there was a relatively long laundry list of things that we'd like to see provided with the submission. And we heard from Elaine that clinical labs are willing and interested in participating, but resources are scarce. I'm just curious about the process for the labs collating this information and uploading it. How is that gonna be relatively straightforward for them? Have you worked out some sort of submission form that they're in agreement with? So we have developed two submission forms. One is a spreadsheet form, and one is for the wonks of us, the XML submission form. And this has been developed more to make sure we have opportunities to capture the kinds of things that people might want to submit. But we have not tested this with groups yet to see how they would fill that out. So our sample submissions that we have gotten have been through the interfaces that we originally developed with DBSNP and worked out with, as part of the LRG collaboration, which was a simple spreadsheet of variant phenotype clinical assertion and publications and somatic and germline and a very small number of qualifiers.