 Okay, thank you for the chance to give you all an update on the NHGWAS catalog and some recent developments that we have implemented. We thought we would originally communicate some of these developments to you in a slide in Eric's Director's Report, but then we thought Council might be interested in a more detailed report. And before I start my presentation, I would really like to thank Helen Parkinson's group at the European Bioinformatics Institute, or the EBI, who have really done a lion's share of this work. I think we may have one or two team members joining us on videocast. So by now, most of not all of you are familiar with this view of the GWAS catalog data that shows the different genotype associations that have been identified from published GWAS studies. We update this type of diagram every quarter. Today I thought I would take you, let's see, not to the end of the presentation, that I would take you behind the scenes and show you a little bit about our curation workflow, as well as describe to you some of the recent developments related to developing an ontology for the GWAS catalog traits, as well as some improvements in our diagram. And then finally, I'd like to give you a preview of our new automated diagram. So to give you a sense of how information flows into and out of the catalog, we have a weekly literature search and we survey the NIH news clipping service for eligible studies. We then look at these eligible studies. Our curators pour over every single table, every single page of the paper to identify eligible associations. We have first level curation that extracts study level and association level information. We think it's important to do a second level check, both for quality and for consistency purposes, so we also implement that level of checking. Before we published the data, we had implemented a data flow process with colleagues at NCBI, whereby they provide some additional genomic annotation and some additional rudimentary QC. And then finally, the data are published to the Web. So as I showed you before, we also have a manual quarterly diagram, which is done painstakingly by hand by Teri Manolio and Darolea. So every quarter, when new associations are added, they work together to literally hand place each new dot on that diagram. So as part of our ongoing and future improvements, one thing that we're working on now is developing a framework to more consistently and in an expanded way collect standardized ethnicity information on the curation side of things. It's worth noting that everything you see here in red is by and large all manual. It's very time intensive, and of course there are some ways in which we hope that we'll be able to increase our efficiency of curation through informatics and other developments. And of course our collaborators at EBI are working on a number of developments related to curation, diagram, and ontology features. And the focus of today's talk is the diagram on ontology features. So from a historical perspective, many of you may be interested to know that before the catalog was a database, it was preceded by an Excel spreadsheet. And even before that, a word table. So this is sort of why you see the word, let's see, this isn't working. Table, yeah, okay, thanks, table up here. So we now have a web entry that was developed for us by Kent Clem of the NHDI web team. This is just the front page that the curators see if they want to start curation or enter a new study or check on the status of a published study and so forth. This is just a brief subset of the data we collect. Everything you see here in blanks is essentially a piece of information that the curator manually enters, maybe cut and paste might save a little time. But it's by and large, mostly manually entered. We did with the help of our collaborators from EBI add this new functionality where if the data are in a spreadsheet, they can be uploaded directly into the database, which has been helpful in terms of reducing errors as well as adding more SNPs per paper in a more time efficient way. We do monitor the most popular searches that are performed by our users. So here you see on the left the most popular terms that were searched for in the calendar year to date. And for those of you who are visual, this is a word all of the same information where the size of the term is proportional to the number of searches. And you can see here that the most common searches are those related to various cancers and diabetes and to a lesser extent some other common diseases. So this is for those of you that use the catalog, this is a snapshot of our search interface. A user can search in this blank box for a particular string that matches a trait in the catalog, or you can browse this relatively long list of traits and highlight the ones you're interested in, and then that will bring up the relevant results in the catalog. So Helen Parkinson, who was an expert in developing and adapting ontologies at the EBI, noticed that, of course, this is a relatively unstructured trait list. For example, you can see that a trait such as diabetes is, if you look on that long list of traits, is classified both under diabetes as well as the specific type of diabetes, perhaps with other traits in combination, if that's what the paper reported on. And she saw an opportunity to integrate these traits into an existing ontology and facilitate categorization of these traits to enable more systematic and more powerful searches. So the ontology that we're using here is called the Experimental Factor Ontology. It's EFO for short. And it reuses multiple resources to produce a controlled vocabulary of experimental variables, such as those related to an anatomical feature or a particular disease. So the ontology also allows relationships to be specified among the various terms, which generates a hierarchy that can then be used to expand a query, such as searching on all immune system disorders. And it will also allow for combinatorial searches, such as that one you see here. Being able to more broadly categorize traits means that we can go from about 200 manually defined traits, which is what you'll find on our current GWAS diagram, to on the order of 20 ontology defined traits. Which of course makes it easier to kind of peruse the diagram visually, as well as classify these traits. The new ontology can also be used to improve the generation of the GWAS catalog diagram, as I alluded to. And to date, the following features are available on a new diagram displayed. I will preview for you in a moment. First of all, the diagram is now completely automated. Every new study or result that is added can now be using an algorithm placed on the GWAS diagram. The displays I'll show you in a moment is interactive. You can zoom in and out and highlight certain traits of interest. We've consolidated traits into higher level categories. This means fewer colors and the ability to sort of discern this distribution of the various trait categories among the different chromosomes or similarities and differences among the groups if you're so inclined. Many of you are familiar with the PowerPoint progression that shows the number of GWAS hits that have accumulated throughout the various quarters. We now have that in web form in a dynamic time series display. And then if you have the Chrome or Safari internet browsers, there is the functionality to do interactive filtering on a trait. So let me show you the new diagram and the website at which it's hosted. We will also put a snapshot of this on our genome.gov catalog home page if you're familiar with going to that page. So I just want to say that this is a work in progress. And for that reason, I'm showing you screenshots. It will continue to be updated with your feedback in response to other things on our list. So first I'll just explain to you that this zoom bar here is going to allow you to zoom in and out of the diagram. So if we highlight a particular region here on chromosome six and click, you can zoom in and even farther. So you can see the resolution is quite nice. The EBI team put quite a bit of effort into choosing the right colors that would kind of highlight the differences among the groups. If you hover over a particular trait you can see here, it will bring up a hover over that tells you what that trait is. I should have also mentioned that if you click show legend, it will give you that color coded legend of about 20 traits that I showed you previously. So if you click on one tab to the right, this is the time series that I was showing you. On the bottom here, you see that there are different kind of dots. This is sort of a slider that will iterate through the different quarters. I'm showing you here a snapshot from June 2009. Click one tab over and you'll find the filtered diagram. So this is an example of a diagram that is filtered just on diabetes. We have 10 filtered diagrams available so far. These correspond to the most popular terms through 2011 and 2012. And the slider on the bottom iterates through these 10 different diagrams. If you click one tab to the right, that shows you where you can download PNG files, which are just image files, of the diagrams, the filtered views, and if you're so inclined, the ontology files that underlie the diagram. I don't have time to go into these, but if you click on help and about, there's also some additional documentation including where the data come from on the GWAS catalog home page. So as I alluded to, this is very much a work in progress. What we hope to work on in the near future includes making interactive links from the dots that you saw on the diagram directly to the GWAS catalog data, as well as to other genome browsers that may have additional data in those genomic regions. Improving filtering features is also high on the list. So in order to provide an autocomplete, which is where you can type in a few letters and then the system will fill in with additional traits that start with those letters, as well as synonyms, we also hope to deliver a filtering based on PubMed ID, as well as combinatorial queries of more than one trait. So because this process is now automated, it means that we hope to provide a more frequent updated diagram. So currently, it's quarterly, and we foresee going to a monthly and perhaps weekly diagram. Also high on the list is improved browser compatibility. So if you are an internet explorer or a Firefox user, the interactive filtering is a little bit more limited at the moment, but you can access the filter views. If you have Chrome or Safari, this functionality should be in place. And of course, we'd love to hear your suggestions. So I'd like to thank our GWAS catalog team, in particular, on the curation side of things. Peggy, Heather, Jackie, and Janella. Kent Clem does our database design. Darrell and Terry have been instrumental in providing the quarterly updates that you all know and love. And then I'd like to thank our colleagues at NCBI for their involvement in our data flow process and providing some additional genomic annotation. And then finally, for this particular talk, as I mentioned, most of the work has been done by Helen Parkinson, Tony Burdett, and Danny Walter. They've really done an outstanding job on the ontology and diagram improvements. And it's been a very fun and productive collaboration to date. So with that, I am happy to take any questions. Yes, Rick? Just a little, I think this is really good. It's valuable. Is it a huge job? It seems, I mean, I know making the visuals is a lot of work, but it's just something that once it's done, it's an engine and then you just add to it and everybody can use it. That's the idea behind automating it, yes, is that there's sort of an algorithm. And we wanted to make it look as familiar as it could, because I think people tend to recognize and use the quarterly diagrams that we have. So we wanted to make it as close to Daryl's diagram as we could. I mean, there's really no substitute for sort of his judgment and discernment and kind of visual acuity, but we're trying to get as close as we can. And then when you say it's funded by that grant, who's that grant to? I misunderstood who did that. I mean, I know all of you. So just to backtrack, the grant is two EBI. Okay, okay. So they're supplemented to do some of this work. So is that a, that's a U award? And so is that continuing? It's, I can't see, that's in the... It's currently funded as a supplement to that group. Okay. Yes, I think Jill and then Rex. So yeah, I think this is really nice and the visualization is great. I notice as long as they've got this interaction, it would be kind of nice to actually get the exact locus of these variants. When you roll over, you get what the SNP is associated to, but you don't get its location. Do you mean like 1P13.2? Yeah, yeah, something. I think it's precise. It's precise. Yeah, because they have it. Why not, why not, you know, I roll over. Why not see it? So one of the things we'd like to do is... At least have it as an option. Yeah, so as I alluded to, one of the things we're hoping to do is to have that hover over take you directly to the GWAS catalog record which does... Even better. But we can't include that in the hover over as well. Yeah, I mean if it looks like, because you're only seeing one label at once and so adding it is not... Right. It's not going to be visually... Yeah. And we want to just kind of reinforce that this diagram has data underlying it and we want to point take people to the actual data so that that stuff doesn't... No, I think that's a great idea, yeah. So yes, Rex? So now that you've got driven by an ontology, do you have the option of actually driving it by alternative ontology? So for example, could you go in and use Go Function or Go Process to drive that? I think that would be extremely powerful. That's a great question. Actually, EFO uses Go. So I know that ontology can be incorporated easily because I think we have a lot of emphasis on diseases. I think mesh is another one people mentioned frequently so we could explore that as well. But I think Go is already sort of incorporated here. Yeah, but it would be really great if you could actually use a Go term, you know. Enter in a Go term? Enter in a Go term. Yes, so one thing that will be made available is I know that when they have a chance to incorporate this they will include all of the synonyms which I believe includes all of the Go terms. Does that... So if for example, the GWAS catalog trade is called Coronary Heart Disease and the Go term is something else. That may not be the best example. It will also bring up those synonyms. Yeah, I guess I don't know, EFO long enough to know how granular it is. I mean, you might end up with a lot of duplication there if it's not granular enough, but it's all going the right direction. Okay, okay, we take your point. So we should definitely pay attention to Go specifically in terms of a way to be able to search on terms. I mean, we have like the ontology portal. It would be really cool if you could actually pull up any ontology and apply to this. That would be extremely powerful. Okay, thanks for that. Mike. I'll just second. This is a really useful tool and I'm really glad to see it going this way. The automation is great. Is anything happening on automation in terms of the original data gathering or is that still very much a manual process? We are exploring ways to make that better. I think we probably will be working on that in the near future. It's definitely one of those areas where, it's obvious that there's a need, but we ourselves don't have that expertise in house. If it does stay a manual process and it becomes overwhelming, I'm not sure that it is, but if it did become 10 to the minus fifth is not a very high bar or very low bar, depending on how you're looking at it. If it came to gosh, we really don't feel like we can do justice to this. If you went to 10 to the minus six, that would not be a terrible sin. Yeah, I think that's a good point. And in fact, I think there's always going to have to be some aspect of manual creation. I think that the data mining can perhaps present eligible associations for us, but I think on just having done this for a while, I think we do, at least for the foreseeable future, want to make sure that the associations meet our criteria and so forth, which you can't always automate. You do need a human to make sure that the association is in line with your criteria. That's a judgment call. Yeah, and if that is the case, which I think is perhaps absolutely right, I think it's more important to make sure you get all the significant results, genome-wide significant results from a paper rather than trying to get everything at 10 to the minus fifth. Because honestly, in this context, 10 to the minus fifth typically is not that interesting. And 10 to the minus fifth versus six versus seven, those are in principle orders of magnitude. And they won't be quite because some papers won't even talk about things at 10 to the minus fifth, but it could be a significant time saving. Good point, thanks. Yes, David. You mentioned the most common search term. Do you also have some sense of who are the biggest users of the GWAS catalog are? You know what, I probably could have pulled that up. We do have the user logs that show the IP addresses, so we could probably look into that. I don't know off the top of my head. I mean, I look at it frequently, but I'd be curious how much of the users are researchers versus physicians versus interesting public with something that they're struggling with because I think that also influences what types of features might be the most useful to build in in the future, right? If 90% of your users are researchers, then the kinds of things that you wanna easily link to are get right to the genome position or give me all the encode elements that are in a, you know, that sort of thing. Whereas if a lot of the users are looking at it for a different reason, an emphasis on the graphics, et cetera, is, anyway. But I think you have to know that. Yeah, that's a good point. Yes, Bob. I just ask you something quickly. In a conversation I had with somebody at NCBI a little while ago, they told me they couldn't capture the IP addresses and tell me who it was who was looking at material they put up on their website. Is that, was that true or not? You know, that's honestly not my area. I thought we might be able to, but I could have misspoken. I'll need to check on that. The issue is whether the government was allowed to capture that kind of information. I'm, unfortunately, I don't know the answer to that. I will look into it, though. Yeah, I was just saying, because unfortunately, government doesn't. But I think they have a judge standing behind them don't they, Rudy? If we can't, I obviously won't be able to look into that for you. Bob, it might be a level of precision. I mean, I know I've been given summary information about who's hitting various NHRI websites, so forth. I don't know the level of detail. I mean, we could try to find out. But I mean, so I don't know what they were saying they couldn't find. They said they could tell me if it was an EDU or an ORG or COM or something like that. Yeah, or maybe what country. Nothing more specific than that. Yeah, I don't think they could have. Maybe I misunderstood. No, that's probably right. I probably can't tell too specifically what country it's from and things like that. So I don't know the level of resolution we can find out. But maybe if we can't find out EDU, that would at least point us to some academic institutions that might give us a preliminary indication. I think Tony, and then Ross. How does this hook into it? Let's say you found a CNV in a gene and you're looking at a browser over that gene. How can you then get access to all the SNPs that have association data back through to your, because that's the kind of information you wanna get when you get an entry point into a gene. What other associations are there? So the specific example you gave of a CNV, we don't currently track in any comprehensive way in our catalog. We start with SNPs basically, so RS numbers, which you can browse directly or download directly from our catalog. The specific example you gave might be a better fit for another resource called FeeGene, phenotype-genotype integrator. Some of us from NHGRI collaborate with NCBI on that and that is actually a more comprehensive data portal that includes association data from the GWAS catalog and DBGAP linked to other resources such as CNVs, genes, and hopefully in the near future encode another data as well. Thank you. Yes, we can certainly add that to a future council meeting. We'd be happy to update on that. Yes, Russ? Yeah, this is really great. And when you showed us the interactive diagrams like you handed a toy to a kid on Christmas and all of us are sitting around here playing with it. And the, I wanted to, what I was hearing from people, all kinds of suggestions and I'd like to maybe generalize it, see now that you've got this interactive display, you've effectively have a really good genome browser for this kind of research. And if there was some way to set up a query page, so the query on any field in your GWAS catalog and then display them so we would have control over what p-value we want, what terms we want. And I think, well, it may not be that many more steps. I mean, I think getting this interactive display was really a major one and now this display is useful again. So it's great work and keep it up. Thanks, I'll pass on. It wasn't my work. It was largely the work of the API folks. They'll be extremely excited to hear that. Thanks. Any other questions? Thank you very much.