 Okay, we have a couple of working group reports to present to the council. There are four working groups of the council, and one of the requirements of them is to give a report to the full council at least once a year. And two, you're going to hear from two of them today. The first will be the data science working group. And Trey Eidecker is member of that working group and council member, and he's going to give the report. Trey? Great. Thanks, Rudy. So it's my pleasure as a member of that working group to represent that working group and its activities for the past year and a half since it was formed. So first of all, I don't think it'll become or act as a surprise to anyone in this room or in the audience here that data sciences are integral to genomics. And I think what's maybe more surprising to me at least as you think about this is exactly how many points of intersection there are. So I think what helps this argument is, of course, that the human genome itself is perhaps the most fundamental form of biological information slash data. And we, of course, have a constellation of databases hosted by an HGRI to host that reference, as well as, of course, all the common and rare variants that have been found there as we look at populations of genomes. And then insofar as we're interested in linking those to disease, of course that invokes a constellation of clinical databases. And I'd also point out that when you talk about, you know, G as the genotype and P as the phenotype, we also consider that we have these exquisitely complex arrays of databases to hold molecular phenotypes. And so that's all of the other oves in my mind, of course, more than that, but certainly when you're talking about epigenome, transcriptome, proteome, metabolome, and so on, you're talking about those molecular phenotypes and all of their data repositories. But to me perhaps the most remarkable point to make about data sciences as being interrelative genomics is if you think about this classical, quintessential one would say effort of genetics over the years to link G to P, everything in that box to our first order is data sciences. Whether you're talking about the statistical models of gene association, whether you're talking about calculation of genetic risk scores, whether you're talking about the functional knowledge bases for annotating the gene functions you find in the genome as a way to translating those to phenotypes, much if not all of that is really a core data sciences activity. And then undergirding all of this is of course issues of fair and equitable data access while maintaining patient privacy, orthogonal or complementary issues of data visualization and so on. So data sciences are important and acknowledging that importance, the NHGRI about a year and a half, almost two years ago now, instantiated what is I guess the newest of these council subcommittees called the data sciences working group or the genomic data science working group. We meet just about every two months by WebEx since spring of 2017 and generally, let me first just talk about the general functions of that working group as we've seen it. So generally we meet on the sort of off cycle of council and so certainly when council is not in session issues related to data sciences arise and it's our purview to provide advice on those issues on that ad hoc basis. But I think far beyond filling holes in time is the second bullet point that we're really facilitating a deeper engagement beyond the day and a half of these council meetings of NHGRI with outside expertise on data sciences and in that respect we interface nicely to an internal NHGRI staff set of folks led by Valentina Francesco, which is called the data sciences focus group. So that's the internal body and we're the external advisory committee is one way to think about it. Now of course data sciences are critical to NHGRI but of course they're also critical to many if not all of the other NIH institutes and so we try to provide advice when necessary on how particular issues of NHGRI are complementary or synergistic with issues of other institutes. And in thinking about all of this of course our mandate is pretty broad and we sort of have to find three different broad areas where we can provide advice. One is data management and storage. Two is data usage analysis visualization and so on. And then three is data policy. And these issues of data management usage and policy have different considerations whether you're talking about basic science or clinical practice, aka genomic medicine. Who are we? Who is the working group at present? So we are currently nine members of outside experts. Some of us, Trey and Mark, are current members of council currently seated around the table. There are also former members of council, some previous ad hoc and ad hoc members. It's been a particular pleasure to work with Nancy Cox over the past few meetings worth of the working group. And also on those Webex calls are NHGRI representatives and I already mentioned Valentina and of course Eric, Carolyn and Allison. So all of us are convening every two months on those Webex calls. Okay, so what are some specific topics that we covered over the past just about two years, year and a half? So the first issue we tackled was to deal with recommendations and maybe hone down some recommendations from the last NHGRI Informatics Workshop. So that last Informatics Workshop was now two years ago in September 2016, although we started discussing it just a few months after that. And one of the sort of pervasive suggestions we got from a number of outside opinions at that workshop was that with relation to algorithm and methods development we needed to support more of it here and it should be investigated or initiated because you wanted to let many flowers bloom as opposed to being more prescriptive about things. And so what, after some discussion, what that led NHGRI to do is as you heard from Eric's talk this morning, release three different program announcements, which were investigator initiated announcements, if you understand the PAR designation. And those are the R21 mechanism, there's the R01 mechanism and the SBIR mechanism, all PARs that really started with that Informatics Workshop filtered through our work group and then were announced by Eric this morning. The second point that we dealt with was the formation and eventual award, a creation of NHGRI's Genomic Cloud Platform, they call ANVL, which stands for Analysis, Visualization and Informatics Lab Space. There's been a lot of discussion about the cloud and the reasons for putting genomes and other data in the cloud. One point that is often made, things always boil down to just the number of bits, so if you think about the number of bits you need to represent the genomic data, that's largely far in excess of the number of bits you need to represent tools. And so the idea, or at least one of the main reasons to have a cloud is that you ship the bits representing your tools to the much larger buckets of bits representing your data as opposed to what most people do nowadays, which is you download massive data to your laptop and run your bespoke tools on those data. And so that's just gotten started, and we'll see how that develops over the next year. Moving right along, we also had a lot of discussion about the importance of model organism databases in human genome interpretation, owing primarily to the huge benefit of evolution in sort of separating and distinguishing gene function, NHGRI has long supported the mods, so-called mods, the model organism databases. And one challenge that we helped an NHGRI deal with was this challenge that as the mods had evolved, there's of course a lot of commonality between these various databases, but they were independent awards. And so there was a desire to sort of seek more synergy as those different databases developed in the future, and that led to creation of this additional mechanism or award, I should say, called the Alliance of Genome Resources, or the AGR. And this takes the specific activities that have been identified as most common to the mods, to mouse genome database, saccharomyces genome database, so on and so forth, and puts them under one roof, so to speak. You heard from Eric already this morning about some activities related to the genomic data sharing policy implementation, and we certainly advised on that. We also provided advice in this third bullet here on the Trends NIH Strategic Plan for Data Science, led by John Lorsch from NIGMS. So one of the interesting tenets of this NIH, or Trends NIH Strategic Plan, is it sort of bends data sciences into three categories, speaking broadly. They define databases, they define knowledge bases, and they define analysis tools. And when you think about how those three categories intersect NHGRI's programs, in many cases it's very clear what's a knowledge base, and what's the database, and what's the tool, and in some cases it's gray. And so we've, I think, been of some utility to NHGRI in advising in specific cases how to bend these things and how to think about them. Vs IV, this Trends NIH Strategic Plan for Data Science. Okay, so that's a report on what we largely did for the past year, year and a half. Let me now turn to sort of the next year to 18 months of activities as we see them. Now, of course, we'd love to hear feedback on what you think we should be thinking about. The first cluster of topics that we've planned to discuss on our calls coming to a WebEx near you soon are all related to helping NHGRI plan in this strategic planning process called Vision 2020, you've heard a lot about already. The first issue really relates to what is data sciences mean to NHGRI? Can we help them better define that, especially Vs IV, these different interests running across NIH institutes? What is data science in our space here and what is it not? And what are the areas of overlap where this institute can partner with other institutes in supporting data science? Those would be the three large categories to place data sciences aspects in. And the hope here is by the end of this year, if not sooner, our committee would have had some recommendations emerge for what is the kernel of data sciences mean to NHGRI, what is the part that we here own the most exclusively and then what parts are certainly core interests but can be done in partnership with other institutes. That would be very useful. The other two cross-cutting issues on the slide, bullet points two and three, are actually not within or are not between NIH institutes there within NHGRI itself. And so the first of those, this second bullet point here relates to a fact that may or may not be known to some of you, which is that if you look at the way in NHGRI supports informatics in the institute currently, there are different programs, different streams, they're called. And there is a standalone informatics program, funding stream that does support a lot of informatics like the mods like Anvil, I believe. But there's also a lot of informatics support and support for data sciences coming out of other funding streams and programs, certainly ENCODE, Centers for Common Disease Genomics, many other centers and programs have their own columns of data sciences, data tools they're developing in some cases, databases they're developing, and those are currently separate. Now there's lots of reasons to think or to argue that might be the best, the most optimal solution for supporting informatics. You can also think that that might not be the most optimal solution for supporting data sciences, and that's what we might discuss in the coming 12 months. And the final point here about the cross-cutting nature of data sciences is between the council working groups themselves, so where the data sciences working group, they're going to hear from a few other, at least one other working group just after me, I believe. And there is also, of course, the genomic medicine working group, so if you think about what typically would you assign in your mind to data sciences, well, that would be things like data storage and analysis, which, you know, the usual serial train of events follows things like data generation. But I think many of us have long appreciated the dangers of thinking about things, you know, sequentially like that, from generation of data to its storage and analysis, and the best point, of course, is this famous quote, I think that's not surprising to many of you. Made 80 years ago, now, by Ronald Fisher, to consult the statistician after an experiment is finished is often merely to ask him to conduct a post-mortem examination. He can perhaps say what the experiment died of. So we need to make sure that our working group moving forward is not simply downstream of the recommendations of other working groups in terms of what data are being recommended, but that there's some interaction. And so that will probably result in a joint WebEx or some interaction between working groups, at least that is our proposal. Final points and particular areas of focus where we think we can make an impact in the next year, and again, we'd love to hear your thoughts. So I think it's not lost upon any of us that outside of genomics, but in data sciences at large, at large, there has been a tremendous impact of artificial intelligence and deep learning in particular in many, many, many different domains, not least of which are things like game playing, the Go Champion has been beat, vanquished, the self-driving cars are on the horizon. If you type cat into Google Images, you get 1,000 cats. And that's impressive, it turns out. So to what degree can we leverage these really huge strides in machine learning for the G2P problem and actually many aspects of genome sciences? That's happening, I don't want to suggest it's not, but we think there are and maybe lots of opportunities from a programmatic point of view for encouraging developments at that intersection. There's been a lot of interest in discussing ways we can further foster education and training opportunities. Again, speaking of machine learning, many of our best data scientists in the US are currently not going to academic labs working on genome research. They're going to, yes, you guessed it, Google, IBM, Amazon, so on and so forth. Is that the way it should be or is there a better way and is there something we can do to really harness that power and those particular sets of expertise in our discipline here? And then finally, a last point that we've seen on the horizon here is we probably would like to recommend another data science workshop since that one in fall of 2016. But given we just had that one in 2016, we thought that there might be more specific focus areas related to data sciences that one might invest effort in. And here are just a couple of suggestions, but these really are very ideas to kind of just state the intention. One might contemplate a workshop in visualization methods. It could be even a focused number of individuals brought together for that purpose. Community engagement in terms of data sharing and privacy, enabling specific groups with data and tools would be another interesting workshop. But we're open to questions and suggestions. So with that, I will stop and turn the floor back over. Thank you. OK, thank you, Trey. Questions? I'll start off. I just want to make a couple of comments. First of all, go back a couple of slides to the quote by Fisher. If you had a beer, and I swear that was a picture of you. I am so honored, Eric. I just wanted to let you know. I can't possibly live that up or down, but thank you. Thank you. That's circular glasses. That's all you would need. I just said Eric's glasses. Oh, yeah, that's right. I agree. So you can take this off now, and I'll not embarrass you. But I want to make two comments. One clarification, I just wanted to point out, you referred to this internal group, this focus group. I just wanted to emphasize that that's affiliated with the strategic planning process. It's not a long-term internal group, but it's important because it ties into what you were talking about. In the last few bullet points. Exactly, is that there is this apparatus that we have of an internal group that's part of the strategic planning process to carry along the data science programmatic discussions and would be the one to help organize that workshop and so forth. But the other point I wanted to make, and it was one of your bolts, you don't have to find the slide. But I really want to emphasize, as you pointed out, how this working group is helping me and helping the Institute sort of find our place within the NIH community. And I really want to emphasize that the entire, and Council's heard me talk about this multiple times, so I just want to emphasize the point here, that the entire NIH data science ecosystem is in a state of flux. I don't mean that to sound negative. It's because it's for all these reasons that there's a lot of catch up, but it's just there's so much bubbling action going on around data science across the NIH. And so lots of things are happening, including at the leadership level in terms and also in terms of initiatives and Common Fund and all sorts of things, even in artificial intelligence, where in Director's Report I talked about this major workshop that the NIH director just hosted and how he's setting up a new working group of his advisory group on artificial intelligence. So there's a lot going on. And of course, our institute is relevant, is not more relevant. I mean, we're right on the crosshairs of this because of genomic data. And many of us, myself, multiple people at the institute get called into trans-NIH committees to help move this along. And so it's been a very helpful working group for me to just get input to help us, just because a lot of times we're being asked to give input so as to have this group go to every couple of months and tell them the latest developments and get their input is extremely valuable. And part of it is we are gonna have to define, I actually think NIH is gonna get much more situated in data science and in all the different areas we're talking about. And then I think it will be important for NHGRI to define what we need to do because we can't do everything and much more is needed than what we could provide. But I think this is gonna be a very rapidly evolving area as NIH gets situated and then helping us define what we should be doing across that landscape. So I just wanted to really emphasize that point. I think, I don't know if it's for your committee or for the workshop, but I think it would be beneficial to have a glossary of data science terms from informatics, bioinformatics, computational biology, AI, machine learning. Because I think there's a lot of confusion or they're used sometimes as synonyms meaning different things. Well, and I've tried very hard not to define data science as part of my talk. That's the goal for the next 12 months. But great idea. Jeff. Yeah, thanks, Triatt. I think there's a lot of excitement about the AI concepts and I know a lot of activities across the NIH. I don't know whether folks are tuning in yet to the workforce issues there. I think across society, a lot of discussion about who's gonna be losing jobs or what's gonna be happening with self-driving cars and such. But I think there's probably ought to be on somebody's plate at the NIH to really be thinking about how it's gonna impact how science is conducted in terms of the workforce issues. Triatt, there's one quick question. In your last, I think it was your last slide or close to the last slide, you talked about your concept of, there it is, visualization, you sort of brushed that and it wasn't clear what you had in mind relative to the exploration that area. Yeah, and maybe Valentina can comment here as well, but this came out of, among other places, the workshop that just recently happened in January, which was, I'm gonna butcher the name of that workshop, the variant interpretation workshop, am I getting that wrong? Sorry? Genome to phenotype. So some of us just attended a January workshop here where surprisingly, there was a huge discussion that happened organically around data visualization as a very separate, but almost equally important issue to data analysis. And so I think we'll try to debrief from that set of recommendations and that's gonna get written up as well. Yeah, anything else to add? It really, the idea came from this workshop that we had like three weeks ago. So we are, we're going to explore what, if anything, we need to do to focus discussions on this particular topic. So happy to update you when we're ready. Aviv. On kind of the two bullet points, both leveraging advances and education and training, I want to put in a pitch for jamborees for events that are relatively low cost but where you bring people together, you give them a data set and a set of challenge problems and you ask them to start solving them together over a period of two to three days and then they can go home and continue and so on. We've seen that both in institutional settings and in the HCA being a huge energizing force and it draws people into these problems and it lets them then sustain their relationships and it also helps you understand what the good problems are to really work on as a community right now. And the generation of the data sets is something that NGRI can really energize its base of grantees who usually would be quite interested in engaging with all these AI slash machine learning folks because sometimes they have a hard time getting kind of to the next level in the analytics. So you're suggesting that some of the data already being generated under some program could be nominated for such a jamboree basically? Yeah, and the program might not be generating precisely those, but it would be quite willing to do that. I'm thinking, you know, ENCODE and SEXSES and so on where you have some room for maneuver, not necessarily humongous room for maneuver, but some where you have a big engine for data production. So it's a relatively small investment is one way in which this can be done. Yeah, these are really low-cost investments and they give very big benefits. We've had discussions about that with other aspects of an age. So I think that's a worthwhile thing to do. And I have zero vested interest in it. It's not something to get a grant from or something like that, it's just an observation. Okay, thank you very much, Trey. Let's do one more. There's still like we have a couple of hours to go.