 Good afternoon, everyone, and welcome to the workshop entitled future direction of the energy analysis, visualization and informatics lab space, also known as anvil. My name is Valentina different Chesco and together with the, with Dr. Ken Wally, I'm one of the colleagues for the anvil program. And, and I have a brief presentation, I will spend a few words describing the anvil and the goals for which energy or I decided to launch this program. We'll go over the workshop goals and the agenda. Some information about the discussions that will happen during the breakout rooms. Spend a few words about the workshop participants and then we'll conclude this presentation with additional information and guidance to the participants. So what is the anvil. It was established by energy or I in the fall of 2018 to provide a cloud based platform for storage sharing and analysis of large genomic and related data sets. In particular, it was established to support collaborative efforts and data sharing needs of the energy or I program and initiatives. And an important aspect of the anvil is that it aims to facilitate training and development development of genomic research workforce and genomic data scientists. So we have specifically organized today's workshop around this very important specific aims for the anvil program. Of course, we don't have a lot of time to really go into the details of all the operations and activities of the anvil program and that's the reason why we prepared a booklet, an information booklet for you all to read it's long, so it's long, and I apologize for it, but we thought it was important for all of you to have some basic foundational knowledge about all the activities that have been happening in the anvil. I want to emphasize that this program is only three year old. And, and in spite of all of that, in spite of being only three year old. I think that we're very pleased energy at eyes very pleased with all the accomplishments and all the progress that has been made for the past few years. However, we all know that there is still a long way to go in a number of activities and in particular the reason why we want to host this workshop is because we wanted to hear from the community about potential gaps and challenges and future opportunities related to the energy investments in the anvil cloud based infrastructure tools and services. I want to pause a moment here to basically make sure that it's clear that the focus for today is really to think about the future. And not much about the past. We, we, we are building this resource as best as we can, and we really want to basically get your input in suggestions on potentially new key technology, new ways of training students, new ways to bring users to to this resource. So focus is on the future not on the past. So, with respect to the agenda after this presentation. Mr events Bonham, who is our acting deputy director will give us a presentation on data science at the forefront of enhancing diversity in genomics. Then we'll hear an introduction to the anvil a little bit more information and what I just gave you about the anvil from the PIs. Dr. Anthony filipakis at the Broad and Dr. Michael chats at Johns Hopkins. Then we'll go into session one of the workshop, which really consists of two breakout rooms, one focus on data submission and consortia engagement, and the other one focus on analysis tools. This is each breakout room sessions last for about one hour and 15 minutes. And then the moderators of the breakout rooms will provide a report to the whole group on the highlights of the discussions, and then we will finally have a break it's going to be a short break. We'll move on to session two same framework as the breakout rooms of session one. I want to emphasize that for both session one and session two really we're not going to spend too much time on present hearing presentations from the anvil stuff, but the focus is really on discussions. So we really want to hear from you and your thoughts and your suggestions. And that's the reason why we set aside 45 minutes for for that. And then we'll report back from session two again the moderators will give us the will give us the reports and then finally it will close with Ken, my colleague and Wiley with a summary of what we heard and then we hope you're going to help us to prioritize some areas that we should focus on over the next a few years, both in the short term and in the long term. So back to the breakout rooms. We have decided that it would have been useful to set a general framework for how the discussions could be focused. And so we decided to use the framework of SWAT analysis where SWAT really means is an acronym and it means strengths weaknesses opportunities and press. So in the context of this workshop, really what we would like to hear from you is, what is it that the anvil itself. Where is the anvil at a disadvantage, where can and will grow and improve, and which factors jeopardize the anvil. So we hope that by, you know, providing this framework that we're going to be able to focus the discussion a little bit more. As I said, we have only 45 minutes and it's important to to be able to touch on all of this point. The other thing I also want to emphasize is that there are some cross-cutting themes. These are themes that are relevant to the four breakout rooms. And so if possible, we would like to hear from you also your thoughts about how the use of the cloud platform could support or be an impediment to some of the goals of the anvil. And what type of tools and services would really improve, please be on mute. Would it improve the services provided for clinical genomics, as well as how can we improve and do better with establishing interoperability across with other genomic resources at the NIH or also now funded by the NIH but by other entities and agencies. Who is here? First of all, this is an invitation only meeting. We have invited about, we have invited about 50 people. And in particular, because, again, we wanted to ensure that people were able to lead discussions, we have provided, as I said, the information booklet with some general background information about anvil and its operations. So we are following a model that is similar to what we're doing with our council meetings. We are, you know, doing council meetings with sign discussions to grants or group of grants or programs. So we did the same here with sign discussions to the various breakout rooms. And I hope you found your name on page five to see on which breakout room you were assigned to. The people that are here are also the key investigators that are supporting the activities, operations and services of the embo platform. These are investigators from the Broad and Johns Hopkins as well as the partners institutions. Then, here are also members of the energy I embo staff, as well as other energy I staff, and they're going to be here just as listeners. So it takes a village to really support this project. So every single person whose feature you see here is an important contributor to all the activities of the embo and I want to thank them all for all the amount of work that they put in support in this program. And this is my final slide and it's just the general information and guidance to participants. So without recording this whole workshop, including the discussions in the breakout rooms and soon after the workshop ends, we will make all the materials as well as the recordings available publicly on our genome website. During the, the entire workshop please use your raise your hand feature in zoom for questions and comments. If you're not speaking please mute yourself. And so, as I told you we're going to have breakout rooms and the participants will be automatically redirected to the breakout rooms based on either whether or not you had a discussion for that room, or based on your choice, the choice that you made during the registration phase. So we have set up a publicly accessible Google Drive folder, which I hope you will remember is a weekly and will workshop. We will post it on the chat every once in a while so it's easily foundable and you will, if you don't remember it you should be able to access it through the chat. And it's very easily organized. Basically, you will find, you know, the root folder and then there are sub folders for the individual breakout rooms. And then if you are lost, and you do not know what's happening with zoom, please feel free to get in touch with the general for money, possibly directly in zoom, and he will be able to help you. And with that, I think that is no time for questions so I will immediately introduce our acting deputy director Ben, Ben small now Ben. So thank you. And thank you for this invitation to participate in this meeting. Next slide. So I just want to make a few comments this afternoon or this morning depending where you are in the country. But I just want to thank the planning team for inviting me to participate and having this opportunity just to share with everyone that's on this zoom workshop. Some of the important things that I believe that are important for the Anvil program that are happening at NHGRI and why I think the Anvil program can play such an important role in these efforts. Next slide. So that I know all of you and next slide, Tina. Okay, great. I know all of you are familiar with the strategic vision that was published in October of 2020. That lays out these four different areas with regards to how NHGRI sees the vision for the future of the field and the work that we as an institute are doing to help to lead the field with identifying guiding principles, robust foundations for genomics, breaking down barriers and compelling genomics research projects. And from the perspective with the recognition that genomics is now integrated across all the biomedical research and all of NIH Institute centers and offices. What should we at NHGRI be doing to actually push the field forward. And I think that the work with regards to the areas that would identify for us this afternoon are so important to the field from the perspective of the work of Anvil. Next slide. So I want to highlight three of the guiding principles that are in the vision document and encourage you to go back and review the vision document and some of the areas that are important to your own work and to the whole Anvil program. And one of the areas is to strive for global diversity in all aspects of genomics research, committing to the systematic inclusion of ancestrally diverse and underrepresented individuals in major genomic studies. As a field, we're continuing to challenge with this issue of how do we truly enhance the diversity of the ancestral populations in our data and our ability to learn new information on understanding human genetic variation that's important to understanding our science. This continues to be a major area of importance to our institute and clearly is a major area for the broader field. So I see that this is an area that we as a field involved in data science must recognize. The second area is to embrace the interdisciplinary and team-oriented nature of genomics research. And I think this Anvil program and the process of the Anvil program is a perfect model of how do we develop interdisciplinary teams and work together in collaborative ways and can be a model for the field. And I just encourage that of this group. And then the final area that I want to highlight here is to adhere to the highest expectations and requirements related to open science, responsible data sharing, and reproducibility in genomics research. And that's clearly something that this process of using the Anvil lab model is a way to do that and that we can lead for the field of genomics and for biomedical research. But there's a fourth guiding principle that I want to highlight and that I want to spend my couple minutes focusing in on you with you. And that's the next slide. And that is to champion a diverse genomics workforce. We say in the guiding principles the promise of genomics cannot be fully achieved without attracting, developing, and retaining a diverse workforce, which includes individuals from groups that are currently underrepresented in the genomic enterprise. This is a major area of focus that the National Human Genome Research Institute has taken on to see how we as an institute can help to enhance the diversity of all of the genomic research workforce, including what can we do with regards to genomic data science. And how can we think about our steps, our strategies around our training programs, around new engagement programs to enhance diversity of the workforce. So as you think about your SWAT analysis today, this issue of thinking about how do we increase the diversity of data scientists, how do we increase the diversity of the individuals using the Anvil platform, and what are strategies that we can take with regards to that. What are the opportunities for us to actually enhance the diversity of the people using Anvil for their research. Next slide. But we also talk about these issues with regards to the foundation for genomics. And I just want to highlight two items within box two in the vision. One is to ensure that the next generation of genomic scientists are sufficiently trained in data science. And that is the work of Anvil from my perspective with regards to providing a platform that's user friendly that's available for individuals that truly a diverse group of trainees and then scientists across different biomedical research can use Anvil for their research and their research questions. So a recognition that there is an important role around training that I believe that the Anvil program has. Next slide, but also in the foundation for genomics is this foster diverse genomics workforce. So I challenge you to think about where the opportunities are for the Anvil program to truly to help to facilitate and to support this enhancing of the diverse genomics workforce. Next slide. The NHGRI has established an action agenda and you can see the website here. And the website of the action agenda lays out four different goals that we've identified that are important for our work. One is to develop and support initiatives that provide exposure and access to careers in genomics. Second is to develop and support training programs that with networks that connect undergraduate and graduate education to careers in genomics. Can we think about how Anvil can help support go to and third to develop and support training and career development and research transition programs that lead to independent research and clinical careers in genomics. This is an area where I think that this group can play an important role with regards to go to and go three with regards to the Anvil network. And finally and go for is to evaluate the progress toward achieving greater diversity in the genomics workforce. So we're going to evaluate all of our programs all our efforts and activities to determine if they're making a difference and to modify them as appropriate so that we can really truly use the resources. The programs initiatives that we start to actually improve the diversity of the field. Next slide. But I believe that Anvil is a leader and I just want to applaud all of the individuals have been involved with the genomic data science community network. On everything that I've heard about this network and the engagement that you now have with academic institutions across the country particularly minority serving institutions including tribal colleges and historically black universities women's colleges the engagement with these groups these faculty members and students about the use and using data science and using the Anvil program is a model. And I just hope that this group can move forward to expand on the work that's already going on with the genomic data science community network to examine how can within your own lab within your own department and within your own academic institution continue to develop programs to increase diversity to bring more students from diverse backgrounds and to data science to bring more faculty into the work using the Anvil platform. So with that I just want to say thank you. Next slide. I look forward to listening in on part of the workshop today and I hope you have a really successful workshop. And that you raise this question of how can you broaden the tent with regards to those that are involved in using the platform and broaden the tent of who you train with the use of Anvil. So with that, thank you. Thank you, Vance. Thank you. This is excellent. Unfortunately, we don't have time for questions but if you do have questions please put them in the Q&A or in the chat and we will address them. Thank you. Great. Thank you. So the next speakers are Dr. Anthony Filippakis from the Brody Institute and Dr. Michael Schatz from Johns Hopkins. They're both the PIs of the Anvil program. Anthony, Mike, who is starting? Go ahead and share my screen. You should be able to. Great. And I am not quite sure what's going on with my video but now is not the moment to try and fix it so I apologize for that. Thanks so much everyone for the opportunity to speak today. Maybe I'll, next slide. So as Valentina mentioned, my name is Anthony Filippakis and I'm from the Brody Institute. I'm the Chief Data Officer there, trained originally as a cardiologist and now oversee a lot of different software efforts at the Brody. Mike. Hi everyone, my name is Michael Schatz. I'm a professor at Johns Hopkins in computer science and biology. And it's really a great privilege and pleasure to be here today to talk about the Anvil. I just wanted to say that Mike and I hadn't ever worked together before the Anvil, but I can't express what a pleasure it is to get to partner with him. I think we both feel that we will be friends and collaborators for life. Next slide. And moreover, we really are very lucky to have an incredible team on both sides of the Anvil, both the Hopkins and the Broadside, more than 12 different institutions make up the leadership team of the Anvil. And it's been great to see the groups come together over the last three years and work collaboratively to build out a real community. In order to organize a lot of our efforts, we've created a series of working groups that cover important areas that the Anvil will focus on, whether it be, you know, data access or data processing, outreach, engagement, etc. And if it's of interest going forward to get more involved in these working groups, we would love it. Many of the members are from outside of the Anvil awardee community, and we really want greater involvement from the overall NHGRI community. Next slide. So to motivate a little bit about what the Anvil is all about, what we see is an incredible moment in time where there's a need for new cloud services to help researchers store, share and analyze genomic and clinical data. At very large scale. And it boils down to a few key needs. The first is cloud-based infrastructure, which provides a whole new suite of capabilities that will get into greater detail on the next slide. Moreover, data access and security cannot be understated. The data that we handle in the Anvil is often from human subjects, has a great deal of regulatory and compliance requirements associated with handling it, and being able to ensure that we can protect this data is another key area of focus. And then finally, the cloud offers a lot of opportunities for collaboration and being able to perform new types of analyses that would be much more difficult in a non-prem environment at your university data center. Next slide. And one of the things that I think is really important in motivating the Anvil is to recognize that we're in the midst of a one-time sea change in terms of how genomic data is stored, shared and analyzed. You know, I think Genomics has always been a leader within life sciences about the ethos of data sharing. I think a lot of this goes back to the Human Genome Project, where there was a public effort and a private effort, and the public side sharing the data and having it be open became a rallying cry, and that persisted in the subsequent generations. But while we've been very good about the ethos of data sharing, the way that we've operationalized it has been sometimes unsatisfactory. Our current model is to put data onto servers and tell researchers to download it to their local environment, and that presents four problems. First, in order to share data, you have to copy data, and that means that NHGRI and other funding bodies have to pay for storing a copy of every data set at every research institute. Second, when you download data, it becomes very hard to audit who's touched it and for what research purpose. And again, as we think about the sensitivity of the medical data sets like e-merge, this is a big problem. Third, there's real need for a common pool of resources. It's a moment in time where genomics is at risk of becoming a sport of kings, where in order to even begin to work with genomic and clinical data, you have to be at a large and well-resourced institute with a security and compliance team and lots of, you know, system administrators. But I think we all realize that there are many great researchers who could make discoveries that are outside of these traditional environments if only they had access to the data. And then finally, we have a lot of need for elasticity in genomics. So by instead of bringing data to the researchers, but instead bringing researchers to the data, we can address these four limitations. Next slide. And so what we seem to do with Anvil is to provide a suite of capabilities to our users. We provide them data that's been processed and QC and staged and made easy to use. Second, we provide computing capabilities that let you deploy large-scale analytical workflows and interactive analyses with ease. And then finally, we do a lot of work to educate users on how to use cloud services and how to use the data and tools and enable new models of collaboration. Next slide. And also as I said before, one of the key things that's a component of the Anvil is having a secure and compliant environment. For those of you who are new to it, one of the things that's key in cloud is to be able to have a security perimeter where you can say these set of services are within our boundary and we are able to attest to their security and compliance. But there are a lot of applications for whom trying to meet this regulatory requirement would be very cumbersome and very expensive. In order to become fismo-moderate, it requires several hundred thousand dollars and a third-party audit. Not every application can afford to cover that. So by making the Anvil in a way that it's an ecosystem where different applications like bioconductor or galaxy can be deployed within this secure environment, we save that challenge of reduplicating, creating a secure environment over and over, and instead take it over for them. Next slide. Just a few things by the numbers. The Anvil now has nearly four petabytes of data, a large number of users, a great array of workflows that are used by many different researchers. And again, have the beginnings of a real community around it that's engaging via social media and other tools. Next slide. And as you'll see, most of the large NHGRI consortia are now have been onboarded and ingested into the Anvil, and many future consortia will be engaged going forward. The early wins focused on human genetics, the CCDG, CMG, and GTECS consortia. But more and more, we're taking on a lot of different clinical use cases such as CSER and the next phase of eMERGE, which is quite exciting, as well as functional genomics efforts like telemer to telemer, and then soon IGVF. Next slide. And with that, I'll turn it over to Mike to do the second half of the introduction here. Thank you, Anthony. And again, thank you all for being here today. So there's a lot of really impactful work going on within the Anvil itself. And another dimension to the work going on in the Anvil is our efforts of interoperability. NIH has established something called the Cloud Platform Interoperability Group and CPI. And Anvil is one of the founding members of this and really kind of, in my opinion, leading the way for this. So in addition to the data sets, as you just heard about, about four petabytes within the Anvil, we've also established a number of key technologies so that the Anvil can interoperate with other NIH Cloud Platforms. And collectively, we have about 11 petabytes of data across all these platforms. To put some perspective on this, currently the SRI, the entire SRA, is about 40 petabytes. So already we're about a quarter of the size of it in just the two years and rapidly grow. In addition to sort of the data, the analysis, a major thrust of the Anvil has been to sort of interact with users and form them and bring them on and support them in the research needs. In the next couple of weeks, we have a few events next week at Genome Informatics, and then ASCII, some workshops coming up beyond that. We've partnered with individual research labs to get them on boarded onto the platform through something called the Cloud Credits Program. We've also been working with a number of major consortium like CCDG or CMG or new consortiums like PrimeMed coming on. We've also given classes and outreach at different institutions. We were in a data science program organized by Howard University, and you just heard from Vince. We've been organizers of something called the Genomic Data Science Community Network. We're really making a focus effort to outreach to do those communities that have traditionally been underserved by this type of research. In addition to all these activities, we're starting to see some really impactful science coming out of this. There's a manuscript that is currently under review describing the anvil and sort of its technical components today. We look forward to that publication coming out later this year or early next. And we're also seeing some tremendous science taking place on top of the anvil. We had an early success scientifically using anvil and using Terra to analyze the spread of COVID-19 around Boston, just being able to sort of survey all of the relationships between those genomes that were being transmitted in the environment. And beyond that, we've had some successes at the basic science level. Through work with the Tealmer to Tealmer Consortium, the anvil was the platform for analyzing 3,202 genomes to be able to do a global analysis of human diversity around the world using that new reference genome that is available. Similar efforts are supported through the Human Pan Genome Project where they're using the anvil as their platform to develop new reference genomes to catalog genetic diversity worldwide. On the clinical side, we've had great successes with CCDG where they've already ingested more than 100,000 human genomes, where there's joint calling going underway right now. Very large-scale efforts in CMGs, eMERGE, other aspects to bring in these types of data. To support that analysis, we've developed and launched a number of major analysis tools to support this. On the top right there, we have sort of a screenshot from a major tool called Seeker that lets you study transmission across families to support Mendelian genomics. At the bottom, we have a polygenic risk score report that can be generated inside the anvil, in this case to look at coronary artery disease. Our vision here over the next several years is to really grow the anvil in many different ways. Number one will be growing the data that is there, but more than just having static collections of disorganized data, we really want to lower the barrier to discovery. So having it harmonized, having it presented in a way where it's very, very easily accessible so that we can really make those tremendous discoveries with it. In terms of the number of sort of activities and data sets, we really want to support many, many consortiums as possible. Our vision in the not too distant future is to be able to support all consortium across the NIH any year on. And related to that, as you heard, a lot of these data are controlled access. Historically, that's been a barrier to research where there's sort of an extended process where you have to apply for access, get it reviewed in many different ways. We still want to have that reviewed process. Of course, we want to streamline it as quickly as possible to make it as efficient and optimized so people can access to those critical data sets. On the computational side, we're going to definitely be increasing the number of tools and number of analysis types that are possible. And part of this is supporting a diverse set of users. So it'll be interactive computing, batch computing, visual analytics. We want to be able to support people that are very comfortable at the command line to those that are just want to see kind of higher level summaries, really kind of be able to see the big picture. And I think a major focus moving forward is now that we kind of have the basic infrastructure, the basic tools in place, we want to be more predictive, more sort of forward looking about the types of analysis that we've done and see a big thrust into sort of capabilities for machine learning in the coming years. And then finally, in terms of the number, in terms of user activity, obviously we want to increase the number of users and the number of consortiums. And then furthermore, we want to empower them to be able to do the analysis on their own. We've had a few early examples and a number of early wins showing off, you know, some of the capabilities that are possible inside the Anvil. We just want to see more and more of that so that we can serve really as the platform for cutting edge biomedical research for all. And there's this great opportunity where we can democratize it, where it doesn't just need to be the big institutions, the Broads, the Hopkins. It can be really anyone anywhere. It can take advantage of this platform to make discoveries. And just to kind of have one final thought, I think we're at this really tremendous up moment in time. I'm old enough to remember an era before the internet, when I was dialing up on my modem to individual servers, individual computers, and it's very inefficient, it's very clumsy, data were siloed. It was a mess. I think we have this tremendous opportunity to sort of reverse that model, really have things harmonized, really have things interoperability, but we have to get it right. If we're successful, we're going to catalyze the creation of this open and federated data ecosystem. The internet itself being a premier example of how that could be so successful. If we fail, unfortunately though, there is this real risk of sort of degenerating into a collection of monolithic data silos that obviously we want to avoid here. So I'm really excited about the successes we've had in the Anvil just a few years, and I'm delighted to talk to you the rest of this afternoon to talk about how we can make it even better moving into the future. Thank you everyone. So I think now we're going to shift into our deeper dives through our breakout sessions, but hopefully that gave you just a little bit of context. Valentina, can you take it over from here? Yes, yes, I'm taking it over. So yes, so we as mentioned earlier, we're going into the breakout rooms. You're going to be automatically reassigned to your own breakout rooms, either as a discussant or just because you chose to select that particular breakout room. So stay tuned and hopefully this is going to work as quickly and quickly. Okay, thanks. Stay on.