 All right, okay. Why don't we start? Because we don't have a lot of time. So while you already heard from me, I'm Valentina DiFrancisco. I'm also going to be one of the moderators for this session together with Dr. Adam Reisnick. Adam? Hi, I'm thrilled to be here. You know, I'm at the hospital in Philadelphia and have a privilege of serving on the external council in ways that have just been really fruitful and looking forward to this discussion. So I just have two or three slides to show just to lay down some ground rules for these discussions. Okay, do you see my slides? Great. Let me see if I manage to project the whole screen. Okay. All right, so to summarize what's going to happen during the breakout session, we're going to go through this presentation two or three slides. Then we'll hear a presentation from Brian O'Connor at the Broad and Fred Tan at the Carnegie Institution. It will be followed. We'll have up to 45 minutes for discussions. And then we will, in the meantime, while the discussion happens, Adam will moderate the discussion while I'm going to take notes for the SWOT analysis. And so those notes will be used at the end of the session to prepare the breakout report that Adam will present at the end of this session. So just some general guidelines about for the discussion. The participants are going to be muted. And as you heard, there are people here that have been selected as discussants. And so we're basically going to ask you to speak up and to express your opinions and thoughts in the context of strengths and weaknesses, opportunities and threats. For the other participants, if you can, please wait until there is time for you to share comments and opinions. You can also do that either on the chat, but actually would prefer for you to use a document that is available on the shared Google Drive. Please navigate there, navigate to the relevant subfolder for our breakout room session. And just feel free to add your comments there. As I said, we're going to collect everything and then address them when we can. So please, if you use that document, please just write your name before you share the comments. So the other thing is, again, unfortunately, we don't have a lot of time, but if you speak up, please be candid, be heard, be polite, usual recommendations, and make sure that you give time for other people to express their thoughts. And if you have any concerns whatsoever about what is happening during the discussion, feel free to reach out to me or to other energy or staff and we'll try to help. Okay, so for this particular breakout room, the name of the discussions are here. And last thing is that because we have only 45 minutes and we have four topics to go through, thanks, witnesses, opportunities and threats, it's about 10 minutes each. So we will try to keep everybody on time. And when the 10 minutes are up for a particular topic, we'll move on to the next one. And just a reminder that there are some particular team themes that we would like to learn more about. And that is the use of the cloud, what is needed for cloud-based systems to better meet the needs of the genomic research community, whether there are tools and services that would better support clinical genomics research. And then the other theme is about interoperability, what is needed to improve interoperability without a genomic resources in the federated ecosystem. So I think that basically is all I have to say. And yes. Can I ask you a quick clarification? For clinical genomics, I think of CLIA, but you're saying research, correct? I just want to make sure I'm clear. Yeah, it's research. Perfect. Thank you. It may go into CLIA, but it's really at this point, we're thinking of research. All right. So Adam, do you have a name for the discussions handy? It's going to be Brian. I think we'll start. Yes. Okay. Okay. Can folks see my screen? Perfect. All right. Let me try and do slideshow. Hold on. All right. Hopefully that's continuing to share my screen. Yeah. Okay. Excellent. All right. So thanks for that intro, Valentina. Today, Fred and I are going to talk about the sort of data submission and consortium engagement for Anvil. I'm really excited to talk about this. I'm going to talk mainly about and of course my slides aren't advancing. All right. I'm going to mainly talk about the submission system, the progress that we've made getting data into Anvil and where are we going in the future with this? And then largely going to hand it over to Fred. I'm going to touch on consortium engagement with regards to submission of data and Fred is going to dive into using consortium engagement to prompt analysis of data in the Anvil platform, as well as overall sort of future directions for engagement. So starting with data submission, just taking a step back, I'm sure everyone on the call is very familiar with the DB GAP model. I just wanted to compare a little bit with DB GAP with Anvil and just kind of clarify how we're actually still leveraging some parts of DB GAP in the Anvil process. But broad brushstrokes here in DB GAP, one goes in and creates a study and goes through a data onboarding process for phenotypic and genotypic data into DB GAP into SRA and goes through an sort of approval process and validation process. In Anvil, it's different. We're not actually submitting data to SRA, for example. We're actually storing it directly in the cloud environment directly in Anvil. We still go through a process of approving which projects are going into Anvil and also registering studies within DB GAP as a way of having a centralized authorization location, which is very, very convenient for us and other projects. But the phenotypic data is being input into integrated tables within Anvil workspaces and genomic files are being directly onboarded in the cloud. And the really lovely, wonderful side effect, and you're going to see this as Fred talks about the analysis being done with consortia and other groups, is this is information that's ready to go for analysis within the Anvil workspace environment powered by Terra. Okay, so let's take a look a little bit at the data lifecycle of data coming into Anvil. There are four personas, I would say. There's the overall data collectors that are aggregating data for a given consortia and preparing it. There's the data submitters that are actually transforming data model information into something that Anvil understands, preparing the submission, preparing those phenotypic tables and submitting genotypic information, large files to buckets, and going through a review process. And notice that this can be a cycle, right? There can be multiple rounds of review before data is handed off to the data ingesters. This is aka also sort of data wranglers. This is Tandis and Valerie on the Broad team that are working very closely with these data submitters to get the data onboarded into final workspaces in Anvil and accessible through a release process. And then finally data analysts can use that through workspaces or create synthetic cohorts for use in their own workspaces within the Anvil platform. One thing I want to point out here is the arrows going back, I think, are really important. And we've seen that with CCDG doing a sort of joint calling on, I think it's about 140,000 genomes. The idea here is that when we perform analysis with an Anvil, that can actually shuttle information back into a new submission process. So there's this virtuous cycle of being able to use Anvil for not only data submission, but also data analysis and resubmission. So what does the submission checklist look like effectively? It starts with obtaining approval for this particular data set or consortia to upload data to the Anvil, developing a data model. We're moving away from the model of every consortia having its own data model to one where we have common elements. And you saw that in the discussion and the pre-read on the Terra interoperability model. We work with them to prepare the data in the appropriate formats for upload to cloud buckets and tables and workspaces and then run the Anvil data ingest process. So prerequisite steps for consortia bringing their data into Anvil. We have the study registration process in DVGaP to define that sort of authorization information and then mapping to the Anvil model and ultimately making decisions about how data is parsed into workspaces on consent group or per group per consent group. Okay, so with that being said kind of giving the overview of how the flow works, I wanted to take a step back and look at the overall sort of one-year plan for data ingest. I think what's remarkable here is we currently have over 20 consortia engaged with Anvil to bring their data into the platform, which is absolutely amazing. We're seeing continued data submission from established consortia. This is a timeline looking out into the future here, but we're also seeing new opportunities with encode data, developmental GTX data, NIAS, dementia, long reads, data set, things like Recount 3 for bringing in RNA-seq data. So there's a lot of diversity and excitement about new consortia engaging with Anvil and bringing in new data types into the platform. It's lovely to see this posted on our website. This really kind of brings it home in terms of how much data is in the system now. We're getting very close to 300,000 participants being loaded into the Anvil platform. And if you look at the data growth over time, we've gone essentially from one petabyte of data about a year ago to almost four petabytes of data. And so that's really, really exciting to see that increase happening, even though we're still working through the process of how do we reduce the manual steps and have more and more automation. It's wonderful to see that. It's also wonderful to see how Anvil's data growth here has really positively impacted things like cross-NIH efforts like NCPI, which is looking at making data widely available across systems. You can see Anvil has had a huge impact over the last year on the amount of data that's accessible NCPI-wide, including systems like Biodata Catalyst, GDC Kids First, and Anvil, making approximately 11 petabytes as of a couple weeks ago accessible to researchers that are working in the Anvil platform. They can work with data across all of these data sets, including the Anvil data. So it's just really awesome to see that. Again, we're really trying to focus on how do we automate as much as possible. And we've made a huge amount of progress in terms of data ingest, but we want to continue to make that process smoother and more automated and faster. And so we've recently been working on improvements to that process. We've developed scripts that will help us set up submission workspaces for consortia to come in and prepare their submissions. We've provided templates and we've provided a self-running QC, a submission checking tool that allows them to do a lot of the work on their own, where previously it was a combination of their work and also working with data wranglers to check data and do validations. So the scripting that we've done is actually paid off quite a bit already in terms of streamlining the way that researchers can upload their data as part of these consortia and validate the data themselves. In terms of focusing on improvements, again, part of it is tooling the scripting infrastructure that I mentioned, but also it's simple things like refining our data submitter instructions on the Anvil project.org website and refining sort of the critical path that researchers and data submitters go through for ingesting data into Anvil. So really a part of the consortia engagement here is really getting that feedback on this process and bringing those improvements to the documentation. The other thing I will say too is in addition to the sort of scripted improvements and automation that we have already rolled out, we're working on a more fundamental submission system improvement over the next year plus that will bring a lot of benefits, including the ability to sort of automatically map data models from submitters to a common Anvil data model based on TIM and also using that common data model for automated validations instead of having a notebook like we do right now that does validation of a submission. We want that to be schema driven. We want that to be driven by the data model. So as we update our data model, the validation suite can be updated as well. All right, so I'm going to talk a little bit about consortia engagement from a submission perspective before handing over to Fred to talk more about it from the perspective of using the platform. Regardless, we have really sort of four pillars of consortia engagement. First is building awareness of Anvil and what the platform does and how consortias can engage with us. And the second pillar here is looking at recruitment. How do we reach out and recruit new data sets and go through the approval process and work with the submitters in that? That ultimately leads to the third pillar of submission and the fourth pillar of then using that submitted data in analysis within working groups and research consortia beyond the submission process. So how do we think about consortia engagement? We're looking at it from the perspective of developing personas and we have sort of four core personas, the PI, analysts, teachers, and consortia. And that last consortia persona, we've really from a submission perspective focused in on data managers and submitters as the key persona subtypes within consortia. And if you look on the sort of online documentation that we have that goes with our submission system, we've really tried to use those personas to craft high quality instructions about DbGaP, a study creation, how to work with our data model, how to actually do the submission and how to actually run QC tools on the submission itself and the data QC tooling that we provide as well. So with that, I just want to say that work, that personas work, and the feedback that we've gotten from projects like CCDG and CMG and GTX and 1000 Genomes, which have been a core part of our success story for onboarding data over the last year. This work together with these projects, these consortia has been very instrumental in refining that engagement process, refining those paths that we give people to onboard data and the documentation and tooling that goes with that. That's led to things like the Teelimer to Teelimer project that Mike was mentioning earlier, being able to onboard data into Anvil, their reference genome along with sample data from 1000 Genomes and others that go with that. So we've been able to streamline the upload and submission process and that ultimately has transitioned into how then people are able to use, in this case, Teelimer data in a very lovely featured workspace that shows how you can actually leverage this data uploaded to Anvil. So with that, I want to kind of transition over to Fred to talk through that consortium level engagement that leads to analysis opportunities. Fred? Thank you, Brian. Next slide. And so one of the great things about engaging the consortia is that not only are they able to contribute their data, but they're able to use the datasets that are already existing. And so showing here is just an example of the TDT consortium reanalyzing the 3000 Genomes from the 1000 Genome project. And one of the benefits of this shared ecosystem is that all of their analysis, the widdles, the workflows that they're developing are now accessible and reusable and extendable by anyone else who wants to do a similar kind of analysis. Next slide. Clinical genomics is something that's on everyone's mind. And so some of the consortia that we're engaged with, in addition to the American Heart Association and eMERGE are social determinants of health. And one of the great things about engaging with AHA is that they've been conducting focus groups with clinicians that have genomics experience, finding out exactly what kind of tools they need. And so one of the premises that Brian did get a chance to touch on is the fact that the more a system is used, the better it becomes for everyone involved. And so finding out exactly what are the highest priority tools led to the incorporation of FarmCat, which is coming soon, one of the most requested tools that helps clinicians interpret variant alleles and suggest clinical dosing guidelines. Next slide. So some of the activities that we've been using to increase awareness and help recruit people to the platform, we've been working on over the past couple years now that the system is coming online. And if you go to the next slide, the first one that I think some of you were even attended at was the NHGRI GSP Magic Jamboree. This happened last summer and there were over 100 attendees. And it was a two-day virtual hands-on event where people got to do activities using the platform itself. And one of the great things was seeing the feedback of people telling us that they loved what the platform already has available, where it's going, and the kind of activities that we're creating for people to use. Next slide. Talking about Vince's trying to increase the size of the tent, another group that we've worked with is RCMI this past spring to try and bring Anvil and Data Science Cloud Genomics to broader audiences. So specifically, institutions that have less resources than others. Next slide. And then finally, the Genomic Data Science Community Network, where we're intentionally targeting institutions across the country, HBCUs, tribal colleges, minority serving institutions, and community colleges. And one of the great things about this activity is building up the network of researchers, creating a white paper that touches on the exact needs that these institutions need and developing curricula so that we can start incorporating Anvil into the curriculum so we can get people early on in their training to start getting the skills that they need to take advantage of these great resources. Next slide. And so two slides here. One is that all the consortia are now invited to the four working groups here that have stars next to them. So for increased opportunity, increased questions, increased awareness, they're able to come here and ask us questions. And I think this is one of the great ways to help bring people to really understand what the platform is. The last slide. And so our vision for our future engagement is to continue this marketing funnel, this community acquisition funnel, where we're trying to raise awareness, evaluation, intent conversion, and ultimately loyalty, so that the community themselves are able to support each other. And that I think is going to be our biggest hallmark when people are able to start interacting with one another, supporting each other, and recruiting each other to our platform. And so I just want to thank you on behalf of Brian and myself for the privilege of being able to work on such great platform, bringing access to people in this community. Thank you. Thanks, guys. That was really awesome. It's my privilege to try and engage all of you in a SWAT analysis. And despite, I think, the traditional context of SWAT being implementation on behalf of a competitive activity for most market-based efforts, this is really a forward-looking strategy. And I wanted to just provide a framing that would be great to start with, which is in the context of consortia, there are a couple of different ways, and there are a couple of different ways to frame the opportunities, strengths, interests for this. One is what are these in relationship to consortia, and what are these in relationship to the consortia's data and the secondary use of its empowerment? The narrative for the latter is oftentimes better developed than the former. And I just want to frame that as an opportunity to begin really thinking about this framework. And I think for, in order to really structure the discussion, we can start with the strength setting, which I think Brian already covered to some extent, beginning with, for example, ease of submission of data, capacity to intersect that data with existing data sets. But I'd like to push a little bit on that setting and get some input from that perspective, particularly as it relates to at least what we've experienced, the consortia's life cycle as connected to the data life cycle in where platforms like Admin can begin. So maybe you'd start with the strengths. And I think we have about essentially 10 minutes to try and cover the strengths perspective here. I would say Admin can replace a lot of data coordination functionality within consortia, make it more economical through reuse of the infrastructure. This is a key opportunity. And currently, most consortia's already have some existing DCC or other activity that's already incorporated, typically separate from a platform setting, with many data sets within the end bill to date, really being provided at the last stage of the data life cycle, post to consortia's own activities and the data on behalf of secondary use and distribution. The telomere to telomere example is a really great example of the consortia itself leveraging the platform not only for the secondary use and distribution, but for that, what the cloud and the end of platform actually drives as a use case in ways that typically are not existent within the DCC's framework. I think it's a great example of a strength. Just had a little bit of color. As Ken Valentine said at the beginning, the end bill is about three years old. And when we first started, there were a large number of consortia that were already underway. And so it really was a model where we had to focus on getting stuff that already existed into the end bill. But what's quite exciting to see is that if you look at a lot of the awards that have been given in the last year, whether it be the new wave of Mendelian awards or the telomere to telomere or the prime consortium or eMERGE4, those are all ones that are getting going after the existence of the end bill. And so the model by which we're engaging with them is exactly what you saw in telomere to telomere. So we very much hope that that would be the dominant model going forward. David? Yeah, Anthony's already talked about the strength of this, but I do think managing the individuals with the data sets based on consent level downstream is going to be super helpful. I mean, I've sort of been around what people have downloaded and all you can do is just send an email saying these individuals should not be used in analysis. So I do think post-study management is definitely a strength. Thank you very much for that, David. Maybe there were a few questions about duos during the talk. You'll hear more about that in the second session today, but David alluded to it. So let me just clarify a little bit is one of the things that the future of the end bill holds is that we ingest these data sets, we collect the data use restrictions, and we model them using a formal ontology. It's quite nice. There's a paper that's going to be coming out in cell genomics in about a week showing that empirically this works very well. And what that means is right now, if you are a researcher and you say I'm studying diabetes, tell me all the samples that I can use as a control. That is weeks of project management time in order to be able to answer that question. Whereas where end bill is going, it's a simple SQL query, and it also simplifies a lot of the data, the dbGaP work of reviewing the applications. So to switch all of the way that you change data access is a big left, that's a big, big challenge. But we're quite far along and you'll hear more about it today where there's now a large scale pilot of DUOS by six NIH DACs. And if that works well, then we'll start to really scale up and automate a lot of the framework of data use oversight. So you'll hear more about it later today, but I just wanted to clarify because I saw quite a few questions during the chat. And the data use is really, it's interesting that you bring it up, Anthony, because most consortia in some respects initially, at least by our experience, have the notion that they are the body of governance, at least until submission for secondary use and discovery. And I think that's, again, one of those shifts in opportunity that is arriving that most consortia are so challenged in the distribution and access of their data pre-release, for example, or pre-becoming a data sets for wider secondary use. And that activity oftentimes is slow and challenging for the DCC itself to manage beyond it, the primary stakeholders of consortia. Elizabeth? Hi, just in terms of strengths, I can imagine once you're used to having your data in this environment and you've got your toolkit set up and your pipelines and your whatever, this is going to make it a lot easier to train students to get them up to speed quickly. So just reflecting on the need to diversify our workforce, this makes it so you can have somebody do a rotation in your lab and do something meaningful once it's all pulled together. Right now, it's a little bit cobbled, but I can see that being a real advantage to the space in the future. Yeah, I love Anthony's Sport of Kings analogy here. This is clearly a main opportunity for these efforts. And again, in the context of consortia, consortia themselves are oftentimes academic discovery focused efforts, less data generation than data sharing efforts, and the capacity to support the consortia's own scientific initiatives in the context of training and widening that out really is a fantastic, I think, resource opportunity where essentially the data generators themselves that the data extremely well, but can use it as a training platform within their own stakeholder community. Sorry, really quick. I don't want to take too much time. The other thing I would comment on is you all are putting together all these great workshops and so on. That's another way that you can get people very quickly engaged even if they're not a trainee who's going to be joining your program. So it will be great to see how that stuff gets integrated. Just related to this point, one of the things I put in the notes was the diversity action plan training programs at NHGRI are a really ideal opportunity for mentored experience with Anvil as part of that whole outreach and diversity and workforce training. I think those could be integrated really, really nicely. I was going to change topics slightly. So if you were responding directly to Carol. So first of all, Carol, I agree. And I think we've been thinking about some of that integration with our training program. And sure, Joe said has certainly been very active in that. And that's a really good suggestion. I was going to throw the question out to the group. I think particularly Liz and Steve Rich, I'd be interested from hearing from you part of our hope is as a strength of the Anvil is that by bringing a lot of this data from different programs together, it has the potential to help people connect across consortium or be more aware of even more aware of what data is out there and what's there. And I guess my question to you is, does that seem true? Would you agree that's a strength? Do you think we're not there yet? Do you or do you worry that that is, you know, something that I'd be hoping would be happening that isn't? And I guess I'd ask Liz and Steve Rich first and then others can also comment. All right. I guess I'm a little odd. I work on Alzheimer's disease and I work on rare disease. The NIA is not well integrated into this. So for me, this is awesome because I'd love to connect to things like eMERGE really easily, where you may have people who have dementia and maybe I can't call them Alzheimer's disease, but I can get something that is kind of mimicking replication that way. That would be fantastic. So I think that that's something that in the future could be developed. But for me, that's kind of a no-go. They're siloed still. And for the rare disease stuff, I feel like the phenotyping categorization is really tricky, you know, because I can't just say, give me all the patients who have rare disease, because maybe somebody's been misdiagnosed or maybe there's phenotype expansion. So then I want to do things like search on features, which I don't see that being very easy right now. And I think in order to build an analysis set that is taking data from multiple resources, we really need to have a tool, which I think is being developed. And I got a notice of interest this morning on it, but some sort of like natural language processing or something that says, oh, you've been diagnosed with bicarbine. And I know from Omen that that has these features. And so if you're looking for this feature, I'm going to give you the people who have bicarbine as well. Things like that. Steve, let me know what I missed. I think one of the issues will be, for many of the consortia, there are common players. And I'm coming at it from the NHLBI and IDDK as well as the NHGRI side of things. So you basically have the same groups of people and multiple data sets contributing. And I think one of the great advantages that might be available through Anvil is sorting through that. So you know, you don't get the same person represented six times in a data set that you think you're using as a control. You know, it's because they're from the same study that's contributing to multiple consortia. I think the other issue that's as obvious is that diagnostic characteristics are difficult for a number of these diseases and phenotypes that we're interested in. And if there's a way of really automating that process, that would be great. So I think there's going to be use there, but it's going to be a lot of training as well. I'd like to speak to the question raised that how to make data sets findable. Well, our experience with the XRNA Atlas is to make the data fair, which means register APIs in API with fairsharing.org. And unbeknownst to us, Google Dataset Search indexed all the metadata. So individual data sets are actually findable using Google Dataset Search. So I would say Anvil would do well to actually make their data sets fair. And in that way, they can make it findable not just within Anvil, but across other NIH efforts and also globally. Anthony, I saw you went off in the week. I wasn't sure if you're going to comment. Oh, sorry. I wish you'd been off mute, but maybe I should go on mute actually. So let me just put on a couple of threads that Elizabeth pulled on just to distinguish them for additional comment. And we can potentially, and it's natural to shift into the opportunities landscape as you're talking about strengths. At least when I think about the consortia lifecycle, at least right now, as hopefully more and more consortia use the Anvil as its data coordinating center platform. But in reality, there's always going to be a mix of this process. But in that setting, a key incentive is to actually empower the consortia's own capacity to advance discovery, which I think is what Elizabeth pointed to across other data sets. By definition, that means that it's going to be an exponential curve. It'll start out slow because initially there's going to be a smaller number of data sets that are relevant to you. But as you enhance the number of data sets that are cross queryable with your disease of interest, presumably there'll be a feed forward loop of interaction. But I want to also pull on the second thread, which is a little bit of a different use case, at least in my mind, Elizabeth, which is within the consortia setting or even outside of it, the notion of even a single file or a single patient being interpreted in the context of a Canvas release. And this really broaches this clinical use cross cutting theme that Valentina pointed to. And I'd love to hear both the current strengths that people or possibilities that people think exist and what are the real opportunities and what would it take to empower such efforts on behalf of, I think, Elizabeth, if you're pointing to some real time diagnostics or interpretation of cases in a meaningful way, leveraging these resources. I'd love for somebody else to chime in, but I don't know. I feel like this thing that I'm talking about is really useful for, like, rare disorders. I don't know how helpful that would be if I'm working on type two diabetes, for example. But it would kind of help for things like, I know that if you've got type two diabetes, you might also be interested in blood sugar measures or whatever. So I think it's worth investing in and it would be useful to larger groups, but it will be particularly helpful to those kinds of projects. So I'd be interested to know how Anville's going to deal with the data addiction, especially for diseases and how they're going to try and share the standards that other NIH programs are doing. So I've been working with the all of us program and the research workbench. And I think one of the opportunities and potential strengths would be if there's a harmonization of the way that clinical data can be presented in these workbenches, because then you're not going to have to keep on relearning a whole new process for dealing with data when you're trying to, say, find patients with certain phenotypes. So I think that's maybe an opportunity as well as a potential strength if Anville can work with some of these other RAM data sets that NIH is currently pushing for. Yeah, I think it's a great point. And actually with you, you're perfect. You're following the perfect SWOT analysis, right? You're sort of identifying current weaknesses is often a lack of these types of phenotypic harmonization. They've gotten very good and Anville is exceptional at the genotype harmonization effort. But in order to fully empower that, you have to convert that extant weakness of non-searchable phenotyping to a opportunity or strengths. I think it's really key, especially in the rare disease. And to your point, Elizabeth, about the rare disease, I actually think that one of the themes of precision medicine is that every human is rare. If you collect enough information about that human and make it searchable. So what are other potential weaknesses in here? You can think about from a weakness. Oh, actually, David, you have your hand up before I move forward. Sorry about that. Yeah, I was just going to say that we're sort of dealing with that in CSER. CSER is a little bit of a different animal. I'd say that we started right at the beginning when CSER was starting. And so we actually have a data coordinating center for the phenotype data. So we do have a chance to come up with that. So actually the PIs are meeting to decide how exactly we're going to do that before depositing that into the Anville. Right. And even for you, even upon depositing, how do we ensure that whatever you guys decided, right, aligned with whatever is existing in the animal? That chain of propagation is still sounds like a challenge. And that's still being negotiated. But we have other researchers that are in all of us and so forth. And so it is a bit cross-pollinated. So I've got one potential weakness and that is sort of, I think people getting into using Gamble is going to be one of the biggest challenges. Getting the message out that there is this platform trying to really encourage people to use the platform. I don't really have a great opportunity suggestion for that. But I know that's something that, I mean, I've heard about Anville, Mike Schatz has spoken about it a few times on the GDSN network. But the energy required to start using something like that compared with when you're just focused on small projects is kind of quite high. So I think that's going to be somewhere where maybe smarter people than myself may think of ways to ease the entropy required to start using these shared programs and processes. Also having ways where one can have a free fiddle, basically see what it can do, give it a test drive, get an idea of how much it would cost to use a platform like that before you actually really get into using something serious. Because the big thing that puts me off to Kyle computing is the lack of understanding how much it's going to cost up front. So let me pull up, let me just pull a couple of threads and I'll turn it over to Manoli, is that you essentially identified a threat to the success of the program, which is cost as compared to what are typically university subsidized resources in ways that investigators don't have to think about it, although the NIH does. But most investors don't care that the NIH has to think about it. So that's the threat context. And you mentioned the barrier to entry, you're also being sort of having to learn to do something different than what you already know. And so again, in a swap now, specifically, you either want to mitigate threats or weaknesses or convert them into opportunities of strengths in some setting. And I'll turn it over to Manoli to have your hand up and then David. I wanted to follow up on that comment because I really feel that comment in the sense that I don't use Anvil right now. I use a lot of dbGaP data and the process is just so difficult to go through to find datasets that are appropriate for what I do. And so what I do is pharmacogenomics, which is even more complex because you're looking at disease and drug many times, which are not available in these datasets. But I wanted to say one of the ways maybe to turn this into an opportunity. And I say it because I think it would be helpful for me just to understand is to show for people who want to be engaged in this, such as my own self and the consortias we have, is to show how you could use that in a very concrete manner, right? Like in a very easy manner of even just identifying the datasets for this very specific phenotype. And in my case, the phenotype is really specific when you have drug disease. Great point. David. Yeah, I just kind of want to talk a little bit about the costing and that within CSER, the Coordinating Center was actually funded to fund all the sites. So I'm not only worried about my site, I'm worried about all the other sites. So we have a student that's working on a sort of a pipeline to to surveillance or keep an eye on what's going on, what's what's happening. So I would say if that was a bit more automated and we'll be glad to share the code if we if we come up with something clever. But that's something just at least for me coming out of our budget. Yeah, I think it's a great point and cost is cost consideration is not annual specific. It's broadly an issue across the cloud based implementation. A couple of threads that at least we've found and if others have comments on this, you know, the engagement of the cloud to I think not only you pointed to this, one is cost transparency and planning around those resources. But then second, even if you give people money, there's the barrier to entry of like, do I really need to invest effort in doing something different than what I'm already doing. And that opportunity landscape of either increasing the speed or the scientific capacity, right, is the communication opportunity. And what we found is that some people feel that they can do something much faster than they can do otherwise, or have access, maybe even searchable in the way that you mentioned, only to something that they couldn't have otherwise, those are two drivers that then mitigate the risk that they feel around cost and might even engage them in experimentation on behalf of that process. So even experimenting with costs is, I think, a valuable opportunity. And if we're investigating it, they'll have to get a sense of that as you highlighted, David. Manoli, you might have had your hand up. Okay, Stephen. Yeah, just just one point related to cost is that I've been involved with Biodata Catalyst as well. And one of the things that happened there is that NHLBI provided Biodata Catalyst fellowships for people to get involved and provided the first year funding. But a lot of the time spent in getting people up and using Biodata Catalyst meant that after about six months, they were ready to actually start working. And of course, their project can take a year or more. And so it became a cost continuum to make certain that it takes six months of work just to figure out what to do, another six months to get started, and they're just getting going on the analysis. So I think it's important to think through this cost structure so you don't just get people started and at middle of a project or beginning of a project, you say, oh, now you have to come up with the funds to continue. And so I think it's important to make a commitment because it's an ethical commitment as much as it is a cost commitment. I think it's a great point. And I think it's distinguishable from, you know, a labeled cost of the threat, but sustainability as associated to cost is a second threat. Even if you have enough funds to do X, the overhead you might get for engaging that and then having to leave the platform or go elsewhere and the downstream harm we might engage as an investigator or consortia is a second threat, essentially. So and I think that's sustainable. And I've tried to ask, I've asked the bi-data kettles folks and NHLBI leadership, well, what's going to happen after the next couple of years? And they said, well, we're trying to figure that out. That's not necessarily a great answer. Right. I like to bring up, you know, switch gears a bit. Is it okay? Yeah. And bring up the topic of, you know, the changing landscape of data intensive research. Now, certainly cloud has changed the landscape and annual is a step in that direction, you know, changing how we operate, how consortia operate. I would say the weakness is that the roles are now not clearly articulated, you know, what will be the role of angle? It should define what it is and in what kind of ways will interact with consortia in the future in this change landscape where angle can take much of the role of data coordination. Also, we are in the world of co-optition, right? So we compete for energy funding, then we get funded, then we collaborate, and then we compete again. How does that new resource affect that process? You know, do we need all to be, you know, collaborators of angle to get funding? Or if you're not members of the institutions that are part of annual, do we have a chance of getting funded for consortia? I think if it was a kind of commercial service organization providing annual services, it will be easier, but annual is an academic institution and infrastructure is the broad institute. You know, we all become part of the broad institute. You know, how do we compete with somebody from the broad institute? Can we? You know, I am not sure about that anymore. So defining clearly the parameters of competition in this new environment is very important actually for the communities buy-in, I would say. And maybe some kind of uniform letters of collaboration, standards of sharing technical data, agreeing to collaborate with whoever is granted in a competitive grant review process. These are all new questions and unless addressed, actually, they'll create a major weakness of the whole project. The whole community may silently turn against it. And that's a weakness and a threat, I would say. Before I engage, is the next comment, I think I see another hand up. Is that related to this? Are we at which section are we at? Opportunities, threats? So I think we went through strengths and then some weaknesses. I think we're not exactly following the opportunities. I think we're really talking about a couple of threats that can be converted into opportunities or strengths, which I think is fine. So, you know, what Alex just highlighted is that, you know, a real threat are narratives of ownership, governance, and as they intersect with competitive practice, in many ways, the anvil as a resource is poised to enhance the competitive success of investigators who adopt that environment, right? It increases access, speed, processes to the better user you are of the cloud, potentially as a competitive practice. It actually does two things, one accelerates the scientific process and engages competition outside of simply resource limitation or access limitation, have historically been barriers to competition, right? You can access the data or you can do this fast. Those have been essentially variables of competition and essentially we're making the field equal in some respects. But to your point, you know, I think one of the really exciting things about the anvil and many other programs is this intersection between NIH and academia and even commercial entities in developing the overall ecosystem and your narrative and the risk you are suggesting, right? Unless we have that narrative clear, those could be barriers to entry. But also an opportunity if we can drive that narrative clear SWH fears as part of the engagement process. If I read it between the line, Alex? Not exactly. You know, I was mostly speaking, from my perspective, data coordination role in conversation, right? This is the role that's basically going away, at least to some degree, I would say. And that's justifiable. I think it's a change in technologies. And certainly, you know, it's justifiable that if something can be done more economically, less redundantly, a lot of research or reuse of resources, that's completely justifiable. What I'm saying is that this new role of anvil needs to be understood also is changing the dynamics of competition. And the question is, do we want competition or do we want just one huge collaboration? You know, my sense is that we, there is the talk is all week, we all collaborate, we'll work together, right? But the reality is we also compete. So if that step is eliminated, what will happen is only those that are part of anvil or part of the institutions, the Broad Institute, etc. They're the only ones who will get the funding for this consortium. I'm not sure that's the intention of the funding agency. So the question is how to manage that transition, where some of the role of the consortium, say data coordination is taken up by anvil, while at the same time keeping the level playing field when it comes to the competitiveness step, right? So that the institutions that participate are not privileged during the competition, that information about anil is shared so that different institutions can develop competitive proposals, and also post-competition that if they get funded, they can participate, you know, in maybe contributing to anvil activities and so on. So I would say it's a complex issue. You know, I think actually you pulled off right, I hadn't prioritized, right? You know, we're talking about consortia, membership, the scientific process itself, but you're pointing at a separate competitive landscape of what is the DCC competitive landscape and what is the role of these platforms in some respect? Like how can you compete as a DCC, for example, and should there be a DCC? Yeah, and the answer may be that there is no DCC that, you know, that consortia should propose that anvil be the DCC. In that case, in that case, everybody's then anvil, for example, would need to agree to be the DCC for every consortium, right? Not privilege this consortium or that consortium. I wonder, Anthony, if you could comment on this or I see David frowning, but maybe I'm frowning regarding this topic, but maybe Anthony can comment. Yeah, no, thank you. It's a really important conversation. I don't see DCCs going away, honestly. I think that they play just the same important role now that they did before. As I mentioned, even these new consortia like Primed or Emerge that have launched since the anvil began, we have great interactions with the DCCs and work very collaboratively with them. And I think what happens is it's a collaboration where the DCC knows the data sets better than we will. You know, there's that certain side of these three genomes are blacklisted because I know they were done by this tech on this day. And that's something that the consortia knows and would be very hard for us to get visibility into. And then there's a manner of kind of propagating that information to us. So at least from my perspective, I don't see DCCs going away. And I think it's a great opportunity to engage an ecosystem of participants in the anvil. And one thing I would appreciate input from this group, my approach to this is whenever asked for a letter of support, I unequivocally say yes, even in cases where it's competing with someone from my institute, I always write a letter of support as an anvil PI when asked. And I've never said no to anyone. That's great to hear. I would say that's really, I had a different experience with some other platform providers in academia who withdrew their lack of support after deciding to compete with me. So anyway, I don't want to name names here, but another aspect of it is openness to technical information, right? So sharing enough technical information with these consortia so they can plug in into anvil in an informed way so their applications are as competitive as others so that that will be helpful to me. I see if Tikar has his hands raised. Go ahead. Go ahead. Yeah, hi. I mean, this is probably just a broadly philosophical comment and you're all aware of it, but I mean, I think this is more in the opportunity realm and the opportunity for anvil to perhaps level the playing field a bit. What is happening in this world is an enormous amount of data being produced. And as a result, there's competition for data analysts and data scientists and experts and most of them, many of them are not always in the biomedical arena because they go to the tech companies on the east and west coast. And so what is happening, what is emerging is a disparity, not only I would say racial but also geographic in that many of these this much of this expertise is residing in academic citadels. And there's an opportunity for anvil to help kind of level the field there by, you know, doing your road shows, doing your MOOCs, doing your courses and making anvil something that is reachable, accessible to not just the data scientists, but the clinician investigator and the student. And I think that's a tremendous role that anvil can play, because I think that this gap is widening. Even, you know, if you compare Midwest with the east and west coast, we have a hard time recruiting data scientists and experts. And I think if we had tools like anvil that are, you know, accessible and kind of level the playing field, I think that would be a huge opportunity for this program. Yeah, again, I think Anthony, you're probably shaking your head inside and out regarding that opportunity landscape. And I think we see this ubiquitously in this setting. I wonder, you know, maybe this is a form of opportunity, which Alex also I think you were bringing up, you know, around the technical sharing, but maybe pushing on that DCC landscape, you know, what we see also emerging, and I think Brian touched on this, is that interop is another version of democratizing the DCC landscape, meaning endless capacity to interoperate with other environments and resources is an expanding that interop landscape and which I think the NCI and especially in its GRI has been leading will continue to essentially create opportunities for additional platform development and resources that can interoperate to connect moving forward. I'm not sure if that's a shared view, but I think that's another way to mitigate that risk of you know, the monolith sort of setting that I think you're pointing to. Agree interoperability, right? API centric, for example, standard data models, you know, fairness, right? It used to be fair to David. David wasn't frowning for any other purpose except light reasons, apparently, which is I'll try to smile more. Right. So let's we have, I think, a few more minutes. What else can we pull on as a threads of either weaknesses or threats that, you know, either we need to mitigate or convert into strengths and opportunities? And then again, this can be either sort of from a cloud centric perspective or annual specific in a setting. Yeah. So one, I mean, a threat, I don't know. Yeah, threat would be that you have an investigator who has let's say a particular domain of biology or medicine that they're interested in and they would like these to be more interoperable. And I think Stephen alluded to this for example, how well can the NHGRI work with other institutions, NIH institutes rather, to have either a conduit between Anvil and their cloud or actually assimilate those datasets onto Anvil so that I don't have to go hopping from cloud to cloud. I could just potentially, if NHGRI was able to have these exist, build these bridges, I would just stay at Anvil and not have to go elsewhere. I think that that's a great point. And it is a very clear threat. It is the still challenging context of the cloud to cloud context in ways that sort of broach on Anthony's earlier comments about the need to copy or where do you create copies? And is that the current solution, you know, across the intercloud context? Titus? I did want to point out, though, that the work of NCPI, you know, has shown, has underscored the importance of what you're saying here too, right? The fact that at least within those organizations, those platforms between BDCath and CRDC and Kids First and Anvil, we can share and we can leverage in Anvil. I can pull and use data from all four of those. I think the bigger question for me is how do we replicate that across all of the NIH and beyond? And that's where you start running into the issues about, you know, cross region sort of work and cross cloud work that's more important to, it's going to be something that we need to look at in the future. But I'm happy to see the progress that we've been making there. I think it's really exciting. Titus? Well, Brian more or less took half of what I was going to say and said it. So I'm just going to say, you know, I think a major threat is if there's only one set of back end systems that is capable of interoperating, in which case it's not really interoperability. And I think NCPI is a partial solution, but that's an example of what Corey Doctorow called cooperative interoperability. We're all going to get in the room and we're going to talk and make sure our systems work together. But I think there's also a strong future role for what Corey calls indifferent interoperability. I don't care that you're interoperating with me because it all just works. And I would like to see, I think that in the next five years, that's what I want, sort of want to see. Like I don't have to talk to any, I don't have to talk to Brian ever. No offense, Brian. But, you know, to use your systems, it just all works. And if it doesn't, there's a help desk and standard channels. And I think that speaks to a lot of the stuff that's been brought up about, you know, that Alex brought up about funding and so on, as well as some of the other things. But I just want to make sure it was really noted. I think it's an awesome vision. That is, we all seek not to care, essentially. David. Just, just want to quickly say, I see an opportunity. I know we're just focusing on ingestion here, but seeing that transition over, and I know this is what everyone wants here, but that transition over to tools development and analyses would be a super great opportunity for this group. And I know Ceaser's sort of in that role right now. And yeah, I'll leave it at that. Yeah. And I think it's a terrific opportunity, you know, and I think it's, again, another area that the interop context begins to broach. Making essentially ensuring that tool development is interoperable, you know, and not necessarily required within a particular setting is a key opportunity. I know some of you, you know, were essentially appointed discussants by an age. Any of the other discussants who haven't commented, who wanted to add, I was worried about hogging airtime. Okay, I was taught to count to seven and then start talking again. So, you know, the title here was really, you know, the submission process. And at least one of the things that I think is interesting is maybe that's actually not the right pairing, thinking about submission and consortia, right, because it invokes, again, a data life cycle and process that we're potentially are trying to undermine, right, that it's not, you know, what, at least I heard is that, you know, some of the opportunities is to transition the anvil from being seen as a submission site as to a platform for discovery use, support of the data coordinating centers, roles and ways that are fair, transparent, non competitive, interoperable. And that submission is really an act of use, not necessarily on behalf of fulfilling DNA just requirement for data sharing, in some respects, I mean, those are going to be existent, right, but DbGaP can also do that to some extent to fulfill those roles. And so I think, you know, that's a real opportunity is to really think about that context. Let me just push on the consortia part, but again, coming back to the clinical use, you know, I think this is an area where the anvil is really thinking to innovate than somebody mentioned, Clea, for example, earlier, what are the threats or challenges in the context of clinical use, if we really want to move this forward, if folks can chime in on that. I actually had a question, which maybe if I knew the answer to would help, which was when they were showing about the clinical genomics aspects, there were two parts that kind of struck me, and I realized it was probably cut for time. One was, was any of that curation process actually curated with kind of the SOPs we use in ClinGen for curation for ClinVar? And it doesn't have to be like expert panel working group, but it could be actually leveraging some of those same processes. So you're getting a better quality of annotation. And then the other was I just had no idea how those areas were chosen. Social determinants of health is really important, but without a champion or a use case, not so much. And so I didn't get a good feel for that. And I realized this is probably just a time issue. So I put my notes in, this is Shannon, I put my notes in the dock, but this is an area that's really critical to me. And so I unfortunately, maybe just it's own session, another time, another workshop, but I didn't get a good takeaway. Okay. I do definitely agree that it could have its own session, in some respects. And that's point to this notion of clinical use. Let me throw out just a couple of points to see if they spark something in the last couple of minutes. Whenever you say the word clinical, most institutions, hospital systems, like shutter, all the doors closed and it's very challenging for most institutions to think about a collaborative cloud environment as a clinical use environment. Those typically don't go together in the same sentence for most institutions. I wonder if that's just my own experience. I suspect not, but it clearly is the case that most institutions don't have local capacity to leverage everything that the animal might have on behalf of clinical use locally. And so that's the opportunity to get, sorry, Valentina. Go ahead. Didn't say anything. Okay. Sorry. Somebody else did. Okay. Yeah. It was me, Adam. I'm sorry. I was just going to have that comment with you. You struck a point when you said that it's trust. So clinical site, I'm in an academic hospital. We have no problem with sharing if we trust the entity. And that's a relationship building thing, right? And I guess my point is this is not unheard of, you know, ClinVar, ClinGen sit right in the middle because I'm on these calls. These are, our CLIA lab is heavily involved in that duration. I sit on an expert panel, so it's right at the heart of it. So it is possible. It's not like they're afraid of it, but it has to be trusted and it has to be done right. And I just think from the research side, from the peer research side, we tend to think of it as an afterthought and that's what I think is the best opportunity. All right. Steven. Yeah. I think one of the questions that will come up is what populates Anvil? And the most likely data that will populate Anvil will be those that were from ongoing consortia and those consortia are composed of studies, whether they're longitudinal cohort studies or other studies that typically are for research use only and actually have consents that will not return information to participants. So, you know, that side of things I think will be not necessarily CLIA, but you use that information to motivate the CLIA appropriateness. And actually, you know, if there are things that can be then taken back to participants, if there is like a specific actionability, then I think that may happen, but I see the initial widespread use is more discovery of things that then could be utilized by the appropriate studies that have CLIA interests. Yeah. That makes sense. Get the car. Oh, you're in me still. I would, I mean, I think I'm not sure. I wasn't coming into this meeting that thinking that we talk about clinical application, that's an enormous challenge. But one issue would be that institutions are going to regard data as currency. And so there'll be competition in the future, which is not based on how many MRIs you have, but how much data you have. So that's going to be probably a factor when you're thinking about collaborating with institutions. But if you're thinking about participant level collaboration, that's a very, very interesting domain, but I'm not sure whether that's on annuals, you know, radar, you know, in terms of potentially, for example, individuals contributing the genomic data for research and potentially even for monetization. But I think that's kind of going in a very different direction. Yeah. Hold on. That's right of it. You know, obviously, Danville and many of these thoughts will begin with genomics. And what we're seeing is right that the data generation around genomics is moving into the clinical domain in ways that puts those data sets under purview of the hospital ecosystem less under their research ecosystem in some respects. So it used to be sort of clinical data versus research whole genome sequencing efforts is shifting so that large scale big data are beginning to be generated at the clinical interface, which drives a different governance and decision making process as opposed to the cohort based sample characterization efforts that currently exist. I don't know, Anthony, if you, in your efforts to engage sort of this clinical domain, maybe you can comment. Yeah. You know, I said this at the beginning, I'm a physician by training. So this is an area that I care very, very deeply about. I think a lot of one of the things that was quite successful in the first three years of the end goal was the AHA led a scoping project around what are common use cases. And two areas that emerged where we felt Anvil could provide early wins for the clinical community was in supplying breast practices pipelines for use by medical centers. So for example, when you think about polygenic risk scores, that's a new approach to a new risk factor that a lot of hospitals want to start implementing. And so by providing the best practices pipeline to compute them and do all the imputation work, it removes the need for the hospital to have expertise in how to compute that pipeline. Similarly, the implementation of seeker as a tool for making helping genetic counselors and genetic professionals make diagnoses in the setting of rare diseases. We also view that as being a nice clinical win. So I think a lot of the work so far has really focused on providing tooling to medical centers to make clinical diagnostics more widely available. But where I think it would be really great to go in the future is thinking more creatively about aspects for how to conduct clinical research studies through the Anvil. For example, you think about something like Redcap, which is so widely utilized and made by our very good friends and collaborators at Vanderbilt. I think it'd be a real win to be able to have a much tighter integration between Redcap and Anvil so that as Redcap studies are run, that data seamlessly ends up landing in the Anvil where researchers can use it. That's one example among many where I'd like to see us go, but again, it's still we're only three years into this. Maybe it gives you a thumbs up. And I think a great use case, you know, the clinical trial, even if observational, the CRF to workspace environment transition. These are still key challenges. 100% agree. One minute and 42 seconds left. Let me open up more broadly across the SWAT landscape of either strengths, opportunities, weaknesses or threats that may not have been addressed that come to mind. I just have one that I guess I always say because of the population I work with is that I think there needs, I don't know what efforts are in place to ensure diversity in Anvil. And I know that there are lots of diverse studies getting put in and that's not what I mean. I mean, we still continue to have a Eurocentric databases even with pushes for diversity. So what engagement are you doing with people from using in consortia that are dedicated to these diverse populations? Right. So here you're pointing at the diversity in the data, not necessarily the users or researchers. Okay. Anthony, any comments around this? Yeah. So I don't know that we yet have any input into that process in terms of we're not, we don't design the studies that are being run. You know, certainly primed as a great example of a consortium that is focused on increasing our diversity, especially for polygenrous scores. And I certainly would love to see us play a bigger role in helping to make sure that studies are designed to make sure that our diversity is increased. But at least right now, it's not an area where we have any purview over that I'm aware of anyways. Anthony, this is Imer. I could also comment, at least in Cesar, we're working closely with Anvil to put in place some of the tools that you need to analyze diverse populations. So some of the tools include tools that infer genetic ancestry and you often need to encompass that in the types of approaches that you use. So and certainly I would also mention that the Cesar data is the Cesar general is consortium with a mandate to recruit 65% or above participants who meet the definition of an underrepresented and diverse for NHGRI. Perfect. Adam, I'm sorry I need to interrupt this discussion, but yeah, we're running out of time. Are you ready for the presentation?