 focusing on analysis and tools that I mentioned before. We're gonna start ours to do a brief introduction in the moderators, which I've just done. And then we're gonna go into our ample presentation, which is so into what you just heard from, format that you heard from Anthony and Mike in the previous session. The presenters from the ample team will be Dr. Vincent Gray, Dr. Anna Donal-Lauria. And we will then go into our discussions and then from the, most of the discussions are completed, we'll start working on a breakout report. So I'm gonna give some more information about the discussion so we're all on the same page. So for this session, we also have discussants for this group, for this session as shown here. So I'm gonna thank all of you for being discussants and providing an opportunity to share your thoughts for the group. In regards to the discussion topics, we're gonna take as Valentina mentioned, we're gonna take the SWAT method for developing the breakout reports for this topic. So we're gonna address the strengths of the ample, where does the ample excel, weaknesses, where is ample at the disadvantage, and opportunities where ample can grow and improve as well as identify threats of those things but those factors that jeopardize ample. As mentioned in Valentina's opening presentation, there's gonna be three cross-cutting themes that we wanna incorporate in our discussions. Those will include the cloud use was needed for cloud-based systems to better meet the needs of the genomics community. What tools of services would better support the clinical genomics research community, as well as understanding what does ample need to do to improve interoperability with other genomic resources in a better-rated ecosystem. To help stimulate and engaging and respectful discussion, we have a few guide rules and for the attendees. Right now, the attendees will all automatically meet if it's your freedom, meet yourself. The moderator will initially solicit comments from the discussants about the ample's swaths but then open it up for other participants. Questions and comments you can put in the Zoom and the feature you can use raise your hands are in the moderator will call on you. And there's also an opportunity to provide comments and ideas in the notes section. And so if I can, and I think we put in the, I'll put in the chat real quick after I hear the link to, yes, if you look in the chat, you'll see the direct link to the comment section for this particular topic. So feel free to provide any ideas that you have and that's in on that Google sheet. For the social engagement, don't be shy from identifying problems and risks. This is really a sense for us to understand what we think will work and what will not work going forward for the ample. Please feel free to be candid and you will be heard and be pleased. And as always, we're all gonna be polite. If you're a talker, please remember to give others the time and space to talk. And if you are quiet, take advantage of any opening when those opportunities present themselves. If you have any questions, please feel free to reach out to me or we have analysts also and if you have any concerns. So this is basically the format that will show for the different components of the swaths and I will turn this over to Dr. Marilyn Ritchie who will start up our session and I will stop sharing my screen. Marilyn. All right, good afternoon everyone. So thank you Ken for that introduction. I thought that was really helpful. So as Ken indicated, what we wanna do is go through this swath analysis. We have before that a presentation from, let's see who's gonna give the presentation. So we have Dr. Vincent Gray and Dr. Anne O'Donnell-Lorrius. So which one of you wanna start? I'll be starting. Okay. Vince, Gary, thank you. Shall I share screen then? That would be great. Okay. So be keeping the swat in mind as you're listening to this presentation and then once we hear those, we'll start the discussion. All right, and then we'll present here and I'll be presenting all the slides and when I transition over to Dr. O'Donnell-Lorrius, then she will tell me when to advance the slides. So thank you all for coming. This is a discussion of analysis tools. Is it looking okay? Yeah, it looks great. Okay, good. So the basic roadmap is I'll be telling you some of the details of the core components and then Anne will take over and discuss basic science and clinical science aspects of work on the anvil. And then it'll probably come back to me to talk about some of the issues of extending it from a technical point of view. So we've already seen this display that security perimeter for genomics, interactive genomics, collaborative genomics is very important. And really just this year, this certification has been achieved and allows developers in different spaces related to genomic computing to provide their tools to the community in a secure way. And you've heard already mention of bioconductor and galaxy as tools, this one for individuals who are really not doing any programming at all but still setting up genomic workflows. And this one a little more in the interactive computing and programming and even packaging space. And then in terms of workflows that can be managed and distributed and reused, Dockstore is an important component of the anvil. And the numbers of resources available to users in these different spaces which we'll say a little bit more about shortly is very substantial. This is a layout of the components again suggested by Fred Tan that there are two dimensions that we should think about basic science to clinical and large scale batch computing to interactive. And the different components can be laid out in the space from those that are very large scale workflow to command line or interaction. And the consortia that have been leveraging these tools specifically to Chilamir and Panginom project and then the clinical dimensions that can be worked on that Anne will be telling you more about. Terra is the basic cloud computing platform that we're using right now. And it's composed of workspaces and workspaces are a very user friendly device for getting people into cloud computing. So the dashboard of a workspace can have a long pros and graphical display discussion of what you can do. The data component is a set of tables that really described from a metadata point of view the types of things that can be used in the workspace, the types of data. And then you can have Jupiter notebooks or are marked on documents workflows in the workflow description language and the history of all the workflows that have been run are all unified for a given task topic and cloud environments can be defined that involve clusters or single multi-core machines what have you in order to do the work that you want to do. And these workspaces, there are many as we've already noted there are featured examples of workspaces that people can immediately copy to get rolling in one or another domain. Our studio and bio conductor have mentioned a number of times they basically work in the most familiar way you could imagine. You come on, you fire up our studio and then you can bring in data or work with the data from the anvil immediately. And this is a tested our studio environment and we keep up with it to have the latest bio conductor releases always available. Galaxy is specifically for individuals with no programming expertise to compose workflows and to work with a very large community and lots of curated data and many, many tools it's right there, it's ready to go. So those are the tools that are available. And I'll talk at the end about extensibility. I want to turn it over to Ann now to discuss some of the clinical applications. Great, thanks so much, Vince. So there's a lot going on, heard a lot in science about the two-limiter transition or two-limiter analysis, including people who are in this call today. And so this is a great example of some exciting workflows in the scientific field that a lot of people are going to want to access and definitely with a lot of big data like this we don't want to be reproducing it in individual sites. So this is workflow has been set up so others can run the T2T analysis workflow here. And then on the next slide, we have an example from a SHOT lab that doctors Miga and Philippi on this call and probably others on this call were also involved in this paper. So reanalyzing a large number of diverse or a large number of human genomes from diverse ancestries and trying to look at how we can change variant calling. And what this is showing us is we're getting a lot more accurate variant calling when we use the T2T assembly. And so from this, they were able to both increase the number of interesting variants found but also decrease the number of total variants because there's a lot cleaner assembly. A lot of the duplicated regions have been much better resolved. In addition, the paper finds that 269 medically relevant genes for their disease associations, they had a 12-fold improvement in calling variants in those genes. There's a class of genes that are gonna be hard from the original GRCH 38, 37 and short read sequencing data to be able to analyze. And so this is gonna be an important improvement for analyzing those genes. Move on to the next slide. So this is one of the platforms I work on and where I'm the most comfortable sharing information but it's Seeker is an open source software very focused on the rare disease and Mendelian analysis space. And this is a totally been able to bring onto Anvil so that any researcher now can go and put their joint BCF and a Terra workspace and push a button to get their data loaded up into Seeker. And so it really deeply annotates it, lets you see read data and it brings in lots of outside data sets. Like you can look at DTEX in there or link out to a bunch of things. And it's also a matchmaker exchange node so that you can actually smit candidate genes and match with other researchers. So again, all available on Anvil now and we have a lot of exciting development plans for the future, including bringing a lot of different variant types in but the one thing I did want to mention is like a potential possibilities. I think it would be really great in the field to have more of a, I have a cram, I put it in Anvil and I push a button and it takes me through all the workflows to generate all the different types of variants into thinking about and then to into loading to Seeker to really like empower the clinician and a researcher to be able to take it from part one into the future. Next slide. And PRS scores, polygenic risk scores. There have been, there are notebooks now that has set up a lot of these nice workflow analyses so you can come in and analyze your data sets. And then we've already heard about these interactive reports that can be generated right in the Anvil to help share this information from the polygenic risk scores and ways that have been studied very well in terms of maximizing the communication ability of this information. So next slide. So the AHA and Anvil working group has thought a lot about what kind of features does the clinical genetics and particularly for cardiac disease do we need and they interviewed a panel of scientists and PRS scores from the prior are one of the things that was really noted as a large need here along with pharmacogenetics. And so while some of it is set up there's a lot of different method development in the PRS space. And so bringing additional workflows and additional methods into Anvil is gonna really be important to continue to empower this community. And this group put together a list of 17 tools that they thought were kind of the leading edge in PRS. And all additionally over the past year Dr. Casey Overby-Taylor has led focus groups to discuss resources for pharmacogenetics and we move into the next slide. And here we're talking about farm cat. So Dr. Richie has been involved in or leading for clinical genomics in the pharmacogenomics space to really support clinical decision making. So this is coming really soon in the next month or so hopefully. And the idea here is that because of just for people who don't know that because of how star alleles are often defined it's kind of a different nomenclature. We need tools that can take genomic data transfer it into the type of variants that we use here and then guide through medical decision making. And so the farm cat tool kind of takes you end to end on that and it's gonna be a really great resource for the community. And next slide. And I'll hand it back to Vince. Well, thanks very much. The last points we want to raise here concern the future of Anvil in terms of technical capabilities conferred on developers outside the immediate technical group to bring new tools into the space. And so by registering tools in Dockstore or uploading widdles into your own workspaces you extend the capabilities of Anvil computational genomics. You can also use the Anvil APIs or extensions or repurposing of those in other languages to use any of the components of the Anvil. And this is all done in a very standardized way with the open API or swagger techniques for conveying services to client users. And wrapper libraries in Python and R are working now and programmers who know how to take advantage of these can build new resources for use in genomic analysis. And then adding new web applications containerized using Kubernetes, this is all doable now. Finally, there's this idea that third party applications can be run and security protocols for these are being formulated now. So the idea is that you wouldn't even have to go into a workspace necessarily but all the authentication and so forth would be taken care of by an app that is able to use the Anvil. There's also work now to bring more machine learning methodology and to use GPUs that are available in the various cloud providers to carry out this type of learning. Basic science work to ensure that individuals have access to all of the results of encode roadmap, IGVF and GTEC and so forth, all inside the platform ready to be interrogated. And as has been mentioned, new work in clinical genomics extending the usability of eMERGE and all the tools that can be deployed in order to deal with the genetics questions. And I think that is the last slide. Yeah, so thank you very much. I think I will stop sharing now. Or maybe I should leave the slides up in case there are any questions about them. Well, maybe we should do that just briefly. Are there any questions for Vincent or Ann before we get into the SWAT analysis? Just wondering if there are any timelines for some of these, specifically like the integration of the encode roadmap data. We've been in discussion with encode. Their total data footprint, if I recall is about one and a half petabytes. And today it's in start in AWS. I think there's, it'll come in phases. So the first phase is kind of mirror the data in GCP. And we've actually started working towards that. But that'll just sort of be raw access. And then over time we'll kind of build out more and more sophisticated capabilities to doing so. So that's kind of underway right now. Roadmap, we've only just sort of really just started the discussion there. So that'll be a more extended timeline for that. Yeah, roadmap's built into the encode portal. So if you just get it by default. That's right, you're right. Just the reason I mentioned it is because encode is ending up and we're starting to write up papers and stuff. And one of the issues that we're having is like there is no home for models and analysis tools. Like the portal and stuff is really designed primarily for the data. And we've been playing with Anvil and actually it is fantastic. Like everything we've tried just works out of the box. And the only thing that's missing is the encode data itself. So like if that could be brought in then actually it'd be a fantastic potential showcase for Anvil. Like when we publish these papers next year like all the toolkits could be really exposed through Anvil. And all the data would be at the portal or in the cloud. So I'm just kind of mentioning that as a potential very interesting collaboration. Excellent, let's do it. That sounds like low hanging fruit, quick win. So great idea Anvil. All right. So why don't we transition? I think we're one minute behind schedule. Not according to the timer but according to the original agenda. So we're gonna go through the SWAT analysis and Ken is gonna help put our slides together. Ken are you just gonna type in them and we'll just talk through or were you gonna share them as you type? So my plan is to let you first just have your discussions and I'm gonna take the key points from it and then in the last 10 minutes I'm gonna put them all together and then we can see if I transcribed them correctly and if I did them in the right order. Sounds great. Okay. All right. So our goal here is to try to spend about 10 minutes on each. Strengths, weaknesses, opportunities and threats. So let's start with strengths and the goal is to hear from the discussants first. So those of you who are identified on the slide if you could feel free to contribute first that would be great. And we can do it two ways. One you can use the raise hand feature which you just hit reactions and which is on the bottom of your Zoom screen and hit raise hand and I'll call on you or if you wanna type them into the chat that's fine as well. And so Luke I see that you have a question typed in there I'll go ahead and just ask it for you while others formulate their points. So a question that Luke raises is if there were restrictions on programming languages for third party applications. I think that is an important one that may fall into a weakness if there are restrictions. Yeah, I can address this. So as you just heard about there's several different entry points into the Anvil sort of analysis suite depending on what you wanna do. You know one sort of key distinction is this is this gonna be something you're gonna do in a workflow at very large scales or something that needs to be run more interactively. Our sort of preferred environment for very large scale workflows uses something called Widdles the workflow description language. In my mind it's like I'm teasing a little bit this is like a very fancy bash script in the sense that you set up here the command lines that need to run here are the inputs here are the outputs. So anything that you can execute in the bash script can be run that could be in Python and C, Java or Rust kind of you name it. It's quite universal at that level. And then the amazing thing is that behind the scenes there's a lot of technology there's something called Chrome metal that if you need 10,000 cores it'll just sort of orchestrate those machines and orchestrate the data on in and out. So if you need to scale up a workflow in any language that's an opportunity. At the other extreme are more interactive tools. If you wanna if it's just a matter of a few parameters you need to set maybe that would be a good choice in Galaxy. If there's a very like sophisticated user interface where you need like custom visualizations that's where that new Kubernetes technology is really important where you can kind of package up an entire web server with whatever GUI you want and then it can be launched and deployed in the environment. Today that's how we're using that's how we deploy Galaxy is in that Kubernetes environment. It's very feature rich. It sets up web servers, proxies, databases there's a whole cluster behind the scenes. So the sky's the limit as to what is possible although obviously if it's more complicated it just requires more engineering to make it so. Thanks Michael. All right, so does anyone wanna go first with strengths? Anshil I was just about to pick on you since you already spoke up that it's been great. So please go ahead. Yeah, so we've been very, very pleasantly surprised with like we thought we'd have lots of cheating problems. I mean we work really with a lot of these deep learning frameworks and libraries and like datasets that aren't like necessarily currently the primary product in Anvil which is variation data. We're primarily working with molecular data and everything has really just beautifully worked out of the box. In fact, I would say we were struggling to scale things up partly because like we're running these models on thousands of datasets and like scheduling the right machines and moving data around all of that stuff. The fact that Anvil is taking care of all of that or data or whatever is under the hood is just unbelievable. Like you can specify, I mean and one of my postdocs like he spent maybe two days and he was completely out to speed. Like he got all of it working. So I would just say like whatever you guys are doing in terms of documentation and the way it's set up is just really quite spectacular. I don't think we've ever, like even just learning Google Cloud took us like weeks but like jumping into Anvil took like maybe two or three days. So I would just say that's a big plus for you. That's great. Barbara? Yeah, my experience in infrastructure like this is from GTX mostly, I guess. And when we were working with GTX V8 we were told that we had to take the data on Google Cloud and download it, you know back in the Anvil of calculation was like 20K for downloading it and storing it on our local system. So the fact that all of this is kind of done is just amazing. And I totally agree with Anjul. I think the fact that it seems fairly easy to spin up from nothing to work on these data is so much better than what we had before. So I really appreciate all of that. I have to say another thing that I feel like we do a lot internally is work really hard on getting these servers that are protecting the sensitive data and the fact that this is kind of done under the hood the accesses are managed appropriately really saves us a lot of justification for, you know I mean, we all do this a thousand times in a thousand different places and sort of having one place where it's done I think would be amazing. Yeah, great points. Karen? Hi, just speaking on behalf of my team I'm here to do American Sorciam colleagues and really emphasizing how useful Anvil was for our variant team in particular as highlighted in the presentation by Ann and also by Mike Schett's team who was running with the co-lead of that very part of our working group that has just been tremendous in building workflows and a lot of the conversations that I've been having kind of passed that paper is how to utilize short read datasets against this new reference and how to develop even more tools in the space it was really a lovely way to really showcase the strengths of Anvil in the use of this new reference also many of our other working groups such as the one that I lead such as the Centrum Eric satellite annotation work we're now using little workflows and moving everything on to Tara and making sure that we have more tooling on the Anvil platform and this is not for our two meter tumor consortium per se but this is to move into more reference genomes more in that basic science space of trying to improve annotation work and human pan genome reference consortium. So it's just been a really tremendous resource we have a lot of education and outreach programs kind of in place here at UC Santa Cruz through our computational genomics laboratory led by Beth Sheets and so we're trying to really utilize these workflows that have been established in 2021 and use workshops through ASXG and other ways to kind of showcase that and collect information on our end because we've just had such tremendous use of Anvil. Great, thank you for that. We have maybe one or two minutes left on strengths in terms of the analysis tools in Anvil does anyone else wanna comment before we move on to weaknesses? Luke. I'm not a discussant. Is it okay for other non discussant? Okay. It's okay, go ahead. Thank you. I just wanted to say for the third party extensibility the fact that you have planned from an infrastructure standpoint to allow for third party applications to extend the system is beautiful. And not only are you allowing for these endpoints to take place you have a very clear in the background material very clear instructions for putting security first enabling and empowering developers. And I also wanna applaud you for having a maintenance and user support because sustainability is a big issue. Thank you. Tim? Yeah, one thing I've loved. I mean, so I'm often thinking in terms of like interactive environments first of all, thank you for setting all those up and it's been great. It's also wonderful you guys have all the like kernels already built with updated R and everything like that. I mean, the amount of time I have to spend convincing CIS admins here to keep that stuff up to date. It's nice to have just one group doing that well as opposed to fighting lots of groups to get that done. So I really appreciate that. Yeah, I would second that. All right. So why don't we move on to weaknesses? So with the tools that have been shown both in the background materials and also in the presentation this morning did anyone identify any weaknesses in what's currently there and show? So this is not really these are not really weaknesses but more like wishless. So I'm glad that the topic of the model zoo came up because in my ideal world, I would have like the data the code and visualization tools and a model zoo kind of all encompassed in the same cloud. So I'm glad to see that that's potentially working and I'd love to brainstorm more on those aspects. The second thing is I think alongside these standardized data sets coming from consortia which are a major source of data there are additional efforts like recount to which many of you are involved. And of course, and you know, CISTROMDB and so forth that are uniformly processed like all of the data on geo which is also an amazing resource. And it'd be good to do some outreach and try to see if those teams would be able to kind of create portals for those data sets. So those aren't like technically consortia but they are really bringing the power of all public data and most of it is basically open access at least the versions of the data that exists there. So that I think would be very nice too. And of course, thirdly, I didn't specifically see any plans for incorporating a lot of the single cell data from many of those consortia in a hub map, HDAN, so forth. That's obviously slightly more difficult but just wanted to recommend that that hopefully is on the timeline plan as well. Thank you for those comments. Sorry, I agree. Model Z is really important. I'm kicking myself that you brought up Recount 3. That is one of the groups that we've been outreaching to. They have a workspace. They're loading their data into it right now. So I think that'll be available hopefully by the end of this year. And then I agree single cell is an important dimension. This part of the complexity is just, there's so many different teams we wanna work with and we just had to prioritize. So we started primarily with kind of large-scale WGS data, but now that those pipelines are in place, absolutely, we're looking to other types of data and single cells definitely on our roadmap for things of interest. So I just wanna one more quick thing if I can just say, I think it'd be awesome to have an outreach group that hooks into the data coordination centers of all of these consortia because I think if they inherently incorporate Anvil as one of their primary platforms, it will just make everything better. Because often there's a lot of fractionation and Anvil could be one of those strategies of bringing all of it back. Like inherently supporting Anvil as part of their DCCs that could be, I think really nice. That's definitely part of our strategy. We've reached out to DCCs at IGVF, Gregor, John Blanker, CCD, CMG, you know, all of you. Okay, great, great, great. We gotta do a better job with them. Awesome. And why don't we make sure that that ends up in your session this afternoon on outreach and training just to make sure that's an opportunity, I think that would be, it sounds like one that you're already doing, Mike, but one that is fruitful and something the group should keep doing. As said, do you wanna go ahead? You had something to say? Yeah, yeah, I just want, especially now just to chime in on what Mike was talking about before. I'm not a discussant, I'm on the external consultant committee. But Mike, if you could give a little bit more of the detailed history about the previous ingestions, I think that might be helpful in how you described in detail what you're doing currently, briefly. Okay, I mean, the whole other parallel breakout session was about this. So I'll give you maybe the 32nd version. It kind of started with our kind of local collaborators, kind of data types that we knew really well, specifically really, I think, arguably it started with CCDG. CCDG is a very large project with hundreds of thousands of genomes, but it's organized into like different cohorts. And importantly, each cohort has certain credentials that are necessary, certain sort of restrictions on who can access those data. And then there's sort of the umbrella project that it works in. So we kind of did a cohort by cohort where you work with the DCC to adjust it, there's pipelines built up to sort of harmonize it, do QC, populate the workspaces. And now that those workspaces are populated amongst individual cohorts, now we can do kind of the meta analysis where we do joint calling across those. But more broadly, we're kind of doing actively sort of outreaching to groups that we're interested to work with. And then also, if you look at NHGRI, they're starting to write into RFAs that this new consortium is strongly encouraged to talk to the anvil, to talk about how that interaction could work. So it's kind of going both ways. Some of this is coming top-down from NHGRI and we're also doing a bottom-up approach where we're reaching out to individual consortiums to try to make them aware and try to support them get onboarded. Okay, thank you, Michael. Karen? I just have a very brief comment. Based on our team here at UC Santa Cruz, it sounds like there could be some improvement with documentation of anvil organization using DocStore. It seems like there's a lot of information on us, our side kind of building up DocStore and maybe that could be an improved place for us to have engagement with anvil. Great, thank you. And Tim? Yeah, my comment might be along similar lines. I love that you guys have an opportunity to sort of sort of share workflows across groups and publish workflows. But like right now, for example, I tried to search for some. There's 2,000 methods and there's a simple filter bar. And so it's really hard to find what other people have done. And I think it was the same with DocStore. There's lots of facets, but they're not the facets that tell me what it does in some ways, right? And so I think it's a really hard problem, absolutely a hard problem. But the more that there's ways to sort of curate, organize, sort of present the workflows that are out there in some sort of way that help people with certain analytical goals in mind, kind of get to some lists of tools that'll help them. I think making that easier would be helpful for a lot of people. I'm wondering what folks think about the tools that are currently there in the clinical genomics or genomic medicine space and whether there are things that are missing that the team should be thinking about there. Barbara? Yeah, I was just gonna say, I think there's a lot of methods that I would really be excited to see there on the interpretation of these SNPs or to the mechanistic interpretation. So I just like to throw out one that my group uses all the time. We love LDSC and having all those annotations there to start understanding what's going on would be great, but others like co-localization analyses, any of the mediation, medialine randomization ones, I think would be really great to start understanding mechanism. I'd be really excited to see those. Great suggestions, I agree with all of those. And what about any related to, so I noticed on the future directions, one of the future directions in terms of data that may be ingested, especially because it emerges some of the electronic medical record data. Obviously it won't be a dump of the full EHR, but my assumption is that it will be something like tables from OMOP or something like that since emerges using OMOP as the data model. Are there tools that folks can think of that if there were some OMOP data tables there, for example, that you would want to see there so that users could use for doing clinical genetic or clinical genomic linkage with the EMR? I guess I'll throw out one that I thought about, although I'm looking at Tune 1 and George, I can't believe you don't have any suggestions. Oh, George, well, okay, good. But one thing that I actually was just emailing with Ken about this and Michael this week, we're developing different EHR phenotype algorithms and the phenotype knowledge base is a common place to put these rule-based algorithms, but it did occur to me that in the future, if some of the data for some medical record data is in Anvil, being able to put these rule-based algorithms and deploy them in Anvil, I think could be really useful. Right now what happens is that you kind of put the algorithm into FKB and then folks download it and rebuild it within their system, but that might be a future space. If there are data tables there, things like diagnosis codes and labs or procedure codes or things like that, being able to take algorithms that are in FKB and deploy them in Anvil, might be a future use that would be useful. It's certainly not there now. And George, did you wanna say something? Yeah, I was just gonna say, so I can offer to you guys what kind of tools would be available, say from the Odyssey Initiative for Processing Clinical Data and how it might align. I mean, we're going through this obviously for all of us research program, and he's not on this session, but he has OMOP on one side and he has the genomic data on the other side. How do you pull it? It's mostly Jupyter NoPix over there and OMOP related tools on the other side. So, I mean, I'd be happy to spend time off time right the second going over what the possibilities would be there. It's based on R and R, so it's all based on R. I think it fits in very well with what you've already done, but there are some special purpose stack tools that have their own user interface. So, depends on how sophisticated a clinical analysis they're gonna do on the data in Anvil. But if we look forward, we're talking about the future of Anvil, not the current state. So we're talking about the future, there may be an increasing amount of clinical data. And I think we have, like if you have 100 million patients with a million variables per patient, I mean, these are the tools you use to do that. SPSS, SAS, none of that works when you have 100 million by a million. It exploits the sparsity. So, might be able to use some of the tools, lasso for, you know, trillions of combinations might be useful to users in, even on the genomic side. So that didn't occur to me until the second saying that, that there may be things that you could use that would be useful on the genomic side too. Great, thank you, George. So, anything else that folks think is missing in terms of weaknesses, and then we can go on to opportunities. Anshil? Yeah, so you're just following up on sort of Tim's idea of like, this is, I think one of the, going to be one of the hardest problems in any of these kinds of portals is search and recommendation to be able to do that effectively. So one thing I was wondering if there is any kind of anonymized logging of how users are putting together different pieces, right? Data with, with code, with models with workflows. Because you can imagine that like data has really good metadata and search built in. It's hard to know how to exactly do that for workflows and stuff. But if you connect workflows back to models, the way users actually play with them, you may be able to use the, the gigantic metadata you have for the, for the data sets to kind of massage how you present, you know, workflows and so forth. So I'm just wondering if like, there is any infrastructure plan to kind of, at least have a mechanism to potentially anonymously log like how different parts of, you know, the entire ecosystem are being used together. The same issue will happen for models too. It's very hard to like have some kind of standardized metadata, but like, if you can hook it all, all back up and you have these knowledge graphs and so forth, you could imagine doing very powerful search and recommendation. So just something to think about potentially in the back end. It's a great question and, you know, it's something that we're interested in, but, you know, like you said, there's some, there's some really major security considerations that we're really mindful of. And I see, you know, David Burnett's on the call. He's, you know, kind of our chief security officer for all of Angle here. And to achieve, you know, this thing called FedRAMP certification by, you know, we're mandated to have a lot of logging and kind of behind the scenes accounting of, you know, who's doing what on the system. So at the very, very lowest level, we know, you know, who's touching what data sets. And we have, you know, some information about what they're doing. This is, I mean, I'm with you though. It's a real tension. At some level, we would love to capture everything and make it anonymous and show it off, but there's some real privacy considerations. So in the here and now, we do not make those generally available. But I agree that that is an important source of metadata in a different context in the, in some of the, like say, public servers for Galaxy, we have been mining the metadata about which tools are very popular to make sure that we have good coverage of them inside of the Angle. So I kind of see it on both sides of the spaces. As a research endeavor, it's great to have that information, but for security considerations, privacy considerations, you know, but we're kind of on purpose not capturing super fine details right now. Yeah, that's a fair characterization. Sorry, no, I was gonna say that's a fair characterization. There are times when I want to log a lot more for a kind of security reasons and we kind of have this balance and this is where we are on it, which is you can run the things, we capture some stuff, we can't capture everything. I completely agree. I think it's a very difficult problem on the security front, but I just wanted to put it out there. The second comment was like, I think one very useful piece that I think is missing in almost all genomics kind of portals or whatever it is, data portals or these ecosystems, is a mechanism for feedback and public discussion. So like, just like you said, like ranking one, one simple way is to just like, have votes and rank up both flows and tools that people build. So there's automatic user curated community of these tools, but even better would be having some mechanism like discuss or whatever, like some of these more formal mechanisms where you can provide feedback like, oh, I think there's a bug in this, or like, this thing doesn't work for this, right? Having these kind of user commentary is super useful because that feedback can help everybody, including the developers of each of these tools. We have both of those. You can sort of start workflows to show off if you like them and then we also have a discourse where if you wanna just sort of generally make comments, I'll put the link in the Zoom chat. Amazing, okay, so you thought of it already, great. Great. Anything else on weaknesses or shall we switch to opportunities? So where can Anvil grow and improve? And there are certainly a few discussants who we haven't heard from yet. I won't call you out, but feel free to either comment or raise your hand if you're one of those who tends to stay quiet. All right, Chunhua. Okay, so I think one area that can potentially be extended is this interoperability model for Terra. Because I'm looking at this, it defines some data fields such as age, but an anatomical site and all these data fields, but then it lays no additional data standards to standardize the content of these data fields. They are just the strings. I wonder if additional data standards either OMAP common data model standards concepts or UMLS or related data standards can be adopted to make sure the content goes into these data fields can be further standardized to enable the interoperability kind of like enrich and then extend the current Terra interoperability model. It's a great question. I'll just say the challenge is we're trying to support so many diverse any share I consortiums. And I think, you know, within a consortium after maybe some infighting, they can sometimes settle on a standard, but trying to get every consortium that NACRI participates in to agree on one master standard is a challenge. That being said, though, we are pushing and sort of advocating for that sort of developments. To get started, it's sort of free text. As we move forward, I think more and more is going to be integrated into say fire as sort of just a container. And I like to think that we could standardize on some of these ontologies moving forward, realistically, it's more of a social problem and getting all of NACRI researchers to adopt some of these standards. And another thing is to make sure like when you say age, everybody understand it's either diagnosis age or current age or what age is referred to. So to make sure all the data contributors, they are kind of putting the same age into that data field. So they agree. Mark? Yeah, so my impression and correct me if I'm wrong, but my impression is that most of the annual users to date are members of NHGRI funded consortia. And it seems to me that an opportunity is trying to broaden that user base to every R01 PI funded by the NHGRI or better yet, every trainee on an NHGRI funded training grant. So maybe this opportunity goes well beyond just the analysis tools discussion we're having here. But it seems to me that that's an opportunity to try and broaden this user base. And I think perhaps some of the challenges that come along with that is that as you get that broadening, you have this growing plethora of tools that people want to bring as well as other data sets. Individual investigators wanting to bring their small data sets that they can join with the large NHGRI data sets that are currently in. Well, perhaps what we should do is to catalog that. So I do think that will come up in the afternoon outreach and training session. But I think in terms of tools, I think you raised a good point. And that is as the outreach grows to make sure that there is a mechanism for Anvil, the Anvil team to collect what are the tools? Cause there's an opportunity to really expand the tool base as the user base expands. They're going to bring, these are the tools we want to bring to the data that you have. William? Yes, thank you. I guess I'm going to follow up on Mark's comment as well in relates to expanding the footprint. You call me Bill, I'm like, William shows up there mostly become Bill. I'm sorry. I was wondering, I'm thinking about that the, probably the greatest potential for footprint extension or expansion probably lies beyond the traditional stem fields or stem disciplines to get a more wider variety. And a lot of those students postdocs and trainees, they may not be comfortable just hearing the standard techie dialogue. They may feel that they don't have good access to that just listening to the conversation. So I was wondering, there may be an opportunity to provide you a tools based use cases as entrees to get people in those areas more comfortable and understanding how they can really use this to solve some problems that we may have never even thought of because we are not in that space, but that's where the actual growth and the impact really is, I mean, I think in my opinion. So, and also when you look at it from that perspective, there's probably an opportunity for tools and outreach to work together where tools can really be a draw in the outreach community to really expand the footprint. So I can just summarize like that and just stop right there, I guess. Bill, thank you for that. I think that's very helpful. And I think to echo back to what Ben Spahnam said earlier today, to increase the diversity of the workforce in this space will require the outreach, but also having the tools available to them. Because I think for a lot of students, especially in non-computer science disciplines, the idea of plugging into an API and setting up a Whittle is just not something that they're gonna even know what we're talking about. And that's where the majority of the impact is gonna reside beyond the tradition of fields and discipline that we are so comfortable with. That's the society impact that occur in those disciplines. That's right. We're definitely gonna continue this discussion in the breakout session for outreach and training. So that's just a really great point, Bill. And I really wanna just highlight that that was brought up in a different session. So thank you. Thank you. Tim. Yeah, thank you. In terms of analysis, sort of, a lot of the people that I work with in my lab, I mean, they are the kind of people that they know enough Unix and Python to be dangerous. But if I also have to get them to learn Google Cloud and a whole bunch of other stuff, Little APIs, et cetera, like, forget it. They're just not gonna engage in what I think could be some really cool opportunities in terms of using Anvil. And so I think there's an opportunity to try to make the interactive environments even sort of easier. And by that, what I'm meaning is that, like, I have this sort of dream that one of my students could log into Anvil and go to the ENCODE directory and all of ENCODE is there. And then they can just start pulling what they need as opposed to having to get the Google addresses and actually copy that specific pieces of data into their workspace. And I can imagine the same with GTX and maybe with the right DBGAP permissions, even having all of them. So that just, if it could be just that much closer, I think you would be able to get from, you'd be able to pull in the students and the trainees that aren't in bioinformatics, computational programs, but know enough to look for features they want across 100 bed files if those files were all in the same place. Yep, great point. Barbara? Yeah, just to sort of echo things that people have said, Bill's point I think is really essential. Tim just made a really good point too. I wanted to add that some of the data sets that we've been working with recently are very heterogeneous in terms of the population. So not just UK or Africa exclusively, but sort of broadly ad-mixed individuals, but they're also especially fragile. So obviously the Million Veterans Project would fall in here. I've heard that they're developing, what is it called, football, like post-trauma database as well. And I definitely work with like the Fragile Family Study, which is underrepresented children who are mostly enriched for low socioeconomic status. And none of those databases feel comfortable putting their data up publicly. So I feel like the two opportunities we have here is number one, creating essentially a safe space for them where they feel comfortable putting their data up and granting access through a particular granting mechanism as opposed to all of us just like throwing desktop servers locally at our universities and begging them for the data, but controlled access where it's really, really safe would be great. I think secondly, to encouraging people to do the right thing with respect to diversity is easy to do when we have the ability to put whatever methods we want on there. So if we include methods that allow highly ad-mixed individuals for particular types of genome-wide association studies and PRSs, although I know they don't necessarily exist for some of these categories, I think that would really encourage people broadly in the community to think more widely about who they include in their study, what studies they include. Yeah, great points, thank you. Karen? I just wanted to make a brief comment about a lot of the datasets that we've been discussing so far, these big silo consortium-based data usually are mapped and analyzed in reference to a single reference genome. And we're about to enter a new era where there are lots of different reference genomes and tools that will allow for a leftover equivalent or something to allow increased fluidity between making comparisons between different reference genomes and making new inferences with perhaps genomes that offer more genomic diversity, I think will be really critical for tool development. I suspect there are many consortium efforts, including the HPRC and the T2T Consortium, which are building these tools, but having this accessible and kind of key to across all the different consortium groups, I think would help unite new discoveries and really help across the board. Very good point. Barbara, do you have another pointer is that your hand left from before? Okay, just checking. Any other opportunities? I will, I'll say out loud in case folks are not following along in the chat. Terri Minnellio made a point and I've heard this statement before, but especially as we think to the future where there would be sets of medical record data in Anvil being able to search across patient records and find an individual who looks like, you know, the quote is patients like Miss Jones. This is a Dan Rodin quote. If you've been at a genomic medicine meeting with him, you've heard him say this. And that is something I think, especially as we think about running machine learning algorithms across electronic health records and identifying interesting, you know, longitudinal patient trajectories or just interesting characteristics of patients. You know, one of the things we struggle with is, is this unique or is this a trend? And you want to put it kind of that model against other patients. You know, it's hard enough to do that within your own electronic health record data, but being able to do that across sites is very complicated, especially to the earlier point, a lot of, you know, health record data isn't easily shared. And so if there were a way to do something like that in Anvil, that would be amazing. I really like that. At the genomics level, we have a good start to this. You know, one of our major components is Gen 3, which takes the, you know, data and metadata and sort of whatever phenotypes are available at the genomics level. So they can build what we call synthetic cohorts, where it'll be, you know, patients or populations of certain characteristics across many different consortium can all be aggregated together. And I think I see that as an incredibly powerful way to take data that we already have and make it more valuable or it can be repurposed for different studies. I think that will come in time for, you know, health care data using fire kind of representations and whatnot, but yeah, I agree. Moving forward, that's going to be an important area of focus. Great. All right, we should probably switch to threats as we're running out of time. So threats here would be, you know, what factors jeopardize Anvil moving into the future? George? Thanks. First, let me just point, and I want to talk about complexity is the threat which we've already talked about, but I mean, that's the big threat, I think. And as you build this thing up, it gets more and more complex. The insiders don't see it as so complex because they're used to it, but then new people can never get in because it's just grown in such a way. And we struggle with this in Odyssey with our tool set and how to bring new people in the community that haven't been there from the beginning. And, you know, there's no easy answer. I mean, what happens is every 10 or 20 years some savant comes along, throws away all the workshops and focus groups and just knows what to do and does it. So fire, you know, Graham Greve just said, this is ridiculous after 10 years of REM. And so we're just going to do this simply and I know what we need and he built fire and then people caught on. And now actually fire is getting very complex. So whether there'll be a Graham Greve 10 years from now that re-does it, it does a new simpler thing. I don't know, but every once in a while you need to like revisit and simplify. I mean, I couldn't agree more. I mean, you know, if I'm honest with myself, it is such a cultural shift, you know, from the way genomics is traditionally done on your own laptop, your own on-premise institutional to move into a cloud model, you know, in the chat, I see lots of people talking about costs and I'm totally with you. You know, it's just a cultural shift. There's always been a cost associated with these analysis, but, you know, to put it front and center and really shine a spotlight on it is intimidating, it's terrifying at times. You know, I've had students now click the button where it's a $50,000 workflow and it's just a very different experience than buying a server, you know, it's just, it's very different. What we're kind of, what I see is we're moving into what I call a consumables model, right? Now, if you want to sequence a genome and say Illumina, you know that it's gonna cost you, I don't know, a thousand bucks or so, but if we can have sort of that sort of model for popular workflows, it'll be, oh, you know, variant calling will cost you $10 or $5 or whatever it is. You know, the cost per sample tend to be pretty small. You know, even these days, you know, whole genome assembly, you know, especially with hi-fi data, we're talking, I don't know, $100 or something. You know, the individual cost per sample is not so high. Now as we move into studies with, you know, 10,000, 100,000 samples, that's where it adds up. But it's more in my mind, the cultural shift that the way that it's executed, the way that it's paid for is so different. I definitely see that as a threat. It's a complexity. You know, we're trying to address it at technical levels to make it simpler. We're trying to provide documentation. We're trying to help people. I'm very pleased to hear what Anshale said that once their postdocs were encouraged, took them a couple of days to figure it out, but I'm the first to admit it's a big cultural paradigm shift as we move into this cloud computing. Yeah. And I just want to echo the runaway costs, I think are the scariest part. That happened to me, not in Anvil, but in a different cloud system. We had something run over the weekend that didn't write a file and it was $9,000, but nobody knew it was even running. Like we thought it was debt. And then we got this bill. So yeah, the runaway costs, they're real and frightening. Bill. Yes. I just want to add that a potential threat could be as this wonderful tool is continually developed as it should be if there is not a conscientious and unrelenting effort to make this available and accessible to people that we may perceive as non-experts. We have to have to make a conscious effort to bring that cadre of inhibition along, that cadre of investigators along. And quite over that takes, I don't know what that would look like, but we got to think about that. Yeah. Thank you. Tim? Sure, yeah. So I guess on the cost point, part of it is it's really hard to estimate costs. So like I don't mind if it costs thousands of dollars, but it needs to come from somewhere. And if I can write a grant for that, that's great. But like, if someone asked me, how much is it going to cost to analyze a new data set that you've generated? I don't know, because I don't know how many times I'm going to have to align it. And that's going to change over time. So I think that that threat can be mitigated with the right funding models. And maybe there can even be administrative supplements to grants to support that. So for a grant that's already been run over four years, we didn't allocate cloud costs for that, right? But so we can't switch to cloud for that because we don't have any funds for it, right? But if there was ways to, whether it'd be through supplements, but also through some estimating of anticipated costs to make that easier to write the budgets for grants and to feel confident that that's in the right ballpark for a grant, the easier that is to not worry about so much. So that's on the cost front. Another threat that just, this is coming up with stuff that's going on in Duke, for example, there's other clouds out there and being very tightly wedded to Google as it seems. I'm worried that there might be other silos building up in other clouds and it's going to be hard to get it into here. Yeah, Tim, that's a great point. And that's one that I was just thinking about as other institutes as well as other commercial platforms emerge, what we need to worry about for Anvil in terms of interoperability. I mean, just this week, the UK Biobank, for example, their research analysis platform now has 450,000 exomes in DNA Nexus, which is a different platform, which is not necessarily interoperable. And we only have about two minutes left for threats before we go to the kind of summing things up. So I just want to give a chance. There are a few folks in the Discussant Group who haven't had a chance to speak yet. Would any of you like to speak up? Anyone else? Anyone who's not a Discussant who would like to make a comment about a threat? This is perhaps more in the outreach section, but what about onboarding like the single user researchers? So not groups, not labs, but are tools here needed that are, it's a very expensive process, I guess, to go through as it stands to set up a billing account, get a credit card, get it all approved, just to trial something. And so streamlining that process so that they're like pilot versions of tools available without having to go through the signup process. I mean, they already exist in their sort of other forms, to some extent, and they're pulling people away from Anvil. And so are those people ever going to consider Anvil because of that signup process? That's an interesting point. Well, we're right at about time. Ken, do you feel ready to start to go through the summary? Yeah, I'm about as brave as I'm going to get. So let me share my screen. I, let's go. Okay. So I apologize in advance. Can you still see my screen? Yes, we can. Okay. So what I have, oh, now I have the, I have the, yeah, there we go. All right. So for the strengths, actually let me get out of this, I may have to make edits to it. So for the strengths, what I had listed was documentation tools and workflow setups are done very well. Data access seems to be a strength for Anvil. The very interpretation tools and workflows and easy to develop new tools in the workspace was considered a strength. The plans that are regarding developing third-party, conforming third-party groups on how to build on a platform was considered another strength. And putting security first in regards to those third-party tools and workflow plans was what I had for strengths. So before we go and see whatever I missed, is there anything on these bullet lists that I have that needs to be rephrased if I didn't capture it accurately? Okay. Is there anything I missed that should be added in this section as a strength? Going once, going twice. All right. So that's what we have for the strengths. I'm gonna go to the weaknesses. What I had listed was developing tools for analysis on open access datasets. Developing more tools for single-cell analysis. There should be, there is an improvements in the documentation related to the access and using doc store. It's hard to find what tools and workflows people have already developed on Anvil. So improving the curation of those tools and workflows. The tools for interpretation of SNPs and better annotation, mediation and medialine randomization of analysis is to be considered a weakness on Anvil. Tools that allow for the analysis of datasets that are built using clinical data models and possibly in actually having algorithms of BKB ported over to Anvil for analysis, as well as SAM tools. That's the weakness that Anvil does not have those resources. So that's considered a weakness. Mechanisms to announce and log in as an option and improve mechanisms for feedback. So did any of these bullets here need to be rephrased or corrected? Because I know it's kind of typing very fast. I may have missed a couple of things. I'm pretty sure I missed several things. Okay. I think anybody would like to add. Wow, okay. I'm doing pretty well, but somebody who has a thumb that has four stitches in it, that's pretty good. So for opportunities, working in will grow and improve. This game of improving adding additional data standards and data models to improve the inoperability for Terra, I need to rephrase that or restate that. Improve the ability to search all available records from matches to an individual patient. I think this is what Terri had highlighted in the chat. Improve the ability to link the necessary tools to expand and diversify the Anvil user community. In addition, create a safe space for groups that are hesitant to host diverse data sets in public repositories. So this is what I have for the opportunities. Does any of these bullets need to be revised? Am I missing something that was mentioned that should be included here? This is where we said something about well, I guess it's kind of in the third bullet. This idea of through outreach, we think we'll identify many more tools. I guess that's there. Oh, what about the tools related, the opportunity as the more diverse human genetics data sets are available as well as the additional reference genomes that having tools that accommodate for or explicitly focus on the admixture and the diverse populations? Robert, I just wanted to say also, I think the third bullet did capture the spirit of what I was trying to say there. And specifically, I'm just using that example off the top of my head. It's sort of like a tools-based use cases could be a way of getting non-experts more involved and that created an opportunity that in the general terms, maybe a collaboration between the tools and outreach can kind of work together for tools to be used as a draw to kind of expand the footprint. All right, so one moment. So let me make sure I get the, so let me, Bill, I have a kind of put note right here and I'll fill this out. Let me first get to the one about that, accommodate that mixture of diversity. What was the last part of, could you finish that part? Diversity of the human genetics data sets and the reference genome. And just to comment briefly on that point, that's also to ensure backward compatibility between the references. So any leftover tools or things that can allow people to move quickly between them would be useful. All right, so Bill, what did you have on yours? Sorry, make sure I captured this. So you were, could you say that again? I'm sorry, I couldn't capture that. Yeah, sure. I mean, I think your bullet certainly captures the spirit of what I was trying to say, but I guess specifically I just suggested that maybe if tools based use cases could be developed and that would be a way of introducing or encouraging or making the Anvil platform more accessible to non-expert users. And out of that, it seems that a general approach could be a collaboration between tools and outreach where tools could really be the drawing factor to really expand the footprint as well. But your comment captured the spirit of that thing well. Okay. This work here? Yeah. Okay. All right, so this is what I have for the opportunities. Does anybody wanna make changes or add anything to it? One other thing that I just thought of that we didn't talk about, I do wonder if there are other tools specifically around genomic medicine implementation that there will be opportunities to house an Anvil. And I'm not entirely sure I know what tools, but I am thinking as more institutions around the country start to implement genomic medicine, certainly, you know, Farmcat is one that we already heard about, but I believe there will be other tools around annotation of relevant, you know, clinical genomic variants and pathogenic variants, maybe something around clinical decision support and nudges and the, you know, the build information for those that could be tested or deployed at, I'm sorry, this is not necessarily helpful as you're typing, but I just, I do think there's gonna be opportunity there. I just don't fully know that I know what it is yet. Oh, that's fine. I mean, we don't have to know this is going to the future. It would be up to us to figure out what those are. Yeah. I mentioned it more when I had the opportunity in the talk, but building off Marilyn's point of just more support for this kind of end-to-end from a cram to empower people to be able to pull out all the different classes of variants from a clinical perspective. And then, yes, figure out how to interpret them and think about reporting them back. So that whole workflow, I think there's a lot of opportunity there. All right, so did I capture this or did I miss something just so I'm clear? I just think the idea like from crams to variants to an interpretation is kind of the path I'd wanna make sure is covered. Okay. And I just wanted to comment here briefly as well that since it was brought up that having tools tailored to the clinical community, it might actually be useful to have kind of the tools kind of directed toward the educational space that could support graduate coursework or something along that line to implement training with kind of clear cost of writing certain tools. And I think that would really address some of the concerns that were brought up in this group for outreach too. You wanna focus on this graduate training or training in general? I just wanna make sure. I think training in general is appropriate. Graduate is an example. And I might add basic and clinical to that. Basic scientists and clinical researchers. And we only have maybe one minute left. So do you wanna get to the threats? Yeah. So for the threats, which was reoccurring I was cloud costs, difficulty in facilitating culture shifts to the cloud, challenging the making tools and resources in the manner that misuses where they are, difficulty in making an amble interoperable with other platforms and hurdles required to access animal justice to test the platform. Did any of these bullets to any of these bullets need to be rephrased and I may not capture them correctly? Did I miss anything that anybody would like to have added to it? Luke. With respect to the clinical genomics, I just wanna make sure that we're considering liability. Okay. Great point, Luke, thank you. All right, well, it looks like our breakout outroom is closing in 40 seconds. Thank you all very much for your input. I think this was really great. Yes, thank everyone for your input. Marilyn, I wanna email you this deck to you so you can go and use it. Okay, thank you. Thanks everyone. Thank you, thank you. Thanks for leading the session. Hey everyone, awesome. Welcome to session number two. We'll hang out just for a second or so to see if anyone else is gonna come into the room. Thanks for putting up the slides, Chris. Do we have everyone who we're expecting or do you think we're waiting on anybody else? We should be good if everyone has come back from the break. I mean, just do a real brief run-through on the intro slide. Please, thank you. Okay, happy to do this. So yes, I'm Chris Wellington, I'm one of the co-moderators, along with Sid, you'll be doing most of the work here during this session. And you've all been in a previous session, so I don't wanna spend too much time on this, but basically we're gonna have some couple brief presentations and we're gonna set aside most of the time for discussion. At the end, we're gonna try to save a little bit of time to get ready for that report back, like what you just heard from session one. So with that, stop sharing and over to you, Sid. Oh, actually, if you could keep your slides up. Let's just go to the next one. Yeah, I was hoping we could just take a second to just meet our discussionists or our panelists. I'm from Nashville, so full disclosure. I make everything as a music reference and you guys are gonna be put on the spot. If we could, you guys go ahead and just test it out and unmute yourselves. And I just wanna hear where you're from. You know, your connection to Annville and your favorite musical artist, or your favorite artist, but I would say obviously a musical artist would be appropriate here. So if we could just very briefly go down the list if you want to tell it to me. Good afternoon everyone, my name is Cinnamon Blas. I'm from the University of California, San Diego, and I'm actually a brand new member of the External Consultant Committee. So I haven't had a lot of involvement with Annville up until this point, but happy to be here today. Favorite artist, favorite band, Cinnamon. I'm not willing to commit at this point. Titus, Carol, are you telling me? I guess I'm next. Titus Brown, I'm at University of California, Davis in the School of Veterinary Medicine. I have had long associations with various different part components of Annville. And I also, in addition to doing a lot of training, I'll still be the training outreach engagement component of the Common Fund Data Ecosystem, which is hoping to interoperate with Annville and NCPI and all that other stuff. I think the last thing was favorite music. I'll just say one of them, Imagine Dragons, works great for long car trips. Ready to Carol. Great, hi everybody. Carol Bolt, I'm a professor at the Jackson Lab and I'm on the Annville ECC group. Favorite, I guess it would be a toss up between the Canadian brass and serif brass. I'm a big brass music band. Awesome, John. Hi, my name is John Creejan. I'm an associate professor at Howard University and we run data science training program over the spring and then I had the Annville team supported us and then participated in that program. So we're very appreciative for their support and participation. Awesome, band. What about music? Yeah, that one's gonna be shy. Go ahead, Andrew. Hi, coming across to you from about six miles from my colleague, John. Andrew Lee, Northern Virginia Community College, Alexandria, working with Annville through the GDSCN. So I have been climbing up that steep learning curve of Annville over the last year or so. And to go with that hometown theme that Sid was talking about, I'm gonna pick DC's own Thievery Corporation, kind of good background music if you need to do some work. So Sid's there with me, yeah. Cool, Rob. Hi, I'm Robert Miller. Professor of Neurobiology at Morehouse School of Medicine. I run a lot of the genomics projects that go on at Morehouse School of Medicine, which is one of the HBCs in Atlanta, Georgia. And my favorite band would probably be James. Yeah, I'm seeing a UK bias there, Rob, but okay. And a Manchester bias as well, but there you go. Peter, Dr. Robinson? Okay, time in if you can, sort of. Hi, I'm Sheryl Roy from University of Texas at El Paso. I'm a System Professor of Computational Biology. And my link to Annville is again, through GDC and SN like Andrew. And favorite artist is Cliff Richards. And last but not least, Bill. Hi, my name is Bill Sutherland. I'm a professor of chemistry at Howard University College of Medicine. And I'm also the P.I. of Howard's R.C.M.I program. And I don't have a lot of history, but Annville primarily through the activities that John mentioned, which is a virtual data science training program.