 And without further ado, we'll move to the next topic of this first meeting, which is a roundtable dedicated to some of the many questions you have raised. And so I will pass the floor immediately to José. Okay. So yes, we are happy to start this roundtable. And the purpose of this roundtable is to discuss how to shape the group. And it is a pleasure for us to host as participants of this roundtable. So Philly Wells, who is a co-founder of NFCore and currently works at Seketa. It's been nothing, couldn't join. So he has also been involved in NFCore just in case he stepped in. He has been involved in the NFCore creation. He is from Kubik in Tubingen. His research is more focused on human genomics, but he has also been involved in this group before, as many of you know. Also Sara Giavalli, who is currently an interim researcher working in IRSD and Silvan Fosec, coming right to Luz, who were both of them were highly involved in Genius Witch project in Eurofang. And of course, Krista, our speaker, who has been already nicely introduced by Selig and was the coordinator of the Warwick project. And as the roundtable will be about how to shape the special interest group, first I will let Phil introduce what this idea of the NFCore special interest groups is. So then, so yes, Phil, the stage is yours. Please. Thanks very much. I won't take up much of your time. I'll just quickly kind of firstly say thank you to all of you for being part of NFCore, being involved. And thank you, Krista, for a fantastic talk. It's like all these things, it's amazing to hear about the impact that the community has and kind of hear back from people in the community about how the pipelines are being used and how it's affecting the way that people work. So really appreciate that. Thank you very much and it's great to have you all here. We're meeting because of this kind of new NFCore concept called special interest groups. I just kind of wanted to point to a couple of resources about them. I put out a blog post just the other day. So this is kind of the main place to go to right now for any details. You go to community and blog and then you'll see there's one called special interest groups and kind of set out the stage for what we're doing and why and a little section about your at the bottom. And kind of as an overview level, the idea behind this was that we until now have really been focused on, as you said, Krista, kind of focused on developers largely within the community, kind of bringing developers together to build pipelines and trying to really help people to collaborate and share ideas and development practices. And because of that, the structure of the community has been focused on the pipelines themselves and to some extent the tooling. But what we're starting to see more has been quite a lot of work with different kind of external communities and consortia, the tightest integration being with Bob Wragg and Eurofang. And what it's really interesting there is that we're having groups of people with common interests, but they're not focused on a single pipeline. They're kind of working across a set of pipelines, but in a specific manner. And so it's kind of perpendicular action is the kind of founding idea of special interest groups that we can help people to collaborate not just for multiple institutions on a single analysis type, but also across specific analysis fields. And by bringing people across from all over the world, all working on similar ideas, we can hopefully kind of come up with and collaborate on standardization, not just of the pipeline code, but the way that specific pipelines are run and the way that specific data is handled. So that's it really. I'm going to kind of, if you're interested in more, I encourage having a read through this blog post, it kind of explains a little bit about the mechanics that we've decided on within the NFCOR kind of governance, about how special interest groups will function. You can find yourselves and any future interest groups under this kind of dedicated page. So we really hope that this will kind of expand over time and that all of you will help contribute to these pages. Like any other page on NFCOR website, if you just click this edit button top right, it takes you straight to GitHub and it's just marked down files. So you can kind of submit a pull request and edit and add to these pages as you see fit. So and most of all, I want to hear from you all about what you need and what would be useful for you as end users and members of the community. Because this is new. We've never done this before. We just want to be as useful as possible for the people in the community. And so we're very much kind of on the listening end rather than the dictating end of this whole process. Okay, happy to do this. Yes, I have to maybe unpick you. Okay. Yes, thanks, Phil. Where are you? Okay, sorry. I have to yes. Okay, you are not spotted anymore. Okay. So thanks for the nice introduction to the special interest groups. And to make the discussion a little bit more agile, we have prepared some topics to discuss. But maybe as also Krista had this nice slide with some questions, maybe we can start with one of these questions, these somehow provocative questions about how we see this equilibrium between the progress and off of next flow and NFCOR and the standardization and the productivity. So do you think that this could be an issue at some point? And this question, maybe it's for you Phil or Cedric to answer, how do you see you, this equilibrium? I'd love to take a crack at this. I thought about asking a question, but I thought as we're going to be doing around table, I'd save it. Yeah, this question of kind of standardization and doing like the final mile is how I often think about it. You kind of hear about this with public transport and stuff, you know, you get dumped by a train in the city centre, but how do you get everybody out to the final little destinations where it's different for everyone? We have the same problem NFCOR. We want everybody to collaborate and we want everyone to standardize. But in order to do that, we have to make the pipelines generic to some extent. And that always leaves most people with the final mile of analysis, where you need to do that thing to your data, which is specific to your project, to your exact data type, and how do we deal with that? Again, very open to suggestions here. I mean, at the moment, we basically tell everyone, okay, it's over to you now. We've helped you get this far. It's up to you to do the last bit. There's some work ongoing to try and streamline those, that final manual work. For example, on the Secura side, back at the end of last year of the next low summits, Evan announced a new feature launching within platform called Data Studios, which will be basically a way to launch interactive workspaces on the same compute location that you run your data. And I hope that we can sort of tie these two features together in the future. So when your pipeline finishes, the pipeline developers will be able to set up kind of custom analysis set up so you can kind of, okay, jump straight into this data visualization tool or this exploration tool or this Jupiter notebook or whatever. So there's some things we can do like that with the tooling. Another thing that I think we can do to try and address it is attack this idea of kind of chaining pipelines or meta pipelines. As you mentioned, Krista, with the advent of DSL2, in next low, we've kind of adopted this notion of modules where we've gone from pipelines down. We've broken up all the modules into these tiny reproducible components which we can share around between pipelines. And that's been very powerful, but it still stops at the pipeline level. So something we've been discussing a bit recently is like, well, maybe we can kind of go in the other direction and treat an entire pipeline as a module effectively and either build a meta pipeline where you can kind of import multiple pipelines and just stick them together or improve the experience of being able to chain pipelines together maybe with or without next flow. That, I think, if we can get it to work, will be a very powerful idea. And that's something we're working very hard on at the moment in the core next-low development team. Yeah, so far. Yes, please, Eli. Yeah, it's contribute to the discussion. I think, yeah, it's a very, I like the idea of the last kilometer and you know, need to reach to get the users to be involved in the community. And from my point of view, there are two major bottlenecks and you already cited them. So the first is the technological accessibility issue. So of course, if you develop like easier, oh yeah, my yes, it automatically corrected with kilometers. Sorry. So of course, if you bridge this gap with like a web-based tool or some platform that is easier for people to try to play a bit, of course, the point is not to develop a Galaxy web interface, but this could be something to work on, I guess, for NFCOR and to help people get a sense of what it is to run something and get some results. And then the second point, apart from the accessibility and technical accessibility, would be the customization. And one aspect is totally addressed with what you call the modularity and chaining pipelines and combining pipelines to adapt the tools to the needs of the scientists. And maybe there's another level of customization for me that is a key to make sure people can make it work on their experiments is the part of the experimental design file. For instance, where you assign to every file, you assign the experimental factor. If for instance it's a paired experiment, if it's a time series experiment or stuff like this. And this is a key part of the analysis that is actually shared by many workflows because we can, this is a central point for RNAC, ATACCIC or whatever the omics you use. You always have samples and you have categories of samples. You might want to compare groups. And this is something that was not very from my perception. That was not very obvious because it was not homogenized across workflows and pipelines. And I know that there has been a lot of work on this. And I'm totally unaware of what is going on. So yes, that could be something to share. Yeah, sample sheets are a very difficult topic. One of my least favorite topics because there is such a thorny issue. We did quite a lot of discussion with a group in the US led by Nathan Sheffield. He's the author of Ref Genie, which we originally started him. And he's also behind a project called PEP Portable Encapsulated Projects. And basically the idea of PEP is to create a standard for sample sheets along with like a server and schema and all this kind of stuff with the idea that different pipelines and different standards could adopt that standard. It never really went anywhere because we never quite got the degree of kind of adoption within next flow. It was very Python based and we needed it to be groovy. And I don't know, it was never quite happens. But some of the core ideas were pretty nice. And we're kind of the way we have been pushing more, I guess, is about sample sheet validation. And that's something I hope to see more of. So I'm not sure how visible this is, but we've had more over the past two years, we've had more and more of this kind of validation steps when you launch a given next flow pipeline that checks that you're, the parameters you've supplied are valid rather than failing halfway through the run because you used a string when it should have been a number. And we're slowly pushing that in the direction of sample sheets now as well. So a pipeline developer can write a schema defining what the sample sheet should look like. And then the sample sheet contents itself can also be validated like it should have this column name and the values within that column should look like this. The initial use of that, which is coming into some pipelines now, is just to fail quickly if you've got something wrong. But my hope has always been, since about 2020 when I started that project, that one day that would form a foundation for creating a set of tooling to make it easier to create those files in the first place. Because now we have a file that says your sample sheet should have four columns, this one should be yes, no, et cetera. And we should be able to build tooling around all of that. And if you dig deep enough on NF Core Tools repository, you'll find a whole load of GitHub issues where I've sketched out how some of this tooling could look. But we just need some developers to actually go and build it. Phil, if I can bounce to this, one of our challenges now is that we are building increasingly complex pipelines. And they are not anymore linear. They have plenty of crossroads. And eventually, one way of using the pipeline is a valid path through this complex metro map. But many paths are topologically valid, but they don't necessarily make scientific sense. Or if they do, they need to be explored. And so we are now with a challenge of having a pipeline that are as inclusive as possible, but also increasingly complex, where each set of parameters is effectively a pipeline. How do you see, how is the future of these things in NF Core? Or how do you think they should be handled? Yeah, and this comes a little bit back to the notion I mentioned earlier, of meta pipelines and increasing pipelines. You take pipelines like the really big pipelines, like Sarek, for example, and they're beasts. You can just do so many different things in them. And if we can unlock this ability to chain pipelines, parts of that discussion is, okay, do we take the big pipelines and chop them up? And so then we have a pipeline for just doing variant calling. And if all you care about is variant calling, you don't have to worry about all that other stuff. And it removes a lot of that complexity. But we still can also have kind of these meta pipelines, which would look very much like the current Sarek where you can run the whole way through from start to finish and do everything in one go. So that's interesting. So just to make sure I understood, you mean that pipelines should become linear again, taking advantage of the modularity? Exactly, but have a kind of different levels of complexity and have a bit more specialization of top-level pipelines. That's a bit of a hand-wavy answer, because how you take that concept and actually make it into something concrete, which is not overwhelmingly confusing, is not trivial. But I think there's something there. So if I can speak from my point of view, which is not everybody's point of view here, as a method person, I like to explore a combination of methods. So one of the reasons I like very big bulky pipelines is because suddenly you can run crazy loops and exploring all kinds of combinations. And these things are of a lot of interest to me because they allow me to explore the method space. But they are not very useful for end users, because end users want one solution that works. And so it's an important and interesting tension, I think, that could be explored. So if I may, you reminded me why you're talking Sylvan of something I forgot to say in the earlier section about different ways we can do the final mile and also the customization of the final results. Not quite the same thing, but something I'm quite excited about specifically with special interest groups. And also Cedric is ties nicely into the concept of complex pipelines. You take RNA-seq, you take SAREC or something, you can run it with the parameter space that you love to explore is huge. But actually, if you're doing animal genomics, maybe you don't, much of that parameter space is not relevant or you don't care about it. And so one idea which I would love to explore is the idea of collaborating on configuration of pipelines. So at the moment, if you go to resources and shared configs, we have already the idea of collaborating and sharing configs, but they're on a compute infrastructure level. So I can go through and say, okay, I'm working in Sweden. Here is a config to run on the shared academic cluster in Sweden, and I can just do minus profile up max. We've had that since quite the early days of NFCORN. It's extremely powerful, especially for bringing in new users. But I would love to one see if we could do something similar that we say, okay, I'm working with bovine genomic data with the RNA-seq pipeline. Here's the configuration profile, which has got sensible defaults, not for human, but for bovine data or whatever. And again, that's kind of work there. That's collaboration and domain expertise, which is maybe a level of collaboration that we've not yet tapped into. I'd be really interested to see if we could do something now. And sort of zooming back onto the purpose of this round table, which is the future of this channel, the future focus of this channel. So do you think that establishing this set of common parameters is as important to goal of establishing common pipelines? Do you think it's one of it? Should be one of the purposes, a special interest group? I think the answer to that question lies with you all, because the special interest group is not for my benefits. It's for your benefit. But it's one idea that I think is interesting and I would hope could be useful for you. Because during this discussion, and Jose you're in charge, but I was wondering, but Chris, to finish it, if we could have sort of typical questions, typical things you guys would want to ask on this channel. And I proposed one, you know, typically, I'm doing this kind of analysis. Is anybody doing the same of kind of analysis? Which kind of tools are you using? That's one thing, but probably all of you have different takes on this. And Krista, I see your hand is raised. Yeah, I very much like this idea of having some tested, validated parameters for I mentioned that the problem that increasingly this NFCOR technology allows non biometrician to have rather standardized well developed pipelines. And you mentioned that we have, in the non linear concept, you can glue together things which don't make a lot of sense or parameters which don't make a lot of sense. So for the different applications, I think it would be very beneficial from a user's point of view to come up with these parameter definitions. And then the core group, this animal genomics interest group could provide us very important service because here are the experts who would be able to decide what does make sense and what doesn't. Even if it's just one or two of them, if there's a really specialist and then could be briefly discussed in the group and this be published as a group output for the community, it would definitely a strong, very strong point. And Phil, if I can ask, where will these parameters leave? Because these sets of parameters, they beg for a house, just like the pipelines. Yeah, I mean, this is up for discussion, but we already have a GitHub repository called NFCOR slash configs, which is where all the institutional pipeline config files live. So I could imagine us having new sub directories under there and you could have an animal genomics directory and then have a config there for each different NFCOR pipeline. And then we can also build that instead of a website and have a web page which kind of renders those and has associated help text or whatever. So please, that will raise your hand. Yeah, I think this, I guess, one of the things that I'm interested in a lot, and this ties into, you know, looking at different methods or doing these various parameter sweeps. But I think that's something that I would like to see is sort of sharing of how essentially just everybody's experience of how changing things, like how does that affect the results? You know, if you use GTK for variant calling versus deep variant for variant calling, just even these major decisions like that, I think it's something that end users, we spend a lot of time tweaking and seeing, okay, if I change this parameter, what happens to my results? If I use this argument versus that argument, how does it affect runtime or resource usage? And I think that that's something that just sort of is a general bioinformatics community. I don't think we do a great job of detailing those things. A lot of times you'll see it and, you know, there might be benchmarking papers when somebody releases a new method. And you, I think, just kind of have to hope that you can extrapolate from whatever's found in those papers to your own data. But I think that having an interest group to sort of share the inside and the experiences that we all have could be extremely useful for, you know, running down or turning down, you know, useless computations when we could sort of already know what happens when you use GTK versus deep variant just to run with that example again. Yes, I agree. And who was going to talk? Someone was going to say something? So I was just like to plus this. I mean, this is my impression is that this kind of discussion Trevor mentioned should be the real heart, you know, of the channel. These are very important discussions that should take place in the channel because where else can you ask this kind of question if not to people who are doing the same kind of analysis you're doing? And so is it, how should it work? Should it be like thread? And there will be one thread dedicated to a specific question, for instance, or is it something because otherwise it can get very, very messy rapidly. So I guess this could be sort of self organizing threads that gradually add up. Well, how do you see it? Yeah, I guess I'm not sure I have a great answer. I just I have a question and no solution for it. One of the things I was wondering was a couple of ways I see that coming up. One is through regular meetings and note taking. You know, if a regular meeting is happening, people such as yourself could submit questions like that to kind of and when everyone discuss it, then hopefully take some notes during that meeting. So to record those kind of discussions of prosperity and and or have the recording online, though written written notes always a bit more accessible. And the other thing is just making use of the Slack community that we have just to pop in and ask a question, but have the audience at your fingertips of, you know, a global community of people working with the same technology and the same data type in one place. Sorry, you had to hand up and I spoke every I was about parameters. I think there are two things. It's always nice to have a reference points, of course, with the configuration files that are proposed by people and you could get them from publications and all in that centralized part that there is a quite an interesting idea, I think, because it's always better to to look what the others have tried and if it works, maybe you can start by that. But I think, Cedric, we have the feeling that it's not only the method people that are interested in playing with parameters because end users, they have to. With combination of tools as well. Yeah, because, you know, you always eventually you end up looking okay, but I don't get my gene or maybe these ones, I'm not sure if they are really overexpressed or whatever. So maybe this has a low expression value, maybe I could raise a bit the threshold or whatever. And we all do that not only if we're interested in methods. So this is a critical thing. And so, yeah, the more the more reference points, the better it is. And I think maybe this is not, I'm wondering if the main factor that is impacting the parameters is like the species or more maybe the technology and sometimes like the kit, the enzyme that was the protocol, the experimental protocol that was used. And this has experience shows that it has a huge impact on how you're supposed to analyze your data. And in that regard, to have a lot of parameter files, if they are properly documented, which is like another challenge for you, could be very useful. And then I have a comment about the questions and the channel and so on. I'm wondering how and this is a question for you guys, how you see the articulation with the existing channels for the dedicated pipelines. Because if someone has a question about RNA-seq, wouldn't they ask it more on the RNA-seq channel than on interest? Yeah, so. Yeah, absolutely. And that's something I kind of touched upon in reference in the blog post, but it's a bit of a concern of mine is that the new interest groups kind of pull discussion out of the wider channels, which would be a potential negative consequence. So my hope is that basically there'll be some kind of self-organization about, you know, if a discussion thread is clearly technical nature and not specific to animal genomics that is specific to RNA-seq pipeline, then absolutely that discussion should stop in that channel and switch over to RNA-seq. But there's nothing that I can or should do to actually enforce that. It's just going to be up to you to decide that. I wouldn't be so confident. I think the other way around, the discussion would start in the specific channel and there won't be any discussion that are sparing in the discussion group, you know. But you know, Silver, I think so it's a very, very important and critical point because if the discussion does not take place at the right place, that generates chaos. And so I think, and that is touching on the last items of this discussion, because I see the audience is twinkling and we'd like to finish on this. But we need to organize this channel. We need to have some good practices and we need to agree among ourselves and to rapidly identify if a discussion is not taking place at the right place and then it will have to be moved in another place. And it's important because this will make things effective. Most of the members of this channel will also be following other pipeline channels because they will be following the pipeline channels in which they are interested. And we have to try to make sure that the discussion in this channel are about a homogeneity of analysis across animals, across model systems. We have to make sure that this is all about interoperability, all about decreasing the level of fragmentation. Anything that has to do with the specific use of a tool is probably something that should take place in different channels. And so that's the question, you know, I was hoping we had the flags then on this, but we should probably draft a code of good practice for the channel, a code of indication of the kind of questions we think belong to the channel and the question that belong to other types of channel. And that's something, if you guys agree, maybe we will share a Google doc at some point and we can we can sort of prioritize the question. And that was me trying to say before, you know, a typical question is I am doing this kind of analysis on that animal. What kind of tools are you using this kind of thing? And if we could gradually come up with a list of questions with a list of personal questions you want, I think it could be useful. Okay, yes. So, and then maybe we should deselect it somehow to this and the shape of the group, what should be the regular activities of the group. So, for instance, you think that we should host seminars or regular roundtables? Should we make like, I don't know, a series of seminars or just intercalate one with the other? I don't know, maybe it is a question for Cedric or anyone that wants to jump in. So, I can start on the seminar. So, I can tell you guys what was my vision, which is a little bit biased, of course. It was to because of the environment in which I work at a surgery, it was to bring to this group state of the art of what is going on in human genetics. And so, for instance, we're going to try to have Roderick Grigo, representing Angkor and G-tex, talking very soon. We're going to have to try to have Thomas Marquez, who was leading the primate pan genome. We're going to try to have this kind of people because I think they will provide an inspiration to this channel. But of course, we would also like to have people from your community. And that brings an important issue, on which I'd like to finish. That's especially for Christa, sorry for singling you out, Christa and Silva, but you are the animal people, I know best now on this call. And we represent a little bit of next floor, a little bit of NFGOR, and a very tiny little bit of the animal community, because we've been interacting with you guys. But we don't really have, you know, we are not really legitimate as representative of the animal community. And so, I think it will be very important that the interest group is run jointly with people who carry more weight than we do in the animal community. And so, we should have, it does not have to be formal from the start. It can if you guys want, but I think it will be important to identify the core board for the channel that will administer the channel and that will propose speakers and propose themes and do a joint. We are happy to take care of all the logistics and all that stuff and then feel and NFGOR are providing us a great support and then we're going to have even more of this. But I think it will be great to have some of you guys involved. And so, Christa, I know you must be extremely busy now with your new attribution, but if you have someone or if you can propose someone who you think could be, I think Rosalind is a little bit underrepresented today, but I know they are interested and I'm sure we can, we could motivate Emily or Dan Laquine and these people feel sorry. Yeah. And I just wanted to sort of extend on that and maybe pick on someone else in this meeting being very biased and completely picking on accents, but I wonder, Trevor, I guess from your accent, they are not part of the European collaboration. And so, A, am I correct in that assumption? Yes. Yeah. I'm in the United States. So, like, that's something else I would really love to see is we want NFGOR to be as inclusive and global as possible. So, I'd love any insight from you and because this interest group is starting off from a European centric consortia, like how do we reach other U.S.-based consortia that we can tap into? How do we increase the reach network effect? Yeah. That's a difficult question. I think for one, because I don't necessarily have my, I'm not too deep in the animal science pool here. I was a molecular biologist and then just sort of started working in animal science on the bioinformatics side. So, I'm not too up to date with all the consortia that are present over here. I think it's maybe more fragmented and it's maybe just sort of appealing to people in individual groups. Animal science here, they tend to be very friendly in person and a lot of them know each other personally. So, I think just sort of word of mouth is maybe the best way to get this going here in America. So, we have contact with Chris Toggl and James Rissi and they couldn't make it today because there seemed to be a NIFA taking place in the U.S. these days, but we hope that we will be able to engage them and that's something probably Chris, I have more to say than me on this. Yeah, you already mentioned Jim Rissi and Chris Erzik was today on the call. She's also very much, she could be some kind of a connecting person because she also has a lot of links to the human side and she's very much interested in providing resources for the community. So, even though she's maybe not one of the core users because she's more into databases than into data analysis herself, but she could also be somebody and then there's, I think in the U.S., they also have a kind of a science bioinformatics group from some funded projects. I just don't quote, she's from, the name is out of my head, but she's also very much into bringing the environment's community together. I would pick up the name for her, it would be nice also to contact her. And it's a fact that the reason we are now having this meeting at 4 p.m. is actually to be more inclusive with our U.S. colleagues so that I don't know what time it's for you to travel, not too early in the morning, I hope. But so, yes, this is something very important. It's very important that the group expands toward the U.S. and that's one of the things we'll be trying to achieve. I think we've been on the call for about one and a half now, Jose, you want to... Yes, no, this is what I wanted to say that maybe we have to wrap up and finish the call because people may have to move on. So, yes. Go ahead, Cedric. No, I think we are still figuring out the speaker for the next session that will take place again on the third Wednesday of the week and then we will communicate with you. We already have a speaker lined up till July. We will take a break after the third Wednesday of July and we will resume on the third Wednesday of September and we'll keep you updated through the Slack channel. Exactly. And I really want to thank all of you. We've been a bit long and the audience has decreased a little bit, but this was... We started with a pretty nice audience for your talk, Christa. So, I think this was a pretty nice success for launching and thank you all for being here for joining this roundtable and this has been terrific. So, see you all a month from now. See you and thank you so much. Thank you, Cedric. Thanks, everyone.