 Welcome back to talk session four about new communities and we'll start with Gregory. Thank you. So actually this is really cool for me because when I graduated from university, the first job I got was at Griffith University. And so I moved to Brisbane, I was here for 10 years and I worked at actually founded the Queensland Parallel Supercomputing Facility, which is now QCIF. Anyway, that's a long story, but it's really cool to be back and talking about stuff that we've been doing over the last year or so. Anyway, this talk is, I guess I don't wanna give the impression that, can we, it's really loud here, getting a bit of feedback, really? It's on, yeah. Oh, this one? Okay. Yeah, I don't wanna give the impression that we're like using Galaxy, like for everything that we do, okay? We're really just transitioning to Galaxy. We're trying to convince our neutron scattering scientists that Galaxy is a great platform for them. And so just wanna preface with that. Anyway, let me get on with this. So introduction, wanna talk about some of the challenges you probably heard from Sergei about some of the things that we've been doing. I just wanna reiterate that a little bit. Talk about some of the workflows that we've actually created and then how those workflows fit into where we see the future. And that's the really exciting part and really what I wanna talk about. So for people that don't know, Oak Ridge was formed just after the Manhattan Project and the idea was to create some nuclear reactors. And so that's where they had all started from. Today, it's a pretty massive organization. We have about 6,000 employees. We have many people come to Oak Ridge from all around the world to utilize the facilities that we've got. So it's a pretty diverse and interesting place to work. So what I wanna talk about today though is the Neutron Sciences facility that we've got. There are actually two facilities. One is called the High Flux Isotope Reactor or HIFA for short and the other is the Splatian Neutron Source. So each one of these facilities produces neutrons and people come from all over the world with their bits of crystals and materials and things and stick them in the neutron beam and they figure out what happens to the neutrons and that lets them in turn figure out what the structure of those materials are. And this is really important for material science for things like coming up with new battery technologies or other kinds of technologies. This is what it's been useful. Now, each of these facilities has around 30 instruments attached to it. So the neutron beam gets deflected and the beam lines go to different kinds of instruments and they give you different information about the material. So there's diffraction, there's imaging, there's reflectometry, there's a whole bunch of different things. And so it's quite a complicated facility. Now, for data analysis, there's a lot of challenges. Sergei talked about some of these yesterday but really the main thing is that we have 30 plus instruments. Each instrument has multiple data analysis workflow. So we're talking about maybe 60 or more data analysis workflows for all the instruments in the facility. And currently, pre-galaxy, it's all manual. So people have to get their data. Once it came off the instrument, they have to move it to somewhere where they can run some compute on it, they can run a simulation or do a modeling, do some kind of modeling and eventually produce some kind of image that then they can kind of figure out what the properties of their material is. And so even though a lot of science has been done at the lab, it's been slow and painstaking, right? And so what we see the value of something like Galaxy is accelerating this. So there's some other challenges like conflicts with compute resources, the fact that we've got multiple administrative and security domains and it's a total and utter mess about how you access compute resources and all this kind of stuff. So these are all challenges as well, but they're more about how we utilize Galaxy. Sergey presented this, so I won't go into it, but we have an architecture now that we think is pretty good and is gonna at least meet our needs in the short term. And if people wanna know more about the details of this, we're happy to talk about that. And this is all open source, so people can utilize it. So we've been going for about 18 months in this one project. We've created roughly 25 tools in that amount of time which correspond to six workflows. So we haven't really even got to the 30 workflows yet. We're only up to six. And so, but we are making progress and so that's been good and the people that are funding our project are pretty happy with what we've done so far. We've made some decisions like all our tools are dockerized containers, except for the ones that run on the HPC systems where we're using singularity. We're trying to be very standardized what we do because we're gonna have so many of these tools eventually in so many workflows. And we have tools written in Fortran. We use like OpenMP and MPI and things like that which really complicate how you run a tool. Like we're not just running our tools on a simple compute farm or something. We have all this kind of stuff that has to happen as well. And so it's quite complicated. Anyway, but we're making good progress. So now I wanna just switch a little bit to some of the workflows. I'm not gonna go into the details. I mean, number one, I don't really even understand the physics behind all this stuff. But what I wanna focus on is what we've achieved with our workflows. So this particular workflow, which is for analyzing these quantum materials, this runs on Summit, which is the fifth largest supercomputer in the world. And we have this running on 6,000 CPUs and it still takes 24 hours to run. So we could even scale it up further. But the important point here is the first, these four main tools here, they run on Summit. And so they need MPI, they need the HPC, the GPUs and everything that Summit has. The last two steps don't need HPC resources. And so we don't wanna run them on Summit. We wanna run them on our cloud, our on-prem cloud, for example. So with this workflow, we've been able to demonstrate for the first time ever, really being able to run a neutron scattering workflow on our own HPC resources, but across multiple types of resources from HPC to cloud. So that sort of verifies the architecture in a way. But also I'll get into why this is really important. We have another workflow that has machine learning components in it. So this uses PyTorch, I think, behind the scenes. I think the big box there is a training step. So every time you run this workflow, it actually trains the neural net as well as then uses it to speed up the actual simulation. So it's kind of a training and application of the network at the same time. But the point here is that we've been able to integrate with machine learning and artificial intelligence into the workflow. And so that's got some really interesting, exciting possibilities in the future. Monte Carlo ray tracing analysis. So this is a demonstration of using a, what they call a digital twin. So it's essentially a simulation of an instrument. So we can actually using the Monte Carlo ray tracing where we're simulating where a neutron goes or thousands or millions of neutrons. And we can actually simulate what would happen to a particular sample. So with this workflow, we have essentially a software version of the facility. So just keep that in the back of your mind and what the implications of that could be. But this has demonstrated that we can do that with Galaxy. The last one I wanted to look at is this cost facility one. I mentioned that we're already able to run on HPC resources and our on-prem cloud. This is an example of us being able to do that end to end. But also with the capability of going intra-facility as well. So ultimately what we wanna be able to do is run a workflow, not only just internally on the Oak Ridge facility, but on another facility as well. So we're working with some of the other light sources, for example, which do different kinds of physics. So there's like, there's the X-ray sources and other light sources which give you different information about the materials. So if we can run experiments that go, span not only our facility, but other facilities, then that also opens up some really interesting capabilities. So this is really what I wanted to talk about, which is what we can do with all those technologies that I've just talked about. So we've demonstrated this capability of so let me just here. So we can leverage our distributed resources, our massively parallel supercomputers. We also have edge resources which are compute resources close to the instruments as well. So using Galaxy, we can leverage all those. We can incorporate machine learning models and AI tools into the workflows. We've demonstrated that capability. We can connect to other labs and other facilities to run cross-facility experiments. We have the digital twin capability. So in addition to running real experiments, we can run simulated experiments. So for example, if someone else is using the machine, the actual facility, but you still wanna run an experiment, you can do a simulation of the experiment. And that can also help you set up the parameters for your actual real experiment when you wanna do that. So there's a lot of benefits of being able to do digital twins. And so by combining all these together with our workflows that are driven by Galaxy, so Galaxy will be driving the workflows at the edge. So for example, when we capture data from the instruments and we need to process that data so that we can use it for doing data analysis, Galaxy will be doing that. And it will also be running workflows for the whole experiment as well. So driving the different instruments, collecting the data, analyzing the data, sending it out to the HPC resources, all that kind of thing. And so really this provides an unprecedented opportunity for doing experiments in material science. So this is a capability that's never been possible before. And so we're actually now submitting funding proposals to build this facility. So this will be like the experimental facility of the future for neutron material sciences. And so none of that would have been possible without something like Galaxy, or we would have had to have built a Galaxy replacement ourselves in order to be able to do this. So to me, that shows you the versatility of Galaxy and how it can be adapted to other types of environments, but also the fact that when it was designed, it was done in a way that it's not specific to any particular domain. And so we can leverage it to build these kinds of capabilities in the future. So it's a pretty exciting future, I think. So yeah, with that, I'd just like to acknowledge a bunch of people that have been involved. This is really 18 months, but we've had a lot of people involved from both the domain science side and on the software engineering and computer science side. So it's been a big time of people. And thank you. I'm... Oh! Oh, I'm in with. Thank you, Gregory. And next, Ennis is speaking. Hi, good afternoon. I'll talk about a bit of a sort of a future-facing development that are planned over the next five years or so with respect to the annual project that was mentioned in the past session and also I guess if we take a look at what we've heard a lot about thus far is about this accessibility of Galaxy. We've heard quite a bit about the use Galaxy services that offer this unprecedented ability for people to just log on and start operating on data around the world. And this is phenomenal for all the training infrastructure, for all the open data that exists out there. So that's one sort of prong of how Galaxy increases the accessibility. The other one is an example of what the talk we've just heard where it's this highly customizable, very powerful local installation of a Galaxy instance that works great for these specialized tasks or larger teams. And then the sort of third prong is this what originated as sort of Galaxy on the cloud as a way to leverage technologies without having to deal with infrastructure itself. And over time that has now kind of evolved into this Galaxy on Anvil as a way to offer Galaxy instances to operate on sensitive and protected data that is otherwise very difficult to obtain. And so we've seen the slide popped up a couple of times already today. So Anvil as a project has been in existence for five years now and it's building this sort of a data repository today. It sits at about five petabytes of predominantly NHGRI. It's a branch of NIH sort of human data, genetic data that's easily ingestible into the software that runs on this platform. So the software that we have includes that that's available on Anvil this sort of includes Galaxy as one of the apps but there's also Jupyter by a conductor integration with the doc store for methods sharing, Whittle and Cromwell for batch analyses. And so it's a library or a lab of software that you can use to operate on the data. So pick your own selections. And then there's the community aspect of course to sort of mimic a lot of the activities around from Galaxy. We've heard Natalie talk earlier about the GDSCN efforts as a way to increase the diversity of people that this unique platform is available. And all this runs in this FedRAMP environment that gives us the guarantees of working with sensitive and protect the data sets. Again, something that is otherwise very, very expensive very difficult and unattainable to most individuals and groups. And in this case, it's ready. And so that's what we've spent the last five years working on. So anvilproject.org gives us an overview if you wanna check that out. And then if you wanna actually try Galaxy and work on some of the protected data sets that's at the anvil.terra.byourl. And what we're kicking off this month is what we're referring to as the Anvil phase two. We are awaiting that notice of award that Mike mentioned but it's hopefully coming any minute now and that'll carry us through the next five years. And so I wanna go through these sort of main activities that are planned for this phase two so that people are sort of in or we can be in sync with the rest of the community and develop things, you know, collaboratively. So the four things, the four activities that we're focusing on is sort of increasing the usage and we do that by improving the UX for users when a lower the cost because here people do actually pay for what they use, streamline how people interact with Galaxy through some machine learning and AI methods by making them available in Galaxy and then improve the interoperability of Anvil with other similar platforms. And so taking a slightly deeper look at this, again, this is all forward facing so most of this is certainly not in Anvil even though some of these components actually are starting to pop up in Galaxy proper already. So as far as improving the user experience, so the day Galaxy is considered as an interactive platform on Anvil and all of batch workflows are done through Terra. However, not all of the, I mean, sorry, no, Wittle workflows but not all of the Wittle workflows exist or some of the Galaxy workflows don't exist in Wittle so I wanna make sure that you can run those workflows in an easy batch format that would help with scalability. When in hands how users interact with Galaxy today that's fairly clunky, takes about 10 minutes to get going, we're gonna get that down to seconds preferably. Second is of course this cost efficient computing element you pay for every minute that you run on Anvil and as you scale up, making sure that we can dynamically scale resources, something that we think comes for granted on the cloud it doesn't today. So that's a step in the forward direction and then leveraging data local computing again some of the work that's already been done in Galaxy is hooking it up. So some of the more interesting things that haven't been talked about thus far are these recommender systems. So again with thousands of tools that exist in the tool shed and in Galaxy how do you select a more suitable one? There's been some preliminary work in Europe done on this topic so again integrating that taking it to the sort of next level including some parameter sweep options particularly when it comes to the machine learning methods and then if we can predict what individual tool you might wanna run next, just say well we can also experiment with being able to autodraft workflows and give you some subset of tools that a particular flow might fit. And then the last sort of set of activities is this multi-cloud support. So we're moving from Google Cloud where all this has been deployed over the past five years to Azure and then increasing the use of GA4 GH APIs against some of which have now been implemented in Galaxy but not at all in the Anvil version. And with that I said this is a collaborative effort between the Broad and Hopkins as the main institutions running this program. Thank you, Mario's next. Okay, so hi everyone, can you hear me well? So I'm here today as a representative of the French bio-adversity data infrastructure for Galaxy Ecology and so the Galaxy Ecology is a transnational initiative for research and expertise on biodiversity. So it's in the context of ecoinformatics where we have a lot of information about time from the time of publication to the depth of investigation and the degradation of the data throughout times. This degradation is important with all the links that the different data has among themselves. So in order to fight this degradation of data and this loss of information, the goal of Galaxy Ecology is to enhance reproducibility. So Galaxy Ecology is supported since 2018 by the French bio-adversity data infrastructure or in short, PNDB. So the PNDB wants to make available data and metadata. They are trying to identify knowledge gaps and participate in communication. And finally, they are promoting products on services at multiple scales, so regional, national and international. So right now they are going to integrate the research infrastructure data pair as the fifth goal, fifth domain between the in the atmosphere, the solid earth, ocean and the last domain which is 21.6. So in Ecology for now, there is currently huge house crib that are used to do Ecology analysis. And with Galaxy, the huge improvement would be to have multiple tools when you divide the huge house crib which will enable users to use different tools with different data sets. So it's improving the usability and the interoperability. For instance, in the data management, the PNDB put multiple tools in place throughout R&D projects. So for instance, we have MetaShop that helps create ecological and metadata language files. Or recently they added MetaShrim that optimates solutions of permetrics and data paper sketch for ecological metadata language base on metadata documents. But there are also non-interactive tools like XML Starlet, MetaDF, XRM, MetaDF, and input data informations, OJR informations, and last thing that very big, EZM, 19.139. So from this tool that managed data, we also have a new workflow that we implemented during the last year. So for instance, we have these four new workflows. So one on biodiversity data exploration, one on remote sensing that I explained 50 yesterday, one on turnover border biodiversity indicators, and the last one that was, that is recently implemented on the co-generalization of the CDM-0B. So the first one, biodiversity data exploration workflow is a workflow that allow you to have a complete overview of your biodiversity data. You have a kind of state of the art of your data. You can make your analysis and statistics checks to be sure that your data are correct and can also view some variables in your data. The next one is the turnover border field analysis where you have three indicators that allows you to see the pressure of food fishing on boulders. So let's see the picture of the field here. It allows you also to calculate the diversity impact of food fishing on the boulder field. So this project is the goal of this project, is to make an experimentation for sustainable and concerted management of recreational food fishing in France. So all of this workflow is available on the Galaxy Training Network and explained. And the final workflow is the eco-organization of the CDM-0B. So the goal of this workflow is to propose an accessible reproducible and transparent IT solution for the processing and analysis of occurrence data applicable to the concept of ecosystem conservation in Antarctica. So the main step of this workflow is mobilization of species distribution then the taxon section. From this, you can determine the optimal number of cluster with the silhouette index and then make the clusterization. And with the final map here, you can have the visualization of the eco-organization. So here, for instance, you have five of these eco-organizations. It will soon be on Galaxy Training Network too. So in conclusion, the Galaxy Ecology instance allows to have a solution for citizens to confirm the extent of their contribution and appreciate what is an ecologist's work. And then even the beautiful picture of the Marine Station of Concernot where the plan BD is working. Thank you for your attention. Thank you. We have time for one question. Sorry, how does boulder turning works? So you turn the boulder and count how many things you have there and that's like an indicator of diversity. Yeah, you have multiple ones. You have an indicator to see when the boulder is turned over if the diversity has changed between the upside and the downside. And you have like a fixed boulder that is the reference point and you see the diversity from this reference point with the other boulders that are turned over. Thank you again. Thank you. Our next speaker is Wendy. Hello, Galaxy community. So I am thrilled to be speaking on behalf of the brand new single cell community of practice. We seem to be growing every single week the new institute is joining. Today we build a very big bridge between Europe and Australia. So hopefully we can get more of our teammates across the community together. So this really started with the problem that we identified. So single cell analysis, as someone who does it obviously, I'm very keen on it. So it's a really cutting edge field and bioinformatics. We're constantly having new tools, new tool suites, new technologies, new users, new uses. There's so much to do in this field and Galaxy has a little bit of an awkward history here where we managed to wrap the exact same tool suite twice. So we'd really like to avoid that going forward. There's a lot to do. This is something that came out of running one single cell training course for 30 students. And this is not even a comprehensive list of the different tools, workflows and tutorial requests that came out of that single training course. We have a lot of work to do. We cannot be duplicating effort going forward. The other thing that we found is a lot of our users would be finding tools that seemed familiar. They looked like single cell tools and are in the different. Also shout out to the tool search. Thank God that is improved. Okay, but they were finding all these different single cell tools. What it turned out is that they were often used by a single group that were pre-processing their data outside of Galaxy and then using that for a very specific way, which meant that users couldn't use it. So, and this is so frustrating for me as a trainer because it seems like 90% of the work is done. You've wrapped the tool, the tool functions, but it's that 10% of making it work with the workflow, that 10% of talking to the user and just fixing, making a little bit less Cody, adding a little help text. That is the difference between me being able to say, here you go user, or me being able to say, avoid that one, it doesn't work, right? So close to glory. So why do we need a community of practice? I feel like we may need to turn the volume down for me. It's just like on Zoom meetings as well, I have this problem. So why do we need a community of practice? Well, obviously who doesn't love another meeting? Also single cell analysis is cool, right? We are asking really exciting questions. What are the different cell types in an eye? Or how do cell types get affected distinctly in disease? How are cell types affected distinctly by treatments or by drugs or by whatever parameter you're interested in? We're asking very cool questions. So obviously we need a community. Of course we need to prevent work duplication and most for there and we have all of these people across different institutes, different disciplines. We might as well use that. I'm a big believer that trainers are your best user advocates. We are the people that see problems, log the problems and also follow up on the problems. So if we can connect these trainers who are these user advocates with the developers, we can really fulfill this pipeline of making things usable and all of that comes to outreach. I can then, you know, pin down all of the different single cell tools for people who've never encountered Galaxy a lot easier when I know that they're working together. How do we function? So this is what I'm really hoping to get advice from you all throughout this conference because we sort of put this together to avoid this problem going forward and I really don't wanna add a meeting to anyone's docket that's not efficient or useful or ideally, God forbid, enjoyable, right? So we meet every six weeks. We've finally added ourselves to the Galaxy workflows calendar after a number of time zone issues. We're gonna start having an Aussie Brisbane friendly time in our every third meeting is gonna actually be useful for y'all. All right, we've got a team forum on elements which we're finally all added to. The getter to element transition did not treat the single cell community well but that's hopefully sorted. We've got a rolling notes. We almost went to GitHub issues board or project board but we ended up sticking with the Google doc for now where we're putting people together. I send out post-meeting notes. I automate it so they get sent out three weeks later so I don't have to follow up on anyone and make sure they're working and then we get an email as well reminding people of the actual meetings. Our templates are pretty simple for how we run a meeting. We always have new people so we always start with introductions to get people to know each other. We then go through updates where people are saying what they're working on to prevent work duplication. I think my favorite bit is putting people together when people are finding issues and need them to be solved and so we're starting to make little elements subgroups when we're having people working across different institutes together to fix these problems or follow up with the user needs. And then of course we have which you slightly can't see the other cool stuff that we get involved in. So smorgasbord being a fantastic example of this people in all different institutes we all agreed who is doing what, who is captioning what in order to then make sure that we had a nice array of materials available for smorgasbord. Seemed fitting, I've got to say when I saw this on the state of Galaxy talk it made me very happy about the way that I like to title things. So that was quite nice which I sometimes get a bit of flack for in England for not being, I don't know, Attenborough enough. So cool stuff we have. I had to give a shout out to obviously my students Julia and Marisa who did some amazing talks today. I think the coolest thing that's happening in Galaxy for a single cell is we had a very amicable divorce from transcriptomics last year which was very important because we've gone multiomic. So now we've got chromatin happening, we've got protein, we've got spatial which I think you're gonna be seeing more. There's posters I didn't even look at that are here. So just lots of cool things are happening in this area and it's only gonna get more exciting going forward. Yes, that was Marisa, that was for you. Otherwise it'd be a bad supervisor and they'd kick me out. I think the other thing that is exciting is when we all get put in a room together we all start to dream together. So we have a whole bunch of different things that we want to look at going forward. Better visualization, we saw the learning pathways which was the bomb. We have more than a week's worth of materials so we're gonna have to make separate learning pathways for what people want. We wanna see multiomic and spatial single cell analysis grow. I think something as a trainer I'm very excited about is having multi languages for the same tutorial. So if we're thinking about Galaxy as a gateway to learning to code, what if we start them with the buttons and then move them on to doing it in our notebook as opposed to starting with in our notebook, right? Can we make that a little bit more accessible for people? So here are a bunch of links since you all have access to the slides anyway. If you'd like to join our mailing group which we've just gotten, thank you Nate. Help us run better. I'd really love to hear your thoughts on that. We've got the matrix group going on and then also I'm just gonna give a shout out to a couple of PhD students who are just very ready with some surveys. So if you have anything going on in the training space I'd really love to talk to your participants. I don't want to get dinged. Yeah. Thank you, Andy. Our next speaker is Anthony. Hello, hello. You hear me? Cool. So I'm Anthony Brottodeau. I'm working in France in Rennes and I'm gonna talk about genome annotation in Galaxy. So we have this group named Galaxy Genome Meditation which is a community born in 2017 and our goal is to ease the annotation of genome within Galaxy. So we work on tools, workflow, visualization, training and all material to ease annotation. So we have a website on github.io and a subdomain on usegalaxy.eu. So first just in one slide what is genome annotation for those who don't know. In fact there are two types. The first one is structural genome annotation. So in this step we will try to find the position of the gene along the sequence of a new genome. So that's really important because it's the base for many, many applications afterwards for transcriptomic epigenetics, comparative genomics. You need to have well-defined genes properly defined by software. And it's not an easy problem for software because these genes have some motives to recognize them on the genome. These motives are very short and not very specific so software have trouble finding these genes. So often they need some evidence data like RNA-seq data or alignment of proteins from other species. And the second step when you finish the structural annotation is the functional annotation. In this step you will try to assign function to these genes based on the sequence. So we'll try to look at non-motives or similarity with known sequence to give names and symbols and attach some gene ontology terms to each predicted gene. So we have a full catalog of annotation tools that are available now in Galaxy. Most of them were uploaded, were pushed to IUC. So you have tools for repeat detection like repeat modeler and masker, red, and you have structural annotation tools for prokaryotes and eukaryotes. You have functional annotation tools like scary ones like InterProscans and Eggnogmapper. And you have other UT-T tools like a boost code to make some quality assessment of your annotation. And alignment tools like Blast, Exonerate, Diamond, Mini Pro to align sequences along the genome and compare it to the annotation. You can also annotate other things that then genes like tRNA or here a non-coding RNA with a financing. Just to focus on two new tools that are coming to Galaxy these weeks. The first one is Helixer, which is a new tool based on deep learning. And this one is great because it doesn't need any evidence to predict genes along the sequence. And it uses GPU, it's super fast. From the first test, it seems to work great on some genomes and not so great on others, but it's quite a new and interesting way to annotate genomes, so it was merged last week and we are currently testing it on newsgalaxy.fr and we will deploy it on other news galaxy later. And the second one is Breaker 3 that should be merged in the coming days. That's a new version of Breaker that supports both RNA-seq and protein evidence at once, which was not possible before. And it seems to give great results too, so it should be in Galaxy pretty soon. So what's so special about annotation tools in Galaxy? Well, these tools are often big bike lines that run a lot of subcommands. Some of them are quite old, so it gave some packaging issues, especially with Pearl. I have great memories about that. And yeah, for computing, it uses a lot of subcommands that are launched in parallel, which is not very easy to debug when it crashes. Some of them use MPI like Maker and most of them also use a big reference database like 10s or hundreds of gigabytes that need to be updated regularly with the data manager, which is not super cool. And the last problem, which is really annoying, is that some software or subcommands may have some non-free or strange license, which make it quite difficult to make a fair tools and fair workflows, so that's really a problem because you can easily distribute the workflow and testing it automatically is not easy also. We have made some fields like this to try to work around this problem. For GenMark, for example, you can upload your own license when you render tool or for InterProscan, you can choose to use the non-free components or not, depending on what you want to do. That's it for the tools. We also saw some specific tools for visualization of annotation. So here you can see Jebra. So from Galaxy, you can generate a genome browser by loading your assembly and annotation and RNSE track or whatever track you have. And yeah, generate in Galaxy, have it in your story and share it with anyone you want. You can do the same with Circus at the top to generate a colorful plots like this. And recently, I've worked on GeneNotebook here that allows to deploy a small web app to explore functional annotation on different genes. So you have also nice alignments of motifs and similarity on genes. So based on all these tools and visualization, we have some workflows that are available so you can get from the assembly that you've done in Galaxy to a structural annotation, a functional one, and then visualization of all these results. And so these workflows are available right now either on GTN or we're trying also to deposit them on the IWC. Okay, that's it for the tools and workflows. There's another nice thing about annotation, well, nice. There's a problem is that automatic annotation are never perfect. As I said, the motifs on the genome are not really strong. They are often short and not very specific. So every time you will make a structural annotation, you will always get errors, genes that are not complete, exomes that are missing, or genes that were fused that shouldn't be. So on some genome sequencing projects, you often want to perform a manual curation step. So you ask to your friends to look at the predicted genes and see if there are errors. So usually you do it for some gene families where you have some experts able to detect these errors. And so we have implemented Apollo, which is a web app which can be summarized as the Google Doc of annotation. So it looks like this. You can look at specific genes and with the mouse, change the limits of exome, entrance of the world gene. You can give names to the genes and this is collaborative and multi-user in real time, which means everyone working on the same genome sees what the others are doing in real time. So this is available as a self-service on usegalaxy.eu slash Apollo. So anyone can come with this genome, this annotation, launch a new manual curation project and give access to his friend on the server and produce a new release of the annotation by manually creating. So Apollo is a part of GMO family of tools, but we have worked also on other tools. Jbrows is also in GMOD. We have worked on Shadow Triple, which are web interface to display, to make web portals displaying annotation to users. So we've worked on Docker images for this application, Python libraries, a common line interface and Galaxy tools using this common line interface to load data and to make it available to other users. And of course we have a lot of training material available. So we use all the possibility of the GTN with slides, tutorials, workflows, videos and two new learning paths we've added two weeks ago, I think. So that's the output of the Galantry's EU project that is finishing in two months. And one thing that is interesting is that it seems to be a topic that is quite popular for people. So if you look at the participants to the smugglers board, there was a form at the registration where people could say which topic interested them. And two-third of users checked the genome annotation topic, which is quite good. And during the whole week, we had 3,000 page views and more than 200 video views, so it's quite great. And now for annotation in Galaxy, there are some interesting projects, VGP, Arga, VG, Atlassie. These projects are a big sequencing project where the members try to sequence tens of thousands of genomes of different kind of species. So you have already seen what has been done for the assembly of genome in VGP. So you have some workflows available. Now the next step is to take this genome and try to annotate them in Galaxy with similar workflows and try to be able to treat as many genomes in a scalable and fair way. So there are a few challenges. First, the availability of evidences data. As I said earlier, often to have a good annotation, you need the RNA-seq data. And these projects often plan to have RNA-seq data but a bit later after the assembly of genomes. So for now, most of them, of these RNA-seq data are not available yet. Also, we'll need to have some standard workflow for all these genome, but probably we'll need to have some phylum-specific workflows because some annotation tools work for some species and will not work or need some special tweaking for other species. So we'll need to work on that. There's also the problem of having a quality check metrics to determine which annotation is the best for a genome. So either by looking at busco statistics, the gene number, length of the exact count, the mean exact count of genes, for example. So that's a big work to do. And there is some computing challenge also. So we are a partner of the EuroSense Gateway with a specific use case on annotation and biodiversity. So the goal is to be able to annotate all these genome using pulsar endpoint in the European infrastructure. So there are different projects, the GP is the International, Agar-Europane, BGA2, Atlassie is a French project. And so all of them have the same goals, so we'll need to have some coordination between the communities to work on the same workflows. And just to finish, maybe the next step would be for all these genome after having another automatic annotation to be able to provide a manual curation infrastructure and some web portal data exploration systems. So I started a project with a few colleagues which is named Boris. So the principle is to have some, well, it's genome portal as a service. So the main goal is to be able for a set of assembly and annotation to easily just in a few click and a few configuration to deploy a whole website like this proposing a standard service like Apollo instance, J-Bros website, Blast server, and the genote book, for example. So we, in fact, this is based on writing a YAML file on the left describing all you get data you have for a specific species and then using CI or process in GitLab and GGA workflows to generate all these portals automatically. So we plan to use it for ATACI to be able to put online all the data of this project in an automatic way and still user friendly for the biologist. That's it. Thanks for listening and thanks to all these people who are contributing in any way to this. Thank you, Antony. We have time for a couple of questions. Antony, I've heard from the VGP people that genome annotation is missing. Do you see there an opportunity? It's missing, yeah, yeah. Surely, yeah, you need, as I said, we need absolutely to synchronize between all this project and to propose the same workflows between the different community and, yeah, yeah, yeah. So I'm wondering about the metadata for genome annotations once you have managed to create them. How are they searchable and what metadata standards or did you have solutions for that part? Yeah, that's important, yeah. Part of the solution for me will be that because for ATACI, for example, we'll have a lot of genomes that will be tracked in a specific database and we plan to use this to generate this kind of YML file using basic metadata extracted from this database. And also, based on that, to be able to generate arrow-create data to deposit them on Xenodo or things like that. Surely, yeah, that's on our roadmap. Maybe not next year, but yeah. Thanks very much. Our last speaker is Catarina. Here you go. Thank you very much for you and a bit of fresh air. And I'm going to talk to you about this from a very different angle. So my name is Catarina Heil and I represent Elixir here at the meeting today. We have a lot of communities in our infrastructure. I'm going to very briefly present you who we are and then what our communities do. So, Elixir is an intergovernmental organization that brings together life science resources and some of these are databases, software tools, training materials, interoperability resources, compute resources, as well as data management support. And what we try to do is to make all these virtual things a federated infrastructure. So a single go-to place for researchers and people like you to find all of this. So we are 24 different nodes that's basically equivalent to 24 countries. And within those countries, it's about 245 institutes and universities that are contributing to this. And we're coordinating all the efforts from one central office, the coordinating activities for Elixir are happening in this secretariat, which we also call the Elixir Hub. And we're all about a network of people and together we accelerate the understanding of life. So again, it's about connecting national infrastructures to mobilize data for life sciences in the European research area and beyond. So whatever we have is open source and that makes it accessible for all of you as well. How do we do this? We have five central platforms. They're looking at topics such as compute data, tools, interoperability and training. And per se, they really build the infrastructure. They host the things that the researchers and the end users but also the intermediate users need. So these here are things like cloud compute storage and access to services. We're looking at robust, long-term, sustainable data resources. When it comes to tools, it's all about finding them, registering them and benchmarking them and also looking at best practices for tool development. And in the space of interoperability, we can mention some of the words like discoverability, accessibility, but also integration and analysis of biological data, looking at standardized formats, metadata and vocabularies. And in the scope of training, it's really the training infrastructure and bringing the capacity to our members to train the researchers and the individuals that want to be using some of our resources and tools. And this is now our communities portfolio. We have about 17 different communities. And this is maybe also where you as the Galaxy community have different communities. They look at different things. You can see the vast diversity of areas that Elixir communities are looking at so that reaches or ranges from research data management, which is our newest established community. Plant sciences, we have people that look at rare diseases, microbial biotechnology. And one of these communities is also the Galaxy community. So communities are formed around the main experts in Elixir nodes, but we're also very welcoming to non Elixir members as partners in our communities. And they provide a mechanism for long-term collaborations with others and other projects and communities like yours as well. Our communities drive service developments in the Elixir platforms and they provide a framework to develop and maintain community standards. This is just a different summary of how our communities work and the one to highlight potentially is also supporting collaborative standards and development. So it's really about having community-driven approaches here. And we're always very interested in a bi-directional conversation to all of this. So the people that have the technology and the users and the user representatives in our communities that are really applying what our platforms are supplying and vice versa. So I have two slides for you on our Galaxy community. You know this much better than I do but this is kind of how we are pitching those connections in our context as well. So the Elixir Galaxy community fosters a Galaxy community in Europe beyond Galaxy Europe or together with Galaxy Europe. And then together with Galaxy resources and training. And again, it monitors and fosters the use of Galaxy within Elixir as well. Every community has defined goals. Every community has leads. Two of them are in the room here today. So Frederick and Björn and Nikola might have been online. I've seen his picture on some of your screenshots as well. So many of you probably do know of Nikola or have met him in the past. And just some of the highlights that we are also very happy about and showcase in our Elixir context when it comes to Galaxy. I'm not going to talk through this slide because I think we've heard and seen really good examples in the last hours which I have also thoroughly enjoyed so far. I did want to highlight this slide and really also say that it's always about the bigger community. So Elixir collaborates with the Australian Biocommons and that's how Elixir also comes into play in the communities. So that's really for sharing technical expertise but also giving both organizations a more global perspective. And I think Galaxy is also right in the middle there. One last example, the Biohackathon. That is an event that is supported and run by Elixir in Europe. And here again, it's really the people from the communities that drive the projects. They make them happen and they implement them. And it's been great to also see all the collaborations across the globe. We're having our next edition in November. I think there'll also be representatives from this part of the globe again with us. Online registration is still open. So do let us know if you want to participate. And with this, I have a slide for you on community resources. If you're interested a little bit more in seeing how we from the Elixir Angle manage communities, I'm also very happy to exchange and learn from you how to really engage and foster the people that form the community and make the community. And with this, really just thank you everybody for having me here as well and enjoy the rest of the day and the conference. Thank you. We have time for one question. Hi, very interesting talk. Thank you. So I saw infrastructure and I saw all these communities. What's happening with the software? Is that something that you maintain or how is that managed within Elixir? So the way we work is that the people in our notes offer services and they can be software, they can be tools, they can be anything. So really the maintenance of the things goes back to the people in our infrastructure in the notes that are maintaining them. And they kind of commit to doing so with the tools that they offer and bring into the infrastructure. I'm not sure if that answered the question. Happy to chat later as well. Thank you again to all the session speakers. So this brings us to almost the end of today. This is definitely the end of this session and I'd like to thank all of today's speakers and all of the trainers and also our session chairs who've been doing a fantastic job.