 Okay, everyone. Welcome to the last talk session for the day. This is a series of success stories with Galaxy or a series of speakers being surprised when the bell rings earlier than they thought. We'll see how we go. I think we've got it sorted now. We might get cracking because we've got a series of short talks and a couple of long talks. As I said, success stories with Galaxy. Dan from the Cleveland clinic clinics going to talk to us a bit about what's that metagenomics. Thank you. Good morning or afternoon, I guess everyone. I'm going to be giving a talk that was originally supposed to be given by Bobby, but he unfortunately had some visa issues so he wasn't able to come but I get to go tell you about all this great work that he's been doing. So when I say we, I very much mean a lot of this work has primarily been done by Bobby. So, first I want to quickly just give a brief introduction to sort of what the microbiome is. So as you're aware, we have many, many cells within your own body, but there's actually a larger number of microorganisms that live inside of inside of your own body. That have large influences across metabolism. You know, the gut access brain got access and so forth and so it's important to realize that these microorganisms that are living inside of you have a large impact on your health. The compounds that they're generating the metabolites. They have a large influence within your own, your own body, your own human health. And then so what can we do to sort of investigate these these microorganisms and and sort of better characterize them. So, I want to talk today about something called meta SBT. And so metagenomics allows us to study not only well characterized microbes, but also a large number of microorganisms that are not able to be cultured. Also, an increasingly large collection of meta genome assembled genomes, also known as mags, and these have sort of paved the way to get from, you know, higher order of taxonomies into individual species including strain level resolution of these microorganisms It's important to get down to these, these more precise categories, because, you know, it's really not just what microorganism is there but it's the genes that they have that are actually acting within your body with creating, you know, these compounds and metabolites and affecting your human health. So what what we're really missing now is a systematic procedure for organizing and processing hundreds to hundreds of thousands of these mags, along with all of these reference genomes that have been obtained from doing isolate sequencing. And then how can we then further go ahead and leverage these along with human health metadata, for example to to determine how we can assess and improve human health. What is meta SBT, this is a computational framework composed of several different modules, different sub routines. And so if we want to first think about the first one, this is the meta SBT index. And so what we can do is we can take us a collection of genomes along with their taxonomies, and we can build these for these in order to allow us to create a database that we can then take with our own sequencing data, our own mags that we've assembled, for example, to then go ahead and investigate this. So meta SBT, it leverages a tool called Howdy SBT, which is written by Bob Harris from from Penn State. And it's greatly improved upon from the Kingsford and Salomon 2016 approach for building the sequence bloom trees. It also includes check M to build with completeness contamination and so forth. So the index allows us to build the database, there's a boundaries function that then allows us to determine the common camers the minimum maximum of those, as well as the cameras between these different taxonomic levels and so when we're actually building these, we're building seven different SPT's from the different taxonomic levels from species, all the way up through to the kingdom. And the reason this is important that we do this screen went blank. The reason why it's important that we do this is because we will want to end up having an update process. So with the profiling what we can then do is we can provide our own sequence data, plug it into here and we can find out where along this taxonomic tree that our data results. The problems with the currently existing processes is that there's no way to update, but because we're actually building different sets of trees at different levels with meta SPT, we can actually have a update process where we can just update individual leaves by having to update the entire database. And this is important as we discover or sequence additional additional metagenomics mags and fully assembled genomes. So so far, what Fabio has been able to do is he's indexed all of the viral reference genomes, as well as a large set of mags from NCBI. We're using these with even more mags and as well as building bacteria archaea and fungi species into this. And we're currently investigating how to simplify this by building sketches representations of the genomes for these larger genomes because some of these some of these clustering steps and so forth are quite computationally intensive. So we have a Galaxy tool suite to do this to build the indexes that the databases to actually run profiling and then also to update. We're also, we have the genomes available that the databases available will be making available on CVMFS as well. So that everyone will be able to use them within galaxy or outside. I don't have time, but I also just want to briefly mention that. Okay, I'm out of time. Yep. Yeah, so if you want to go ahead and access these databases and the tools they're all open source they're available on GitHub. They're available on bio conda. There are galaxy tools that are also being released soon. So, thank you. We don't have time for questions at the moment, but next up we've got a talk from Linel and Alex about vertebrate genome project workflows within galaxy. I'm Linel and this is Alex and we're here to give an update about a vertebrate genomes project and the galaxy workflows you've been using to assemble a whole bunch of vertebrate genomes. So, the vertebrate genomes project as a bit of an introduction is a bit of a lofty goal of a project where we're aiming to sequence, as it says in the tin all 66,000 vertebrate species, but baby steps so we are shopping it up into phases the first one of which is sequencing a taxonomic representative from all orders, all vertebrate orders, and that amounts to about 260 species of which we are almost finished we have a little over 200 sequence in NCBI and curated NCBI ready for annotation. And on our, we have all of them publicly available as well this table just a screenshot from our public facing portal, where all the data is publicly accessible and viewable, as long as you're here to our data usage policies which you can email Eric about and not me. Aside from that, so these are the 51 genomes that have been done so far. Well, more since the 51 that are have been done and described in the paper that just came out and is on bio archive. It is currently in submission and nature biotech. Yay. But so it's 51 species assembled across the vertebrate tree of life. That's sharks, fish, mammals, reptiles, birds, and even works on some invertebrates we've used it on some spiders and some mosquitoes, because people don't read the lab name, but it works in a variety of genome complexity to from very homozygous critically endangered species from zoo populations to very heterozygous almost like strain crosses of zebra finches in a colony. Also genome size, it has worked on the 800 we get at a 800 megabase pear chub mackerel all the way to the eight gigabase pear carob refrog which crash every single note I tried to run it on. And about a third to half of these were run on galaxy EU using freely available public compute because it was before I was able to set up our local server and really utilize the compute my institution has and also stop giving Bjorn a headache. But the actual workflow itself is dependent on hi fi data to make context using hi fi asm, and we have a variety of different workflows available for if you have additional phasing data like trio parental data, or high C data which you can also use later on for scaffolding. Also, there's scaffolding pipeline for bio nano, and there's also expert workflows that help get all your reference quality genomes up and out into our genome arc where they are also freely publicly accessible. There's also an independently invoked mito some mito genome assembly pipeline that's available as a workflow to and yeah so progress updates were as I mentioned papers and submitted to nature biotech also and bio archive if you want to take a look at it. And it's 51 plus genomes assembled and available, and they're directly available from the data importer on major servers you can just click genomark and just navigate to your species you want, and there's more genomes that have been done since this was updated but I'll take it. And I give the update on the actual enhancements to galaxy that we've made for this so all of the workflows that are being used by the VGP are now available on docstore so you can pull them into a galaxy instance very easily and run them. If all of the tools are available on your instance. They're also all available from from the VGP page on docstore like I mentioned, everything is currently runnable on both org and EU and hoping to get it available on the Australian server soon. And those will have extensive training material available through the GTN, which will allow you to assemble your own genomes as well as to assemble any with that any data that is available in genomark, because again that is available from the importer now. We're hoping to obviously do some more genomes to increase pace by running it against multiple servers, as well as the VGLs compute increase automation, and just more places are running and as people become more aware of the VGP, they can start running their own and helping out the project themselves. We also want to be using some of the more some of the IWC enhancements so for example. There's an alternative workflow pass such so that a workflow depending on what data is available for any given species you can run a different version of the workflow from a single invocation. More will be discussed about that in posters that Tyler and I will be discussing later, as well as a talk on Wednesday. Furthermore, we want to start annotating the workflows that are being output so we're going to have some workflows on that as well as some availability on the Australian server. Thanks to everybody who has helped out on this project as well as some of our funders and love to have some questions. Thank you. Wait, sorry. One more thing for the update I forgot to mention very importantly related to genome sequencing hubs we've got some buy-in from local projects such as the Sidra HPC and the blue guitar which is set up a galaxy instance to sequence some species names is there and culturally important there and African bio genome project I'm also helping them set up their local instance which I think the compute is located in Johannesburg but I could be wrong. That's a fun update. So you said a lot of this was run on public infrastructure so approximately how much did you save by giving Bjorn a headache versus using AWS. You mean like money if you had to pay for that computer or if you had to pay for the compute with AWS right. How much money did I save. Yeah, I haven't run the numbers and I don't think I will. Okay, but it's a good question. It is in the metrics I could tally it up on the metrics according to you but I won't. Probably still worth Bjorn's headache. It's there. It's in the metrics. They're just not counting where I was troubleshooting some of the stuff as well. Thank you. There's there's we got enough time for one question. I will do a shout out to Anna Simon is going to wave at the back. You two should talk to Anna Simon about getting the VGP pipelines on to Galaxy AU because Anna's already started that way. All right, time for a 30 second question. What's the annotation workflow going to do and how heavy is it in compute. I don't know. I am new to genome annotations it will be deferring to like Anthony and a couple other experts for that, because we have been usually just waiting for NCBI to annotate it once we get our RNA seek and I said seek data uploaded. It's nice to have also like a quick and dirty annotation of just like maybe repeat content of the finish assemblies, but TV determine it and if you want to help make that happen. And also, what about the transcriptome like is that something that will also be done like a separate transcriptomic assembly worker. I don't know if we generate enough data for that usually as part of ours. And really quick plug before we walk away we have a post we didn't really get to get into actually the nitty gritty of the pipeline so if you want to talk to us about that we have a poster on Wednesday and you can come to the VGP training on Thursday. Let me start with. Thank you again. Thank you for organizing us to getting us there is a great opportunity to see the great country. But oh my gosh to get here. I write today 5am after 26 hours trip so if I fall asleep don't blame me please. I'll try to do it quickly so that's a short slide for motivation and traduction so basically we have this great compute resources in our lab compute supercomputer number one on the wall supercomputer number five on the wall. We have great experimental facilities great scientists but the part which is missing is the one that glues all the stuff together that basically a lot of work has to be done manually. What we thought this morning call so that the science I don't want probably maybe to focus on infrastructure on coping data on starting stuff they want to concentrate on their work. That's why we started looking into galaxy we're lucky that we found galaxy basically our job what we've been doing since a year or so it's adjusting galaxy or part of the work is adjusting galaxy to our structure. It's erratically if you read tutorials whatever galaxy can run everywhere laptop cloud HPC Kubernetes. That's true but still to make it work some some efforts are quite and we've been doing that we have multiple resources which are available for users. We have experimental analysis cluster some cloud machines HPC in cloud HDGX nodes and we have this big machine summit and frontier. So, I split it into three parts what we what has to be done basically we have to to make sure that we can use this multi cluster multiple computers also. There's HPC cluster cloud, whatever, but this all kind of multiple isolated islands of computer resources and what we want to do to make it transparent for galaxy user. Then you have this authentication authorization problem and the same you have different islands with different security requirements on HPC should be the fact authentication on cloud that should be some other authentication analysis cluster we can have external users we can have internal users and how idea our goal again to make it transparent for galaxy users so that we use galaxy and don't care about authentication. And also require that the job should run behalf of the user we can just have some service account trans galaxy jobs because of the accounting of the data access and so on. The short part may be probably the most complicated violence. Again, we have multiple islands of storage multiple islands of data, have data in HPC we have data coming from experiment data stored in the cloud. It's definitely even access more to this data is all open data data data and again we want that to be also transparent for the galaxy user. And here the condition is that we don't want to basically we cannot duplicate the experiment produce petabytes of data and we cannot just create another object storing galaxy and copy data there so galaxy ideally should stay in place. So how we implemented all these computers relatively easy. Thanks for galaxy we have pulsar galaxy rabbit in combination of this free basically did the work for us. You have these different islands and we start pulsar on the island and then galaxies and jobs to pull south for a bit and pure a bit and pure is very useful for the changing direction of the flaws to deal with firewalls. On how access summit we have this Kubernetes cluster which is basically you can start a port on this class and this class to have connection to summit and submit job to summit. So we also start pulsar and Kubernetes and then it can accept jobs from them. That's easy. With more difficult identification authorization what we did using code to see tokens that was also available in galaxy. We don't have galaxy user like and database or central identity provider and galaxy user can log in for this provider and then we receive the open ID. And we start passing this talking to all resources where we run galaxy jobs and using this token using different authorization mechanism allow or not allow to start the job allow or disallow access to the data. For this he had to do some changes to galaxy and pulsar first one was to auto refresh this token because previously when you receive this token when you look into galaxy get this token was stored on database basically never updated there. So now we do this auto refresh and we expose this token is galaxy endpoint for pulsar so that the job running and for pulsar couldn't get this token fresh token from galaxy and use it. Also implemented an option to run Docker container as a user because currently you can hard code and configuration. A user to run Docker container but we want to be dynamically so also use this token with check have this parameters for the destination and you can extract them. And then check the token and run this Docker container as a user. Also created a couple pulsar plugins to authenticate authorize job on the user base currently pulsar does not know actually which user submitted the job which I can check the tool if the tool for the destination of here not. We also implemented it can have a look into the token again extract the user name and using different plugins decide if this user allowed to run the jobs or not. This briefly about authorization data management. Just a reminder what we have what we want to do. So, again galaxy has this object store. You can use the object store and we use it and we use it to put data and get data from galaxy and we also use in from pulsar site to get data from object store before you submit the job. And put out put data set. But I'm saying here put and get instead of upload download because if you don't want to move data, or it's not necessary so in some cases we just keep data in place and just work with metadata. So, this galaxy object store should be able to work with these different islands of data and have implemented here using crucial plug in Russia is a distributed management data management solution. I'm from somewhere they use it to store data and different tears around the globe. We created this Russia plug in for galaxy object store so basically galaxy only talks to Russia plug in using a couple of functions and then Russia is configured to talk with different islands of data. There's a little bit more about implementation so basically for Russia you have to configure this Russia storage element and for different for all of this data. There's a lot of different resources for HPC for example cloud and experiment in this page in this slide. And then you can configure different protocols which this storage element can use to deliver to upload download basically date and then depending on the location where you want where you run where you run, you can use different protocols to get this data so basically if you're on the pool side that run on HPC for example you ask Russia to download file or Russia as an answer just gives you some protocol. Some some protocol, it could be politics it could be as free it could be it's also everything is configurable so ask Russia I want to download file as a use protocol and then you go to the storage and using this protocol. You get your data and one of the protocol could be the same link where you really don't copy data and just if your job is located close to this data storage then you don't have to move the data and the protocol which Russia would give you allows you know to move in the data just to get the same link. Or simple copy whatever but if you're somewhere away from this storage if you're in cloud and want to get data from HPC, then the Russia would return or some different protocol seemingly would not work if you give you as free or custom protocol to get this data. Naturalization dedication also working with these tokens. And also what implement as I said this in place ingest we don't want to download we don't want to move the data so we created a tool and this protocol and change to the Russian plug in so you can just ingest the data to the galaxy to Russia basically your galaxy says I want to. The galaxy still executes this calls this upload method but then Russia know that it should not do upload or just ingest the metadata. Yeah, and the nice feature that galaxy still sees this data set as a normal one you can click on an eye you can see this date it will be downloaded the data is not close to galaxy. So, as we said for for users that transparent and behind the scenes that's all this data movement happens or not happens. Some changes has to be done first of all this Russia plug in we created quite similar to I wrote since three plug in that I've done still has to be refactor because I learned today, you'll do refactor the other plug in. Another feature that we had to use this set meta from Pulsar not as a part of the job because normally when you submit a job to Pulsar and the job script job or us and at the end of the script. We execute basically sending metadata and uploading file somewhere to object store. We did not or we could not use it because the job can be executed on some summit note and we don't have control there we cannot install galaxy dependencies that we have to change it to introduce an option that after job has finished Pulsar calls basically the same script to upload the data. Download from object store only when needed again as has it was implemented before. As soon as you touch this get file name function the data has been downloaded from object store into the galaxy cache. You can also split this functionality so until you clicked an eye button the galaxy data would never leave its original location and in many cases you don't need it if you run workflow steps on different resources I don't. So then there was no data copy to it's not different resources then data will be copied between these resources but it will not be copied unnecessary copied to galaxy until you want to really see the download from your client. And a small one displaying binary file instead of downloading. Some PRs are accepted a couple or one is pending and we still have to do a lot of them. So that's basically the final architecture. Altogether we have galaxy clients the authorized for the density provider we send the job for a bit and cute. Pulsar Pulsar talks objects store gets the data submit the job and push the data or register the data back. Work in progress. My DS recently implemented this live job output that's what our users wanted when you run some tool you actually don't see the output and tool has been implemented that you can have this live viewing galaxy. It's pretty nice interactive tools on remote resource that I discussed with Nate it's already working, but for us did not work when we started for Pulsar it starts interactive tool somewhere but we don't get the feedback to galaxy. We have to see how that works. Generic tools for HPC jobs also we had to do quite a lot of hard coding inside the tool itself to make it run on summit. We don't like it discussing how to make it better. Another one this workflow stop restarts. I'm not really sure if we like it but our user sometimes asked for it so they start that job. We have now this live job out but then they see oh it's already finished or conversion good enough or something like this or not. They want to stop it but they want to kill and delete all data. And if it's workflow maybe then restart the next step of the workflow using this results. It's kind of break this reproducibility so that's why we don't read Michael. If considered as a kind of interactive part of the work while using still investigating that might be a nice feature as well. And we owe a lot of PRs to main galaxy repo. We should be working on it and we will do. That's basically it. Thank you for attention. I have to mention that's only the first part tomorrow. We'll do the second part basically how we use all of this. Got got time for one question up the back here. It's for the zoom folks we need. Yeah. Really. I mean I'm amazing talk. I remember you came last year and you were starting out and like you have a fantastic vision there. I think it's not a question. I mean it's just I mean really really really good job. I do want to say like all the things you've listed these are really solid changes should have done them already for other projects in the community. Please please do also open the list of things you want to see. But if you say everything you said makes a whole lot of sense the architecture is really good. Let's do it. I mean really really happy how you've done this also your contributions are amazing. Keep going. I mean really good. Thank you. We do our best. I think that that's because we try to get a feedback from our user and we're lucky that we have several users and we work with them. We ask them permanently what do you need. But I agree what I want to want to say we need to collaborate with you guys more because I learned that you did something that we were already doing for maybe there's already some solution. And then the bicycle the second time they want to change it. But thank you. Thanks again. Thanks again. Up next we've got. I think I'm going to use your your nickname is going to talk about gene polymorphism and the links to Parkinson's disease and how you've investigated that in Galaxy. Thank you very much. Thanks for inviting me and this is great opportunity so I could extend this thanks to Galaxy for extending. This support to us regular biological scientists who are sometimes stuck in the world of bioinformatics. So well done guys. I think the galaxy is improving from yeah every year. Yeah it's better and better so I will definitely recommend my students use that more. So I will probably talk a little bit completely different than what Sergei was talking so it could be closer to some hearts or further I'm not sure. So it's about superoxide dismutase and how the polymorphism in this gene could impact basically certain diseases in this case its main focus is on Parkinson's disease. But just before I start sorry, I have a tendency to talk too long so it's seven minutes I realize I will have to be. I am honestly yes. I know. So, just to mention that big thank you for a number of people that are enlisted here my student Mia she did majority of work and there are other colleagues involved in the project so I'm just thanking them for all support here. And just regarding now the impact of a certain genetic background on Parkinson's disease so I don't know if you watch back to the future maybe anyone watch back to the future so I did. Hence, maybe younger generation not yet. So majority people did actually so if you know Michael J. Fox famous actor. He got Parkinson's disease quite early so that's a very unusual and it's mostly impacting population who are above 70 but there are some people who are impacted very early like Michael J. Fox and in the case of Parkinson's disease. There are a number of interesting things that I will try to make short just overview if, if you don't know, many of people impacted by Parkinson's disease are having a bit of shaking movement. They are having that trap so called like clinical picture so rigidity stiffness. They are also having impairments in voluntary movements so there are lots of involuntary movement and a bit of posture issue so I have even family member being impacted so it's for me close to my heart, working on this. Area so this is a little bit of overview because there are some certain molecular markers that are overlapping with dementia and Parkinson's but because of time I will not go further. There are some loose bodies that are found to be present in certain people with Parkinson's and regarding the development of the disease. It's complex multiple factorial disease because there are a number of factors that do have make a like impact on this disease. Today focusing on genetics but just bear in mind that there are of course there are other factors like environmental exposure and even gut microbiome so it was good really talk because there is a very synergy about that microbiome and predisposition so into lots of fibers if I could say keep your gut healthy. So, today we are focusing on genetics and just super oxide this with days is one of these genes that are enlisted to be potentially important here. So, they are around 5 to 10% they say now even more people that with genetic predisposition which is very likely in the case of Michael J. Fox actually, as he developed that very early. So, we will focus on that aspect regarding genetic polymorphism of super oxide this with days and how this polymorphism increase actually oxidative stress and has a very good hypothetical chance to develop to lead to development of Parkinson's disease. So, that was the goal of just an overview quickly and what we did we applied galaxy tools there are multiple other tools I'm aware that we should extend our use of galaxy even further so will definitely more now be inclined to use galaxy for even other databases and to access it so the poster is tomorrow so if you have a specific questions and if I miss something to say please come tomorrow to but there was a sequences that were retrieved from databases and they were then filtered. Okay, very close. So anyway, so we use galaxy. I will give you an overview here so database were used to get sequences then we did a bit of filtering using certain tools, such as SNP AFF and we then did all this process to get towards the laterious variance so it was. Yeah, long talk for seven minutes. So, but we ended up with seven variants, which were selected through this pipeline workflow. And so we'll just go here so SNP AFF was used plus the NCBI clinical significance tool for finding out which potentially of these variants would have impact because they are over, you know, 334,000 even more. Yes, variance so you needed to filter really closely on where where is significance and which sequences could have impact so multiple tools were used multiple sequence alignment and then protein modeling and identification of domains to see and quantify what impact is on the functionality of protein. So, to make story short. So we basically further extended knowledge regarding genetic so polymorphism by exploring genetic variation and how they could have implication on functional domains. And we identify seven very likely harmful so variance. And that's basically it future studies are to come. Thank you very much and just wanted to say that I'm editor for this journal marine drugs so you are really welcome to potentially submit your paper. So one of my papers was published there recently on genome mining so I think galaxy users could look towards that in the future. Thank you very much for your time. Thanks. Thank you. We don't have time for question the time for questions but a great great example of galaxy success stories. Thank you. Last but not least we're going to have a change of a change of pace and move away from molecular biology towards ecology and hear about the galaxy for ecology and galaxy project. So I'm very juicy from the National Center of scientific research of friends. So I will speak about galaxy for our system and biodiversity and about the various project on how to how the project want to create another network using your galaxy. So first I will talk about galaxy ecology. How is it how it's an inspiration for the galaxy Earth system. Then what is the system model and finally what's a nationality club. So galaxy ecology was created in 2018 by the French biodiversity data infrastructure is a transnational initiative for the analysis of biodiversity data in research and expertise. It's a European instance. So I will talk about some of two of the workflows that are already on galaxy. And I will talk more about galaxy ecology tomorrow during an update. So one of the workflow on galaxy that was implemented yesterday is using satellite data to create biodiversity indicators so you can download data from the company or the European Spatial Agency or the platform. And from this sentinel to data you can process them and have a raster bill format that will can then that can then be used for biodiversity to create biodiversity indicators. So you have different kind of biodiversity indicators that you can compute. The first one is the identification of biodiversity hotspots with indicator just Shannon Simpson and others. Then once you have the hotspot you can do biodiversity indicators only for the canopy with alpha beta and functional analysis. So for instance here you have a map of the alpha biodiversity indicators and here a tabula of the breaker chess table. Then you can also do a an analysis on the spectral and not I spectral on spectral indices that allow you to see the well being of the vegetation so for instance you have the nominal day different vegetation index or the canopy quality content index. So everything on this workflow that goes from the sentinel to data to biodiversity indicators are explained on galaxy training networks. So this new workflow is a first step toward having an analysis on the earth critical zone so the land degradation. Another set of tools that is available on galaxy ecology is focused on the biodiversity of the ocean. So for that ocean biodiversity biographic information system which is a global open access data and information clearing house on marine biodiversity for science conservation and sustainable development. So in order to visualize the marine data obviously created the package of this indicators which I use to create a galaxy tool that allows you to to compute these five indicators and so here you have an example of the output as a map for the channel index. So this set of tools allows you to have some information about the marine omics. So I'm speaking about land degradation and marine omics because these are two topics of the ferries earth system model. So the system models describe the atmospheric and ocean circulation and thermodynamics the biological and chemical processes that pick back on the physics of climate and grid over the surface of the earth and underneath the surface of the oceans. So we have two main topics here. The earth critical zone with the land degradation using Sentinel to data that we had that I explained just before and marine omics with obvious indicators. So these two topics are among the topics of ferries that that was designed to have another system model. So we have five main topics ocean biogeochemical observations. Then we have coastal water dynamics, the earth critical zone, volcano space observatory and marine omics observations for these five topics are for the ferries project. Project a an assist are creating a system model. This project is a European open science clock project it's a new approach to observation and modeling of the earth system environment and biodiversity. It ends at getting interdisciplinary data discovery and access service and to create another little analytic lab and data lake. It's for policy makers resource providers civil society and general public for scientific users community and operational forecast services. So we will focus on the earth and I think lab as we want to try to create one with galaxy. So authentic an authentic lab is an easy way to visualize analyze and process environmental data on demand. So here one of the solution is galaxy to crew the idea is to create a galaxy instance with a set of tools for our system environmental and biodiversity data. So that way we can use multiple data set to process them and change them in one environment. So to do so we were inspired by galaxy ecology and galaxy climate. So, from the galaxy ecology instance we have tools for accessing data, for instance, the occurrence tool from the occurrence tools that can retrieve data from GBs or obese. We also have from the galaxy climate the common is the Copernicus climate and atmospheric stores tools that allow you to retrieve data on climate. So as I told you earlier you have the biodiversity analysis tools that are that we can rely on to start our work on different topics. So for instance, the Sentinel to about biodiversity. So when integrated tools for the earth system I use the pendulum ecosystem tools with Jupiter notebooks and interactive tools as an inspiration. And finally, I reuse all the galaxy 20 networks that are available from these two instances. One of the main issue here is that we have multiple topics with multiple kind of data sets so it's a real challenge. So Ferris wants to open gateways for the field of earth and environmental sciences by addressing the limitations of the current digital architecture. One of the reasons that everybody is asking nowadays is where the data how to store them how to access them and is the data like the medical answer. So for now we don't really have an answer to that, but we know what kind of data that we want. So we have for the coastal water dynamic the data on river discharge meteorological and oceanographic condition data from satellite sensors for the earth critical zone. We also have satellite data such as Sentinel one or Sentinel two, we need the data for the soil databases and national and regional earth critical zone data sets for the volcano topics also need remote sensing measurements and more, more furthermore we need access to the Copernicus Sentinel open data hub. And for the two other topics, it's quite the same we need in situ data access to them and remote sensing data for marineomics, however, we already have some of the tools on galaxy ecology that we can use. For instance, with the global biodiversity information facility so the occurrence tools that I talked about earlier, that allow us to access data from from the gym. So, once we talk about data access we need to process this data. So for the coastal water dynamics workflow, it focuses on the coastal marine environment near river estuaries. So it wants the aim is to follow the evolution of plankton blooms or the transverse and fate of neutral of nutrients, carbon and contaminants. So here to create this coastal water dynamic workflow with the aim is to change three tools using net CDF files as inputs and outputs so to do so we have, we will have three interactive tools. So we already want have one divinely that performs and in the end dimensional variation national analysis greeting of habitually located observations. So this is a Jupiter notebook tools that is already available on galaxy Europe. So this tool allow the user to browse among all the notebooks available for this for this kind of analysis and can input and output net CDF file that can be then used in for instance ocean data view, which is a desktop interactive tool for exploration analysis and visualization of oceanographic and other georeference profile time series trajectory or sequence data. So this tool is also available on galaxy Europe. And then the final aim is to change this to tools with source, which can, which will be a Jupiter notebook tool also that can calibrate and validate ocean model within a selected special domain using in situ observations. So this is the workflow that we work to do on the most on and will soon be complete, I hope. We have earth critical zone aims so the sustainable development goals created a target to combat the certification from this target indicators to monitor the poor portion of land that is degraded over total land area was implemented and from this indicator three sub indicators were chosen so the vegetation productivity the land cover changes and the salt organic carbon chances. So these three sub indicators are calculated by to the trends of software which possessive raster data set and allows a pixel based qualitative index of potential degradation. This software is a QGIS plugin. I'm trying right now to take the script out script out and create three different batch tools that can process the data and have vegetation productivity and the three steps indicators. For now I'm having Google access issues with that so if anyone has some idea about that I'm all ears. For the volcano topics they propose to join and to make a joint analysis of heterogeneous remote sensing observation for the monitoring of global volcanic activity allowing the focus on any major volcanic eruption worldwide. And for the final two topics we have biogeochemical that wants to provide a common platform to data scientists in order to qualify calibrate validate bgc data from sensors deployed on various platforms so we want to use galaxy for that. And finally we have marinomics, which wants a platform that can provide the ability to analyze spatial and time comparable marine makeable metagenomics data sets for the exploration of biodiversity and its collaboration with environmental quality. So for all these topics there is multiple data set and multiple workflow that overlaps and so we want to reuse these different workflows and Dexter in other kind of process. So to do so, we have a collaboration with the project for our science gateway. So the partners and the colleagues from our science gateway came in France to make a two days galaxy training where they were helping to teach the ferris partner how to use and why use galaxy. We had one day of hands on to integrate tools with the attendant, we were able to integrate five tools at the end. So two of these tools were LGF viewer and scoop which check the quality of oceanographic data. So the collaboration of the two years project can get efficient cross discipline workflow by creating sharing and reusing tools and workflows on galaxy. And so this collaboration is here to help us to integrate all of this system topics on galaxy and create. We hope at the end her system use galaxy. Thank you for your attention. And if you have any advice comments and other remarks, I'm all ears. Thank you, Marie. We have time for one question. Great talk, Marie. Thanks a lot. Can you tell us a little bit about the problematic or maybe it's not problematic aspect that your tools are more or less running as either Jupiter notebooks right or even desktop applications in the cloud. Is that a specific and more complicated problem than just integrating galaxy tool. So where do you see problems that we might or can help you to make that easier to run those. Yeah, VR is virtual research environments for you. Right now for the tools that are already on galaxy. I don't think there is any kind of troubles but I think that some of the asset topics that the partners of Paris would like a persistent service where we don't have a tool that stopped at one point and keeps running. So I don't know how to say that exactly but if galaxy the aim is to use galaxy also, it would be great to use galaxy also as a service and not just as tools that will one day stop. And we have to relaunch them and use them so that's one part. And then we have, I think, a lot of tools that will be coming on galaxy and some of them. They are using multiple kind of language and the huge kind of data access and that's some stuff that we will be needed help. I don't know if I'm really clear or not. Great. Thanks again, Larry.