 Hello everyone, welcome to day three talk session five on new galaxy tooling. I'm Stephen Manos. I'm going to be the emcee for this session and we have Alex and Tyler Speaking up first from Hopkins University. Thanks All right, I'm Tyler. I'll be talking about Alex on our we're talking with Alex about our work on a single cell spatial pipeline And some of the advancements we've been using from the IWC workflows and how to implement it So if people aren't familiar with a single cell Experimental design, it's essentially trying to Capture transcriptome information from within single cells and is used very often for identifying cellular subtypes It's often a pretty common experience at this point. It gets fairly high use on galaxy Very often you'll see it used in Drug design experiments where you're trying to determine a drug effect across a variety of cell types Or if you're trying to determine changes in a tumor micro environment, for example Additionally while you've been working on it, there have been a lot of advances in the field so One of the big changes that's happened recently is the include or introduction of spatialomic data into it So rather than just being able to look at single cells In an aggregate sense across an entire tumor They've now included high resolution tissue imaging data and have a couple Experiments that focus on that you're able to combine that high resolution tissue imaging data Along with a single cell transcriptome information And you can combine those two pieces of information to give yourself a three dimensional cellular atlas in a way So you can look at how those cell types are aligning for example in a tumor micro environment in three dimensional space And we'll give you a good indication of how those cells are aligning correctly If you have certain immune cells clustering around different cancer type cells Changes in gene expression in certain areas of that tumor and it's very useful for tissue analysis However, when it comes to workflows, we've had Several problems with trying to create them most of the single cell data And input data types are very non standardized and very diverse And as a result it makes it very confusing to try and have an input workflow This problem has only been confounded with the inclusion of spatialomic data and several of the other advances That have been going on in the field And as a result we've been Working together to try and use some of the new advances that have been introduced into the workflow editor within galaxy To try and make a more general workflow. They'll be helpful for other people and I'll pass off to alex talk about that Thank you tyler So the iwc exists to have standardized workflows that are available for everybody to Focus on each of their types of analysis, which is great. We have dozens available variant calling the vgp Has all of ours up there epigenetics But the non standard approach to single cell Sequencing makes this really difficult to have a standardized single cell workflow on the iwc Because you might be coming in with 10x data You might be coming in with other formats of data You might have different analyses you want to continue with and that's going to require different workflows each time Currently we have a bunch of single cell workflows available from galaxy training and we have a few that are being Put into a few workflows that are being put into the iwc at the moment But again, these are not going to be catch-all cases where anyone can just bring their data and start running it So to that end the enhancements that have been that marius highlighted earlier That have been made to the workflow In two workflows in galaxy can be used to create Much larger and more complex workflows the direct input parameters conditional Parameter inputs parameter mapping and parameter generation based on inputs allow for what we're starting to call Meta workflows where you can have domain entire workflows where you choose a path So currently we are working with squid pie as tyler mentioned and squid pie is a A tool made by the cybers team, which does spatialomics analysis after a 10x basic single-cell RNA analysis however A user might come in and have a different analysis direction. They want to go to they don't have their 10x data So this is an example with squid pie where you come in and we ask you immediately Do you have spatialomics data available? And if the answer is no, we can hand you off immediately to one of the existing training workflows At or if you do we can ask you please input the specific squid pie data type that we have implemented, which is visium data And give you an entire different workflow So this can evolve further into full domain Areas not just single cell where what type of data we ask you at the beginning What type of data do you have? What direction do you want to go? Do you want to continue a workflow? Or do you want to terminate early such that a user can Can basically design their own workflow as they go As I mentioned, this allows for domain workflows. This also allows for project workflows So with the vgp stuff we've been working on it means that depending on what type of data you bring in You can run a different version of the of the vgp analysis that has already been discussed earlier And come out with proper results We wanted to say thank you to everybody who's been working on on this the IWC team especially marius. Thank you for all of your help on all of this and I would like to ask for any questions Following a little bit the squid pie Development we we see a lot of strange tools coming to galaxies that really have unstructured outputs and very complicated outputs Do you see any way how we can? Yeah approach upstream To make those tools in general more user friendly independent of galaxy or not So currently the visium input data is a large folder that involves multiple and data and data files and a bunch of image files So the inclusion of non-standard format Inputs for like inputting an entire directory as a single data object that doesn't have to be worked on by a user beforehand allows for A much more diverse downstream experience if a user doesn't if if a user can pull in The input folder that they pull from somewhere rather than having to Know how to tar gzip is what we have to do for visium going in Thank you very much Thank you and next up is julia from the university of enbra. Thank you All right. So hi again. Um today. I'm going to tell you about what I've actually done as a newcomer And that was Introducing new monocle free tutorials for trajectory analysis in single cell RNA sequencing So i'm glad that I'm glad that single cell Sequencing was already introduced so Having this Gene expression on the level of individual cells. We can actually group them based on this fashion patterns and identify the Cell types and trajectory analysis is basically answering the question. How are those cell types related to each other? and In order to do so a trajectory analysis computes pseudo time which basically measures how the cells progress through biological transition and it's very useful for studying differentiation and development of cells as well as different cell fate decisions how those are made But it's important to know that Not all trajectory inference methods Can be used to oh, sorry can be used to infer all kinds of biological processes and this slide explains why As you can see there are lots of trajectory inference methods and those use various algorithms behind to compute to calculate pseudo time and having this Just various methods We need to compare the results so that we can make sure that our results are Actually accurate and make biological sense However, in our single cell case study tutorial series. We only had one method which used scun pie but in in the field the gold standard is monocle free and Actually, it turned out that we had lots of monocle free tools in galaxy toolshed that no one has Used in tutorial So this is why I decided to step in and I developed two tutorials based on monocle free and a slide deck to reinforce learning And I want to show you the example from real life how Important it is to have different methods to assess the reliability of the analysis. So here you can see The cell types plotted using scun pie and monocle That's from the differentiation from the development of t cells And this graph shows the pseudo time calculated by monocle and I want to draw your attention to this Cluster that branches out from the main trajectory But this actually wasn't seen in pseudo time calculated by scun pie. So that made me think if This branch is an artifact or have I discovered a new cell type? So I decided to check that out. I checked all the gene markers and a couple of weeks later I discovered that it was just a batch effect So as you can see here after applying batch correction On this data set this weird branch just disappeared However, I had to do it in monocle in R because simply The batch correction function hasn't been wrapped into galaxy tool yet As well as some other functions that are available in monocle in R But are not available as galaxy buttons And that inspired me to Extend this first tutorial that I firstly developed Which aims at users who prefer to interact with galaxy interface instead of Coding and so Then I just reproduced the same tutorial using the same data set to allow users on different levels to To perform the same analysis basically and um Open the doors to extend learning and here is a quick shout out to people who Introduce the automatically generated Jupyter notebooks feature because it was super helpful in creating this tutorial made work Much easier and quicker and Here I by introducing those two new Tutorials I really wanted to highlight how we can use the full potential of galaxy Just in one tutorial series and basically just on one data set So here is how we can take user from curated tutorial oriented data just to introduce the user to the topic then to Get the step further and introduce the user to more difficult to analyze data, but still using Galaxy buttons and then By having a trajectory analysis method, which is maybe more advanced Analysis user can choose either the way which is easier but more Restricted with galaxy buttons or There is an optional Tutorial which takes the user through a whole analysis on advanced level and obviously, I just want to Thank all the institutions and people who made it possible for me to create those tutorials and with those key question key points. I want to Yeah, open the the floor to any questions that you might have So let's move into a little bit different field of science. So let's move to mass spectrometry um A mass spectrometer can qualitatively or quantitatively determine the presence of basically any molecule in a sample And so we use this to measure the composition of samples or to determine presence and absence of specific chemicals and How we do this with a very specific type of Instrumentation for which the software is suitable What we do is we compare fragment so spectra from fragmentation events which we induce in the mass spectrometry um And we can identify What kind of compound we are measuring and fragmenting by comparing it with a spectral library So there's stuff that we have measured before like we measured the standard We know what it is. We observe the experimental data. We store it so that we can use it later This is possible because this type of instrumentation provides very reproducible and reliable data um The mass spectrometer is usually coupled to a chromatography. So this adds a different dimension Apart from the mass to our data, which is time um And from time we can compute a certain index and this is called the retention index. So In the end we have basically two dimensions which we can use to Identify our data which is the mass to charge ratio and the peaks that we have from the signal and the time domain So we start with the raw data We do some pre-processing magic and we pull out spectra, which we believe coming from Which we believe come from a single molecule We compare them to our library And by comparing it to something that we already know we find out the structure of this specific compound How do we do this comparison? Well, we use this match mass package To compute mass spectral similarity. There are many softwares out there that do this They do it in a little bit different way But in the end what it always comes down to is you need some form of quantization of your data because comparing Vectors in a continuous space is not really possible. So we kind of need to bin our spectra to Um a fixed length vector, which we are then comparing We can use different methods for this and then we can compute a score between those which can be basically the cosine angle between this vector Right, we could also use Manhattan distance or whatever similarity metric we wish to use Um Based on the score we can then say okay these spectra are actually matching or not So this match mass package takes two sets of mass spectrometer of mass spectra compares a scoring matrix between those and then from that you can Continue your your analysis Um So what we did is we implemented a bunch of galaxy tools based on this package Um, and this is just to highlight like the functionality that we use. So, um mass spectral libraries or the unidentified mass spectra are represented in the format that you can see on the top left Um, so it's usually a text-based format, which is not really standardized. So this library tries to push standards, etc Um, but basically it comes down that you have a metadata section and a cheek section And so you can filter libraries based on the metadata that is in it. You can convert it to different formats You can export all the metadata to use it in different tools um So yeah, this is kind of the the tooling that you can do around the spectral similarity computation You can compute the spectral similarity, but you can also compute similarity based on metadata So for example, you can encode the chemical structure using smile strings and store it in the metadata And then you can for example compute the time motor score between the smile strings to get an actual structural similarity If you're comparing two mass spectra for which you have the structural annotation um What you can then do is you can also do molecular networking on it So you can just take your scores matrix and basically put it into a network where the connection between the nodes is Determined by the score so the spectral similarity of these nodes um And this you can then actually download and take out into software that is dedicated for this such as side escape or you can so Use it in mat jam as far as I know Um, we also implemented a wrapper for machine learning based scoring. This is kind of like a topic for itself um, and yeah, you can format the output results, etc Um, how the scores are implemented is this is done in a very flexible way. So this supports sparse computations and um NumPy arrays and this is just to make this scalable because you can have very large reference libraries um And in this way you can compute your score between Basically auditory amounts of data So with this I would like to come to an end. Um, I would like to acknowledge The Czech ministry of education youth and sports which is making this work possible And elixir for funding us to come to this conference. Thank you very much and now we have time for questions Thank you We have time for one question maybe two. All right Any questions Thank you beyond. All right. Thank you very much. Thank you Um next up is Andrew from the Peter McCallum cancer center in victoria. Thank you. Thank you. I mean, we've got a lot of slides Let's go quick This is fast qc. Everyone's probably seen it for assessing quality of sequencing data This fast qe which is better because it does the same thing but with emoji So the question you're asking is why does that even exist? Let's go back to a parallel session at pycon in 2016. I'm speaking in the middle So left of me there's talk about bioinformatics to the right talk about emoji I wish I could have gone to other talks By the end of my talk maybe my audience might feel the same way So what do I do? I did a one more thing or I combined everything that was happening My love aside projects bioinformatics and emoji. You look at fast q data which we encode scores and ascii characters Why use aces ascii characters? Let's use emoji instead And then my last slide was all let's do something typical, you know semi-realistic Let's do the average quality in a couple of sequencing files represent with emoji And that should have been the end of it really I got a support request I put it on twitter and I said oh, I'd be good for education and public talks My friend is like no Andrew finish your phd And it was hard It was hard, I'll admit But I have a brilliant solution where I think of some Excuse to work on it and then I'll work on it But that's all I did and so we've kind of limited it to once a year I'll kind of do something with this every year put it on pipi takes gz files and it's going pretty well until You know now you ask why is it on galaxy? so once a year, but then Pandemic happened and we had bcc 2020 which was half bosc and half gcc And so I spoke about it said please contribute to fast qe so I don't have to try and build up the community So it's not just me but actually I was speaking in a parallel session with a galaxy talk So I did a one more thing which was with this work on galaxy And that could have been the end of it there as well But then we had co-fest And then Helena wrote a wrapper for galaxy maria started doing some training materials And so suddenly the once a year development approach has then become a galaxy Story so we've got put in the wrapper was put in then got added to the gtn Then the last couple of smorgasbords Um farsky's been in there one quite. I love the silent hero was fast qe But it can get better. So fast qe doesn't have a html interface So the wrapper is very elegant It does a lot of heavy heavy lifting with said and it wraps the output into html with that line down the bottom there And that's the report you get in galaxy at the moment So can I improve it? Well, there's one more version than then's linked to the wrapper There's some custom options that aren't in there and the tutorial makes it clear that you use fast qe for short reads But it's not going to work for long reads And that was a bit of a challenge for me So galaxy informed design There's a html output, you know html is rich. It's interactive and it needs to work with long reads So this is the new interactive report. I wonder if this video will work. Yes, it is So the new version is a lovely interactive report There's a mouse over you can see the max of the minimum and mean at the same time You can go through and you can see at the end the quality gets to those emojis And you get some sequence information as well in the new version So It makes the wrapper simpler You have to enable html because there's a lot of javascript going on there But the code's a lot cleaner as I said you got um those Tips there and then the window option now makes it work with long reads So you can summarize in the bottom here and really collapse long long reads and compare short and long which might be good from an educational point of view Um now that there's a html report. It's not just galaxy So look up for multi qc in the future where fast qe is going to be in there as well So my summary is that improving fast qe for galaxy has actually just made it better It's simplified the development and now we're easier to add new features and um just in the training network materials to maybe put it in a few more places So I just like to thank everyone that contributed during co-fest and and put the wrapper on there Alysia for indulging me when I said I wanted to come up here and talk about it And I've gone pretty fast. So I've probably got time for one more thing Um So back in co-fest we kind of went back to another idea Which was originally I said could we convert a file into fast qe rather than summarizing the quality? And so out of co-fest came by mojify So instead of just summarizing we can now convert files completely to an emoji format So if you ever looked at fast q format, that's arguably easier to read than what's going on And why stop with a dna? So you could do it with proteins or even variants Which I can you know, maybe past I've actually related to my working cancer. So Biomodify exists And now I'm thinking could we get on galaxy as well? So all the things I'm talking about today I'm thinking about the biomodify And maybe we might see you next year and we'll talk about that. Thank you Thank you Do we have time for question? We got time for one question Anton What about cigar strings? That is a great idea The only thing stopping me is funding. So if anyone wants to fund it, please come and talk to me after this That would be great Also, bam files. I've got ideas about that and yeah and follow genetic trees. It actually works really well Be it anyway. Yeah The copy stopped Thank you. Thank you. Thank you very much And next up we've got Nate from Penn State University. Thank you All right, thank you. Uh, it's gonna be a tough act to follow but I'll give it a try Um, so I'm talking today about the intergalactic data commission. This is a project been worked on by a number of my colleagues um, and I and Uh, what are we talking about with the? Intergalactic data commission. Whoop We're talking about reference data. And so of course the first question you're going to ask is what is reference data? And this is anything in galaxy that is going to be expensive to pre-compute Um, or and or it's going to be reused a lot Or it will consume a lot of space. So the canonical example for this is Uh, genomes and the various indexes that you have to build for different tools. So You might be surprised to learn that for example, bwa bowtie to high stat to star all these things They all have different indexes. Uh, that have to be built for each Genome that you want to make usable. So you have to have the genome file itself Then you have to build these indexes that are specific to the tools Occasionally you want to cross something that uses a different tools index, which is nice, but that doesn't happen very often um And so why do we want to? Do something special with this reference data? Well, just to take as an example grch 38 build of human genome If you look we have uh in in our copy of galaxy. We've got three Uh different versions of this. I don't really know what they are but Um canonical female fool and each of these genomes the fast-a files are three gigs, which is not really that much And then you can see we've got different indexes and this isn't even all of them. It's bowtie two the old bowtie Whatever, um, each of those indexes are in fact larger than the genomes themselves Uh, in fact all the way up to bwa math index, which is 17 gigs and the star index is also generally very large much larger than the genome um In total the hc 38 then uh a folder is is uh containing about 80 gigs And so you can imagine with your 250 gig quota in galaxy. You don't want people to be Making lots and lots of copies of that. I mean not just for their own space But of course as you as a galaxy administrator don't want all that That many copies of the data So we've been able to deal with reference data in galaxy pretty much from the very beginning It's stuff that you just have on disk that you make available through through the galaxy ui In the tool form so you can see you select this option In hysat two in this case to use a built-in genome You can pick which one it was or which one you want And if your thing isn't in there you do this nebulous sort of contact the galaxy team thing whatever that means so Originally you had to hand build this then in 2014 Dan uh added galaxy data managers And data managers are special tools that You can run to do all of that Heavy lifting that you previously had to do yourself Automatically through galaxy as a special admin only tool So this is great for for fetching those reference genomes for building all those different indexes And it's also great for non genomic tools as well. It doesn't have to be genomic data Although that's what we talk about a lot And data managers install the data into your galaxy server Wherever you run them so If you're looking sort of across the class of different galaxy servers And so we've got the use galaxy star servers up here and maybe yours at the end How how do data get updated in these different servers? Well, each each server probably has a different method Some of them more manual or or less controlled than others And then what's the process for that data actually being built You know you use got a process a use got a process on org it's Kind of broken at the moment But unfortunately then every every server is different at some point I took all of mine my data that we had on use galaxy org and we dumped it into a cvmfs repository and So a lot of people use that but it's also at this point very out of date So When we're talking about data managers and sharing data between galaxy instances, what do we get from data managers? We get that that admins can easily install and build data They can Do this in a relatively uniform way right it controls how the bwa index tool gets run It controls how the bow tie to index tool gets run The command line arguments and so forth And they can be used to generate the data that I then later dumped into cvmfs But the things that we we can't do with data managers So data managers are awesome, but the things that we can't do as far as sharing between instances go so when You run a data manager or the admins run a data manager for galaxy europe And I run the same thing. We may not actually get the same thing out We should but it's not guaranteed You're running two separate tools on two completely different infrastructures Maybe you have a slightly different source of the data or you ran them at different times You know ucse where we get a lot of our genomic data used to update things silently behind the scenes I don't know if they still do that, but that can be an issue so Right additionally some of these indexers require a lot of resources to run and There are manual processes involved in all of these things so I said like Our data on use galaxy.org are fairly out of date Largely because there is this manual process that's required. We build it then I have to dump it over to cvmfs Used to be multiple people who could do this, but now it's pretty much just me. So that's not great But the biggest issue with this process is You end up with a lot of duplication of both the effort and space So all of the big galaxy servers want all these model organisms. They want all these indexes for the common tools Why are we all building them individually? For our servers and hosting them and using all that space and spending all that time So the solution that we've come up with is this project called the intergalactic data commission or idc Uh, but if you go and look at our repository, you'll see that the first commit to it was five years ago Why am I only talking about it now? We've been trying to get it going all these years So What were the blockers for this project? We needed to design a usable system for community curation and maintenance of reference data We needed adequate compute resources to run some of those really intensive data managers Um, and we wanted to generate a single Canonical source of data that's reusable by everybody not just building off of that very sort of old out-of-date hand-built Repository that I used or generated And I think we needed to admit to ourselves that there were a lot like we kept on saying Oh, well, we have all these pieces That do the things that we need to do But putting them together is a lot of work So here are the pieces We we Have a way for people to contribute to say what they want in form of github pull requests We have a system for running long-term jobs and reporting on results. That's jenkins We can run data managers programmatically with ephemeris We have galaxy servers actually that we can run our data managers on in the form of the use galaxy servers themselves They have huge compute back ends Lots of stuff that can be done there so And we have a script to export that to cvmfs But we had to build all of these pieces the ones in purple to actually put the whole thing together So, uh, what did we need first we had to define what data needed to go into this this repository um, and we sort of had a version of like Telling ephemeris how to run data managers, but that's not really what humans are interested in they want to know Things like, you know, what's the doi associated with this data? Where did it come from? What's the source? What's all this information? Which indexers should we run on it? And so simon came up with this this format for genomes.yml And so we can define what it is that we actually want to run our data managers on Next step is we have to figure out what we have and what needs to be done So this seems like maybe a simple problem, but it's actually a little bit more complicated than you might think and so John Chilton my colleague wrote this the split data managers utility based on work started by simon that Takes the set of everything in that genomes.yml file Subtracts out the set of things That have histories on usegalaxy.org that we've built them for and figures out Okay, here are the things that are not built that are in the genomes.yml And spit out a bunch of tasks for us to turn over and produce data The next sort of a piece of the puzzle then was That we needed to Figure out how to actually automate this this process of putting all these pieces together And turns out we had a fairly useful model for that already because the way that we install our tools on usegalaxy.org I don't click through the the admin UI and install tools that way We have a github based pool request building system You know people submit what tools they want installed It runs through in the background on Jenkins and installs them pushes it out to see even with us Which is exactly what we want to do with the data We just need to do something slightly different in that we need to run it directly on usegalaxy.org for the compute resources We wanted We'll talk a whole lot about this we wanted a breakable server rather than using directly against usegalaxy.org But we want something that we can sort of mess with and not Break these galaxy.org so we have an ephemeral server that we can stand up, but it uses uh usegalaxy.org's resources and database so that the stuff is persisted forever And Mike showed this picture yesterday. There will be a quiz on at the end study it no pictures I don't I put this I love this thing. So I sneak it into every talk that I can But it's mainly just to show what what's possible, but and it's the infrastructure behind usegalaxy.org it's not really meant to be readable But what we needed to do is connect the database which lives in texas and the data storage which lives in texas and is nfs mounted We needed to connect that across the internet to a Where the builder runs on an open stack cloud in indiana and if anyone Is a sys admin, you know, you don't send nfs over the the internet But we came up with a solution If you haven't heard me evangelize tail scale, it's magical vpn solution come talk to me afterwards And I will tell you how solve all your life's problems But it made it trivial to connect these pieces. We added minio on top of the storage as an s3 backend and then We needed the ability to actually run the data managers but not have the data be installed into the server where it was running For this john wrote Tool data bundles so you can run a data manager, but it doesn't install the data and then you can copy it back out of galaxy You can export it and then you import it back into galaxy A different galaxy, which is how we then move this stuff over to cvmfs To publish it out to the world We write it using overlay fs. I don't have any details on that but So we put all of these pieces together you can request data today We'll work out some of the issues afterwards, but Yeah, you go to the repository you make a pull request and eventually it will end up in a Repository that you can mount up on your galaxy server. You don't even need root on your cluster and You can use it. It's distributed around the world Through all of these different sites And you can set up your own stratum one mirror if you want That slide is out of order And so what's left? We need some curators for the data. I'm a system administrator. I don't know what to do with the data I don't know what should get into the repository. We need scientists to help And finally, so what's with this funny name the idc? Well, the idc is for reference data what the iuc is for tools and what the iwc is for workflows And the idc is what is a project that that simon was fairly passionate about especially after we solved some of his other problems He wrote the the proposal that determined the structure He wrote this tutorial for the gtn about reference data. And so Tamara tells me he was pretty excited to be part of something as nerdy as the unicolactic data commission, but We figured we figured in in remembrance of simon. We wanted to do something To to remember him by this project that he cared about so we're calling it the simons data club We're cheating on the acronym. That's fine Uh, so thank you everyone Go forth and make data Thank you. Thank you for that tribute to simon as well. We got time for one question Thank you very much. This looks amazing. Um, does this currently only support like specific data types? So only for genomic data or can this be scaled to any type of reference data? So it should work with a little bit of extra work on anything that we currently have a data manager for If it doesn't have a data manager, then someone needs to write one But it should work on anything that Essentially already exists as data table table data in galaxy Thanks. Yeah And next up we have new one from the university of melbourne Hi, um So i'm here to talk about some recent changes that were made to galaxy That make it a little bit easier to Fetch protected remote data sources into galaxy for example, um See if you have a protected web server an http web server or if tp server How do you get that data into galaxy and if your users are already logged in Can you do that without requiring any more passwords or logging in a logging process? so So, uh So to answer that question, uh, first let's look at how galaxy currently handles data sources So broadly it can be divided into two groups So url handlers like general protocol handlers that can You know, like http or ftp or and more recently the gf for gh dr s api for which john chilton added support for recently and So that's one group and then the second group is file sources Which are plugins that you can write for galaxy and which can interface with arbitrary browseable file sources like Dropbox or a google drive or s3 and so on So, um, if you look at the first one url handlers So these are typically the URLs you would paste into the upload dialog box And you've pasted it and then galaxy faithfully features that data Now the catch here is that that data usually has to be public In contrast file sources are a lot more sophisticated you can do All sorts of things with it and it's it's a pluggable system and you have You typically see it as a browseable list of files and you can Fetch data into galaxy from all these various file sources. You can add your own plugins And very importantly you can inject things like user preferences or credentials or read data from a vault And send that into that plugin so that the data can be fetched transparently and without any interaction from the user so So that brings us to the motivation for why We wanted to try and unify these two things. So, uh, the australian biocommons I wanted to find a way to integrate the Bio platforms data portal With, uh, galaxy australia and do that in a seamless way I think caroline talked about this in her keynote address And basically that data is exposed over The gf a gh dr s api And it's password protected. So we needed some way to get that data into galaxy australia um So what is gf a gh dr s? So the data repository service is a way to map a logical id Or to provide a data set with a logical id That abstracts the way the underlying physical location of the file So for example your file could be on s3 or http And uh, you just get one single didder's url for that and you can, uh, Fetch it, you know, you don't need to know where the file is stored um, so galaxy Supports dr s as I mentioned, but Only public urls. And so the Work here was to try and figure out a way to unify these two things so they can all take advantage of the kind of features that File sources offer so and that's what we did. So we uh, so we managed to unify these abstractions And now everything is a file source So what that means is that? All of the features that file sources offer are now available to all protocol handlers And they don't just need to be browsable or anything everything can take advantage of those features So as a result of that you can inject Uh credentials or do any of the things that you can do with file sources And administrators have a lot of fine-grained control over control over this process. So for example, uh, different sites can have different uh credentials injected and also you could Have role-based restrictions and you can even Stop certain sites being downloaded from all together, you know, parental controls whatever so So all these things are possible now and uh, yeah, so I'd like to, uh Uh acknowledge who were winter who Did a lot of the work to translate the requirements of the bpa data portal into something that we could implement in galaxy and also did a lot of the testing and uh, yeah, so, um That I think in future we'll be able to do some more interesting things like maybe injecting ydc tokens and providing single sign-on across Services. So if there are any questions, I'm happy to Thank you We have time for a couple of questions Excellent talk. I think you said that durr's uris are available for public data on main Is that automatically generated or something that a user has to initiate? so, um, I So if you paste a durr's url, it should resolve it and be able to download from a durr's API server, so that that should be fairly transparent and that's that's something that john did and it's available for Since I think this release the latest release And in addition, I think he also added functionality to expose expose galaxy datasets over durr's So that now every dataset cannot galaxy dataset also has a durr's url I don't quite know how to find that url yet. Do I know there's a standard Standard url for it, but I don't know if it's exposed anywhere where you can just copy and paste it. I need to check on that One more question from the audience. All right. Thank you everyone. Thank you new one. Thanks very much And next up we have michelle from john hopkins university Good morning. My name is michelle savage. I'm a software engineer in michael schatzlab and john's hopkins university And I work in both the galaxy team and the anvil team Today what i'm going to talk to you about is more of a marketing pitch and a refresh of an existing tool called gx admin, which is for Administrators and how it's going to be leveraged and additionally for researchers Strategic tool developers and that sort of thing and this was a challenge that arose Yeah, so this challenge pretty much arose out of Wanting to empower those types of users To make data driven decisions based on tool metrics that were either on news galaxy or their own individual instance of galaxy. So this is an example of A chart of some data that's representative of visualization That those types of users would find helpful in making decisions This chart talks about Users in a given month by tool For about a 22 month period ending in may 2023 And it's a popularity chart just showing the top 10 tools And it starts to tell a story about what tools are useful um If you in this instance the visualization is on a online notebook platform called observable And you can adjust from 10 tools to see all the tools and as you can see the story gets a little more complete And we see some patterns Like an 80 20 rule or a long pale effect where we have a few tools that are used a lot and very in a bunch of tools That aren't used as much So along with that What we can start to answer becomes a still a little subjective it's not the entire story You know you could look at it and say well Uh the tools that aren't being used very much should be deprecated On the other hand you could say the tools that aren't being used very much should be promoted So it kind of depends on what your goals are as Um an admin or a researcher So here are some other charts Um, I'm not going to go through all of these because there are many But just to give you an example of some of the types of tools that we're trying to make available Instantaneously without having to go through necessarily your network admin anyone with database access should be able to run these Alongside galaxy from the gx admin install Complementary info. So here we have jobs per month Total cpu time and total memory so Yes, so who are these people that need these types of tools? Who are who's consuming this data really and it's as we know there different types of users on galaxy scientists trainers tool makers developers like me and then admins and researchers which we're going to talk about more and You know, we all have a variety of different hacks that we play So you might actually see something that Even as not necessarily an admin or a researcher that might be of need useful for your role But on the admin exclusive admin side, we have everything that covers Community management to resource management on the resource researcher side. We have everything from a p. I That might need this information to a professor or an academic Uh, and the various roles that they might try to Accomplish again, there's a lot of overlap for everyone here. I think in the room anything from it support resource management Um resource acquisition stakeholder support proving that different tools are working Um, and even strategic tool makers, you know, somebody who makes tools and maybe wants to make something that hasn't been developed yet Or make something that's popular better And this is just an example of some of the software tools that Those types of users use now and are cobbling together to get these kinds of more complex reports Um So complimentary repos so in this example, we found two complementary repositories these are repositories that aren't in the galaxy code base but can run alongside galaxy one is gx admin, um, which has existed for a while and the other is use metrics and Gx admin has been around for a while And so we decided to consolidate the use metrics into that and it has a steady stream of contributors stable code base and that process of Then okay now that we've got everything in the same code base. How do we get the data to run? So in the case of your individual instance of galaxy Previously what you would have to do is to download the data from a network admin and then maybe run it alongside And there are these various manual steps that we have gotten rid of at least on the research side, but always existed on the admin side and Over here you can see the process is much more streamlined. It's basically installing gx admin querying for some key metrics and you know, there's some new queries now in gx admin and we anticipate to add more to get some of those tool metric performance tracking information So then how do we process the data? Again, we were right in the process of removing some manual steps on the For users but also for the use galaxy side so we don't have to go through a network admin and We can alleviate some of their requests to get a lot of requests Yep, and so now it becomes a very simple process of installing gx admin via a curl command and querying here in the second box you can see just gx admin the name of the query and you can also pass parameters. So if you want to Look at a certain time period or even if you have an instance where you have a server that needs a certain calculation to calculate for memory Which happens because of different load balances and if you know that calculation you can Should be soon this is one is actually not there yet, but you should be soon be able to Add a JSON object that will make that calculation for you. So all the data is normalized So, um, yeah, so that is where we are next and so that's where we are now and um Are there any questions? And next up is dan from the liner research institute. Thank you Great. So once again, um, i'm dan I'm here to actually give jay's talk. Unfortunately. He also had visa issues. So he's unable to attend And I realize i'm one of the last things holding us from koala fest Which means it's great that jay sent 29 slides for a seven minutes talk So so buckle up. Let's go Um, so i'm going to be talking about uh plugin for uh jupiter lab that allows us to actually Use galaxy, uh through a gooey inside of uh jupiter lab notebooks Hopefully we're we're all aware of what galaxy actually is. So it's an open source platform It allows you to do all these awesome things I'm sure many of us are also familiar with what jupiter is And so if you're not just very briefly, it's also a really great open source, uh software That allows uh users to interactively write code and do analyses And so what can we do if we want to um empower users to get the the full Compliment of what you can use inside of galaxy inside of jupiter notebooks So that's that's the the the overall arching goal of jinn Or galaxy and notebooks. So jinn is an open source jupiter lab extension And so on the back end it's using galaxy's bioblend Um Package to interact with the api um, and it it's a uh a custom uh jupiter lab Extension interface and so it gives us actual gooey access for example to Galaxy tools data sets histories and so forth. Um all within uh jupiter lab notebooks So This is the pretty standard looking uh jupiter notebook What you can see here is that we actually have The jinn uh extension loaded and so over here on this left hand side You can think of this as our tool menu very similar to galaxy You can click here to bring up a login widget inside of a new cell inside of jupiter And then we have the options to either log in using your credentials So your email you use name and your password or you can log in directly just using your api key There's a drop-down list where you can select from known servers or you can provide Um a url directly to your own Your own custom galaxy instance Once you log in it'll tell you you logged in successfully if you remembered your password correctly And then uh the tools menu will fill up with tools that have been retrieved from that galaxy instance Once you click on a tool the a um new cell will appear here And so this is inside of the jupiter notebook itself. And so we have a very similar looking Tool form inside a galaxy. We have our histories over here. We can switch between histories We can configure our tools. We can drag and drop Between our history and the galaxy tool inside of jupiter notebook And we can run those tools directly from within the the notebook interface So does this actually work? Um, so we went ahead and uh ran an analysis mirrored an analysis from the gtn uh training materials it Yes, so we ran an analysis directly from the gtn training materials inside of the jupiter lab notebook just to make sure that we can reproduce Things inside of the notebook just like we can if we're using galaxy directly And of course you can and when you run tools you get a lot of really nice interesting feedback As well as you know color changing just like standard galaxy instance You can view your results in this case. We have a fishers plot Inside a galaxy. What are some unique features you can actually log into multiple galaxy instances within the same jupiter notebook And so in this case we have logged into galaxy main a local galaxy server as well as galaxy europe All the tools will appear over there under their individual section And so you can see here that we're using a local version of this individual tool We actually look at um expanding those data sets We can actually send those data sets for example to uh gene pattern or another plugin within side of here Or we can send it to a different galaxy server directly Um Of course because you can open up these notebooks that have been shared from other people before Maybe you want to run your analysis at a different site and so you can select from Different galaxy servers that you may be logged into and they'll refresh the the form Obviously any data sets that were not transferred or do not exist there will not Automatically be transferred at this point, but the other parameter settings will be correct If you've shared a notebook with someone, um, it'll save the state But there's a nice button here to update it to the current logged in user I'll keep going And so another really nice thing is because we're using jupiter notebooks We can actually select an option here and use a Variable that's been defined in a different cell inside a python and directly plug it directly into the galaxy interface With uploading you can upload files from your own computer. You can send them from the From the jupiter server itself But when you go and upload from your computer it will attempt to use A chunk to upload Based upon the tusse interface if that fails for example due to cores It'll send the file first to the jupiter server and then directly to the galaxy server Um running out of time so we'll just pretend there's no limitations Um And so some of the implementation details, you know, we need a galaxy server. We're using python node j s It's built on top of a package called nb tools It's a standalone extension that you can install from pipe high for example, so you can pip install directly into your galaxy Or your jupiter notebook and it's available here on github on docker And so forth. So thank you Thank you very much Dan. We have our time for one question beyond Then do you envision that to have this extension? I don't know on google call up or Do you actually want to have that on our galaxy servers as a jupiter? I mean inside our jupiter notebooks in the main galaxy servers. Yeah, so A really nice thing is if we're actually already working on adding it as an interactive tool inside of galaxy So we can launch it directly within galaxy You could then automatically configure it with the api key from that galaxy user that's served it in They can then also log in to different galaxy servers if they wanted to and so forth and so We we are we are developing that currently Coal, I'm not sure. I haven't played with it too much, but maybe if it's useful Thank you again, and thank you to all the speakers for this awesome session much much appreciated. Thank you