 Great, thanks for that. I'm gonna share, it's like two slides. It's just some notes, just to kind of get things organized. Actually, I wanna be able to see people, so I'm not gonna go into full screen mode, so hang on just a second. I'll share my screen there. Okay, so it's just two slides, just to kind of put a little context on this. Right, so every year, one of the, at GCC there's a presentation from PIs to try to summarize all the amazing progress that's happened in the last, since the last GCC. So we have an extensive backlog of material to go through, so none of us are having any concerns about having enough to talk about, but we also wanna be, we have about a month's time, so there's a little bit of opportunity here to kind of think about, is there a demonstration? Is there some new technology that could be showed off? They'll tie together major progress across all the different working groups here. So at, I guess this was back in September or so, at that sort of last updates of the working groups, one of the topics that came up as sort of a common theme was thinking about remote data to kind of really sort of interact with all the working groups at many different levels, from the UI to exposed remote data to kind of the backend services that can actually process it and work with it. And what's interesting about this is this comes up in many different contexts scientifically about why you might wanna do this. So in the case of, for example, COVID, there could be data sets distributed around the world for various reasons, and you wanna do an aggregate analysis of all that. It's gonna be kind of this many to many relationship where all servers may wanna be able to talk to all other servers to be able to see that. In the case of the vertebrate genomes projects, there's the workflows have been developed that can assemble genomes and do a bunch of analysis. They often have very special data requirements and memory requirements, like a terabyte or more of RAM. Some of those nodes are available in the U. They're very precious to be able to get access to such resources, but nevertheless, we wanna be able to tap into those resources from other instances. So you can imagine a user logged in in Maine in the United States, trying to remotely manage workflows in say the EU to get access to that. In the case of Anvil, where there's sort of protected data, some of that protected data right now lives in Google Cloud, some of that productive data may live in Azure or AWS. And we wanna have all these different sort of data sets kind of executed on. We also wanna avoid moving from cloud to cloud to cloud because there's egress fees associated with that. So there's sort of security considerations and then also egress as well. And then a very similar story in the case of ITCR where there's sort of patient data that are stored across multiple clouds or multiple institutions. And we wanna kind of do aggregate analysis. We wanna have unified analysis for from one portal, you can tap into all these different resources in a secure and federated way, essentially. So when you think about what it takes to do this, there's obviously a lot of moving parts to be able to kind of work with remote data in a really seamless way. Here's sort of a super high level skim of how this might impact the different working groups. Obviously there's a lot of details that are not displayed here. But as you kind of think through the working groups from here, it hits every single one. So on the UI side, we need all these user facing components, select where the data are, launch jobs remotely, monitor them. And I think a critical part of that will be sort of credential management, especially to sort of ingest your credentials and make sure they're propagated through securely to these different remote data sets. For systems of back ends, we need APIs and implementations for remote data registration, job execution, monitoring tools and workflows that serve these scientific use cases, assuring the tools are available and accessible for remote execution. Maybe they need special packaging. Maybe they need to be sort of tagged in a special way. Testing and hardening, so that we can just very, very routinely run these remote data tests all the time to make sure they seamlessly execute end to end. And then goats also play a really important role to sort of develop new tutorials, new documentation, just new trainings from the user side to how to work remote job execution. And I'm sure also on the administrative side, how can you set up servers that can participate in remote execution? And then this also touched all the sort of scientific projects that we have for the reasons that we were talking about. So this is sort of the goal is to just open the forum here. I don't know how we want to do this. We could go kind of working group by working group and just sort of talk through where things stand. Is this, we do have about a month or maybe a little bit more, but we just want kind of an honest assessment about what's working, what's not working, what's possible, what's not possible. And especially what we're interested in is if there's key integrations that need to happen where one working group is aware of what's going on and the other working group to make sure all the different systems can talk to each other. So I know that we're kind of putting everyone on the spot here, but we wanted just to have this a real casual conversation, an open discussion, but I'm very interested to hear everyone's thoughts about from the different working groups. Is this a dream of remote data analysis? How far away is it? That's sort of the goals there. So that's my last slide. I'm curious if there's any sort of initial thoughts or maybe if we can describe a little bit sort of the flow of the demo in ideal case and then perhaps then working groups will have a better understanding of where their chunks start and end and what's involved. I mean, the one thing that strikes me here is that the work that has to be developed is a purely backend at this point, I think. And like the idea of taking things from another server, I mean, you have to splash that out a little bit. You can cheat, but that sounds like a multi-year project and we'd need more resources for that. There are some simpler things we can do in demo, like get things from file source S3 or whatever and make sure it doesn't end up in Galaxy. That's something we can definitely do for GCC. But like, yeah. So I think one of the early things that we had in mind was something like, okay, I'm at usegalaxy.org, then there is a large set of, I don't know, for example, COVID data set somewhere on sitting in the EU and I can analyze them there, but from usegalaxy.org, is that possible? So who runs the job? Like, how's that look like? Is that you that runs the job on the orgs we have? Or? That's the plan, yes. Unless I understand this wrong and please, John and Bjorn might correct me here. I mean, in a sense, that's possible if we open up each other's backend, like each other's object stores and allow accessing them from each other, but that sounds like, I don't know. I mean, it doesn't sound like something useful for the community except for those two large servers. Okay, maybe. And I mean, there are also like legal questions, I guess that need to be addressed. So my correct move and wrong, but I think the question that we are asking is, at GCC we need to give a presentation and this presentation should cover most of the awesome work that was done in the last year. And we were thinking especially about the remote data stuff that we know, the new history and so on. And the question is simply, and we are not asking to implement anything in the next two months or whatever, one month. What we are asking is what can we realistically demonstrate that will be deployed in the next release? Right, so how can we utilize the remote data work in a way that we can also then show the new history and what could stuff is actually possible with the remote data work, maybe in combination with Puzzar. So the Australians, we know that they are now using Puzzar endpoint in AWS where they submit the Alpha fold stuff in, but these are all super cool things that are working currently, how can we get that together and what role plays the remote data stuff in there that we can give a very nice demonstration at GCC that captures all of that, right? I mean, so the thing that we wanted to work on and that was the starting point for the remote data is that Galaxy doesn't need to ingest in its object store that it controls, remote data sets are on S3, for instance. So that would be possible now, right? So you can say in your file source, there's an S3 bucket, the user can select it and instead of uploading it into the history, you get this deferred data set. So that's very minimal metadata on the Galaxy side will appear in the history. And then once you run it, we can materialize that data set on the job worker. So there's no, I mean, Galaxy doesn't hold on to the data. Now from a user perspective, I don't think that's exciting. And it's not what I've heard now as suggestion. So it's... No, no, we ask you for a suggestion, right? That's the point. Or maybe I could... I mean, the way, I mean, like, yeah. The notion of from the user standpoint, it's not exciting. That's a wrong way of thinking about this because there are lots of things in Galaxy which from a user standpoint are not that exciting, like submitting jobs anywhere, in fact, and running things on a cluster. Users don't know about this. If you ask sort of any biologists, so for them, it's just you push a button, you get results, but it's not exciting. But it's exciting for people who I think will be in GCC because it's sort of not really average users who will be there. And it will be exciting for anybody who does any development or thinking of Galaxy as a platform that they can use at their place. That's the goal. So from that standpoint, it's actually very exciting. I mean, it's important work, for sure. But like, you know, there's so much cool things happening with the display of workflows, with the new history. Oh, the question is, what's the coolest presentation we can put giving what we have? That's the question. I mean, that's what I said, right? I mean, we can run some... I mean, if we have something in a cloud, if you have S3 and we have some compute in the cloud, then we can run jobs there without Galaxy holding onto the data, right? Basically skipping the upload part of the whole thing. Is it cool enough for you? It's certainly cool enough for me. I mean, John, you've worked on this, right? And I'm sorry that I called it not exciting. It's totally exciting to me, but it's just... No, I talked to the TIs yesterday and I had the same comments. I mean, you articulated the scope of the work, I think, a lot better than I did, but in terms of it just not being a flashy demo, I have the same comment. You know, even if I think of the work I've done over the last year, I don't think deferred data is the most, like, exciting thing. Like, if I had to go up on stage and talk about something, I mean, the tools, clicking on the tags and workflows. I mean, like, there's gooey things. There's things that you, like, click on and interact with and that feel more exciting. And then there's the political... There's, like, the Pulsar network, right? Like, if I think about remote data, I think, you know, it's really... I mean, it's a technical question in a lot of ways, but it's also a political question and it's a question of... So, to my mind, the sort of stuff like the Pulsar network is more exciting. But... But, John, how can we demonstrate the Pulsar network together with Remote Data? Is there a cool way where you can... That's happening in June, right? I think that we need to figure out how to get the remote... The Remote Data stuff is cool and it doesn't quite work with the Pulsar yet, right? Like, it's... I mean, hopefully we get there, but... Yeah, if we could do that, that would be great. And maybe that's the point of this meeting, is to figure out how we get there. But, yeah, I think the Pulsar network is enough, I guess, is what I'm saying. But maybe I'm wrong. I mean, I should really know what... not be a shortage of exciting things to show that were developed in the last year and, you know, in a sense, we could go back a little bit longer even than that. I mean, the new history, like you can do a full talk on that, I think. All the little new cool bells and whistles. But there will be talks, right? I mean, there will be talks about the new history and so on. But if we can fall back to the usual PI talk and just give an overview without big connections, we just thought that it would be nice to have really a demo where we can highlight a lot of things that magically come together, right? And the workflow user interface and the new two histories, of course they are part of that, but can we come up with a coherent story? That was more or less the question because I guess, right? Can we include then the Pulsar network and zoom out of... I don't know. I'm not the creator here. I mean, zoom out of Galaxy and then show the Pulsar network where the job is currently scheduled. I don't know. I mean, this is exactly what we were asking you. I think we do something like... Okay, so we do some analysis in remote data. As Mario said, it doesn't look exciting on the slide presentation. But then that demo also shows the new history. And then once this analysis runs, we actually show another thing which shows... By the way, this runs there. So it's a demo which highlights big things that during the conference will be discussed separately, such as history, for example. So one thing that we could do to build on that is show the things that don't look really that exciting and put side-by-side how we would have to do this previously, right? So if you say, how we get, I don't know, two terabytes of VCFs or something like that from an S3 bucket. And we say, okay, now we start running the job. And then you have this slide and you say, okay, if previously we had to get that to our Galaxy instance, well, that would have flicking a long time before we ran out of storage. I mean, that's some sort of thing that you could do. I mean, there are some user interface components in Jon's work already where you can also track where things are. So I think the PIs here, Mike, me, Jeremy, Bjorn, we're very good at spinning things. We'll spin it. Don't worry. Just walk us through, and that's not a question to you, Mara. It's a question to everybody. So how would you... Because usually the previous versions of this talk were like, okay, here are PIs, they sort of, you know, sit down. And this is sort of things that we think are cool, but I think that should be the other way around. That should be what you think is cool. And then we kind of make a big deal out of this. And then you go in the details. So we're trying to sort of reverse the way this is done. And so the question is, wow, what's the... What are the important pieces here? I mean, I think you could... Like if you were really... And I'm sorry. I mean, like others should definitely chime in if they have opinions. I don't want to monopolize all the time, but I'm doing it. So three to one. But I mean, so one thing we could do is like, Delphine has developed these cool workflows for BGP, or Wolfgang has developed the workflows for Southscore 2, or, you know, we could go with another workflow. We say, okay, these are in the IWC, which has seen growth since last GCC. We will get them through a TRS server, which is also relatively recent work. We can... I mean, if you say this priority, we can improve the import interface a little bit. Like John has worked on the workflow list. We can say, okay, we're running this now and the input data sets, they're not actually in Galaxy, they're coming straight from history. And they're not at any point in time entering Galaxy's object store. You can show this in the data set. And, you know, you can end with Dan's new improved Jupyter notebook tool that can produce outputs. I don't know. I mean, this seems like it doesn't touch everything. But I mean, I guess that's a coherent story. John, you want to just say something? No, I'm fine. John, not to put you on the spot, but Assunta reminded me that you submitted an abstract to talk about remote data, because you maybe say a few words about what you're envisioning you would present. Oh, I mean, I have no clue at this point. That's the way it's off. I think my interest around it might be more like a workflow centric perspective on it. Like, you know, extracting artifacts, maybe like RO crates, something like that. But yeah, I mean, it is a point, though, that, like, I need to figure out, you know, I have 15 minutes to kill or whatever on how to come up with a demo. But, you know, usually my demos aren't very flashy. So, you know, I would probably talk about, you know, the object store and the data set state and job components, you know, architecture diagrams, boring stuff, technical details. And then, yeah, so I'm not too worried about an overlap there, but yeah. Yeah, I'm not worried at all about overlap. It's more of a question of, what do we think will be working in the next release? Yeah, I think what Maria said is the correct thing. Like, you can upload data and it doesn't need to go into the object store. You can have data sets in Galaxy that don't actually materialize into the object store, which we were told is a huge important thing to do, right? Sure is. Yeah, I mean, I don't know that I agree, but it's, you know, it is what we've managed to do. And it was a lot of work, right? Yeah. I mean, there are also other things that we could point out. I mean, we've worked over the years on scalability. I mean, there is, for instance, the task you work that interfaces nicely with remote data stuff in that, you know, sometimes you will have to fetch data from somewhere or, you know, do things that you don't want to hold up the entire job schedule or workflow schedule. I mean, these are things that I think you can mention, even if we're not actually using them in production yet. So, for instance, you know, if you want to put on a big vision, I think we should probably make sure that Galaxy can handle like 10 to 100 times the volume of jobs it can handle now. In a given time window. And I think an important thing there is that we, we sort of make the whole scheduling stuff more efficient. And that, you know, the task use are one way to do that. I mean, the whole, I mean, the entire API framework is different now. You can, you can even say, okay, you know, for instance, like in the outline that I mentioned, you can show some of the well-defined endpoints and say, hey, if you're a developer, we've never had as good documentation as we have now, and you can show, you know, you can, you can say, okay, you click on the info page, and if the interface doesn't provide you what you need, you can, you can look here, it's all well described. I think that's, that's kind of a bit of an invisible piece of work, but it, you know, I don't know if it's interesting enough to take the PI presentation, but you know, I mean, there are, there are a lot of things that, you know, you can, you can do in passing by. And maybe Mike, maybe we need to do more practical here. So if we, if we sort of, so if we go back to the list of four things that was the SARS, VGP, Cannibal and ITCR. So in terms of if we sort of, so let's, if we actually do this, then what's the most, so SARS is realistic because I mean, it's easy because there is no any protection, you just you stage some, a lot of data sets somewhere that's very straightforward with VGP. How would that look like, I mean, we can compute, I mean, we can I mean VGP is in that sense, it's a good example because they already use an S3 to get the raw data. Yeah, so they have these arc S3 buckets where the raw data, so the pack bio and the Lumina data. So this you can get in with the remote data for example. But where would we compute then? I mean, I think, I mean, it's unfortunate that Nate's not here, but I think Penn State is connected via internet too. And I don't think there's egress or ingress so could be on.org actually. Well, TAC is also an internet to all of that is on internet. So it's not like we're directly in the cloud, but the same thing holds like when you import it from the bucket, then it doesn't end up, you know, in your Galaxy account for storage, it doesn't end up in the object store. So it's public data sets, by the way, I don't pay anything and they can also be computed on EU and on the Australian server. So it's really public. My question here is that what's the remote. So in case of, for example, again, again, it's perhaps I'm asking this question because I don't understand what I'm talking about. So in case of, for example, COVID data you store it somewhere and you and you compute there probably nearby, right? But in case of VGB data, you can't compute nearby. You need to move it somewhere or as I'm completely, am I completely out of line here? No, you're right, but you have to do that anyway you do it even on the cloud. I mean, there are maybe a few tools that can stream directly over S3, but even in the clouds, you know, put it in your working directory and start doing stuff. Okay, but in terms of actual presentation. So with COVID data, it's easy because we sort of have it's small, it's just lots of small data sets that we can move in. There's a VGP data. So for example, in case of COVID data, it would look like it was COVID data on EU and I just run my workflows on it on EU infrastructure. That's overall, but in case of VGP, you get it from the arc to the big memory nodes and then start running. Is that how this, okay. And with Anvil or ITCR, what would be the similar scenarios? For Anvil, this is absolutely essential, right? Because Galaxy is effectively a thumb roll where each user boots up their own server. So the data reside in buckets, but you know, as it is now, step one is to do this huge upload or kind of transfer tasks from the bucket into the Galaxy instance. But if we could kind of skip that huge upload and just sort of, you know, somehow adjust the list of URLs or whatever the right mechanism is, that would be huge. That'll save users time, huge amount of space, huge amount of everything. So it's a huge win. Jeremy, do I say a bit about ITCR? Yeah. I think that in thinking about this over the last 40 hours or so, there's this key distinction that we should all try to make, which is there are two pieces of the puzzle that I think were perhaps confounding sometimes. Number one is that when you upload a data set into Galaxy, when you use deferred data now, it doesn't go into the store. But as Anton was saying, the question still then becomes, when you actually do the compute, can you do data local compute? And I guess I had naively assumed that they would be intertwined, but they're not, right? You can still defer that data, not put it in the Galaxy Store, but also not do data local compute and still have to bring it over. In the case of ITCR, it would be nice to have both, to be honest. We would not like to put it into the Galaxy data store for all the reasons Mike articulated, but also we would like to do data local compute. And it seems like the data local compute is still a ways out. And maybe the, unfortunately, the more easily demoable thing in some sense, if you say, look, I don't have to move these terabytes of data off the cloud. So yeah, I mean, we don't have a solution like where you can say, you know, whatever you do, it will be data local without writing a job route or something like this. But they do, I mean, I think there are ways in which you could determine based on, you know, where the data is, choose, you know, in Galaxy terms, the destination. So that the computer is closer to the data. I mean, it's, it's not the street, it's not the street, it's not the street, it's not the street. I mean, it's, it's not a streamlined thing. It's not as fancy as all the systems offer. But overall, you know, you can set it up, especially if you're in like a, yeah, well, I don't really know how the ATCR setup looks like, but, you know, if you know your infrastructure, you can write a rule to match that. I mean, that's with a big question mark because I haven't done it, but just like overall it seems like you can probably do this. The ITCR setup is pretty straightforward. There's terabytes, hundreds of terabytes of data on both GCP and AWS. Some of it is public, but we still don't want to move it because it's just too large. It would be nice if people could use usegalaxy.org or usegalaxy.eu to analyze those data sets on the cloud. What I think if I understand correctly, I'm hearing is the hang up right now is we don't have compute setup on either of those clouds. So deferred data would work just fine today, right? You boot up or you go to usegalaxy.org, you say I'm going to point to this set of 100 files. It won't be slurped down into usegalaxy.org's object store. Fantastic. But when you run the job, you would slurp all that down into the worker node. And that is still pretty prohibitive. Yeah, I mean, depending on where your worker is, right? So if it's your responsibility to make sure that you have a worker in GCP and one in AWS. And then I believe we could do a little bit of magic to determine whether we want to send the job to GCP or AWS. Based on the URI. Yeah. I mean, at least for things that were, you know, that come straight out the upload. Yes. I mean, after that, immediate data, intermediate data sets still go to the object store, right? I understand the intermediate data sets still go there, for instance. So I guess what I'm hearing is these, or I guess what I had envisioned is, if we could analyze these big data sets that were sitting on the cloud from.org or.edu, we would be able to demonstrate all the fantastic UI advancements and the task queue and stuff like that. But right now as it stands, we wouldn't argue that data local computing is part of this solution probably because it doesn't seem like we could bring that up in a reasonable period of time. We'd still be doing this trade, transfer to the worker node. Yeah, I would feel more comfortable with this because I guess at one point we want to do data local computing, not, you know, say, well, last year we said we could do data local computing. This year we spent a ton of time and we could do data local computing, right? One approach here is then to forget the data local compute part, focus on large scale analysis of some data sets and just walk through how the UI facilitates large scale analysis through one of these scientific drivers. If it happens to be a personal instance on GCP through Anvil, that's okay because data local compute means transfer within a cloud and that's fine. If it happens to be on EU or.org, that's also fine as long as the data sets are small and can be moved quickly or we can, as Mike would say, do a Julia Childs type of demo where we cut out the middle piece where it takes a long time to do that data transfer in the data analysis. I don't remember what context we talked about it in, but recently we also talked about Dan's, was it 2017 scalability talk about basically all the things that didn't work in Galaxy at scale. We thought it would be cool to revisit that and, you know, basically try the same types of things that he did and just show the difference. I don't know that that's suitable for this particular talk, but we thought it would be a cool thing to look at for some context. I mean, I guess it's a, it's a bit of a developer motivation thing. Yeah. I mean, that's the kind of thing where, you know, Nikolas is processing 600,000 data sets and he says like, well, it's taking a long time and then we can zoom in on why that is. By large scale, Jeremy, you mean. Collections. Yeah, more or less. Well, perhaps I'm missing something, but collections in the new UI seem like a large part of this demo. Of course, for instance. But how would, so let's, so suppose we go to well, I guess with VGP, it actually works well because you have this file browser now. But for example, if you, how would you, as a user access, to be honest with you, I'm so tired of COVID. So some set of sequences will be COVID problem. So some set of sequences at EU from.org how would interface look like? It's like going to be a library. There's going to be the same file browser. I think that's good. Go ahead, Marius. I'm sorry. I think that's the part that I don't think is realistic. Unless, you know, we sink this manually in the backend. But, you know, the accessing data from one server in another with a user interface and not with a bunch of cheating involved. Okay, so that's what we want to know. That's basically so then that would guide us to actual demo. So that's not possible. Okay. I didn't say it's not possible. I don't think it's a good idea to present it. I mean, if I had like a budget and, and I wanted to do a cool demo, I was thinking about something Marius said, that would be kind of cool if you could have side to side, loading, let's say 2000 COVID sequences into a history. You know, one with without deferred data and one with deferred data and presumably without deferred data, the whole demo is just sitting there. And with deferred data, you can go in and you can start showing off history features that start, you know, running analysis as needed on them. It's a really cool demo. I mean, it would be flashy at least to sort of see just one side spinning and on one side you're getting going faster and it's sort of, it's around the issues. And Anton, I sent you for some, for instance, a GIF with upload, right? I mean, I uploaded 100 data sets in using the message queue. Very, very small data sets, but like, although 100 data sets, they just turned green. I mean, I was no intermediate thing going on. Like if you did that side by side without the message queue, I mean, I would take like a minute or so. Yep. I guess that falls in the similar category. Yeah, and that may have become an hour or several hours if it's large data, right? That's undefurred, yeah. Okay. But where do I start? So how do you start then? So suppose you're doing this demo, what's there? What's the starting point? I think the starting point has to be that you have a public demo. I think the starting point has to be that you have a public data set out there in the world, Anton, that is very large. Okay, that's easy. But, I mean, your question is really interesting. I don't want to derail us, but the, you're talking about federations, so to speak, when you ask about dot Oregon dot EU talking together. And Federation is a really interesting concept. I don't think we've really pushed on it much, but from a user experience standpoint, this may have some legs down the road where you could potentially use it a single sign in to access data across EU and dot org. I just don't think it's possible for GCC. So I guess that's the distinction is we, the two servers can't talk to each other. What we want is a server that talks to a large data set that's out there in the world. I think I frame it wrong. I think really what I meant is that we don't really have to talk to you, but I'm having data stored at Denby infrastructure in Germany, same infrastructure that EU is using. Yeah, I think that was the original plan, not the instance, but it's just, I have org, but the data is in German or somewhere else, not in the US. It can be in US, but it's flashier to be somewhere else. Yeah, this goes back to data local computing, this idea that we've got the piece in place to say, oh, I know that there's data out there that I want to work with, and I can now talk about it and operate it on it in Galaxy. But when I do that operation, I still got to localize that data or copy it over. I can't run the job way over in Germany right now because we don't have the compute set up. I mean, that would be really cool. I mean, some sort of collaborative scheduling over shared infrastructure. Can you use Pulsar for that right now? So if you have a Pulsar instance running in AWS somewhere, can you connect to it from a local Galaxy on a laptop? Because that would be a cool demo to show. If you have data in AWS, a Pulsar node running in AWS, and you're on conference Wi-Fi, but it actually works, right? I mean, that has worked for some time now, right? With the deferred data and... Well, no. Right, so that's... Okay, okay, okay, okay. That's all I've got. So, explain again as a foreign idiot. So you just go ahead. So you have a laptop, okay? Yeah, you've got a laptop with a local Galaxy that you just set up. You have some Pulsar node running on AWS. You're able to dispatch jobs to that Pulsar node. Use deferred data to get data out of S3. So it's local to AWS, local to your Pulsar node, but you actually run the jobs from your laptop and then you see some results at the end, right? And there's what John was saying, that there is a problem between sort of deferred data and Pulsar node. Yeah, that's not... That piece isn't quite there. Okay. Can we put a salary worker on Amazon? But yeah, I mean, I'm just wondering... I don't know if that's Galaxy on your laptop at that point. John, I mean, we talked about this, if we push the remote tool eval to Pulsar and the tests are working, right? We have this tested with Pulsar. The remote tool eval should do with the materialization of the data set. So what piece is missing there? You know, I could be wrong. It feels like that happened. I mean, wait, you've tried that with Pulsar and it worked? Yeah. Okay. Then I'm perhaps wrong. That needs all of Galaxy available to it. All right. Yeah, yeah, yeah. So Pulsar needs to have a copy of Galaxy available. That's right. It needs to be importable. But that might be true for some Pulsar setups, right? Yeah, no. And certainly the way it main runs, it does have that available. I had not realized that you had tests and had done this. To me it looked like it was magic happening in the handler that didn't work with the Pulsar runner. But if you've done it, then it should just work with deferred data, right? Like it's happening in the remote evaluation. You know, we haven't sat down and like looked at it, but we have tests, unit tests. Yeah, I mean, we have tests, but sometimes they're working so well. They're just using some other thing that made it magically pass. We can check this more carefully. Okay. Maybe that laptop demo would work just fine then. That would be amazing. I didn't know it worked so well. That's awesome. So it would be nice to continue to hack on it to the point where if you did a Pip and Pulsar, you did it in Galaxy in order to get it to work. And if we did the two pod Kubernetes version of this, so it worked with Kubernetes, blah, blah, blah. I mean, the Galaxy app itself should be PIP installable. So with some luck that might work out. It's just, I guess, like a normal Pulsar installation probably not bringing in the Galaxy app package. But as like an extra, could work. That would be cool. But that speculation, like, I mean, no guarantees that's going to work by GCC. We should check it. It might. Again, in no way this whole thing is to create any additional work. It's just, we're trying to understand which pieces we can put together in sort of, that's, that's the, I guess it reflects PI's ignorance. So, you know, here's an idea, possibly non-technical just flashy related with regard to what Mario said a few minutes ago about side by side, the slow versus fast, the same thing. So when it can be, Mario, it's correct me if I'm wrong. It can be easily 20 times faster, right? With the message queue. The upload, yeah. So one second versus 20 seconds is absolutely non-impressive to an audience. But one minute versus 20 minutes is very impressive. If you start describing it and launch it and you are done with describing it within one minute and it's done on this side. And the other one will be done precisely by the end of the presentation. If it's time just right. That might be a little bit more impressive. Maybe not. No, actually 20 seconds is also very impressive for a user. Trust me. I mean, this is, I think that would be quite possible because. Well, I haven't personally played with Jones branch, but I think there's a, there's a check box, many upload stuff, whether you want to defer it or not. So you can say, okay, let's stop this. And we say, oops. I didn't click deferred data. And then is that, is that how it could work? John. Yeah, there is a check box. Yeah, that would be cool. It's noticeable if it, if it ends 20 minutes later, once we're done with the presentation. So going back to the laptop abstraction. So you'll have a laptop with a galaxy running on it. It talks to a preconfigured. Pulsar instance somewhere. That is also has access to the data local to that pulsar. Right. Okay. And so in that setup, where is the where, where can we show the difference between, between the cues? One second. You would just see this in your history, right? One data set will be done and the other will be running. Yes. So, and, and, and yeah. I mean, in this analogy, we could actually have two separate laptops. Have them race against each other. Laptop race. Yeah. Okay. We'll have to have two projectors though. That complicate our GCZ preparation. Who suggested that? Was it because we haven't modernized the multi-history view? No, this is exciting. This has been a great conversation to, to brainstorm what's possible. And I think it's a lot of great ideas came, coming out of this conversation. If I could just like real quick, I mean, maybe back to Jeremy's point and maybe even some of Mario's skepticism to me, the data local compute has been possible. For five years in Galaxy, I, I would like a, at some point, like a checklist or something of like, you know, we have these pieces that allow an admin to deploy a solution that is, takes advantage of data local compute, right? And that is true with or without deferred data. So I'm sort of, I don't know. It would be nice to, to know what it would take to change the minds of the people in this room, that Galaxy can do data local compute. And, and what, what, what we feel other services have that we don't. I mean, some of them are kind of obvious, but I mean, I think in terms of allowing you to flexibly deploy things, I think we actually do better than everybody else on this, on this narrow point, not worse. Yeah. And so that was just like, just a comment, just like a, an end point, an end cap. I wanted to add on to that, that comment, because that was a little concerning to me, but all right. Sorry. I mean, I can give like a quick answer, which would be that, I mean, you need to write a custom rule in Python, right? So for other systems, that would be perfectly fine. I don't know. I mean, it's, it's not even that our interface is well defined in a sense. So like the people that could set this up are Galaxy developers, whether or not they are admins primarily, but they do have intimate knowledge of how Galaxy works. If you want to set up the rule, right? So we set up, for instance, the per, per group. I think you can do this without a custom rule. But yeah, I should shut up though. Yeah, that's, that's fair. I will, I will try to find that. Okay. So what I'm thinking about when I said data local computers, that you have multiple data locations and multiple compute locations and you want to match, you know, this data set is here. So compute should go there. I don't know if that's the same scenario we're talking about. Yeah, I know that that is. And I guess I would assume there's some way in any of the YAML description languages that we implemented on top of Galaxy's job config to do that. But maybe there isn't. Maybe that's a whole I should try to figure out. But I mean, I ultimately don't think the five lines of Python are, I mean, that's exactly how you would want to do that. Right. That's the most flexible thing you could do. And it's going to be better than, you know, because there's a million different little education here. And you should have full control of that. But maybe I'm wrong. Yeah, I would agree. I apologize, John. We, we aren't giving due credit here to what's out there. I do concur with Marius that it feels a little hard right now to do data local computing, but I definitely see a path with all these different pieces now that we could do it. It does require dynamic job, job destination, specifier, apologies for mangling that particular piece. And it requires setting up the remote compute. So for instance, if we set up this remote Pulsar and AWS or GCP or something like that, that would be a requirement too. And then connecting the two and doing the configuration, but you're right. There is a path to data local compute. I think if we cleaned it up just a little bit and documented it, we'd make a lot of progress there. Yeah. I think this might to a large extent be that we just, somebody needs to walk it through and describe, you know, what is possible now. And then from that, it should also particularly follow what would be the improvement that we can make. Yeah. I'll ask. I'm sorry. I'll ask some, one of my software engineers to start looking at this more closely, John. It's also worth looking at Nuon's work on spinning up jobs on clusters. You know, like the Pulsar configuration piece is a thing there, but I know that Nuon has done a lot of work on that to make that automated. So. Yeah. That's good. No, thanks for the clarification. And sorry if I came off as, I just wanted to like understand where people's heads were at so that we can go forward and sort of get on the same page. And it will never be completely on the same page, but understanding what needs to be done and your documentation is a great point. Obviously there's a lack of that. All right. John, you didn't even say that you hate galaxy today. So I think we're improving. Yeah. And we're past the hour mark. So it's too late. I mean, the, I've been asked a good question. Can we schedule workflows in a way that intermediate steps are not transferred? And I think the current answer is no. But I mean, it's a cool task, right? It's something we should work towards too. With the extended metadata, you kind of can, right? Like. It means. It depends on what you mean by transferred, right? But one thing with the different data branches, exporting invocations, right? So basically everything is in the invocations branch to be able to sort of spin up a galaxy and run a whole workflow on a remote cloud and just pull back the metadata that you want as the result. And this is work that Kyle had wanted to do forever ago. And it's, you know, it's, it's, it's in good shape if we wanted to have a really good demo for that in 2023. Spinning up a galaxy means using a pooser or. No, I was thinking before the workflow, I was thinking a whole galaxy. You just run a whole galaxy and since, you know, without a UI, just the API just sort of send the workflow invocation request, run it, schedule it however you want, however the galaxy is configured to run it on the cloud and just pull down the model store at the end that describes the invocation. You think there would actually be a route for people to bring their own compute and storage? So if, you know, you do this on the public galaxy instance and you say, okay, these are the inputs, we will generate your rise for the inputs. And then you say, here's my URL for a galaxy instance that I have what I configured or I bought from some provider or whatever. And then you just get that back in the instance. I mean, I don't, in this particular case, I don't know if you really don't want the intermediate workflow results, right? I don't know if this is that problem. In my head, they're not, but maybe I'm wrong. Okay, I think we are personal. I don't want to keep anyone longer. But for me, it was very useful, but also kind of mind blowing. Okay. I'm still confused about new things. So, okay. I'm just curious. So what's the next thing next thing is probably we should come up with some kind of example. And then, and then ask if that's so and then through few iterations of that Russia come up with that plan. Yeah, I think we should spend one dinner in Montpellier to sketch that more out the initial ideas here and then come back with a more concrete plan. And John will guess we'll be zooming you. I'll be there Mario. Clearly the better head for all of this to me anyway. So it's fine. I mean, I can just transmit what you've done. That's all. I'll just eat a baguette in your honor and hash brown. The hash brown. Bye bye everyone. See you next week. Bye bye.