 I would like to propose a toast to getting a special night started on time. Cheers. We're not on time. All right, so an announcement. Some of you guys are going to get talks tomorrow. If you would please submit your slides to the Google Drive that Jennifer has set up. And if you don't know where to put those, find Jennifer, she can be quoting around. And I think about further ado, I would like to start session. I would like to introduce Sian Kim. And she's going to give a talk on the cure deployment and hosting of galaxy workflows and your registration tools and infrastructure. All right. Good afternoon, galaxy community. My name is Sian Kim, and I work as a product lead at the center for infectious control. Genomics and one health that I'm treating university. Nice meeting you. I had a whole lot of personal struggle try to make out here, leading up to this conference. I was diagnosed with stage three cancer. I'm actually preparing for this presentation. So chemo and radiation and so on. It was no fun at all. But I'm very glad to be here and I would like to thank organizers for accommodating me. I am. I do see a lot of joy though, and accepting and allowing myself to be and finding strength to do what I feel passionate about which is why I'm here. I would like to talk about DevOps topic to speak to you. I believe that researchers developing tools, especially in health care. It's important for us to understand the production environment and the requirements to be used in health care. So, due to my illness, I wasn't sure if I could make it today. So, my co-presenter and I just together a little recording. Unfortunately, he won't be here. He's still going to play. But if there are questions that have come up, then I'll be sure to lay those and then get the questions asked for you. There's also a slack channel as well. Lastly, I'd like to thank my 13 year old son to help you prepare for this recording through his mother. So, if you're watching this, thank you so much and I love you. Without further ado, I'll play the recording and I'll do a little bit of live Q&A at the end. My name is so and Kim. I'm the product lead at the Center for Infectious Disease Genomics and One Health at Simon Fraser University. Today, I'll be sharing information related to DevOps, particularly regarding secure deployment of Galaxy workflow. I'll also share our experience of what it takes to deploy modern tools within a Canadian setting, Canadian healthcare setting. Up until recently, I was managing multiple IT teams within hospitals. I've experienced firsthand how challenging deployment can be in healthcare. Even with the well tested research software, there are a number of barriers and issues when it comes to deployment. I refer to these as being a keen to last mile delivery problems. I'll disclose our story of working together with stakeholders to overcome these last mile problems for Galaxy deployment. My key message is about the primary importance of laying the groundwork for partnership in order to benefit from the multiple perspectives and requirements within public health organizations as well as externally at partner organizations such as AWS. The perspectives of partners prevent extensive refactoring efforts and enable our team to get the job done right the first time. Our capacity and understanding as a research team increase as a direct result. On the personal note, I was diagnosed with cancer while I was preparing for this presentation. What's only served to increase my belief in solutions available through the open community of Galaxy, which can improve patient outcome, perhaps even mine. I'm delighted to be given an opportunity to connect with you today. And I'd like to thank the organizers for accommodating me. I'd like to acknowledge the folks I work with at the Center for Infectious Disease Genomics and One Health at Simon Frazier University. We're primarily a bioinformatics lab. We provide end-to-end services, including sample preparation sequencing and analysis. We're a development shop comprised of software engineer, microbiologist, and bioinformaticians. I would also like to acknowledge Kenny at June and Nolan at the teams at AWS as well as public health agency of Canada. Our mission is to solve real world infectious disease challenges through one health lens using advanced computational approaches, innovative knowledge engineering, as well as shareholder engagement. Regarding our mission, I would like to be in the by sharing a story. We approach health care IT and said, hey, we have this tool called IRIDA, a rapid in public health genomics. What do you think? This is what they said, was a comprehensive architecture review completed? Was the pool, how was the tool vetted and tested? Was there a security threat assessment or privacy impact analysis? These refer to the specific mechanism, compliance mechanism used in BC Canada. Is it compatible with cloud? In British Columbia, regional health authority and government ministry offer preconfigured cloud environments. They have a heavily weighted preference to host all new project within that environment, especially big data project. They have an existing architecture, which allows execution of a shared responsibility model between central IT and application owners, leveraging a number of cloud native services and controls. Who are you going to trust? It's all about lack of trust. Many have asked who built that tool? How many people in your team had access to the code? How big was your dev team? How many people will maintain the code base in the long run? We all understand that highly trained workforce can be transient given the market demands and thinking about in the long term. We researchers are sometimes perhaps often focused on initial development and are susceptible to some fever, racing to get the code into production, but lacking perspectives on and appropriate considerations for long term operational requirements. One example could be adequately preparing for incident management and user monitoring. Some aspects of incident management are related to who does what, the roles of the tool itself, such as system health monitoring. Another example is infrastructure as code. Infrastructure as code not only guarantees reproducibility, but also in the case of a cyber incident, infrastructure as code provides the ability to treat the incident within the virtual loss radius, a critical advantage. The question remains, can we embed best practices and long term operational considerations as part of our development process in order to promote ease of use for public health organization? After receiving feedback from partners and other stakeholders, our team came up with a game plan. This is a high level summary of our process flow to DRes. We use an AWS environment for product development. However, we use academic clusters, academic cloud and clusters for ad hoc analysis. First, we conduct an architecture review with counterpart AWS teams, as well as our regional health authority partners. Second, we review, collate and concatenate various Canadian compliance requirements. These requirements include a Canadian federal, PBMM protected, which stands for protected B medium integrity and medium availability. In addition to author review of privacy schedules and review of best practices, such as the AWS secure environment accelerator, ASEA. Third, we identified gaps between what we have and what we need, adding services and features to the existing stack in a modular fashion. We did not create the infrastructure of code template from scratch. Where's the fun in that? We use a tool called former two in order to automate terraform code generations. This will be covered in more detail in the presentation by my colleague Resinal at AWS. Fourth, security security security, penetration testing, security threat assessment and privacy impact assessment. The result of these efforts typically produced a list of remediations before code is released. We recognize that this is critical to have the ability to iterate and update on fractured code as efficiently as possible. Delays in doing so could cost lives. Again, I cannot emphasize enough how important the feedback process is and how that input needs to be incorporated directly into the DevOps workflow. Compliance requirements are thereby embedded as part of infrastructure as code. I'm going to hand over the presentation to my colleague from AWS, Reginald, to address the architecture itself, our findings before and after the review process, and how we implement those recommendations. Reginald? Hello Galaxy Community Conference. My name is Reginald Johnson. I'm a Senior Solution Architect with Amazon Web Services. I've been in the technology industry for the better part of the past 20 years, spending a bit more than half of it working with NGOs and nonprofit organizations, as well as public cloud supporting a variety of scenarios, including public health and research. Right now I work for AWS as part of a team helping nonprofit entities and healthcare organizations leverage public cloud to support their missions. I was asked by the SIGDEL team to identify how IRIDA can be run in AWS using infrastructure as code to deploy as scale. With a diagram that you see on the right, we identified a number of architectural issues. One key issue is perimeter security necessary to meet certain compliance requirements. Anywhere you deploy an application that will be collecting or processing health data, there will be a number of security or compliance requirements that need to be met. This is especially true for the perimeter but often extends to the interior of the workload itself. Perimeter security controls will be the simplest to me. Many of these controls are enabled by default by AWS and additional controls can be added to the environment as it's deployed and after it's deployed. These controls can include encryption for data and transit and at rest, intrusion detection, intrusion prevention, and protection from DDOS or distributed denial of service attacks. One way that we're able to optimize this architecture, provide greater availability and mitigate security concerns is to decouple the Galaxy workflows into their own pods on the cluster managed by Amazon Elastic Kubernetes Service. If you're not already familiar with the Amazon Elastic Kubernetes Service or EKS, this provides a fully managed control plane for Kubernetes environments on AWS. Think Kubernetes on autopilot where you're only responsible for managing the underlying nodes if you choose to manage them and none of the control plane. All of that is completely managed for you. Moving things to separate pods is going to enable us to not only secure portions of the workload and its workflows but also scale those portions as necessary independently of one another. This will enable us to mitigate the concern of compute node idling. How do we know if what we're provisioning is not only sufficient for the workload but not oversized so that we can prevent resources from running idle, incurring cost or otherwise not existing in an optomy state? To address this, we're going to have to balance not only the availability of the workload but also bursty or unpredictable usage patterns. Once the workloads are decoupled to separate pods, it does make it easier to revision and scale resources against the usage patterns of the pod. Because each pod is separate and may have separate usage patterns, we can scale one portion of the application without necessarily needing to scale the other, reducing the amount of underutilized resources. Additionally, we're going to implement an enforce encryption in transit and at rest for parts of the workload such as communication with Amazon RDS Postgres and Amazon EFS as well as a key problem in a workload which is going to be docker hub rate limiting. As we are using docker containers as part of the managed DKS cluster, we're going to encounter rate limiting with docker hub in any case where there is either not a subscription in place or a subscription impacted by rate limiting. As the number of pulls from docker hub increases, latency on the workload applications and specific jobs is going to be impacted. This is also going to impact deployment flexibility. Now, as we look at the refactor architecture, we're going to see a number of things that changed for this particular deployment. The first thing we mitigated would be the further decoupling of the Galaxy workflows into separate pods in EKS. From this perspective, the pods are separated for Iridda, Galaxy and Bio containers. Again, this is going to allow us to scale each part of the workload independently of each other and handle compute node idling problem. Each pod can be provisioned with the underlying compute capacity it requires, allowing us to mix and apply both persistent compute on Amazon EC2 and serverless compute using AWS Fargate where necessary without over provisioning resources anywhere. For encryption and transit and at rest, we're able to leverage AWS Key Management Service or KMS to manage not only the security keys for at rest disk encryption and object encryption, but also manage the TLS certificates that would be used for encryption and transit. This is used by the EFS share used by all containers in the workload, but also the underlying storage for RDS. KMS will also manage the TLS certificates and associated keys supporting encryption and transit for the connections to RDS and EFS. It's also going to manage their certificates presented by the application load balancer, which FreeKS can be implemented as an ingress controller. If we need to manage other pieces of secure data, that can also be offloaded to KMS for secure key storage. We've implemented AWS WAF at the perimeter to protect ingress from outside the application and outside the workload. AWS WAF will integrate directly with the application load balancer or ALB to block malicious traffic before it can be processed by the ALB, reducing load on the ALB itself, as well as the underlying workload. AWS Shield is enabled by default providing always on and managed Layer 4 DDoS protection for inline network attacks. With AWS WAF and AWS Shield both enabled and configured, the workload is going to be protected from Layer 4 network attacks as well as Layer 7 application side attacks. Now, on to the Docker Hub Rate Limiting Problem. We're able to address this concern using Amazon Elastic Container Repository or ECR. ECR is a managed container repository service with a feature known as pull-through cache. This will allow ECR to cache a copy of Docker images held in the third-party repository, accelerating your ability to deploy the mini-KS. ECR will also check the source repository for updates and cache those as well, keeping both its cache and your application and workload up to date on every pull. Now, with Docker Hub, you're going to experience rate limiting on the first pull in every subsequent pull. When pulling from ECR, the rate limit you would otherwise experience when scaling your pods no longer applies. That said, there will be a number of next steps that we have to look at in terms of this architecture. A lot of these are going to come down to how you reuse this architecture, manage your workload, and what direction you choose. A primary concern is going to be the sizing of persistent compute. This comes down to your specific workload and how busy it may be. When you deploy these workloads, you may need some persistent compute that is going to impact the availability of various parts of the workload. For anything control plane that doesn't need to scale, you may choose persistent compute to maintain availability and use serverless elsewhere for more ephemeral parts of the workload. What you choose will impact your spend up time and your availability. Larger resources may require more time to spin up and that will vary based on your usage patterns and how consistent they are. Another concern will be moving to object storage for data and execution. Right now, as designed, the Galaxy workflow will require using local or network attached storage such as EFS. Using Amazon S3 for object storage will allow you to reduce costs per gigabyte and scale as necessary for working set data during execution. But this is going to require some changes to Galaxy in order to enable that. Another item to address would be replication within and across regions. How available or fault tolerant does your workload need to be? This is going to be inherently supported by EFS, RDS, S3, and EKS. How you choose to implement is going to vary based on your requirements and your need for disaster recovery. Increasing availability for EFS and RDS can be done in a manner of minutes simply by enabling multi-AZ on RDS. For EFS, multi-AZ is enabled by default and the same for the EKS control plane. However, you define the availability of your pods and the underlying worker nodes. So if you need those to be highly available, fault tolerant, or provide for disaster recovery, that is something you have to plan for when you are ultimately designing the EKS cluster or scaling that cluster. You ultimately define how you want your workload to be replicated. Implementing this architecture in EKS is going to allow the components of the workload itself to be incredibly portable. You have the ability to replicate RDS across availability zones and across regions, as well as the EFS share and ECR repository used, whether that is using the replication features of the platform or simply using AWS backup to backup a copy of your data and then restore it to another region. Spinning up a new EKS cluster in another region can then be done in a manner of minutes. Now, how portable is this? AWS provides a feature to EKS called EKS Anywhere, which allows you to deploy an EKS cluster using any hardware and environment you choose. So if you have a requirement to use hardware on-premise or a private cloud environment, you have the ability to deploy an EKS cluster, connect it to AWS, and maintain a consistent control plane across your deployments. And any AWS service needed by the cluster can easily be consumed over VPN, allowing you to use the same features available to you if the cluster were running in AWS. Finally, we have to review local compliance requirements for security and privacy. There are ever-changing requirements for security, privacy, and various guardrails that would need to be applied to this or many other architectures. Through this review, we were able to apply a common set of controls. As the number or set of controls increases or is otherwise modified, we will be able to implement the additional controls at the perimeter as well as the interior of the workload and its subsequent applications. This is a key benefit of infrastructure's code. Everything in this deployment has been distilled to a template that can be easily redeployed in any AWS availability zone region. As new requirements are added, we can enable those requirements either by updating the infrastructure's code and redeploying or making the change imperatively via the console. You're going to make these changes. Time is going to pass. How do we integrate changes to the infrastructure's code template as time goes on and maintain things like version control, consistency, or simple auditing? How do we maintain the ability to redeploy? If you're not familiar with updating the infrastructure's code template, encounter difficulties doing so, or it needs some help, there are a few options. There are tools such as Formr2 that will allow you to ingest your existing deployment and output a template. That template can easily be redeployed using CloudFormation. It can be redeployed using Terraform. Comparing this new template to your previous one is as simple as doing a diff in your tool of choice. If you're using Git, you'll be able to see the changes as soon as you commit or push and revert accordingly. Now, what Formr2 is? Formr2 is an open source and freely available tool allowing you to ingest the components of your existing environment and export to a template that you can use to redeploy or modify as needed. What Formr2 is not? It is not going to grab the details from your non-AWS environments. It is not a part of your CICD pipeline, and it is not going to generate Kubernetes helm charts for you. It is a tool to help you map your existing architecture and easily redeploy to AWS using CloudFormation, using Terraform, or using your tool of choice. This is going to enable you to build a practice around capacity building. Soyin? This collaborative exercise led to capacity building and investment in our own informatics infrastructure. For example, we're streamlining our CICD pipeline to de-risk last mile problems and to increase our ability as a development shop to iterate quickly for real-world deployment in healthcare. I'm sharing two different architectures we're working on, one with the GitHub action together with infrastructure's code, such as Terraform and CloudFormation. The other one is using code pipeline in AWS. If you're a development shop or buy an informatics lab like us, these items should especially resonate. Our informatics infrastructure reflects the reality of working with multiple partners and environment, as well as security and privacy best practices. I'm highlighting some of the key areas such as working with multi-cloud environment, partnership and role separations as part of security best practices. With that, I'd like to close our presentation and open up for questions. The source code is publicly available in the following link. I encourage people to reach out and connect. You can find my contact information in this slide. Thank you. I'm going to use new people during awesome. Okay, I don't think we have time for questions. I'm going to kind of see more through, but I think everyone here can join me in that. Just the amount of respect for you coming here after that, that was this, this is what that was saying. Let's just give a round of applause. Okay, so let's move on. Luke Sargent is going to be up next. I'm going to make sure you have our time. That's where after the two years of using them, you think we don't need to ask for distance. It seems good. Zoom people, can you see this presentation? Awesome. All right. Luke Sargent is going to talk about stealing ballots in on the cloud, particularly data, local cancer analytics on all ads. Don't worry. I have to be Luke and that is the title of my presentation. So, so our lab does multiplex tissue imaging analysis in Galaxy. What you see on the right is not a lot of the technicolor seniors, but it's actually Bible scientific data. If you want more input information on that, I suggest you go to the target on Wednesday, a galaxy platform for multiplex tissue imaging analysis cancer research and translation that Cameron is going to give because we will get into the nitty gritty. I'm going to focus more on the infrastructural gist, which is that we have a set of workflows that have these computerized tools that operate on fairly large data that's fairly computational intensive. So we have some constraints we have to deal with as to be all in that the key data sets live on AWS and these requests are pace buckets and it's compute storage intensive to analyze them. Accessibility is also important for collaboration with external folks. And this is free. So, and I approach might be to say, okay, what is the BTS to Galaxy instance, I can create on a large machine and all this point galaxy also the Docker runner. Download what I need press change ago, except your sort of constraint by the most demanding tool that you have. So you're going to have this monolith that runs at your those high specs and you're going to be incurring this cost for something that might be sort of streaky and then what happens if you run multiple invocations of this tool so you're in for all to the most that you're willing to support it on time. You can use infrastructural resources like HPC. And we indeed do that, however, one of the downsides of that is that it's less accessible to external collaborators so. And you don't have quite as much control over what you're deploying you have the limits that you have to work within. And so it's often cheaper or free to lab to use what comes with those downsides and subject to congestion or what have you. And so we decided to go to a data local approach to go all in with AWS to compute around the buckets of stuff. And so it's obviously scalable that sort of they're stable cloud thing that you can throw as much resources you want at something. And it's as successful as you want to make it. It's much more customizable than say the restrictions that your local institution might impose on your computational resources and critically the intra cloud transfer is free and that's pretty good. The rest of it's not of course, and that's a bummer. And there were not job runners available initially for the services that we wanted to target namely batch and CS. So at a high level. This is the architectural details that included a glossary to the right as much as everyone loves acronyms I think we get enough of them in our life but those ones will get a point about it. We have a head know that runs galaxy and whatever associated services the proxy and etc. And key to all of this is the EFS drive the elastic file system drive that is a network file system that can be shared across various computational resources so we send jobs on demand to either. We can create the pending. Fargate serverless to make a runtime gets a little baby jobs and be pure jobs go to the two. And use those containerized tools. So that's sort of similar to some of the galaxy kubernetes setups that some of you like so like also uses a shared network files and stuff. This is a visual representation of the architecture so in the cloudy boxes AWS at large and then we operate in a single subnet so we don't replicate across stuff to save on costs. We have our galaxy instance on the bottom left within EC to running on the constant dedicated VM. As I mentioned before we are sending to the Fargate at the top right for another EC to instance for the larger things for containerized tools and all of this is backed by the best drive. So within the cloud we can transfer around and of course to the right represented by the kitty cat is the greater internet at large. So we can pull data from the internet, and that's free, we can pull data from the buckets, and that's free and about the end of previous. So we submit jobs to our gate or EC to, as I mentioned, the runners that we created just create template and submit the photo free. And backing it all is the EFS drive so we have this EFS drive that is mounted into any place and so by changes and one or reproduced in others. The glass half full component of cost the inter cloud transfers free. That's good in crisis free. We like that. You get a free be you let me know some other small things like a free be less happy for the static happy for your domain name some of their world storage if you need it for your party instances and 100 gigabytes transfer out which is not nothing. So, the takeaway from this I'm not going to read a whole bunch of numbers but the main components of the cost, probably starting from most agree just to least with the, the tool runtime, then EFS drive and head no runtime. And then there are other various things. So, a particular instance that we run on us west to EFS where we have to be performing showness is this, you have a constant of time to see to instance that we even might be over provisioning and we don't spend. So, the reserve instances that's a tweak will add as it's good savings, but right now we just pay a flat fee, because of reasons, a little bit more storage and then right now we're sitting at around three terabytes which is pretty like for a large server understand. As we grow this number will change. So, you have to practice it with different policies. And key to this figure, if you're dealing with EFS costs is the ratio of hot to cold storage so you have cheaper storage that slower and so we move from one to the other and the price that they side is like before but we find a practice that most of the data just sort of sits there and actively work on things are closer to a ratio, but below that you see that the tool vocation cost can vary quite a lot so it can be cheap for the little baby guys and be quite expensive for the beefier half terabyte RAM machines which is usually. Yeah. So to speed this up, this is roughly the. So we did a rough cost analysis of the dots represent the cost of just to that tool runtime, whatever the smallest capable instance is month of S3 storage, and then EFS over job runtime versus just pure interest cost which is the line, and by and large minus that one first amount layer seems to be more or less worth it. In the future, we will use five and since more that's something we're just starting to roll out dynamic destinations to make sure that we are more closely tailored to the resource requirements of each individual tools that have over provisioning which they cost to sink. So I'm going to head down for a data looks cool, but that's a question why I understand super well, and integration with external data comments so that you can utilize your permissions, whatever you have access to. You can look at it by actually going to cancer.comsgalaxy.org. Try not to be too malicious if you don't mind, but have we'll see. We have a runner that is being worked on to be added to the galaxy proper. And that requires a little bit of structural setup as it won't just work purely out of the box but we have asked for roles into development for that. These are the people I worked with. This is where it happened. Presentation. Hi guys do the hundred reasons we're going to skip Q&A until lunch. So, have questions, keep them in your head. All right, so moving on. And this Afghan is going to talk about open bio reference data catalog for galaxy. Good afternoon. Jumping right in. So we talked about reference data anchor buddy in the same place. What is reference data come from when that was something where you actually see it in galaxy and it's a little drop down on some of the tools such as mappers. And then you have this long list of a few hundred entries. And so the question is, what is that data data actually reside what does it come from today. That's on the CDFS as we refer to it in the galaxy world with the server machine file system, which is a week only in our case file system that hosts these two hundred fifty preferences that the guys project provides only about six terabytes of data. So this has been working great. It works great. But we thought, well, let's look at some alternatives. How can this evolve, given that we've encountered a couple of drawbacks with perhaps. So one of them has been this notion that it's centrally managed and it's transaction based so every time a change needs to be made there's one person that can do it. He's in this room. And so that made the community contributions kind of challenging. And the other thing is that we have a project to provide this locally replicated hardware and maintain service. And lastly for somebody new coming into the galaxy environment as a system, particularly CDFS is not kind of the average toolbox. And so with that in mind, we went to Amazon and looked at the registry of open data that they offered us. So this is a repository of open data sets from various domains. And the data is hosted by Amazon for free, including the egress. So we applied and we're accepted into this program, meaning that the galaxy reference data is not considered one of these high value data sets by hands on standards. And so what we do with that is we take the contents and see if that can be mirrored into an S3 bucket, namely the open data repository for Galaxy. And from there from now on, anybody can consume the same galaxy server can configure it and everything that you get off CDFS you can alternatively get off from the S3 bucket. And so, like I said, we kind of encountered some of these challenges with CDFS only solution so this isn't another resource, it's a mirror resource with the intent of solving these three different topics that I'm going to go into in turn. So first one is this notion of repository expansion, right, we've been talking about the community contributions to the galaxy's reference data repository for years. And this is something that hopefully helps so S3 is a common tool in many society's toolboxes. And so the lowers that are contribution bar because you'll put a file to a bucket and it's there, in addition to being completely galaxy. And what really also helps is that it opens possibilities where you can consume the public data that the project offers, along with additional data that exists in other projects, once Galaxy is able to consume data from a bucket, it can consume data from any number of buckets. This also works for private data that an institution may want to offer. It doesn't have to be only galaxy's reference data that is housed in this repository. It includes other things other than map where the C is that which is what we've done a lot of times include annotation files source in similar domain, but it can also include data from other projects so by conductor, a project that offers packages and data for the hard environment. And they have been using us in this project, and they have deposited compile our software and libraries, probably about seven terabytes of data at this point. And so, from our session, you can install this annotation package, for example, and which comes from AWS, even though as a user you don't really see it because benefit some of the stability that manages the three office, I'll mention that. The second thing is this kind of increasing the accessibility on file system repository and data that there's been it so registers domain of library.org you can go to and it's a client based browser that allows you to browse data that's in this repository so again, at this point in the house, so they are the galaxy so they have provided a doctor so you can see what's there you can see what's there. And the last thing is relying on this piece of scalability so we've done some benchmarking. And just to see how, how the jobs were formed so we set up two galaxy instances, one, both of these on the cloud platform one will see the FS based installation one was three back installation for indices. The jobs in front of only about five minutes or so, but the, you know, so the state is actually hosted in civic, whatever, in city. There's no simple benefit in fetching data from, from history as opposed to from our close to CVS, which means there's a lot of, a lot of bandwidth and only we have no storage infrastructure to maintain as a project and in principle we can also go over this. And so, and in summary, I guess, the idea behind this project is to explore an evolution of how guys reference data can be provided to the broader community. So the galaxy by joint in this case to provide some data same principles can be used to house and offer and consume data product buckets. They are available today to the next three at this time, you can book it to a galaxy instance, you can also use common commands like how you get the pearl to fetch individual files on and they're downstream application so they can do this from our studio. And in the future, maybe you can other API is for the workflow engines. And with that, I'd like to take the people that work with on this project closely and the organizations that may possible so thank you. Awesome. Okay, next up. Define gravity of painless process management for galaxy servers by a quarter. So a real big history lesson on how you run a galaxy server. 15 years ago, it looked like this you would run a number of processes that ran as paste server, which is a Python based application server. And it was so to load balance and more processes, we were just run more of these commands, which was not the easiest thing to do, but at least, you know, you just run this one one server in lots of different with different server names, and you get lots of servers so we can make this relatively different script that you're probably familiar with with running galaxy. But lots of limitations there. So, mid 2010s, we started to move towards US GI, which is, if you've run galaxy up until 2025. This is the server that has been running your galaxy system. And the nice thing that we got with US GI, lots of things, but one of the nice things was that it essentially acted as our process manager. So, before web processes, it would work off a number of additional workers, it would route requests to them itself internally. And then we use this feature in US GI called meals to run the processes that handle your jobs with galaxy. So this is one entry point. Instead of before we had to run these multiple pastry commands. Now you just run US GI, and it fires all the processes that are here. But okay now it's 2022. We want more features than what what we can get with US GI when you use web technologies asynchronous requests uploads proxy for galaxy interactive tools and so forth. And plus, there were some features. So you can do zero downtime restarts with US GI so that your users don't notice when you restart the galaxy server. And then we also had this ability to start the job handlers from US GI but couldn't use both of those at the same time. So, now, in galaxy 2025, you've got all these different processes. You know, US GI is on a unicorn, where it's running certain tasks with celery, your job handlers now run as this just independent. And, you know, main topic. We've got trusty for uploads and the, the node proxy for for interactive tools that's a lot to manage and we lost our sort of free process management that we had with each job. So, what are we going to do. We now have gravity. This is essentially a little wrapper around supervisor. So in addition to controlling supervisor D, it writes and manages the config files for super positive for you. It's also a project that I started working on in 2015 and then you know each year around as our sort of solution of choice myself work, but it was pretty much done, very close to done. And it is included now with with galaxy 2025. So when you run around OSH brown gravity to start galaxy. So why did we pick supervisor D. If you are a galaxy admin. You probably are familiar with it already. So many many galaxy admins have been using this data for a long time sort of on the galaxy servers. So how do you start run galaxy using grab. So, you can of course still run use run SSH. This is like the development way to to start from a galaxy server. But now, when you install your virtual environment of a galaxy, you also get this command galaxy, just like galaxy run it starts up you can see it starts the all the necessary processes. You can see the unicorn and celery celery V. How do you run it in the background as a damon. There's also a separate commands or galaxy is just sort of a front end running a foreground galaxy process. But there's this bigger utility called galaxy control that it's familiar to people who have used supervisor the past you can see that there's a control start this starts the processes and then detaches. It tells you where the log files are so you can go look at all of the logs of the processes as they run. There's a status command and just sort of all through the supervisor as you can see tells you what the processes are. And then there's a stop to to shut all the down. And then one thing I didn't put on this slide. Now, you can see here. Okay, we have three processes, just the genome corn celery celery, but in a production galaxy server you run separate job handlers as separate processes and to do that. There's a new gravity section of a galaxy dot animal. It's documented in the sample. Check it out. And you can configure handlers to run there. And now all you do is find out the update. And you can see it starts up and runs those processes. And now also do zero downtime restarts just like you put the USB third mode. This is thanks to a separate software and both unicorn herder that herds to unicorn processes, but it is integrated with gravity. And then you can change your application server from from the unicorn unicorn herder run update update and then everything just happens now you've got a unicorn herder process running, and you can run galaxy control graceful. And that will, as you can see, it signals unicorn herder to restart. But then it only restarts the the processes that it doesn't matter if they restart celery, so you can restart those. But unicorn herder state running just got a signal. So that's the stuff that we need to do still some systems some enhancements for people who are using system D. And tomorrow, come check it out. I'll show you everything about how it works. Thanks. All right, moving on. Thank you. Thank you. Thank you. So we're going to talk about the perspective more tech, which we'd like to call. Just a little, I think, because we might lose out today. So it's basically a dynamic job for galaxy for routine jobs to the appropriate destinations with appropriate settings. So I'm going to give you what you want it. And then Catherine will follow up with some real old examples of how to work in practice. So the basic problem that I saw with the new, the new job galaxy jobs to destinations. I might want to say things that has ways that appropriate for environment. So for example, you might want to assign specific number of code or memory for a job. You might want to route certain the assembly jobs to a high memory node. And maybe we'll check specific environment variables. And quite often we find ourselves repeatedly contributing these things across a large number of environments. So there's been prior work to handle this. We are already familiar with there are dynamic job groups that galaxy support, which is simple Python function that you can specify, and that function can make these kinds of decisions. Of course, there are some that's going to be hard to do. So typically, the galaxy has the support of something called dynamic to destinations, which you can see, which is a function of a job group. And that you can see you're using a channel file. So, so there are variants of this you can see you use something called the sorting hat. And use galaxy.org gives it the job router, but essentially the variants of the same idea of using a channel file and you get jobs. So, so we, with Galaxy Australia we wanted to do something similar and we ran into some problems that we need to environment as well. We kind of realized that we needed to have a more general purpose system that would be generalized and used by others as well. And that could potentially because we got to get a large number of us are known. We also wanted to be able to do a kind of mentorship. So that all these possible would be utilized more effectively. And we also wanted to reduce the amount of repetitive work we were connected to it. So basically, as you know, once administrative or galaxy instances, see whether we could combine this so that those are kind of new for watching goals of TVB. So, so basically TVB is also a valuable job group. And it's an extensible mechanism for routine jobs. So it comes as an extensible package built included with Galaxy 205 or you could be installed into all the versions. And it is in a similar way configured to your job. So, so let's look at some of the capabilities that TVB provides. We're going to look at five basic capabilities today, but there's the documentation on all its features on the docs. So basically the first thing I'm going to look at is how do you specify basic resource requirements for tool. So again, it's configured to a YAML file. And here we see up there, a simple tool definition for a bow tie tool. And you can see if you specify the number of cores and memory that you should assign to that job. And you can also list the destination in the same YAML file set available in your environment. So straight up the back, you might notice that you need to assign six cores and memory is an expression of course. So it's actually a Python expression. And in fact, all TVB fields are Python expressions. So they're evaluated at one time and you can have complex multi-line Python code if you so wish, but quite often they're just complex. And then the destinations are whatever is available in your environment and then TVB will automatically find the best matching or fitting destination for your job. So that's the basic thing. So next, what if you have more complex routine requirements? Like you want to send highman jobs to a highman ring order and so on. So the way TVB allows this is through tags. So it matches the tags on your tools and your destinations. And you can express preference or versions to particular destinations through this tagging system. So there's more on the documentation, but basically, so for example, we express a preference for highman net nodes and we reject all fine nodes. But in this particular case, in the destination, there's only one destination that matches and that's the slowing destination down below. So using this type of a system, we can also do meta-shadowing. So again, when you have a whole network of Pulsar servers, we could balance the jobs across multiple servers and TVB can basically create the status of each server and allow you to rank the least loaded destination and those jobs there. So again, it's an extensible functional function that you can add and your own code. So the next thing we're going to look at is how you conditionally adjust jobs, job requirements based on input parameters. So here, we see here this TVB provides an if condition. So you can conditionally make decisions and adjust with those as according. So here you notice that it's a simple Python expression again is input size greater than 10, adjust the code to 8 and memory to 32. And so you can you can use these embedded Python expressions. You can add sharing impact conditionally, or you can also basically fail with the user sending your message. So next, we're going to look at how do you reduce the repetitiveness of configuration. So here we have one tool and that tool is inherited by another. So it's a very simple mechanism so you can just make it a tool and override settings. And the nice thing about this is that when you override it, you see that code is 16, but men between Harry is actually an expression. So that's evaluated in a delayed fashion so you get 16 into four fair memory. Okay, so the last thing I'm going to look at is how do you further reduce repetition. So for that, we went to, so TVB supports loading rules from remote HTTP URLs. So you could have a whole bunch of rules and you can load multiple rules files, all to your just specify the job cost. And the advantage here is that we could potentially centralize all these rules and share them amongst the community. So that we don't all have to be discovered these, okay how many codes should be used for the high set job or how much memory should be assigned for it. We can we can contribute our experiences back to the shared database. So with that, I'm going to hand over to Catherine. We're going to talk about TVB in real world and applied in a real world. So Simon and Dylan started developing the total expecting vortex mid last year, and we deployed it on Galaxy Australia in November. And so the biggest change is that instead of mapping a tool to individual destination, we are mapping a tool to a set of destination characteristics and for each tool, we now have more destinations where the tools allowed to run. And this is a much better utilization of our resources. It's led to lower queue times, and far less admin overhead. And this is a graph of the load on two of our call sign nodes or over 24 hour periods, one week apart. The first one is before the TVB was deployed and the second one is after. And because the TVB is using feedback. From the load of the pulsars, the second graph, the load is tracked together fairly well. So the TVB is preferentially scheduled jobs to notice that we're more available. And for the configuration we have all of our tools in a YAML file starting with the default tool. The default tool is what we've defined for all of the other tools to inherit their values from. So the default tool runs with one core. It's mem is an expression. It's four times the number of cores in gigabytes. And because all of our jobs are running on slow, we've defined a native-specific parameter, which is substituting values for cores, mem, and the slow partition, which we've defined to be named to the default tool. But this might be overridden by the tools. The only scheduling rule for the default tool is that it can't run on any destination that's marked as offline. So to set a destination to offline, in this configuration, we have pulsar and pulsar B. Pulsar B has been set to offline by just adding the offline tag. And the configuration changes take effect immediately. You need to take note down for maintenance. You can just send it, set it to offline, and jobs will no longer be scheduled there. Another tool example is PICAD. We've defined a rule that matches all of the tool wrappers in the PICAD repository. So that's about 30 wrappers. And in this case, we don't need to explicitly set the mem value because that's inherited from the default tool. So the mem is just the number of cores times four. We can also put an environment variable in here that's using the mem value. Another example is AlphaFold. In this case, we've explicitly set the mem to be 160. This also uses the GPU. It's running in a Docker container and we can set the Docker parameters for the tool. It also has a funny scheduling rule because we only allow people in particular to use it. So the scheduling rule in this case is checking for non-membership of the AlphaFold group. If this evaluates to true, it fails with a user-friendly customer message. Sometimes we might want to give tools particular resources based on what the user has put in the tool form. So for example, Canada abundance distribution, the user has selected large animal genome as a sample type. And in this case, we know that they need far more memory. And this emphasizes the importance of having a shared resource of scheduling needs for the RC tools. And there's a bit of a further session. Tomorrow afternoon for building a global database of tool resource requirements. And thank you to everybody who called and gathered to Australia. And I know we're skipping questions, but ask us anything anytime about this. Sorry, I kind of messed up your time. All right, we've got one more talk before lunch. I'm going to introduce Reid Wagner. For some reason, the name is very popular. Okay, I'll take us home. So before I start, I just want to give a special acknowledgement of JJ because he's actually done the bulk of the implementation. And so he said he'd buy me two beers if I presented. Yeah, so I'm presenting on it, but yeah, I'll be talking about our implementation of a wrapper for right. So, right. It's an orchestrated software suite for spec spec on a mass spec based proteomics analysis. And so it incorporates like a whole host of software as you can see but the main powering tools and it's cracker, which is a proteomics search tool and it was actually I think presented on in depth at last year's PCC. So it's maintained by Alexi, which is these lab at the University of Michigan and in particular, Dr. Chow, you was attending PCC today, and he'll give you a talk sort of in more in depth on private. Tomorrow. So frankly was originally a job we've based. But since last year ahead of this one was added actually, so that it can be used in galaxy and similar sort of use cases. So I'm not going to focus too much on the technical details of crack pipe or the science and more so just our own implementation in the galaxy. So this is, this is frag, this is what it looks like. And I show it to sort of point out that it's highly configurable. And for us, but that means is because the, you know, the headless frag type on works mostly the same as the original application. So our work is primarily in the tool XML file and basically recreating this GUI in galaxy. And because it's so configurable it's given us some sort of interesting challenges to recreate. So this is what it looks like now our work is basically converting from this to this. Kind of feature I want to point out on that was interesting is one of the first inputs which are the LC MS files. So you can see, we've got four of these. And some frag type analyses. It's important to be able to group these files by the experiment that they came from. So in this Java interface, there's two different ways to do that you can click into these columns and sort of manually set it there's a few options to sort of automatically set it. So the question for us was how do we recreate this in galaxy because they're just in this exact same way or sort of designing this metadata or this grouping to input files. So we came up with in galaxy. So you can see sort of an obvious to these two groups there's an experiment column there. And so what we've done is basically create an input separate input section for each of these groups. So in one list where you assign this metadata, we create like a new entry for, for each grouping. And the way we do this is, oh, you know, this is this is a somewhat common sort of galaxy interface like putting an input, but we're able to create an arbitrary number of groups with this interesting tech called the repeat tech. So what the repeat tag allows you to do is basically take any sort of parameter XML that you have. And the user, it will automatically generate this button and the user can simply add an arbitrary number of repetitions of this input. So in this case, we had two groups, but we could add more. So you can see how it could be useful for all sorts of use cases. So moving on. Yeah, how this frag pipe uses a configuration file, which is, you know, not an uncommon way to configure it configure an application that has so many options, as opposed to like command line options or something like that. So, yeah, so, you know, a user might ask, well, how do you, how can we do this equally in galaxy. And also how can we incorporate default settings, which are feature of correct type. So the config file tag is really useful for this. So what this does is it basically allows you to write a block of template in code. And that template, or that code, once it's evaluated is written directly to a file, which can then be referenced when you're invoking the application. So here you can see on the left. This is just a very abbreviated snippet of our config file definition. And there's kind of a lot there so but it will sound to this in the end. You can see on the right, that's a little abbreviated part of our command element where we reference the config file with that sort of workflow config file. The other thing is that. Yeah, so like breakfast comes with a set of sort of predefined workflows like predefined like for different types of services open. You know, close nonspecific HLA, etc. And so we wanted to be able to like just sort of hit the ground running and run this without, I mean, we created all of them and we also don't necessarily need the user to input like every single parameter or go over every single parameter and so what we were able to do is. And so the workflow can be input by the user history, or, you know, we have defaults that are stored in the file system that are provided by track pipe. And in this code block, you can see that we actually depending on whichever one will read that in. And then we basically created dictionary of all the options and then in the rest of the tool XML file when we're setting parameters we use that same dictionary and we overwrite those values. And then we have a loop, and we write out every single key value. So, yeah, so so that's how we kind of talk about problem. We've got a lot of work still to do on this, and we will be working on it a lot of code test so anyone who's interested, please definitely stop by you can ask questions, help us get ideas. So, we've also implemented bio kind of recipes for I was right, so those are on as well if you want to check off. Thanks. So, I think it would be a good idea if the speakers for this session could maybe like take a table and field questions in like kind of one location. So sit at different tables. And if you're looking for Q amp a find your table with your speaker and interest. Let's give one more round of applause for all of us here. Go eat.