 Hi, I'm Dan and Baker. I work at Johns Hopkins as a member of the galaxy team and I contribute to the back-end working group that I'm representing here This is a short talk in our series covering recent efforts. We've made towards modernizing the galaxy back-end I'll cover Decomposing some of the work that the galaxy server does into smaller units for asynchronous consumption by a task queue We're doing this to both improve the end user experience and increase the scalability and robustness of the platform Okay, let's start with a good example of one of the primary problems. We're working to solve here Galaxy has traditionally performed many maybe even most activities other than actually running jobs within the synchronous request response web or API transaction To make this a little more concrete consider the diagram on the right here When a user in the galaxy web client requests the purging Which is actually deleting off the disk of a data set The server receives the request and then does the work all of it including the disk IO and updating the database and Then responds with a confirmation that the request succeeded and then that's what you see in the galaxy client This has worked pretty well until recently with advances in the number of data sets the galaxy can process and Now it can be a serious bottleneck for any functionality that performs file IO or expensive database queries Thinking about the data set purge task, but considering The tens of thousands of data sets you can have in a collection from a modern art Analysis pipeline the synchronous response when you click to purge a collection can take minutes or more I probably don't have to tell you that clicking on something in the interface and just waiting for something to happen for Tens of seconds or even minutes without any feedback is pretty bad Even worse for very long requests the galaxy proxy server Can think galaxies dead and drop the connection returning an error to the client even when it's actually still working on the back end Okay, so what can we actually do about this? The gist of it is that we need to refactor galaxy to encapsulate these expensive units of work as tasks that can be run asynchronously whenever appropriate In this diagram I've added two more components a queue and a worker process The client will now make the same data set purge request to the server But instead of actually doing any work the server just logs a task on the queue with the right parameters and then tells the client Yeah, got it. It'll get taken care of The worker is watching the same queue asking if there are any tasks it can handle when one comes in It does the work and the task is marked as completed however long it takes Without blocking the user if you're familiar with the galaxy code base You might be thinking that we already have a queue and a queue worker. We do kind of These older modules use the kombu library for messaging But those tasks are handled as a part of the existing galaxy process just in a separate thread It turns out it's not as flexible as what we need and what we implemented there ended up being mostly useful for inter-process communication Celery is a library that sits on top of the same kombu messaging layer providing some very nice abstractions for Formalizing chunks of code as tasks and distributing them and making sure they get taken care of Okay, let's take a quick look at the code for this focusing on the purge data set task again The left side is the original code for purging a data set taken from the HDA manager It gets a handle to the user figures out how much this data set purge should affect their quota and synchronously calls the method To actually delete the data set from disk and flesh out the results to the database The code on the right is kind of what it looks like after refactoring on the top. That's the purge method now It checks to see if celery tasks are enabled And then just dispatches a task by calling purge HDA dot delay The purge HDA method is below And that's the celery task wrapper itself It hopefully demonstrates how easy some of this refactoring can be the wrapper is just a function that calls the same exact code You see on the left just now as HDA underscore manager dot underscore purge with the the right parameters So looking at the purge HDA task again You probably notice the two decorators and additional typing in the function signature The first decorator celery app dot task just defines the function as a celery task and says that we don't need a return value from it Pretty standard celery stuff. The second decorator is where some magic happens and that's a custom decorator We've added uses dependency injection to bind an appropriate context to the method from wherever it gets called in this case from the celery worker I just wanted to point it out here because this is really cool new stuff in galaxy And if you'd like to learn more there's a much deeper dive in the architecture content So wrapping up you can try this right now You'll need to enable celery tasks in your galaxy config and then you can start galaxy up using circus to handle the Galaxy and celery process management for you Circus is the other new bit of the stack that we don't have enough time to go into but in short It's nice python process and socket manager kind of simpler supervisor That allows us to manage the seller worker processes and maybe more in the future The issue linked here is where we're collecting tasks that should be offloaded to celery if you'd like to take a look to help out or Recommend particular pain points you might have noticed and Lastly some specific feature work that we expect to be pretty impactful is to rewrite job Handling and workflow scheduling is tasks right now both of these processes are fairly rigid But indeed composing this work as tasks we can have much more flexible deployments as an example in a Kubernetes-based galaxy you could have robust Horizontal scalability pretty easily by linking the number of celery worker pods to the size of the task queue Your deployment would then autoscale up to meet an influx of requests or workflow submissions and then back down as the queue goes idle All while leaving the now thinner galaxy core web app running at high performance. Thank you