 Hi everyone. My name is Flavio. I'm a senior staff engineer at Dino Therapeutics. I'm very excited to be here with you today to talk about how we're scaling gene therapy research and development with Argo workflows in HERA at Dino Therapeutics. It's not going to be the canonical deep dive tech talk into a piece of technology such as Arco workflows or Kubernetes but rather it'll be an illustration of some of the applications of Argo workflows specifically to the field of gene therapy and hopefully illustrate how we use it in a diverse set of use cases that will hopefully help your organization as well. Often talks will perform an introduction and then kind of end optionally with acknowledgments but it's my talk so I'm going to start with acknowledgments first. Then I'll offer a brief introduction to gene therapy and illustrate how we're using Argo workflows and HERA at Dino Therapeutics. So we started using Argo workflows about two years ago and throughout this time I think I've met some absolutely astonishing individuals in the community and I'm extremely grateful for all the interactions that I've had with you folks in the community and all the contributions that everyone performs that indirectly helps the field of gene therapy and patients globally. With that we'll talk a bit about gene therapy. So gene therapy broadly is a medical modality that is supposed to repair something that's broken or add something that's missing into cells. This statement of all organisms are made of cells including humans is universally accepted in the scientific community and as you can see in this diagram they're incredibly complex environments like you see all of these tiny dots these cells were stained so they fluoresced under a microscope and you see all these tiny dots that are components that contribute to chemical reactions and things of that kind that help us function properly. So if some of those components are missing or they're broken in some way you get all sorts of diseases. But delivering one of these genetic therapies to the right locations is actually very, very hard because these locations are for example the back of the eye the brain or the liver and it's just very hard to get there without actually encountering a bunch of stuff on the way. So this is what we do actually at Dynotherapeutics we're interested in solving this giant delivery problem. So we're taking these naturally occurring molecules called adeno-associated viruses or AAVs for short and we are applying modifications to their sequences in order to make them more be better suited for targeting specific tissues or specific cells. And we can do this because we're working with advances in sequencing or thanks to advances in sequencing from the past 20 or so years. So here I have a diagram and a seemingly a random string of letters that actually codes for that structure. So this is the structure that we know about for such a virus that's found in nature. So at Dynotherapeutics we apply modifications through the sequence such as this one I bolded MGND you can imagine changing that to something like CDNS to modify a specific part of the structure to make it work in just in a different way. Hopefully be better at targeting say the eye or the back of the eye. Now these sequences are actually generally 750 characters long and in each position you have about your alphabet size is 20. So you have 20 to the power of 750 possibilities for these sequences which is an immense number of strings to work with. Now thanks to biology we don't have to explore all of that but we can actually change these letters as I mentioned through software or at least the subset of them. And we do this via a proprietary set of algorithms that run primarily on Argo workflows. And at Dyno Argo workflows was the workflow engine of choice for these three reasons. We get immense flexibility from Argo workflows and we focus on flexibility because we noticed an increased uptake in tools such as Docker containers in the scientific community which allows us to take one such container and therefore run in our Argo workflows to solve a problem that Dyno has. Through the examples that I will showcase in a few minutes you'll also see a lot of fan out and a lot of fan in and to my knowledge I think Argo workflows is one of the few workflow engines that's actually very good at helping you orchestrate that capacity. And lastly the ability to integrate with other CNCF products. So when you adopt Kubernetes and Argo workflows you're not just taking those two products but you're actually opening up the opportunity to integrate with all of the constellation of products in the CNCF ecosystem which is really really empowering for organizations. So at Dyno I will show you that Argo workflows is the engine of sequence design so taking the sequences and modifying them somehow to generate more of these variants from this universe of available sequences. We're also doing biological data processing through Argo workflows so we're taking data from the lab and processing it in specific ways and storing it of course through data ingestion mechanisms that we have constructed. So sequence design. So the main problem of sequence design is to pick a starting point as I mentioned one of those naturally occurring viruses and then you propose some changes to it via whatever approaches and then you choose the best among them. So here I have two example workflows. The one on the left has about 500 pods running at the same time and those pods are actually all proposing modifications to those sequences kind of exploring this space of available sequences of available AAV sequences then choosing the best and then we actually have about 10 to 15 of those 500 pod workflows running at the same time in a sequence design session and the reason we chunked them is because of size limitations that I will cover in a minute and the one on the right is supposed to illustrate this nice fan in that we have constructed that takes the output of all of these 500 pod or so workflows and aggregates the outputs into a final file that is actually manufactured in that contains sequences that are manufactured in the lab. We train machine learning models as well for sequence design and selection purposes and we have a variety models that are actually trained on Argo workflows in a highly parallel manner. Now this workflow is again quite simple because we have 10 to 15 or so workflows running at a time for a model training session and these models are generally trained on four or so GPUs per Kubernetes node and I want to emphasize that a lot of our workflows are actually based around script templates and script templates are incredibly important at Dino Therapeutics because they offer us a giant opportunity for innovation because you can change script templates really really quick you don't have to rely on your integration system to create a template and then use it in your workflow instead you can rely on a script template and just modify the source code as however you wish for your specific experiment or design session and just ship it and this is important for Dino not only because of fast iteration but because we're also a research organization so we're not we don't have external facing APIs we actually have a lot of batch workflows and a lot of operational work that is executed on on Argo workflows now I mentioned that we parallelize these workflows quite a lot and the reason we do so is because the limitations with GRPC, ETCD and Kubernetes annotations now I don't know exactly whether the size limitations are exactly the same or like they're perfectly correlated but between these three and primarily ETCD and Kubernetes annotations we encounter a lot of problems because we pass you know giant config files via parameters or something like that and that's important for for innovation I know we can use artifacts but the flexibility of parameters is actually very important and we would rather chunk the workflow up than introduce the complexity associated with with artifacts now of course once you have a bunch of sequences things have to become a reality at some point so you have to take all of those sequences and you have to actually manufacture them in the lab and you actually have to test test them in cells and animal models which involves a lot of experimentation then we have to take those we have to evaluate the tissues of those animal models and we again have to perform genetic sequencing on them to better to understand how the variants that we are we have tested have actually performed in those animal models this also looks like a nice workflow maybe there's a future in which this is actually orchestrated on our Argo as well but we're not we're not there yet now working with sequencing data is actually very challenging that there are these like huge machines that are probably as as tall as I am they're incredibly complex and they their output is actually a highly specialized a series of highly specialized files and they span the sizes of 50 gigabytes to terabytes or something like that so you need a lot of specialized software to parse those out and this is where we see a clear impact of Argo workflows at Dino because it allowed the team that is actually responsible for this parsing software to implement some quite extraordinary workflows that I will that are on the next slide and you'll see hopefully how important fanning out and fan out and fan in is through those workflows so through this example I would like to illustrate some of the scale of these workflows here we see so they're so large that I had to chunk them up because they didn't fit on my screen so they start from the from the left to the right you see that we're processing about a hundred of so or so complex branches such as the one in the middle and on the right in in in a parallel manner and all of those 100 branches are actually fanning out further into 90 or so pods in the middle there and ultimately everything is aggregated into a final output on the right hand side and this one I would like to further showcase the complexity associated with some of these workflows so these files because their texts are highly formatted you can process them in a highly parallel manner so we often have these workflows that are that have this very beautiful fan out fan in and fan out and fan in pattern ultimately culminating into a single pod that actually creates the final human interpretable output there's but there's a lot of processing that happens on the on the way to that now once we parsed everything out we have some computational biology workflows that run in order to compute the performance to aggregate all of that parsing data and construct the performance representations of these of these viruses that were tested in experiments and here we see again high parallelism 150 or so pods but we also see we're using things such as exit templates for or on-air handling for for instance we here we do some we check the data we enrich and insert chunks of data and then we put them in some storage solution we record some metadata we test the queries but if any of these steps actually fail we roll the whole thing back all of that data gets deleted because it's it's more important for dyna to have correct data than partially correct data now if you're the type of organization that has a similar operational workflow in the sense that you start with something and then you feed it through multiple steps and then you use the final output into your initial one to keep improving argo is actually very empowering for that type of pattern so in dyna's case we start with data with some data collection we build some models then we design some sequences um and once we design those sequences we have to select them among them and then we have to perform some experiments and then we parse the data to construct these performance evaluations but that means we have now more data which means we can build better models and we can design better sequences and we have more selection opportunities and we keep going in this loop until we meet whatever goals we have set for a specific research program so if you have similar workflows not necessarily in gene therapy you can take advantage of argo workflows to model this specific business domain when we adopted argo workflows though it wasn't that easy to actually use so a lot of my a lot of my colleagues don't come from an engineering background um they're primarily um scientists who focus exclusively on solving a specific problem and they don't necessarily care about the workflow engine they just need to to get to get the job done so it was actually very hard to work with the ammos um at the beginning and other sdks that we have tried so we've created hera which is a python sdk that's released under argo project labs and the reason we created it is because we wanted to take advantage of the python ecosystem at dyna we primarily use python for the for all of the data science ecosystem that it has um that that that python supports and that gives us the opportunity to have code over yaml code is much more easy it's much easier to check it is much easier to interpret it's much easier to share and understand what it actually does and also also talk about it uh conceptually hera at the same time while a python sdk it still showcases love for yaml because argo workflow still ultimately relies on yaml files so if you have uh get ops patterns that rely on yaml files to define templates for example you can still use hera specifically for that and the biggest win that we have noticed with using hera actually is it has created a more diy do it yourself uh environment in which your people create their own workflows rather than rely on a do it with me or do it for me workflow which is clearly not sustainable and i like to call hera offering as as as being an sdk that offers complex simplicity in the sense that it's simple to use but you also at the same time have access to the full all the lights um to the the full feature set of of argo workflows and i have a very short example i have the canonical hello world example that we see in multiple tutorials of argo workflows where we create uh this this nice diamond of four tasks that has a bcnd so here i import a dag workflow in a script i decorate my function and notice that i still get flexibility to orchestrate my uh the workflow myself i'm not forced to have a linear program by decorating my functions i just simply decorate them um with perhaps some um parameters of script templates and then i simply call them with the familiar arguments or or name parameters that are that are that are um that come from argo workflow tasks or steps so overall we've seen great success at dyno uh with applying our workflows to the field of gene therapy it has allowed us to scale our research and development efforts to degrees that we may not have been able to do so uh with other workflow engines and i want to emphasize how important the the ability to integrate with other cncf products is for dyno because we don't have a large engineering group and therefore it's very important for us to take advantage of all of the open source uh offerings that are out there and take and use everyone's contributions to solve problem um in the field of gene therapy now of course there are still loads of problems to solve not only in argo workflows but in the community um broadly and i'm i'm very sure that as a community uh we're actually going to tackle uh them all together that is all i had for you today if you have any questions i'm happy to address them i will be around for the week so i'm happy to chat about Kubernetes gene therapy or anything else for that matter any questions okay i actually do have a question yeah the thing is you're using uh the argo workflows ui at a large scale so have you ever like try to give feedback to to the community how how we can improve the ui yeah is it we have noticed the so the question is whether we have given feedback to the community with respect to the ui so we did notice some pain points with using the ui especially at scale we haven't had the opportunity to discuss improvements to the ui with anyone yet but one thing that we are excited about is plugins um because we we might be able to construct our own plugins to display our own data in in our course as far as you know okay please do supply us with feedback i mean we have guys that will basically we can push them to to start looking at it and improve workflows ui awesome thank you anyone so um question is is is the part of the team that does the actual um research on gene they do leverage the ui or the the server or the the information coming out of the server how do you how do you serve that uh to them we do take advantage of logs a lot um because people are very familiar with for example the python logging um package so they rely a lot uh on that but other than that i believe we're very comfortable pushing files to something like gcs so we're not really using artifacts and then serving them in specific ways but rather using what's already familiar to uh to us and rely on on argo as the execution and orchestration engine of all of the work that we want to do thanks thank you