 I'm your host, Elliot, with a background of being at Bloomberg as the senior software engineer on the ML platform team. One of the maintainers of HERA, which I'm gonna tell you all about today. And joining me is JP. Hi everyone, I'm JP Zivillich. I'm CTO and founder of PypeKit. We are a services and software provider for the Argo workflows ecosystem. And yeah, excited to dive into HERA and orchestrating Python functions natively. Let's go. Here's the outline of the day. So going over first, why build on Argo workflows, then the Argo developer experience for some problem motivation. Then we'll introduce HERA and show off some of its features and example code for you to walk away with. And then we'll get into some case studies of how HERA actually helps in the wild. And finally, the main takeaways of today's talk for you all. So let's start with the what's and why's. Why build on Argo workflows? Spoiler alert, we are at ArgoCon. But for anyone new to Argo workflows, it's become the de facto Kubernetes workflow orchestration standard, bit of a mouthful, but it's beat out the competition because of its features and scalability robustness all built off Kubernetes. And it has benefits beyond the code, like the active and growing community that we've just heard about in the keynote. And with long-term support and being vendor neutral, we can trust it for the long term. So now actually let's look at some of the problems facing Argo workflows. Firstly, being built on Kubernetes YAML, our favorite language, is the default language to communicate with Argo, which can actually be a barrier to entry in a few ways. It's hard to test because it requires a Kubernetes cluster for Argo workflows. It needs to be installed. And then we know we can test workflow end-to-end, maybe manually, but we wanna be able to test every template. We wanna test the whole workflow and we wanna inspect it programmatically. But right now there's no easy way to programmatically do that. And it's hard to reuse code outside of these kind of heavy-weight workflow templates that have to be on your cluster already. What if we just want something lightweight, a couple of lines of glue code between our steps? And finally, it's hard to maintain long workflows simply because a single YAML file can become too hard to maintain. And if you want semantically versioned and auto-documenting workflow templates, for example, it's something you have to set up yourself. And furthermore, stats from the 2023 Argo Workflow Survey showed that only 13% of Argo workflows users were data scientists or data engineers. Maybe they prefer another language. In the 2021 Argo Workflow Survey feedback included, they need to, quote, convince folks that containers aren't scary and YAML isn't evil. And quote, honestly, something like a good Python DSL would be fantastic. So a Python DSL, why would we want that? Well, Python has a mature developer experience. Extensive libraries is the preferred scripting language for many developers. And I could go on and on about all its popular features. So how about we build on Python and leverage these existing solutions and features rather than reinventing the wheel for Argo, no pun intended? So how can we get the best of the Python developer experience while still using Argo workflows as our Kubernetes workflow orchestrator? Well, introducing Herra as the Python SDK for Argo workflows, it lets you write your workflow in Python, but it can do so much more. Every Python function can be used as a template in your workflows, which means you can test templates individually or the workflow as a whole. And Herra also provides a Python client wrapper around the workflows API, letting you interact with Argo workflows entirely through Python. So Herra's main goal is to let you focus on your business logic and make orchestration easy. So it's the all-in-one solution for you to write your business logic, orchestration logic, and submit all in Python. And the extensive documentation for Herra is something I'm proud of contributing. It offers a comprehensive introduction to Argo and Herra with a from zero walkthrough, from zero to Herra hero, and offers user guides for its more complex features. But enough talk, let's code. This isn't a demo, but... So let's take this YAML workflow to find over here. It kind of looks like this DAG diamond. It's running A, then B and C in parallel, and then D, and rewrite it with Herra. So first, we'll need Herra's custom classes to help you author your workflow with code completion. I know, right? So these classes, DAG container, parameters test, task, and workflow, these are all custom classes that Herra provides to help you write your workflow. And next, you'll see this context manager pattern. So for workflows, DAG steps and more, it mirrors the YAML syntax, so Argo experts should feel right at home. And then if you know Argo workflows, this Python version of a container template should make sense to you. It will run the echo command in the Alpine 3.7 image, taking an input parameter called message, which it will print to standard out. And objects like this container are automatically added to the workflows templates, along with the DAG. That's also a type of template. But these tasks below are running that container template we just defined, passing a message that matches the names. And these are under the DAG context, so they're added to the DAG. And we can sprinkle some syntactic sugar into this workflow to avoid some extra typing by calling the container template. Plus, we can describe task dependencies with the RShift operator. Pretty cool. But container templates, running commands. No, I want more Python. So lucky for you, in Hera, functions R templates. You just need to decorate them with this script decorator, and Hera will do the rest. Here, we're replacing the container template we just had with a function, running on the Python 3.12 image, and all it does is print message. Functionally equivalent, but now it's in Python. So everything else in the workflow is just as before. It's still a diamond. But let's say you don't want to keep flicking between your IDE and Argo CLI or the UI to keep starting your workflows. You want to just call a magic function that creates it for you. All that takes is a little setup, and then you can call create directly on your workflow. But before we get too carried away, there are still some missing pieces. Hera has to compile your Python workflow into YAML for Argo to understand it. And for script templates, that means we have to dump the function body into the source field of the script template. In Hera, we call this an inline script because the code is inline in your YAML as opposed to something coming up next. But our function is essentially typeless and practically untestable. And when Hera compiles your workflow to YAML, it helps a bit by deserializing your parameters for you, but we're still at the mercy of whatever the user gives us. So what if we could do better? So the Hera script runner is a way to run your code through an image built with your dependencies, including Hera itself, meaning your code runs exactly as written. It encourages the use of container tech because users need to build an image with their code in penises, which is pulled and run on Argo. And if you're curious about how it works, this is a quick look behind the scenes of the exported YAML. The command is still Python, but now there are Args. And the image has to be one that you built. And the source is the list of input parameters. And what this means, with the image intact, we can write type safe functions. So to use the Hera runner, we first need to specify the constructor as runner. And we need a template that we're going to build from this code later on. And same as an inline script, the function parameters are exported as input parameters to the template. But now, at runtime, Hera can deserialize and type check the inputs. And with the Hera runner, we can even return values straight from the function. Which means, drumroll please, we can actually test script template functions just like any other normal Python. But right now, you might realize we're limited to the basic types with JSON. You might be able to fit a dict or a list in there. But what about if we could do even better? Have you heard of Pydantic? It's my friend. It's a Python library predominantly used for data validation and it's able to convert between JSON strings and Python objects. So Hera actually already integrates with Pydantic. I didn't tell you that because it's how it actually validates your function and the type checking. But it also means that Pydantic classes themselves can be used in your function inputs. And what I mean by that are base models, if you know about Pydantic. So you get your rich Python type hints while you're writing your function. But it means Hera knows how to automatically deserialize your template inputs when running your workflow using the type hints of your function, ensuring type safety at runtime round of applause. So let's give it a try. First, we need a Pydantic base model class and here a very complex example of a rectangle with a length and width and a function called area that returns the area. I'm sure your business logic is much more complex. So with the Hera runner, we can now just use this as an input parameter to the function which means the underlying script template is now looking at this rectangle and we can test it just like any other function. So who else likes testing their code? Well, good news everyone. Workflows are testable. So in this example, we're going to set up a workflow and wait for it to complete. So I'm calling create just as before. I've left some of the code out. But then with the completed workflow, we can check the status and make sure the phase is succeeded and not failed. And we can even get individual nodes from the workflow and check their outputs. So that's enough of code examples. We have a few notes on the future goals of Hera to keep building on the Python developer experience. So bringing Python versioning and dependency management to Argo, things that don't really exist right now and would help teams with lots of interdependent workflow templates. But let's see how Hera is used in the wild. Beginning with Bloomberg as being a Bloomberg employee. So for some background, we're a financial news and data company. In the AI group, we have constant streams of data coming in that we want to use to retrain models and keep them as up-to-date as possible. So we call this model remediation or model drift. And even though Argo workflows looks like a strong contender to orchestrate and run pipelines for this model remediation, we saw very low adoption of Argo workflows despite even having an in-house team to support it. For teams that had adopted Argo, we saw YAML workflows with entire chunks of Python code dumped into the script templates which looked pretty untestable outside of end-to-end system tests. So how did we solve this? Being the ML platform team, we aim to provide a Python experience for our AI developers to use Argo workflows. So we worked on Herra itself for a major version 5 release which was a complete rewrite of the library. And we provide accompanying tools for Herra to help our Python AI developers get started. So things like a cookie cutter, which is a Python tool to give you some boilerplate starter code. So in this case, it's for a workflow template. We encourage all our developers to write workflow templates and reuse them. We also have a testing framework for them to test this workflow template that they're working on and iterating on. We also have a documentation generator for our workflow templates because as the ML platform team, we provide lots of workflow templates and we want them to have a unified reading interface. And we also ran workshops to kickstart our Herra adoption. And we have support channels where developers can reach out and talk to us. So the key takeaway for us was that Herra enabled our 40 teams in the AI group to adopt ARGA workflows and use it in production, running workflows that can look a little bit like this. And these do data collection, retraining multiple models and evaluating them at scale. So now we're here from JP. Thank you, Elliot. We got some nervous looks when you mentioned who likes testing your code. Sorry guys, your secret is safe with us. So as mentioned in the intro, my name is JP. I work at PipeKit. We are a services and software provider in the ARGA workflows ecosystem. So a problem that many companies run into is how do they manage ARGA workflows at scale? How do they build that self-service like data science or CI platform for their team? And that's what we help with. So we do a couple solutions, some of which are closed source, some of which are open source. I'll highlight one of the open source ones is that we're rolling out this PipeKit SDK within the open. And this is a tool that allows you to submit to multiple clusters from a single Jupyter notebook using Hera. And it also allows you to stream the logs back into that single notebook. And really from the vendor perspective, what we've learned serving customers is that we need to meet both the platform team and the data scientists where they're at. So platform teams, Kubernetes operators, we're very familiar with like pods, deployments, service accounts, all of the many Kubernetes primitives that we've come to know and love. But data scientists and other people who aren't yet in the Kubernetes ecosystem, they want to just interact with code, right? Like they enjoy Python. They want to stay within their ecosystem. They like Jupyter notebooks. And these are things where if you're trying to shoehorn these developers or data scientists into using YAML and the tools that you're accustomed to, it's a lot of mental overhead for them to overcome, right? Like they've already got to worry about like their models, data science and whatnot. So it's best if you can just meet both of these core groups where they're at. So we'll go over a couple of use cases that we've seen in the wild. The first one I'm going to highlight is one of our customers that we work with. This is a Curb Battery Intelligence. And what they do is detect when batteries are most likely to fail. So what they do is they analyze a bunch of battery data. We've given an ArgoCon talk with them previously where they highlighted how they're using a combination of Kubernetes and DASC deployments that are running on Kubernetes to achieve lots of parallelism. At first, they were doing this with just a native like YAML Argo workflow. And then over time, we've been helping them transition into Hera. Now, the problems that they were running into is that the data science team just didn't really like interacting with the YAML. It wasn't flexible. They would have to take their workflows YAML, hand it off to that platform team to make some sort of major modification to it. And then they'd give it back to the data scientists. That slows down operations considerably. Even things like setting resource requests and limits was just not as flexible and self-serve as it could have been. So the solution of adopting Hera has allowed for a lot greater flexibility. The data scientists are able to start using the DASC primitives that they're known to and are unit-customed with. They're able to update the workflows on the fly, set those resource requests and limits. And it's led to greater productivity broadly. Next, I'll get into an energy company that we've seen in the wild. This is a problem that we see in energy companies where they run algorithmic trading desks. So if you're an energy company, you need to normalize your energy prices. Throughout the year, energy prices are highly subject to seasonality or macroeconomic events, which is awful if you're doing business planning. You want a relatively flat price that you can plan for. So this company adopted Argo workflows because they needed to process lots and lots of historical returns. And that's a level of scale that most companies aren't able to function at without Kubernetes. So Kubernetes and Argo workflows is the obvious solution. But then since they're competing in a market against other algorithmic traders, they need to have their data scientists be able to iterate very quickly. And so they adopted Herat in order to be more self-serve, give the data scientists the flexibility they needed, and avoid handoffs between the data scientists and the platform teams. So I'll hand it back to Elliott to get into some of the key takeaways. Thank you. So for some main takeaways of today's talk, we've seen what Herat is, some of its core features, and how it makes Argo workflows easy to use for Python developers. Argo is still doing the heavy lifting of orchestrating your Python functions at runtime, so you still get everything you've come to expect from the Argo workflows project. And we've seen how across multiple companies Herat is trusted and used in production to help teams use Argo workflows and scale up. The tool on didn't read for today if you're still recovering from jet lag is Argo workflows and Herat can give you the best cloud native workflow experience. And if you want to connect with us, contribute to Herat, or just learn more, you can find us on GitHub or the CNCS Slack. So thank you all for coming. Good job. I think we have time for questions. You can take it. Cool. I think we have about five minutes for questions. Is there a common mic for people to use? I don't know if there is. If not, we can have people just raise their hands and we'll have you all ask some questions. Yes, for sure. I'm going to repeat the questions just so everyone can hear it. So the question was, why build on top of Argo workflows instead of another tool like Dagster, Airflow, et cetera? I'm happy to take this one. You want it, Elliot? Okay. So that is a good question. I think one of the core issues is the amount of scale. So Argo workflows is built on top of Kubernetes and uses all of the normal Kubernetes primitives that we're familiar with. We've seen that Argo or that Kubernetes can scale pretty well. It's not, you know, an infinite scaling solution, right? Like you can't just take a Kubernetes cluster and expect it to scale to, you know, infinity. But it does achieve a level of scale that might be a little bit tougher, especially with some legacy systems like Airflow where you're having to stand up like multiple parts, figure out some of them can run on Kubernetes. Some of them cannot. Argo workflows was built on top of Kubernetes from the ground up and is able to hit some scaling limits or scaling thresholds that some other tools cannot. Really then the question was, what is the developer experience like? And that's something that platform engineers refine with, writing YAML, right? If you're very accustomed to writing YAML, then, you know, that's great. You can still do that. But a lot of data scientists, just other engineers, weren't quite up to speed on it. So we saw a Flaview from Dyna Therapeutics start working on Hera and then Bloomberg adopted it and saw a broader community adoption improving the developer experience. Anything to add, Elliot? Cool. Did that answer your question? Thumbs up? More or less? If not, happy to talk after. Attend the next session. Yeah. Other questions? Yes. I don't think there's a table that exists currently. What's interesting is both are built on top of Argo workflows. So you're using the same runtime under the hood, but happy to dive into that a little bit more. Do you know off the top of your head? Do you mean are there like feature comparisons? So for the Argo Python SDK that's auto-generated, Hera also kind of auto-generates a lot of stuff as well. So you have a fallback mechanism in case we haven't implemented something yet. But we've been pretty up to date so far and there isn't really anything missing. All right. Other questions? Stogahats, three minutes. In that case, we are happy to wrap it up early. Thank you guys so much. Thank you very much.