 I'm Elliott. I just gave a talk about Hera. I'm a senior software engineer at Bloomberg. I'm Alina. I work at Centrega Integrating as an ML Ops engineer where we help support our algo traders and analysts during their work. Hello everyone. I'm Sambhav. You can call me Sam. I lead the ML productivity team at Bloomberg. I'm also one of the maintainers or co-maintenors of Hera alongside Elliott. Nice to be speaking here at Argo Convert, Argo and ML. We're excited to have you all. Let's dive in. What are the challenges with machine learning and orchestration that shall have been running into generally? Sam, I'll kick it to you first to get Bloomberg's perspective on the topic and then we'll pass it to Alina. Yeah. So for those who are not aware, Bloomberg is a financial data company. Most people associate it with the news channel, but we sell or we provide high-quality data and experiences to explore and track and analyze the data. And as you can imagine, with this large amount of data, we also provide various AI integrations in our products to help you better explore and make sense of that data. So in order to really highlight the challenges we face in terms of orchestration and MLops with AI, I think a few things are important to note. So models, unlike code, tend to decay over time. So they often need to be continuously trained and deployed, which is what we call the model issue detection plus remediation cycle at Bloomberg. The issue with these remediation cycles is that they're almost never fully autonomous. They're always human-in-the-loop workflows that often require domain experts to come in and do appropriate QCs, provide appropriate inputs before you can hand it off again to an automated pipeline. So apart from all of that, the AI landscape, as you know, is evolving and rapidly changing. So every single day you have new ways to extract your features, train your models. What stays common is the actual step itself. So I guess your training code might change, but the fact that your training code accepts feature vectors and outputs a model remains the same. So you need an orchestration layer that can glue all of these steps together, can remain stable and flexible to all of these changes. Thank you, Sam. Alina, could you dive into Centrica's experience with ML and orchestration? Yes. So I guess I should sort of preface this with a little bit of an explanation about entity trading if people aren't too familiar. So essentially we have, we trade in some different markets, and that means we have some very sharp deadlines. For example, for the day ahead market, you have a deadline at 12 noon every day. And that means that we run a lot of, we have to run a lot of crowns, essentially like 1130 to get on the newest data. We also trade in different, like other markets, like once every hour, once every 30 minutes, every 15 minutes, where we have a lot of data and we have some machine learning models that need to make new predictions on that data as close to deadline as possible. So we have this very spiky computation need, essentially. Yeah. Thank you. I guess that factors into why choose Argo workflows. It seems like there's some obvious tie ins to where the problems that you're all are outlaying are things that Argo workflows might solve. Sam, could you dive into Bloomberg's perspective on that first? Yes, sure. So Kubernetes is not new to Bloomberg. We built our Bloomberg ML platform on top of Kubernetes back in 2017. So we have seven years of experience scaling and building multi-tenant ML platforms on top of Kubernetes. So when it came to choosing an orchestration platform, I think Argo fit in perfectly. So it brings in all the same things that Kubernetes does in terms of like multi-tenancy, in terms of scalability. And most importantly for us, like render neutrality, which is we can be sure about the future of the project, but we can also come in and contribute and help with the project. For us, apart from all of this, like in terms of the core requirements that Argo really satisfied, a few of them were multi-tenancy. Like we needed ways where our AI developers could sort of self-service and run their workflows without affecting other teams. So these are models they often have to work with like data. And a lot of this data is sensitive and we want to go by the principle of these privileges and ensure only the people that need to be accessing a certain kind of data set have access to it. So Kubernetes provides really great primitives for multi-tenancy, for workload isolation, resource quotas, et cetera. Apart from that, some of the other features that Argo really has going for it in terms of the ML space and human in the loop workflows that I talked about are artifact visualization and intermediate parameters. We use that heavily. So artifact visualization is great because if you've trained a model and you have a report out on the evaluation metrics, you can see it right there and then combine it with intermediate parameters to just figure out, okay, do I need to retry this loop with a different set of parameters or am I happy to proceed to the next step? All of this coupled with the fact that it's declarative built on YAML means that we can take it and adapt it to any sort of user experience that we want. So it's not truly tied to one language. So for some set of our users, Python is important. So we've been able to provide an appropriate user experience using Python and Herald. It seems like the strong multi-tenancy with sensitive data is a common thread between the two companies. Alina, could you touch on Centrica's Argo workflows adoption? Yeah, that was a very long considered answer. We maybe just chose it because it seemed... Had a good logo. We wanted a fancy crown drap and we're like, yeah, maybe this one. But it turned out to be a great choice. And yeah, especially the multi-tenancy was really important to us because of course we have different teams. And a fun fact about entity trading is even within the same company, these teams often compete. So we need total privacy, including from us in some cases, where we cannot even see the code or we don't want us to. We technically can, but there's not tell them how many of this work. It's fine. And yeah, so what is it that was important? We needed crown jobs that could spawn, especially like big fanouts were important to us because we have a lot of data sources and then we have to make predictions on those separately. So Sam touched a bit on the flexibility and building out some Python functionality on top of the Argo workflows native YAML. I want to turn it over to Elliot. We just gave a talk on Hera introducing it, but for people who weren't in the audience during that time or people who are watching on the video later, could you just give a quick summary on what is Hera? What is the role of Python on top of Argo workflows? Yep. So Hera itself is a custom written Python SDK for mainly Argo workflows with some support for events. And it provides all the extra functionality that you need to communicate with Argo workflows entirely within Python. So you can write your functions, which become script templates. You can orchestrate them in your workflows and then you can submit them. So that's all in Python. And yeah, what's the second part of the question? What is the role that just Python generally play? I mean, especially at Bloomberg. Yeah. So maybe this is our room full of ML engineers and data scientists and you're like Python. But if you want to use Argo workflows, it's pretty hard right now. So you might have steered towards airflow or some other alternatives. And at Bloomberg, we kind of thought we have Argo workflows and we want to use it, but our ML developers are a bit resistant to using it. So how can we bridge the gap between them? So Hera was the answer in this case. That's excellent. So both of these companies, Bloomberg and Centrica have adopted both Argo workflows and Hera. Now let's talk a bit about what the impacts of that choice have been. Sam, I'll turn it over to you to talk about how has Bloomberg been affected and changed from the decision to adopt Hera specifically? Yeah, I think the Hera adoption was a massive turn in terms of Argo adoption in the company, especially for our ML teams. So we have a workflow orchestration team in-house that provides Argo as a multi-tenant offering to the rest of the company. The initial set of users for that platform were largely CI CD use cases or cloud deployment pipelines. Now, despite the fact that this was a well-supported platform offering, it was not gaining a lot of traction with our AI and ML teams. And I think even after all of this, the AI teams were so entrenched into Python, they would rather spin up their own custom airflow clusters rather than use the common platform offering. Now, Hera was a massive change or shift in that. Like as I said, we have a lot of experience building ML platforms on top of Kubernetes, which means we know the audience, we know what they're like, and we know how to adapt Kubernetes paradigms to Python. And Hera was a very, very, very, very, very, very, very, very, very, very, very, very, very, very, and Hera was pretty much our answer to that. To give you some numbers, we have an AI engineering group of over 300 engineers who now use this Hera. We've been able to migrate and even have multiple teams adopt continuous training and deployment pipelines because of Hera, so that's over 40 teams. And yeah, in general, people are very happy and productive. We have a support channel which gets more than five, 10 questions a day. So yeah, it's been great. Ironically, if the support channel is active, it means that people are happy and using the product. Alina, could you dive into a bit on how adopting Hera has impacted Centrica? Yeah, so we sort of similar to you guys, we also set it up for them and just handed it off to them with the YAML and been like, okay, you can go do that now. And we had some brave, early adopters who were facing that challenge and we made YAML workflows. And then it kind of stalled out a little bit because people would rather just use Jams or whatever we're used to. And then I was actually asking around, like, does anyone know who found Hera? And I don't think we could come up with an answer. We think maybe it was one of the data scientists themselves who found it and independently started using it. And then it sort of spread like wildfire until we had to support it officially also. And they really love it. It's made a huge difference. Even the people who were the early adopters who use YAML, they now also use Hera just to work together with their teams better. It's amazing to hear how one developer's decision has really impacted the broader company and led to such a positive outcome. Flipping to the other side of the coin, like what have the challenges been for adopting Argo workflows in Hera? And how have y'all worked through it? Sam, I'll turn it over to you. Yeah, I think like it's no surprise Argo has been tested at scale at multiple companies. I think where we found it really challenging to adopt Argo was largely the developer experience, especially for people who are not that familiar with Kubernetes paradigms. So to give you a bit of background, we operate very differently as a platform team. So rather than just developing platform features, we develop a feature. Then we go through a cycle of actually working with our tenant teams to help them on board onto the platform, gather feedback, requirements, feature requests, iterate on them, and then continue the cycle again. So largely for us, there have been two challenges like around the developer experience aspect. We've tried to solve it in two different ways. One is attacking the core of the problem, which is like there are a set of common functionality or workflow templates that we provide to our users. And we work with the team so we know what their requirements are and what kind of workflow templates they're developing. And then as we see enough common set of requirements bubble up from multiple teams, we take that on as a platform offering and provide it as part of our internal workflow template library. So one of the aspects is that the other aspect is the developer experience around Hera when we cannot actually provide functionality out of the box. So Hera is still maturing around its developer experience aspects. I'll hand it over to Elliot to talk about how we're tackling things on the Hera front. Yeah. So as a first time open source contributor, Hera is the first project I've kind of been maintaining. And it's been interesting to see all these kind of patterns coming out of people using it and how they're using it. Like we didn't even think people would be using Hera entirely to avoid YAML and interacting with Argo entirely through Python. So that's been interesting. But another thing is especially that in Bloomberg, I can see the code that people are writing. And yeah, we've been seeing kind of design patterns coming out of this. There's the Hera design with all these context manager things. And it's like, maybe we made some missteps here and there, but overall the project is maturing. And we're getting there to figure out what we need to do next. It's interesting seeing how closely you all get to work with the developers at Bloomberg and build out that workflow template library to help them down the golden path. Whereas Alina, it seems at Centrica, the teams are competing against each other. And so you have to may or limit yourself to may or may not see the code that they are working on and give them that privacy that they need. How is that contrasted a bit with, yeah, your experience where Hera was adopted like wildfire and you have to just kind of let the data scientists do what they do. Yeah, so it's pretty different because it means that it puts a lot of pressure on the data scientists themselves too, because they have to very independently make their own workflows and make it work. Of course, we help if they ask us to, but generally we are not really that hands on and we can't be. And luckily, I mean, Hera has sort of, yeah, it's gotten better. It has very nice user friendly documentation at this point the last year you made, and it's really good. But it's almost like too good because a lot of people using just Hera, they never interacted with Guillermo, and they don't really know about Agro's core concepts, because they haven't had to most of the time. Often within the teams, they share code a lot, and that means that everything just works until it doesn't, and then we're not really sure what's going on. Can I quote you on the documentation being too good? I think that's the only time I've ever heard that in an open source project. I mean, you link more to Agro documentation now, so it's actually getting better. Okay. Interesting to hear. It's getting just good instead of too good. All right. I won't continue riffing on that front. Anyway, so our final topic that we want to dive into is kind of about the future. So the ML AI field is really quickly evolving. What do y'all think is in store for the future of ML and Agro workflows specifically? And what does a complete ML stack look like? How does Agro workflows play into it? It's a great tool. It's not the end-all, be-all platform where you throw it at your stack, and then ML is done. So, Sam, I'll toss it to you, and then we'll have Alina get the final word in. Yeah. There are a couple of questions there, so I'm going to tackle it one by one. Please. First one is in my opinion, how do I see Agro workflows and the ML field evolving in the next few years? So I think at least for Bloomberg, as I mentioned, like Hera adoption and just Agro and workflow orchestration adoption has spread like wildfire. So we have hundreds of models, all of which are being continuously trained. Now the problem becomes like how do we track which model is deployed, and how do we make sure that we have an accurate picture of what data went into it, what parameters were used. So it becomes a sort of deep cataloging and provenance problem, not just for finding out what's running in production, but potentially what issues can occur and how can we fix it, alert people and so on. So yeah, provenance tracking, I think, is becoming really important. And I think workflows or the orchestration platform has a big part to play there. As the piece that's gluing everything together, I think it's the perfect source of truth for figuring out what exactly ran and connected multiple steps together. So if there are some deep integrations between like being able to export workflow parameters, outputs, inputs to a common ML metadata store, where the user doesn't even have to write the code, it just automatically exports itself. I think that'll be great in terms of provenance tracking. But yeah, I think the other question was like, how do we see this fit into the entire stack? I think the other side of it, which we haven't really talked about is production monitoring, eventing. So yeah, I think this is one part of the problem, which is being able to catalog everything. The next part of the problem is being able to automatically detect drift issues and then redo the cycle all over again, where you can continuously train and deploy without potentially humans in the loop. Getting to a fully self-serve, automated system. It'll be good to see when that future comes about. Alina, I'll pass it to you for your thoughts on the future of Argo workflows ML and where Argo workflows fits into the stack. Yeah, much like Bloomberg, we're definitely also looking into like production metrics and monitoring and tracking of different models, different assets, what we're doing. I'm not sure. I mean, Agro sort of leaves off at that point. So it's an actual next step, I think, for every stack that we just need a solution in place. Beyond that, we're also looking at Argo events testing that out. So we can just react directly to new data coming in, rather than relying on crons that are hopefully after the data came in. So to speed things up a little bit. And also some visualization of metrics, hopefully data scientists friendly, that we can set up themselves, especially for the development of new models and stuff. Metrics and observability seems to be a hot point. Hopefully there is some progress on that in the future. Yeah, once it works, you have to actually check that it works, like track it, unfortunately, it turns out. And maybe test things once in a while. We should try and convince the data scientists of that. It's a process. All right. That's all the questions that I had for y'all. Thanks everyone for coming out. We do have four minutes to do some questions from the audience. I'll hand out a microphone to anyone who wants to ask a question. So we'll open it up. Does anyone have a question? If so, just raise your hand and I'll come find you with a mic. All right. So three of you used the phrase that Hera spread like wildfire, and it seemed like adoption was smooth and effective. Despite that, this question is directed towards Elliot, but any of you can answer. At Bloomberg, when the 300 engineers and different teams adopted Hera, was there any resistance? And how did you meet that resistance? And perhaps how will that affect how you improve the platform and the Hera for the future? Okay. Yeah. An interesting question, because I think we went into it with goodwill. So once we kind of publicized that we were working on this project and we would have a new way to use our workflows coming out, and then we publicized that there'll be a workshop for you to get stuck in and get a kickstart to using it, I think that helped avoid any major resistance. And then within Bloomberg itself, I think we kind of directed towards certain teams that we knew kind of would benefit from being able to write these model remediation pipelines more easily. So they were our initial adopters. And then teams started adopting it by themselves, at which point I was like, okay, well, so yeah. All right. I think we have time for one more question. Yep. I know this guy. Hey, what's up team? A quick question. I was wondering, yeah, if you have any vision for Hera to expand to any of the other Argo projects and any thoughts on if that's possible or what you're thinking there? We've certainly explored the idea, I would say. So I think I think Elliott mentioned it in his previous talk, but we do have some support for Argo events. I think largely sort of focused on workflows largely because I think we can improve a lot there. And that's what we are most familiar with at this point. But yeah, definitely, I think we have seen a lot of response from the community that the way that we have married Kubernetes concepts, especially with by Nantic and some of the features like script on those sort of makes these platforms and systems more Python native and people want the same experience for other things. And yeah, like, we're certainly open to it. I think it largely depends on where the community drives us. All right. I think that's all the time we have. Thank you so much to our expert panel. Let's give it up for them. Thank you to our host JP. Thank you. Thank you.