 So yeah, my name is Camilo Ortiz and I'm presenting with Philip how we use Kubernetes in particular to speed up over-the-counter trading. Hopefully it's a little bit different of what Ed mentioned, but let's see how it goes. Yeah, we've been in Bloomberg eight years, Philip 12 years, mainly working on NLP and AI applications in Bloomberg. So the item of the talk, I will try to do like a crash talk on crash course on over-the-counter trading, what that does mean, how traders or people in the finance world talk to each other, and Philip will go over why be able to detect security offers in these dialogues is important, and then we'll try to bring it back to the reason of this conference, how do we use Kubernetes in all of this model-developed lifecycle. So over-the-counter trading, in particular, if you look at the stocks or the normal, I guess, securities and products that are traded and are famous, I guess, among the people that are nodding through finance, they are normally traded in central exchanges. These are the stocks that are used in our pension funds, the things that you can trade in your apps, but normally there's like a central regulated exchange like NASDAQ that people can basically trade, and they will see like a single price for everybody to check. However, and I'll give more stats later, a vast majority of the financial products are actually traded over-the-counter or traded in centralized business networks. So the people that trade FX, they will have their own social network when it becomes too high volumes to trade for an exchange, same thing with commodities, mortgages, they all have their own kind of social network where they talk to each other. Lumberg in particular tries to bring transparency to that particular market, and in particular when they actually talk from one type of asset class to another one, Lumberg tries to connect all of those, but still most of those prices are if you were in that group chat or in that social network. And that's basically what over-the-counter trading means. It's basically when you have, when people trade any of these financial products and there's no supervision of an exchange. And it used to be in the 80s that you will see traders having, I don't know, ten phone lines and try to talk with everybody and repeat it exactly over and over again. You've moved to emails, but it quickly became a spam problem, and nowadays in the same way that we talk to our family and friends, basically they chat with each other. So some fun facts, bonds, although if you check this year it's not going to be the case, but bonds has historically been higher than equity if you add up the market volume, and most of those bonds are actually traded over-the-counter. FX, if you add up all of the transactions that the traders do against each other, they are normally like 20 times bigger than what equity is traded, the stocks, and the other markets, and they're actually also in their high volumes are traded over-the-counter, and if you just count the number of things that can be traded in the market, equity only accounts for 1.5% of those things that you could trade, there's only like, I don't know, like 30,000 at most equities that you could trade and ETFs, but all the other products like municipal bonds and structure products where they combine, I don't know, like a bond where I want to hedge against some commodity, all of those are actually traded by people talking to each other. So how do they talk to each other? So here's an example of Alice and Bob, which they want to buy some bond, and Carl, who is going to serve those orders. What's really happening here is that there are two threats happening at the same time. So first, Alice asked for IBM 25s, Carl gives an offer, then Alice makes some negotiation, and then Carl can go a little bit lower, and then they made a trade, and that last post is actually legally binding. They need to log that into some database so that the government can track that price. At the same time, the other threat is where Bob kind of tags along with Alice and wants to also buy some of those bonds, but seems that Bob took too long and then Alice cannot meet that price. So you have on this conversation at least two threats, and that's a structure that you can track from those dialogues. In addition to that, in the first post, we could detect that there's an inquiry intent, and there's a particular bond, which you need to link, that we have a link in our knowledge bases of, and when they say IBM 25s, they actually normally mean, at that point in time, IBM, which coupon is seven, and that is expires in October 30th of 2029, 2025. And the size, when they say 10M here in this context is actually 10 million. They want 10 million of those notes. If we continue, the second post has an inquiry intent as well, but a different size. The third one gives a quote with a price, then they negotiate a more negotiation with a price. A trade is confirmed. Bob tries to negotiate again, but then there's no trade, right? So what happens really behind the scenes? So it could have been that, for example, in the third example, where Alice checks the news, checks the portfolio, and wants to get some of those bonds. When Bob or Carl, whoever's serving the order, gets that inquiry, they check what's their relationship with that client. They check all of their RFQ management. They're getting quotes or inquiries from many people, and they also see what will be the price that they can give, and they give a price. Same thing with Alice. They jump back into their Bloomberg or to their management systems, and they check what will be all of the quotes that they're getting, because maybe they're not only talking with that salesperson, but they might be talking with other people, and they will just decide, okay, this seems to be a good price. They make an order. Once an order is made, same thing. They go back and forward from one system to the unstructured information that they have in the chat, and then a trade is done. Or it could be, as the previous example, that they take too long and it passes. So what we really want and what does really value in doing dialogue understanding in all of these conversations is that most of these traders actually have 200 active group chats at the same time. So if we can detect that IBM was mentioned and that it's a bond and that there's an inquiry intent, we could actually make an alert. And maybe the conversation from then on can be completely structured. You can just hand off to some bot that can handle that order. So next, we're going to basically talk about what we call MDLC or ML DLC, as Alejandro was mentioning, for security over detection. And I'll hand off for Philip. He will talk about more about what that problem is. Me? Yeah. All right. So before we actually go into the MDLC, let's talk a little bit about the natural language processing that needs to be performed here. Named entity recognition and named entity linking are two standard tasks in natural language processing. Here's an example. Apple today announced that its board of directors appointed Tim Cook as CEO effective immediately. We can recognize here that Apple is an organization and Tim Cook is a person. That's the recognition part. And further, we can link Apple, the organization to Apple Inc. in California. So that's a database entry and identifier in a database that identifies this entity Apple. And similarly, Tim Cook can be linked to Timothy D Cook in Apple Inc. The same kind of named entity recognition and linking, you can also perform on the kinds of chats that Camilla will just talk to you about a second ago. Here's an example post where our systems would consider whether, for example, the word F is a ticker or something unknown or something that we don't care about. And it would then link the ticker F to Ford Motor Company in Michigan. Similarly, for 5.291, it could be a price, it could be a yield. Turns out it's actually a coupon, which is sort of an interest rate. And our systems would be able to identify this as well. And in a similar fashion, all words in this post get labeled with some sort of type. And all together, this then identifies and extracts the trade that is being discussed in this IB post. Here's some more examples. AT&T stock to be sold at $34.5. LOL is probably just traders having fun, but believe it or not, there used to be a Canadian energy company with that ticker. They no longer have that ticker. Here's a post in which a trader sends an entire table worth of trades that they want to do today. And just to be clear, all of these posts are made up, but they're realistic. Camilla already hinted at how some traders may benefit from alerting capability, for example, that becomes possible once you have done NLP on chat posts. Here's another example. What you see here is a table where a client can see all the trades that we extracted from their own instant messages and emails. And so in this screen, clients are able to filter, for example, by the bid price or by the ticker and by other things. Imagine searching for a trade opportunity with a price smaller than $94 by doing a control F on your inbox. Of course, that's impossible. But once you've done NLP, you've extracted the trade, the trade attributes you can do filtering, searching, alerting, and that enables workflows that are much richer for our clients and more effective than just working with a simple instant messenger. All right. Next Camilla is going to tell you about Kubernetes in our MDLC. Thanks. So I will just glance over a similar cycle that we have talked about the life cycle of the, of how do we build a solution like this in Bloomberg? And it kind of blends with the similar workflows that we have seen in the previous talks. But it all starts with gathering data, right? And to just illustrate the challenges that we have in this case, only as more fractions of those messages or those posts are actually mentioned, any of the securities. So a lot of the chatter that they talk to each other is building relationships, right? So how are your kids doing? How should we have coffee? There's a lot of relationships that is happening. And every now and then, especially when the market opens, they do talk about these securities. So we need, we're forced to have bias samples, but we also are forced to track how, what was that bias? The annotation task, as it was mentioned before by Philip, it's not easy. Like training annotators to be able to understand those dialogues. It takes us a long time. It's a big effort to just get a few thousand quality annotations for, for a given asset class in a year. And then the sensitivity of this data. So most of these posts and the things that are chatted in this, in this, in these groups is information that is actually hinting of how the market is going to move five minutes from there or one hour from there. So this, we need to make sure that the access control under only the people that have access to, to this data are allowed to, to look at it and, and of course it's always encrypted and addressed. Building the model. So even if we have perfect data, the NLP task is one of the hardest. It's not only just linking, but it's also a slot filling. Like many offers can be mentioned at the same time. So we need to group them accordingly. And that without even talking about the threat is entanglement that comes into place. So it requires a lot of compute for us to get a, okay, a curious is that for, for the client will be okay to automate based on what we detect. Of course, since we're liable and all of the things that we detect, we need to be able to reproduce and explain everything that we detect end to end for any prediction that we do. And for us to be able to train a model and be able to trust that evaluation, we need to account for the bias that came from the samples. And for most of the knowledge bases that we linked all of these securities, actually these security sometimes only spam for one week. They're for repurchase agreements are contracts that only last a week and they might be created within the day. So we need to have a knowledge base that not only can handle that in real time, but when we train the models, those annotations might be from a month ago. We need to simulate the market from, from a month ago. And the deployment, the usual. So a bunch of queries per second, a millisecond latency, depending on what's the, the application and how much automation does the client required. We need to sync the dependencies, of course. And for any new deployment, we need to make sure that we do better than before, at least for the examples that we were doing okay. So we need the ability to sandbox any new change and compare that against the previous traffic with production traffic. So I'm not going to go into the monitoring and update the data drift detection and be able to capture the feedback and be able to create new data based on that feedback. It's another set of problems. But the theme of most of this life cycle is that in each, each of these steps, we need to have all of the dependencies for the software and the framework, the hardware, not even how we train our models change, depending on the hardware that we use. And of course what makes it even more difficult in ML is the data. And that's where we basically rely in our machine learning platform teams to use in CRDs to kind of normalize how we do this across. And this model of a learning life cycle not only happens for security detection, but when we want to do intent classification, threat disentanglement, theme detection, summarization, you name it, right? So nowadays we have hundreds of models where it's the same type of problem that as an organization we want to remove those redundancies and not everybody try to solve the same problem all over again. And that's where we kind of rely a lot on custom resources definitions or CRDs, where for gathering data we define a way to stream across the company, a way to do batch compute, and of course workflows, especially for example how we annotate our data, where we need to break down that task into smaller simpler tasks, like I don't know, I see a post, is there anything interesting here and then go to the next level, is there a bond mentioned and then if there is a bond then go to another annotation task where you actually annotate all of the slots. Same thing with building the model, more batch problems. We have a definition of how we hyper tune our models, so run a bunch of batch jobs where you can find some objective function that you want to optimize and distribute which I guess is another batch problem, batch job where we basically don't feed the model into one single node and we need to be able to figure out how do we put that in memory. Deployment, out of all of the CRDs that I mentioned, K-Serve is the only one that is open sourced where we deploy our models, we need to do some regression tests where we compare a lot of these K-Serve instances against the previous versions and monitor and update where we want to do some data drift detection and stream to the data that we think are important to gathering feedback. An overall, and this is solutions that I've mentioned with Kubeflow and Argo, eventually we have completely declared all of our model development lifecycle and if these are models that are mature enough where we don't need to experiment too much in each of these steps, get into a position where we can actually automate all of these workflow by another CRD that basically runs all of them. And next, Philip will talk a little bit more deeper into the latest that we've done in terms of building the model for security of our detection. Yeah, let's focus a little bit more on that model building component for a moment. Historically, the mode of operation in machine learning has been that you would gather a large amount of annotations, thousands, maybe tens of thousands, which in our case would have been annotated posts where a human painstakingly annotates that 10 million, for example, is a ticker, sorry, it's a size, or IBM is a ticker. 25 identifies a contract and date. So very, very laborious. But afterwards, you can train, say a recurrent neural network on a GPU farm and you perform some hyperparameter tuning and eventually you deploy that model. Camille mentioned that we have millisecond latency requirements because there's a human sitting in front of a screen and is effectively interacting with that model. More recently, there have been innovations that enable you to do with way, way fewer annotations than previously. Fewer annotations means less human labor so that's desirable and there are other advantages too. Let me tell you how that works. You begin with a large amount of text that could be Wikipedia and other text from the web, for example, and you do unsupervised pre-training of the model, which is a task where the model just learns to consume text and produce more text of a similar kind. Next, you take some amount of annotated posts of the same kind that we would have used historically, but way, way fewer, and you perform fine tuning. After fine tuning, you have a large language model that is able to perform the task that we're looking for it to perform, which is identify tickers, identify prices, and other things, but there's a problem. The problem is that large language models are really, really slow. There are big billions or tens of billions of parameters and even if you deploy them on a GPU or multiple GPUs, you're gonna look at inference times of around a second. That's too much if you have a human looking at the screen, a client especially. One solution is that you can do what's called model distillation. And here, the large model acts as a teacher to a small model and generates training data for a small model. And now the small model is the kind of recurrent neural network that we've been deploying historically, a model with a millisecond latency that is good to deploy. So overall, this lets you work with way, way fewer annotations. Now, how does that connect to what Camillo just said? For one, the pipeline that we have at the bottom of the screen is way, way longer than the pipeline at the top of the screen. And that means that robustness, reproducibility, and the kinds of things that Camillo just outlined become even more importantly. It's important. Second, you're working with large models and again, Kubernetes and the ability to scale work in multi-GPU, multi-machine environments effectively becomes important. So, Bloomberg aims to provide highly accurate data, information, analytics, and insights for the global capital markets ecosystem. Human interactions still dominate a significant proportion of that market, the over the counter market. And there's an opportunity for AI to make relevant connections between humans who operate in these markets. Kubernetes has been instrumental in enabling us to build robust industrial ML solutions and a reliable model development lifecycle. If you're interested in finding out more about machine learning infrastructure, do attend the panel that we included here at the bottom of the screen, where some colleagues of ours and other people who I know are in this room, some of them anyway, will be discussing Thursday at 525. Our team is growing. If you're interested in joining us, do scan this QR code and you'll be taken to a place that lists our open positions in the AI and ML ops areas. We also included email addresses of our recruiters here for both London and New York City positions. And we have a blog, tech at Bloomberg.com slash AI. And Camilo and I will be around for the rest of the day. Don't hesitate to talk with us during the breaks. We'll be excited to talk with you. Thank you. Brilliant. Yeah, thank you so much. I'm glad you did the talk and I didn't, right? It was a, my summary was not the best, but what an interesting problem, right? Doing NLP on multiple chat threads seems like a really hard thing to do anyway, but where the stakes are that people are trading tens or hundreds of millions of dollars, you really do have to get it right. So yeah, really cool use case. We do have a little bit of time if people have questions in the room. So just stick your hand up and I'll come around with a mic if you have one. Don't be shy. Okay, yeah, one over here. Yesterday there was a panel, or sorry, an entire day focused on Kubernetes and batch and I'm just, you didn't really go much into details into your, the batch. I was wondering if you could maybe talk a little bit about like how you guys, if you're running like batch processing on Kubernetes and what you have found with it? I can answer that. So we basically have our own CRD for batch processing. So in a way, each of the teams in Bloomberg, they have their own namespace where they have a quota so that they don't, so that they don't run out of resources. And all of the dependencies are put in a YAML as usual and then the batch process is run for Python TensorFlow Spark sometimes. So most of these process are kind of managed in namespaces and teams share the results of the experiments because they're all running in the same share workspace. So that's pretty much it. But we're mostly from the user side. People that build those CRDs are not us. Great. Any more questions? Okay. Right. Well, let's have another round of applause for the new buggy hybrid.