 Hi everyone, thanks for joining this session. So today we'll talk together about high performance multi-region messaging with NATS. So before to do that, let's talk about who we are. So I'm Cyril Baker, I'm a editor of infrastructure at XBTO, where I lead and manage the infrastructure teams since 2022. I started as a Linux, HP, CEC, Sennmin. I have more than 15 years experience in open source infrastructure and services. I just discovered Kubernetes and AWS in 2017, and I designed previously many highly scalable platforms in the public cloud for the attack and the mobile game industry. And I'm Vincent Bernaud, I'm a DevSecOps at XBTO for more than three years now. I'm mainly focusing on dev and trading teams, and I have five years experience in sysadmin and cloud native applications. So XBTO, who we are and what are we doing? So we are crypto trading firm. We were built in 2015 as a proprietary crypto trading investment firm. But today we are a full service company offering different services around crypto and investments. In 2023, we acquire a custodian trading platform that is called Stable House. And then now Stable House secures some, oh, sorry for that, secures some big deal with the fifth largest firm that is called Apex. I think there's a world thing with the TV, sorry for that, should be good, sorry. So maybe some of you already know us because we signed Lionel Messi at the Miami football club Astures. And we are also the main corporate and jersey sponsor of Miami football club, co-owned by David Beckham and George Mass. We are licensed by the Bermuda Monetary Authority and have presented five locations. So Bermuda, Miami, New York, London, Paris, and Abu Dhabi next. So what about the agenda? First, we'll present our needs and why we choose Nats, then we'll present quickly Nats for those of you who don't know the technology. Then we'll deep dive into the platform and the architecture and the change we faced. And at the end, we'll present a quick tool that we developed ourselves and we'll answer all of your questions at the end. So let's talk about our needs and our use case. So what we want to do at XBTO is to share collected data between processes located in different regions around the world. So what we want is to have a full mesh technology that can connect several local message booths together to our messages. We want also something that have clustering technology to have redundancy and availability, something stateless that do not require writing to disk and that store all the data in memory, something secure with modern authentication, authorization and multi-tenancy and also something really scalable that allow to ingest a big quantity of messages that will increase over time without changing our current architecture. So let's talk about Nats. So how many of you are already familiar with Nats? Ok, nice. And now how many of you are using Nats in production in your current company? Great. So a quick intro on Nats for those that are not familiar with this tech. So what's Nats? Nats is a simple, secure and high performance open source messaging system built for the cloud and microservice architecture. It is available since 2011. It's used by top major company around the world. It's written in Go, so it's cross-platform. There are multiple client libraries available for most languages. And its core design principle is performance and scalability. And it's developed mainly by Cynidea that provides multiple different types of services around Nats and also some support. So why it's cool and why it's great? There are multiple advanced features included in Nats, but we will focus on four great advanced features. The first two features are linked together. The first great feature is the supercluster. So what's a supercluster? A supercluster is a set of clusters, sorry, that have gateway connection established between them. This means that, for example, if a client connected to cluster A wants to subscribe to messages that is hosted in a cluster B, this will be possible seamlessly. Messages will transparently flow across all clusters without the client needing to have any knowledge of the physical location. So let's say, for example, you have three different local message bus split in three different regions. You will be able to connect a supercluster with all this cluster connected together. For doing that, the supercluster is using the gateway. So what's a gateway? A gateway enables connecting one or more clusters together into a full mesh network. So this allows the formation of superclusters. Gateway exists to reduce the number of connections required between servers and optimize the interest graph propagation. Another great feature that is included in Nats is the lift node. So the lift node extends any existing system of any size and optionally bridging both operator and security domain. The lift node will transparently route messages from local clients to one or more remote Nats system in this version. And the last one is the monitoring. So what's great with Nats is that you have a built-in HTTP server that can provide some JSON endpoints. So this is really cool if you want to scrap some metrics with Prometheus or something else. Now, let's talk about our architecture and platform at XBTO. A few metrics from about Nats at XBTO. So currently, we have more than 100 nodes, 34 clusters, five superclusters, four lift nodes, more than 5,000 clients. We reach around 80 billion messages per month. So this means that we have approximately around 60 terabytes of traffic each month. And about the stack, so we use mostly Nats 2.9, 2.10. Kubernetes, I'm sure to deploy on Kubernetes. But we use also some standard Linux binaries. We'll talk about that later. And most of our clients are Python, are written in Python.net and JavaScript. We use also Protobuf. It's really cool. We have built some custom Prometheus exporter. So we create new features and extend the current Nats Prometheus exporter available on GitHub. And our monitoring stack is running on Mimir, Grafana, and the on-call services hosted at Grafana Labs. And we will also show you, at the end of this talk, homemade tools that we created, that is called SnapMap, to allow us to have a custom monitoring interface. So here, you can see our current architecture at XBTO. So we have three main regions, one in the US, one in Japan, and one in the UK. We are using a hybrid infrastructure. So we have some private cloud, and we use also AWS. So between the US and the Japan, we use a direct connect link. So it's a dedicated fiber link between AWS and our own data centers. This is really great for having stability. And between the US and the UK and Japan and the UK, we just use a gold old VPN on the internet that is doing the tricks. On each region, so all our processes are running into multiple Kubernetes clusters. So we use EKS on AWS, but also on prem RKE, on short distribution. I think soon we will migrate to Talos. This is in the roadmap. But for the moment, we still use RKE. And on each region, we have this NAPS bus that is local in the US, in Japan, and in the UK. This is a great setup. And the great thing is that with the direct connect with AWS, this allows us to route any traffic from the United States to Japan with what they call the direct connect gateway. So we can reach all the region that we want with the same link in AWS. So now let's deep dive into the supercluster and how we built it. So first, the goal that we had is to interconnect the different regions. We started with only one region, and then we had to expand to new regions. And we keep expanding and adding more regions during the coming month. So the big thing is, it allows us to have a full mesh network. So we have a full mesh network in your regional cluster. So as you can see on the schema here, on region A we have a full mesh cluster with NAPS, sorry. And we have a second cluster in region B which is also full meshed, but every node in cluster A is connected to one node in cluster B. It's not completely full mesh, I would say. And it works with inbound and outbound gateways. So it can have inbound, so it receives connections and outbound. This is a really important feature for another thing which is scalability and decentralized architecture because it allows to quickly add new clusters as you expand your regions. And also minimize round-trip delay because we are trading companies, so latency is key for us, and we need to have the best routes and optimize the routes as we go. So two important features that we use is the interest-based routing. So for example, in your schema, you have the market data process A which is publishing on a subject and it's consumed by the trading process A and if it's local, you keep it local. So it will never be propagated to region B, it reduce costs, and also the number of messages. So that's great, but if you need to exchange messages from process C to the trading process C, which is on another region, it will be completely transparent. You won't have to even know on which cluster your process is deployed, it will be fully transparent. And at the end, the go-sipping. So the go-sipping is a really interesting feature in NATS where you can dynamically update your cluster without any interaction. Meaning, if we want to add a region C in this setup, we would just have to put in the config of the new cluster at least one node of cluster A of cluster B and you will have a full setup which will connect to every cluster and nothing to change in cluster B or cluster A. So that's really great for us. Now, let's talk about one of our superclusters. So we have five of them, like this, and we have also other smaller cluster, but this is the main one. So we use EC2s to run on AWS. We'll dive into this later. We have a bunch of processes on Kubernetes, so EKS on AWS, and on-prem we have also processes on Kubernetes. We have some time series database and are also connected to NATS using all the clients. We have standalone processes. So we have really a lot of processes with different stack, different clients that are connected to the same infrastructure and communicate easily between the regions to build really a decentralized architecture. So if we lose, I don't know, region B, it will still be functional. So let's talk about the monitoring now. So how do we monitor everything? So one thing that is really important for us is if we have an issue on one of these core regions to be able to see what happens and continue to monitor the other one. So we decided to set up our monitoring platform outside the current production infrastructure. So for doing that, we use a Mimir, self-hosted. We deploy that into two different providers, so in two different countries. So in America, in Canada, in North America, and in Europe, in Germany. We just choose some single instances, some really high CPUs and memory with fast local SSD storage. We use also KIFU-S for doing that because we wanted to stay in the Kubernetes ecosystem because it's easier for us to deploy Mimir to maintain the lifecycle, this type of things. Et on each node, we have a local Mimir object storage. So we have great IO to write on the object storage locally, the data from Mimir. On this platform, so it's not just for NAT, we reach 5 million active metrics. So this means that we ingest around 90,000 metrics on the ingest, on the ingest controller that reach the Mimir API. And also we decided to use Mimir because maybe some of you know that this is maintained by Graphanalab. It was called Cortext a few years ago. They rebuilt that. So if Graphanalab is using that for their customer, I think it can scale really well for our use case. Now, how do we collect the data from each node's bus? So as I said previously, if you remember, we have this JSON endpoint with the NAT metrics that is available inside NATs. So for doing that, we use our custom Prometheus exporter with advanced feature that scrap the data on the JSON page to get the metrics. And then we just use Graphanalagent to scrap the data from the Prometheus exporter and push it back to the Mimir API with the Prometheus remote write features. Why we use Graphanalagent and not just a big Prometheus server that scrap the data from the Prometheus exporter is because Graphanalagent is a small Prometheus. So it's really light. It's, for our use case, it's better. So we have this agent on each node. It doesn't require to have the big Prometheus server where you need redundancy to have at least two replicas if you have a failure, this type of things. So this is really great. And for visualization and alerting, so we don't want to care about maintaining our own Graphana, okay? The guys at Graphanalagent are doing a great job. They create advanced feature that are only available on the Graphanalag platform. So we use Graphana, we see the SAS version at Graphanalagent to visualize and create some alerts. And also we use a great server that is on call. So for MNGertianert and critical alerts that can send some SMS or you can have also a bot that call you if there is an emergency on the system. And the last great feature that the guys from Graphanalagent are providing is the synthetic monitoring. So basically it's like the Prometheus black box, okay? That is officially available on GitHub. The only difference is this, with this version from Graphanalagent, you can create all the configuration on the portal, okay, on the web UI. So let's say that I want to check something. So synthetic monitoring is doing, is mostly using, used sorry for doing some prob checking. So I just have to go to the Graphana portal, say, okay, I want to pick this prob so you can set up your own prob or you can use also some prob provided by Graphana. So this is really great all around the world. You just have to say, okay, I pick this prob, I want to check the latency with this service. If it's an ICMP protocol or TCP protocol or also an advanced HTTP protocol where you need, for example, if it's an IPI token, you can set up that on the platform and you just have to click on set up and it's done, okay? And you have automatically the graph that show you what happened, the latencies or the HTTP status code if you have something in HTTP. So now a slide that some of you may not like, why we migrated out of Kubernetes. So first we still have Nats running on Kubernetes for some use case, but we decided to migrate out for the big super clusters and the trading intensive cluster. Why? First, dedicated resource because for networking Nats and latency, you need dedicated resources. We want to really have a high throughput and also don't have any CPU issues when you have shared instances and some processes may go wrong. So dedicated instances are better for that, but it's more expensive, obviously, and also maintenance is harder because you have to maintain everything by yourself. And it's not the same stack, it's not standardized. Another thing is the dependency on Kubernetes updates. So we are using EKS which has a lifecycle and a resource cycle. And for us, we have some issues with event deconnexions. So we need to have Nats bus running for a month, years. We have without any disconnections also. So when EKS forced us to update, it creates down times and we obviously don't want to do that for some kinds of clusters. So that's why also we migrated out, but it's also a harder lifecycle management because you have to do everything by yourself. The end and for tuning, it's also simple and working because you don't have the Kubernetes overhead, which is small admit, but it's still there. It's easier to identify Nats traffic because you have dedicated instances so you can directly know what traffic is going out and in. And one thing that was really difficult for us out here at some point, what the load balancer. We use ingresses and load balancer in Kubernetes to expose Nats, but we have, there is really intelligent features in Nats to optimize routing also. When you have disconnection, try to find another node, which is still up. And the load balancer was adding more complexity to that. So we decided to get rid of the load balancer and it works really well by itself. And last point, which is also more difficult, is the configuration management because we had to create custom tooling and see ball terraform and all that stuff to ensure that you have the same configuration all time in all of your nodes and secret rotations are also a bit harder. Next, the challenge we faced. So it's more about tuning and optimizations. We've been running Nats for three or four years now. We started small, we built on Nats, there was updates, there are many features that are coming and updating. First one I just spoke about, but it's the reliable networking. We kind of a downtime, even the connection is considered downtime for us. So we wanted to have something that can run for years without having to touch it. So that's why we migrated to the new infrastructure, which is totally standalone, on VMs on premise and on EC2s on AWS. And we only rely on Nats for routing optimizations and everything related to network. Basically, it's running on Docker and we have host networking. So we really try to have the simpler networking possible. Second point was scalability. We started with only one cluster, then we expanded to a new region, we are expanding to a third region and now we will expand to a fourth region soon. So we really want to have something scalable where you can dynamically grow without downtime. For that, some of you may know, you cannot out reload the routes and the gateway configurations. So you have to reload the Nats binary when you want to update those configs. So it's kind of disruptive for us. What we did, we extensively used the go-sipping system where you can dynamically grow your cluster, as I said before, where you just put an endpoint of the running Nats and it will connect all together and we use the mandatory downtime that we create to update Nats and to basically do hardware maintenance to also put the new cluster inside the configurations. So we do two thing at once. Third thing was unified authentication and authorization. We started with nothing, then we added basic authorization, sorry. And at some point when we used the super cluster, we also had to a unified authentication between all the Nats. And that was something really difficult to handle because when you handle authentication at the cluster level, regional cluster level, it's fine. You can enable it quite easily and it will work even if remote cluster doesn't have authentication. Only if you use the default account. But that was the case. We use the default accounts, so the dology. But when you run in super clusters and you want to enable new accounts, for example, in our case, we wanted to use the system accounts and kind of separate all those things logically because we have multiple clusters physically separated on EC2s but we also want to use the powerful logical separation in Nats and to enable that, we had to create a downtime because you cannot enable authorization at the super cluster level without bringing everything down and bringing everything back up. So there might be ways to do that but it was really difficult for us and we decided it was better for us to just bring everything down and bring it back up. So another feature also for authentication is the no-chooser. I don't know if some of you have used it but it's really great because we have a mix of authenticated clients due to legacy software and legacy stuff that we have. And the no-chooser allows you to give a default set of permissions that will be used in case of a client that will just not provide authentication. So it's really enabled a smooth transition from non authenticated to authenticated clients. And at the end, so the global overview, we have more than 100 nodes, we are keep expending this. So I think the metrics is great but we also wanted to have a better way of visualizing the whole infrastructure because it's really critical for us is the core of our business. If NATS is even down for a few seconds, we can lose a lot of money. So we need to know at all time what happens and we need to know at all time that NATS, if it's failing, we need to know before the business, basically. And quickly identify bottlenecks on the infrastructure because as we rely only on NATS to decide by itself to load balance traffic and everything, we need to know if at some point the node is overcrowded or has too many traffic on it. And for that, we build custom tools. So we forked the Prometheus exporter, so the official one, to add more metrics and compute everything. And we created a new web dashboard, simple dashboard to have a better observability on NATS. And I will present it quickly. So it's not a dynamic because I could not replicate the whole infrastructure without leaking some information. So it's only screenshots, I'm sorry. But basically, it's more web UI with a Python backend, some TypeScript and React for the front. And obviously, it's running on Kubernetes because we can have that time. This is not an issue. We use extensively NATS monitoring endpoints to the rest endpoints that are built in. But we want to switch to the system account and use basically NATS to monitor NATS. It's used for the infra team and the developers. At infra, basically, we use it to monitor the infra. And the developers are using it to know at all point in their processes. Where are they connected to? What are the subscriptions? And the number of messages also that are going through. You have all of this in metrics, but you can have a, I would say, visual way on the list, basically. It's easier for them. So yeah, that's it for the NATS map. So we built a graph. So it's one of our clusters. We can see all three dots, our regional cluster, that are connected to the others. Whoops. Okay, great. So yeah, this is not dynamic on the screen, but if you click on the node, basically, you'll have a list and an overview of the other nodes that are connected. It's really, really great for us when we need to do some maintenance because we can know beforehand which nodes are connected to this one. And when we reboot, we can know dynamically on which node it really reconnect because it's not something that we can predict. So that's really great on this part to really identify this. Oh, just one thing. What you don't see here is that you can zoom out and zoom in also to see, for example, share what happened between these three local bus. So it's really, really cool. Yeah, this is really a big overview, but you can have more details on this. And basically, this one is just a list, but you have really many, many details on the details space that I cannot present today. But it's really useful for us to have an overview of all of the nodes, all of our buses and everything running there. So it's really, really simple, but it does a job right now. So now let's see real conclusions. Yes, to conclude, we are really happy with this current architecture. It's really scalable. We think that nets fit well with our use case. So we will continue to use that. The guys from China, they are currently including many great features in the next release. So feel free to check the last release notes. I think the new version is coming in a few weeks. And yes, we will try to expand that and keep continuing working on this infrastructure and add more also a density feature into our nets map. It's not open sourced for the moment. Maybe one day we will push it on GitHub. I don't know. I'm not sure I want everyone to look at the code. But yeah, we will see. That's it. Any question, maybe? Hi, I'm just curious why you chose to go stateless and in memory and not on disk. Wouldn't that be safer? Yeah. First performance. And because, you know, the persistences in our nets is available now with JetStream. Before that, it was not really available. So when we decided to build this architecture, what we wanted to have is just full performance. And because we have three replicas on each region, we don't really need persistency because if a failure is happening, we still have one bus, for example, we still have the two other ones that are running. So it's why we decided to go stateless for performance and because JetStream was not available before. And basically for the training processes, it needs to go fast. Like we have reconciliation built in the trading platform, but it needs to go fast. Even if the message is not delivered, it will be delivered again, but it needs to go fast and we don't want to have writing to disk for that. No, I think we will stay stateless. We will continue to stay stateless. It's much better for us because it's a really specific trading use case, you know. On other services, on other at XBTO, we use persistences. We start using JetStream. We still have one old Kafka cluster for another project, but maybe we will replace Kafka with JetStream in the future for this project. Yeah, it's mainly for the trading part where we don't have JetStream, but we use Nats and JetStream for other specific use cases. Do you have some latency matrix within your super cluster? Between each region that created the super cluster. So yeah, so currently between Europe and Japan, we have around 200 milliseconds, if I remember correctly. Yeah. Yeah, currently we're working on a new project to increase that, to lower that. We will use a new global provider network, a private one, so we will achieve around 100 milliseconds between the Europe and Japan. And between the US and UK, I don't remember, it's around 80. Yeah, something like that. Yeah, something like that, 80. But you know, between the US and Japan, what's great is that we have the direct connect link. So we use the internal backbone of AWS to rotate everything. Why we did that, it's because when you're in the AWS backbone, it's really stable. So you can reach all AWS region. We've not the best latency available on the market, but it's really stable and we don't have any issue. Yeah, and that's from the infrastructure part, but where the processes are, the most latency sensitive, usually they use the same region. We put everything in the same region and so regional clusters are really latency sensitive, but super clusters. Basically, it's okay to have a bit less, a bit more latency, sorry. When we need really, really latency sensitive processes, it's directly on the same host and really optimizing in another way. Hello, thanks for the talk, it was really great. I got two questions. Do you do a performance test on your NATS infrastructure? And if you do, do you recommend any tools that helped you so? And I wanted to know as well, how did you handle back pressure problem on your NATS infrastructure? Thank you. Thank you for the question. Basically, we have two setups. So we presented the prod one, but we have a fully similar setup for testing environment. So it's exactly the same VMs EC2s. So it's expensive, it's less used, but we do pressure tests on this one. Usually what we do is, we have all the same stuff running, obviously for testing, but at some point we just generate more and more messages to test beforehand if the NATS infrastructure will, well, we'll endow the new load, I would say. And when we do migrations, we test everything on UAT, we try all type of scenarios to really test if our clients are really able to reconcile everything. And that's also one feature that will be really great, normally in the next release, is the lame duck. Lame duck mode is a signal that is sent from NATS to your clients to say, hey, I'm stopping in two minutes. So do what you have to do, but I'm stopping and you should know. That's something that we're really working on to add this into our processes so we can directly reconnect to another node if we have an issue. And, you know, I would say later, we also would like to measure latency between the client and NATS and if we see that it's too high, try to connect to another node directly to reduce the load. Yeah, you have NATS box, that's a great tool that allow you also to do some benchmarks provided in NATS. So you can use that also to test your performance and see how many messages you can reach, everything, so that's really, really cool. Hi, couple questions about your Kubernetes setup. You're using the RKI-1, probably, and do you manage that by Rancher server and how big is the cluster? This is a bare metal node or VM and last question, why you decide to migrate to Talos? So, as you know, Rancher was acquired by SUSE since more than one year, two years now. And, first, we are just using the Rancher manager to be able to give access to our developer with the project features from Rancher. This is really great, it's basically a Kubernetes proxy that allow your user to connect to any cluster managed by the Rancher manager easily. Also, we were not using GitOps two years ago, okay, we were deploying our cluster by hand, set up in the end chart, for example, Promet to use the ingress controller by hand, so it was a pain. And what we wanted at that time is to move to the GitOps methodology. I love Flux, for example, but Rancher was providing a tool that was called Flit. And it's why also, when we explore the market, we told us, okay, we have this Rancher multi cluster manager that is great. They integrate now a component like Flux inside the Rancher manager, so we will use Flit. It was a big nightmare, to be honest. Because Flit is not like Flux, the product was not really stable, they were updating major release, so it was breaking many things on the Kubernetes cluster managed by Rancher. So we decided to stop that. Now we want to reduce our footprint with Rancher. And to be honest, we discover TALOS now last years. I think it's a mature technology now. What's great is that, as you know, you don't, it's a light distribution, okay? You don't need anymore an SSH on it, so this means that for security purpose, it's great. You don't have to take care about, for example, baiting Ubuntu each month because you have some security fix. And also, you can deploy your cluster easily just with the TALOS command line in one second. Same for maintaining the life cycle, upgrading your cluster, adding more nodes. So to be honest, I think, for the on-prem part, moving to TALOS will be a great benefit for us. And it will be easier for us to manage our Kubernetes in the future with just the TALOS CLI. And just to add on that, maybe also the life cycle because on EKS, you're always at the, almost the latest version of Kubernetes. And on Rancher, it's a bit more difficult to have the latest. So sometimes we have differences in a version of Kubernetes for the same projects. So that's also one of the points for moving away from Rancher. But we are really happy also with the project. It was working well and still working well. All right, no more question. So thanks a lot everybody. You have a QR code on the last slide. So if you want to give us some feedback, it's really cool. Feel free to do it. Thanks.