 I'm Alexis, you can find me on the web as UltraBug, and I'm a Gen2 Linux developer and often part of the Python team, the cluster team, and I maintain various packages related to NoSQL key value stores, such as Redis, and message queuing technologies. In my professional life, I am CTO at a company named Numberly, where we do programmatic and data-driven marketing and advertising for our clients. Today, well, I'm going to talk to you about designing a scalable and distributed application. As you may have seen this year, we have quite a lot of DevOps talk at your Python, which is actually a good thing, and this is actually one of them. I will not show a lot of Python code on the slides themselves. Instead, I will demonstrate a full stack for running distributed Python at scale. Just as a disclaimer, there is no definitive way of doing this kind of thing, so I will just share my experience and give some guidelines that I found interesting to address this kind of design, and I will do something very perilous as to showcase a real application in a live demo. That's what I promised. So I will try my best to give this up. So what are we going to do? We are going to design a geodistributed page sheet counter web application, and now I'm going to just explain quickly what are the steps we will follow for this. I will start by explaining to you and agreeing with you about the application's contract, which defines the goals and functionalities we expect our application to do. Then I will continue with some philosophical guidelines that I used to design this application. Then I will present the stack I chose and explain why, and then we will talk about service discovery, and then we will implement service discovery in our application, and we will end up with the live demo and then maybe an open discussion. So let's start with the first point of our contract. Geodistributed means multi-data center. We expect our application to provide the same level of functionality around the world. In this talk, I will not cover how you do GeoDNS to direct your users to the closest data center, but instead I will focus on the application itself. The second point of our application's contract is that the web application will display the sum of the page sheets from all our data centers around the world. The counter displayed should be the same to all users wherever they come from. The third point of our contract is that scaling out or down our application should take no manual operation or reconfiguration on the application side, even when we add a whole data center to our application. That means that our application will be able to grow or shrink itself automatically, and this will obviously provide some kind of fault tolerance. The last point of our application's contract is that we will have the background color of the web service configurable, and when changing this configuration, it should be made available to all the web services immediately in all the data centers. So we configure with once, and it's deployed and taken into account immediately everywhere. Okay, that's some kind of contract. What I think is that solving these kind of complicated problems usually depends more on pragmatic technological choices rather than pure coding skills. Let me share first some guidelines I found interesting over the time to address this kind of issue. The first one is actually that your stack is what makes your code run and what makes your application accessible. So for this matter, I really favor using and choosing tools offering a maximum of features that developers and operation guys can both benefit from. That's an important point. This will allow all the involved parties to use and reuse robust functionalities instead of having to implement or code them over and over. The second point you may all know, actually, is that the Zen of Python, I think, is a good philosophy that can help you choose the right technologies and implementations for your architecture. I usually and the more that I can avoid using any black box or magical technologies. This means usually that I tend to avoid technologies that I wouldn't be able to explain to my mom or dad in less than five minutes. The other one is that there is a good story that you may already all know is that one day there were those guys who had to build a tool to manipulate text files. That could have done one big program that could do everything on its own with a lot of options and stuff like that. Instead, they created tools like cat, rep, or said, which specialized in special tasks of file manipulation. And then they created pipes. So we could combine them all. Et voilà. Well, this proved to be a great design over the years, I guess, and we can absolutely apply this kind of design to distributed application. This end up into breaking things and our application down into small components. Each component will provide a small and simple service. Application also means that this component should be really autonomous. So we really have to resist the temptation of sharing any kind of state between them. You can relate this to REST APIs, which are a good example. Because we use them and reuse them in our own applications. And they can be seen as isolated components from our application. But now, how big can a component be, really? Well, that's when you start thinking microservices, right? That's the trendy word today. And it's getting more and more popular. And it's a good thing. But actually, microservices are nothing new. It's just the extreme version of component isolation. And it is, by a sense, a distributed architecture style and actually surfers from the same trade-offs. And one more, which is that when you talk about microservices, you talk about micromanagement. You have to work hard on automation, and this can take time. So I recommend finding a good balance between your needs and the added orchestration complexity that is in place. You have to ask yourself, do I really need to split this up? Because then the more you split your components, the more you rely on their ability to communicate with each other. And this is true for every distributed architecture and every distributed design, actually. Remote communication is slower and can be unreliable in case of network failure or latency, which you cannot control. This gets even worse when you start using Internet, sorry, to connect your components together. I find that message or job queuing technologies are a good choice to address these kind of issues because they provide by themselves some kind of network fault tolerance mechanisms. But what when our components can still not communicate with each other? That's when our application can become eventually consistent. I think that this is a major point because this has a great impact on how you design your application. We have to decide where we can accept that kind of state in our code or architecture and make compromises. So now let's talk about our application stack. Okay, in my case, I chose Nginx and Uwuzgi. I chose Nginx because it's very fast and offers a lot of interesting features at the HTTP level. We will use it as the main entrance of our web services. Uwuzgi on the other hand is a fast and pluggable application server. It will run our code. It's written in C++ and was designed from the start with Python as its primary language support but it doesn't only support Python. It offers on the fly some strong and proven features which we can use natively with Python as well, such as async loops for Javan, async IO, virtual on this pooling and metrics support. And what's good about them is that they integrate which each server easily. There are configuration options in Nginx to speak with Uwuzgi. So now let's review our applications components. The first one is what we will call and agree to call the collector. The collector gets one HTTP request and for each HTTP request, we generate a hit job for our backend processor. We will query the total hit count which we will display back to the user. It's pretty straightforward on paper. Then we have the processor. The processor consumes jobs and increments a counter for each jobs it consumes. There again it's pretty straightforward. But now let's see how we can use the stack for those components. We end up with having the collector component on the left. This actually represents the whole server and the processor runs on its own server as well. So there is a clear separation of them. Nginx is at the front of our collector web service and it passes down. The request is through Uwuzgi down to our code which is in this case written in flask and using a synchronous loop of G event. The processor component is the backend responsible for calculating the total hit sum. Uwuzgi is running this simple Python code as a mule. That's the Uwuzgi term but you can see it as a sort of demon. So that's just pure Python execution. Nothing else. Now we have these two separated components running on each server and we need a tool to exchange jobs between them. That's what BeanstalkD was exactly designed to do in a blazing fast and reliable way. I chose BeanstalkD over other queuing technologies such as zero MQ or rabbit MQ because its core design is just like memcached for the one of you who already tried it and that means that it just does one thing and do it simple and fast. It is that simple to set up. Very easy to operate. There's almost nothing to configure and it offers persistence for full tolerance. Spawning it using Uwuzgi more into our BeanstalkD in case of a sudden crash so Uwuzgi responds the BeanstalkD server on the fly for us is really simple and BeanstalkD is just the actual command that you would run on your terminal. But now the last question is here where do I run this BeanstalkD service? Do I split it on its own server the microservices way or do I include it within one of our components and if so which one? Well in this case I want to make sure that I never lose any hit count. That means that I need a strong locality between the collector and BeanstalkD. I don't need I don't want to have communication problems between them so I bundle them in the same server in the same component. I accept the compromise that comes with this is that my processor can become eventually consistent in case of a network failure here in between in case the processor can't get the job from BeanstalkD then my counter will not be incremented. That means my application becomes eventually consistent. Now let's see how this scales out in one data center. Duplicating collectors have an impact on the processors. Every time I had a collector I need to reconfigure for now by hand each of my processor or one processor so that it connects to every BeanstalkD instances from all the collectors and pulls jobs from them. Duplicating our processors means that we need some kind of external database around here so they can share a single counter. The two processors here you can see they increment the data center counter for each job they get but together in parallel. Then the collector can access this same counter and display to the users. How does it work if I start spanning this over multiple data centers? Well we can actually keep one counter one local counter per data center then the collectors would need the way to access the count from all the data centers then they can sum the two counters in this case that would be 500 and display the result to the user. We need a system that will allow our components to detect each others automatically. That is what Discovery is for. You can see Service Discovery has a sort of dynamic implementation of DNS. With DNS you request a domain name you get a server IP whether it's up or down and your browser connects to it. Service Discovery on the other hand is dynamic. That means that you query a catalog for a given service and you get a list of all the available hosts providing the service. If one of those hosts becomes unavailable in case of a shutdown, standard shutdown or failure it is removed from the catalog immediately and your application stops connecting to it. That's as simple as that. There are a few Service Discovery servers available providing different kind of features such as ZooKeeper, ETCD and console. Console once again I chose console because it provides all the features I need to address the limitations we just talked about. It is written in Go and it is very easy to use and deploy. There are several Python libraries to use console. In our demo I used Consulate for this application. So now let's dive into console and see how it works. On each data center we have a console cluster which is usually made of three console servers. One of them is the local data center's console leader. Each console cluster offers a counter key value store. I said counter but it's just a local key value store. You can put anything in it. Then you use agents here to interact with the console cluster. That means that your services will register and they register themselves through the console agents. That means also that your clients will be able to look up for a service and in the catalog or query the key value store using the agent in the console cluster. The console cluster can also be queried using standard DNS or an HTTP API if you want to do it yourself. Finally we connect different console clusters from each other using the one gossip protocol through internet here. This is a simple configuration that you add on each console cluster so they know where to connect to and then they join and communicate with each other naturally. There is one great thing about Consul and Yuuzgi is that there is a plugin integration of Consul into Yuuzgi. It will allow then to automatically register your application in the cluster when it started successfully. Then Yuuzgi will handle for you the health checking so that it will send periodically health checks saying hey the application is still alive, you can keep it on the catalog and you keep the catalog fresh and we will do it for you so you don't have to code it yourself. Then again your stack will help you. And if our application happens to fail or even if Yuuzgi as a whole fails then the service will be removed from the console cluster automatically. It is very easy to use. It's just one line you had in the Yuuzgi file I showed you earlier. There's nothing pretty much around here much to say. Okay so we have all those bricks together and we are finally ready to put all the pieces around. So let's build this up. We had a collector and a processor components. Then we had a communication between them using Beanstalk D. Then we had a counter, a central counter and we could then start scaling out our collectors and scaling out our processors. Then we used the key value store from the console cluster to handle the counter for us and we used service discovery to allow our processors to detect every Beanstalk D in services in our topology and get jobs from them. And then now we can add another data center in our topology, connect the two console clusters together through internet. All our components are aware of the presence of the two data centers and we're done. There was one last note before the demo is that our collectors should be doing the sum of each data center's counter. But how? How do you implement this kind of thing? If they can connect to every available data center and query the counter there, the counter there, so that means those collectors will connect to this console cluster and get the 100 counter, then it will auto detect that there is another data center, connect to it through internet and get the counter over there, make the sum. What happens if there is a internet problem, a communication problem between our data centers? That doesn't scale right. So instead, each time a processor will increment its local counter here, so each time it does plus one, it will itself detect that there is another data center available on our topology, connect to it and add its own value of the US counter and here on the Europe counter in the key value store of the opposite data center, so that now when collectors need to get the list and the sum of all the counters from all data centers, they can just query the local counter for their console cluster. That's all. That's locality and we avoid any kind of problem or inconsistency between our, for our web application. And at the end of the live demo, I will showcase an internet problem, so you will be able to actually see it happen. Wow. Okay. I don't know you, but I think kind of relaxed now, but that sounds like mind blowing, you know? Okay, so here I take the big risk of today and we go for a live demo. Let's go. So we'll start by going, so on the right side, you have the European stuff. On the left side, you will have the US stuff. I will start by showing you the console UI, which comes directly with the, with console. So we start with this. In the European data center, US, and we will have US West. For now, it's down on purpose. We only see the console leader, which is available. There is only one node, which is himself. And there is no key value storage. There is anything in our key value store. Fine. So that, I forgot to prepare. That's the stress. Okay. And this is the console log. All right. So now I will start by, I will start a collector, right? So I'm getting my web service up. What you can see here is that USG already detected my collector and registered it on console for me. We can see now that in the services, I indeed have a collector and a bean store D service available on my cluster. There is a new node that appeared, but the key value store is still zero. I will now load my web service. Okay. So it responded correctly and the sum is for now zero. I just did one hit. I do two hits, three, four, five. What's happening? Well, I still don't have a processor service available. So for now, my collector is just putting hit jobs in my bean store D server. But they are staying there. They need to be pulled from there. And processed by my process service. So they are inserted in the key value store, which will then be displayed over here. Okay. So let's do it. I'm going to connect to the processor now and start my processor service. Okay. So my processor service started, right? And what we can see, okay, there are some people who actually connected to the service. Yeah. Because we have nine jobs around here. Okay. So, well, yeah, I should have. Okay. Well, anyway. So my processor service started in the EU west data center. And that's what you can see here. Once again, you was registered it on the console cluster. And then it allowed our processor to detect that there was one bean store D service available in this data center. And it connected to it, which is this is the machine, the collector machine. It discovered that there was nine jobs on it. And then it went and it incremented account slash us US, sorry, equals nine, which are the number of that of the the hits that we had at this time. Now I think, yeah. Okay. Now some guy who are just hammering it. Anyway, it's a good. It's, it's perfect. It's perfect. Really. So you can see here that yeah. All right. This is not a talk about load testing guys. All right. It's okay. So yeah, actually, if I reload now my application here and web service, I can see that it discovered that there is actually the EU west data center available and that the hit count was at the time 247 and the sum of 247 plus himself is just 247. I can still see now that in my services in the processor is here and working as expected, there is three notes, our three notes now. And in the key value, indeed, I have a count and you EU west, which is now 273. Okay. So this is working as expected. And right. Just for the audience, if you could try not too much hammering, because now I'm going to start a new collector service. And what's usually interesting to see is that as soon as I will start it, it will be picked up on the processor and the processor will say, Hey, now there are two beanstalk. The service is available and I will start playing with them too. No, you don't want to stop. All right. Okay. We'll try. Thank you. And no, okay. You see that immediately our processor, our processor picked the new one and said that, Okay, now I have two beanstalk service. And now the load is distributed between my collectors and voila. So it scales out really easily. I can see still that in my console cluster, now I have four notes up and that the key value keeps on growing, right? Okay. I promised another thing in my contract is that the background of my, so I will do it myself and restart this every second. Okay. I went. I will start setting the color of my, in my EU West data center to green. Okay. And this will actually put a job on the beanstalk D from the collector. And then it will be picked up by the processor, which will in turn detect all the data centers available and set the right color on the key value, which is internal picked up by the collector web service and displayed to the user. This is what just happened before your eyes. And indeed in the key value now, I can see that the color is green. Okay. So this seems to work actually, which is pretty amazing. No. And I can change it whenever I want. And it's immediately picked up by our web services, whatever they are. Okay. This was fun on one data center. Now, how does it work on the, on another data center? So here, this is the console leader logs. Okay. On the, on Europe. So I will start by adding the new console cluster in the US now. Okay. They picked up each other. So now you can see that in Europe, they see that there is a US West DC available. And you can see in that the US that they picked up EU West DC available. Fine. In this time, I will start by starting my processor in the US first. Okay. What happened here is that I implemented some kind of synchronization initialization of my US cluster console cluster. And when the processor started, it got registered as usual by USG for me in the US data center. And then it picked up that there is one European data center available. So it went there and discovered that there was a key value named color with the value purple and it synchronized it on the US story. Then it did the same thing for it, for the counter available in the US, which was as the time 1882. And finally, it said that there is no bean store disservice available in the US yet. This is normal and expected. We didn't start any collector on the US. Yes, yet. So let's connect to the US console cluster. We can see that we only have one other service than the console leader, which is the processor we just started. There are two nodes. And in the key value store, I have the configured seed purple available. And I already have the EU West counter here. And I initialize the US West to zero. Okay. So what you can see, actually, if I keep on changing here is that it's growing. Yes, because the processor on the European side picked up that there was the US data center available. So now it's not only copying its own counter to Europe, but it's also copying it to US. Okay. But still, I still don't have a web service available in the US. So that's what I'm going to do now. And here, what do you expect? Well, here we expect that our application will start really. And then, oh, you'll see by yourself. Look at this. Nothing. Okay. For now it's normal because we didn't have any hit count made from the US side. So I will just start querying my US web service, which just appeared in Europe as soon as the counter got incremented in the US. And now you can see that there are pretty much doing what our applications contract was for, which is display the sum of the hit count from every data center. And I can still play a bit with my color thing and change the color and it gets picked up everywhere around the world. Not finished. It's not finished. Thank you. All right. Just to finish up, I promise that I will cut loose the two, yeah, I'm going on schedule. I'm going to cut loose and cut the communication between the two data centers. And you will hopefully understand why this copying of the counter is efficient to address inconsistency problems. So here I just stop the console server on the US. Okay. So now in the US, indeed the sum is zero. I can't query the local key value store, which is dependent on the console cluster. But what you can see here is that my sum hit count still remains consistent. And that is because this counter was synchronized from the US to the European console key value store. That was the point I was trying to make earlier. Maybe I was not very clear about it, but now you can see it live. And then I can just start again my console cluster on the US part and everything gets picked up once again. The USB here will reconnect to the cluster by itself and start doing the job as usual. Okay. Well, thank you. This source code is available on GitHub. So I encourage you to check it out. Maybe now that we have some time ahead of us, we can discuss this. It's not about no question. Like I said, it's an open discussion because this is the way happened to implement it. But I'm sure that you may have implemented it maybe other ways. You can... Oh, yeah. There is one good thing about the source code. It is not just about source code. I also provided all the Ansible playbooks to actually orchestrate and automate the installation of all the stack you just seen. So you can play with it. I did it on Amazon web services. So I guess it's the common standard for the most people. But you can definitely adapt it to containers or whatever you want. It's not a problem at all. Well, thank you. First of all, thank you very much for this awesome talk. I have a question. Was there any reason not to use Redis for the key value store? Because it provides a lot of rich functionality working with values. Yeah. That would have meant another component in my topology, which I really didn't need for this kind of application. So it really depends on what you are designing, actually. But in this case, all I needed is a key value store. And I could just very easily implement the replication of it thanks to service discovery offered also by console. So with only one technology, I could address everything at once. So I didn't really need some kind of added complexity or features from Redis to achieve anything else from here. But, yeah, you could definitely use it for your own needs and use it. Any questions? Thanks again. Wonderful talk. And the UI is nice. I mean, tile manager and stuff. Cool. So let me imagine a case when I have, let's say, a similar topology, but a bit different task. For example, I implement a computer game. So I have two tanks. I think, I don't know, fire a bullet, right? That client can fire a bullet. Another client can fire a bullet. Like a game. And I face the transport problem. So transport can lag. And I have to calculate somehow both on a client and a server. Perhaps what you suggest, either to calculate latency and somehow adopt on a client or put more logic to the server if yes and how. Thanks. That's a tricky question, which is about latency management. Well, if there, if you need 400 milliseconds to go from one part of the world to the other, whether you implement this on a server, on a client or a server is almost the same. The advantage you have on the client side is that usually you use do DNS to make sure that your user is close to the data center serving your game. So in this case, I will try as much as I can to mitigate this and put some logic also on the client side. But if the user from the U.S. fired first, some from Europe fired, whatever you do on the client side in the U.S. side, it will take 400 milliseconds for this information to reach the other server. So maybe you are looking for peer-to-peer type of connections instead of having a star technology, a star topology, you know. Maybe I will try in this kind of field for your, for your example. Thank you for that all. I want to add some points about gaming. Gaming is a better example for this talk. Because in gaming, you should implement all features on the server because client can be broken and you should work with this idea. So in this case, the better way is implement some logic on the server. Thanks. Yeah. There is no definitive way. Web sockets are also a good thing to use event-based interactions. So you can benefit also from the client side to handle these kind of things. If you were in 2013, there was a very good demonstration of a game which was not distributed but with the client side and and Python side using Uwuzgi and by the creator Uwuzgi, Roberto. So maybe you should check it out just to see just how he played with the server part and the front part. But yeah, it's even based on the front. And you have just to finish on games, actually, when you see MMO games such as Eve Online that I played a lot at the time, there is no miracle solution. All the clients connect to one and only one data center. That's the point. So yeah, there is no big miracle anyway. So you can mitigate but not fulfill fully. Hi. First of all, thank you for the talk. I have a question. What happens if this case is a simple one? It's a counter. But what happens if we have something bigger to process and none of the processors dies? Let's say it's on a different machine. Yeah. So basically, we would have an inconsistent state. And how would you address that? Okay. And so the failure of a processor while it is processing a job, which can be which can take time. Sorry. Yeah. Well, actually, it's already implemented in Beanstall D. You have this reserve and delete mechanism. So that's acknowledgement protocol. That means that when you take a job, you reserve it and you have per default two minutes to process it before Beanstall D puts it back in the queue. So if your process dies in between those two minutes, and that's configurable, you can choose. Well, then it will reenter the queue and be pulled by another live processor and be processed in the end. So that's persistence. And the delay between them is the time to leave. In Beanstall D, it's called time to run value you put. That's all. Yeah. And what about if we have something critical and we need to have a consistent state across data centers, maybe? How can we do that? Let's say we have, I don't know, something very important to count. And when someone checks that counter, it has to be accurate. At 100 percent? Then nothing. It's 100. No, no. It's a simple answer. Then you cannot split the collector part and the processor part. You have to make them stick together in the same component, in the same server and in the same server component, duplicate this component over multiple data centers, and then have them be aware of all the data centers available, and you just have to do everything at once. That's all. You don't need Beanstall D, actually, almost. Yeah. You don't need it at all. Yeah. Okay. Thank you. Beanstall D is indeed here for the asynchronousity in our case. Yeah. Okay. But it's still doable. Service discovery will allow you to do it easily. I'm just wondering what your experience is with using this kind of thing for some more unreliable things. So the main thing that I'm thinking here is for sending out emails, you don't have a Q of emails to send out or something like that. And so when, you know, when your process is handling something that... You're going too fast for me, Sarah. Sorry. Sorry. If your processor is handling a job that is unreliable, like sending an email or an SMS, how do you handle that? How do you handle that resiliently and ineffectively? Well, I think it's the same than before. If you have to see a job like a representation of... Maybe some of you use Celery. So it's a task, actually. You can see it as a task. It's just like the same thing. So the difference here is that implemented it by myself. I didn't need an extra library, which comes with all the dependencies. In this case would be a rabbit thank you. So if my job in Beanstall D represents an email and that my processor once again fails before it happened to send it efficiently and effectively. Sorry. And then it will reenter the same queue and it will be picked up by another processor, which we'll try in turn. Does that answer your question? Sure. If I can add on to it then maybe a little bit, which is, you know, so in the case of an email, you send it out. Your process is like, yes, I send my email. That's awesome. And then, you know, seven hours later it comes back as bounced. And so you're in that thing of trying to link your jobs back together. Yeah, yeah. You will have to implement some kind of bounce parser. So depending on your MTA it's maybe something fairly easy to do or not. And then detect this kind of event. And I guess you would need a sort of database between them to store all the data you or the HTML or the source of your email that you were sending. And just generate another job from it to redo it. But usually, well, we do a lot of emails in our company. So this is, you shouldn't do this kind of thing actually in real life. Detect the bounce and try to resend the same email to the same address would be considered as abused by most email service providers. But yeah, you could end up with something like this. Hi. Let's assume one of your data center is DDOS by someone. You don't have the ability to spread the computation across many data center, right? Yes, you can. If you are DDOS in U.S., for example, only U.S. processor will compute the addition for U.S. incoming it. How you can cross that or spread that over the world or, I don't know, anywhere to, you know what I mean. I guess I see what you mean. There are two problems with DDOS. The first one is that it kills your internet connection. So you end up like when I stopped the console leader on the U.S. part, that's a brain split, right? So you can compare it to a network failure. So if you have a network failure such as this, your hit counts and your regular users' requests don't come in, right? So there's nothing much to process anyway, you know? But every, the communication between the processors and the collectors is, it's a local one. It's on UPC, right? So this won't be affected at all. But you still need to get... You can have your U.S. queue full of messages and your... No. Ah, okay, okay, okay. Your U.E. queue quite normally full. Yeah, that's possible. Then you have to do some kind of auto-scaling load or filtering or, well, rate control. But you can't ask the processor to process the U.S. being stored there. Yeah, yeah, you could. You could. You could. The first implementation actually that I did, I did like this, like you said. The processors were getting jobs from all the BINSTOL-D all around the data centers. But then you still face the problem then when you are DDoS, the processors from Europe wouldn't anyway be able to communicate with the U.S. BINSTOL-Ds. So, well. And I found it to be a bit more complex to explain. So I ended up with this one, this topology, which I think was, I hope, easier to understand. Thanks. And demonstrate. Have more questions? Comments? Or discussion? Suggestions? As I don't know, console, which is the size of the data it can maintain? In the key value store? Yeah. I don't remember the actual maximum size of a key of the value associated to a key, but it's a couple of megs. Overall, depends. The size of the console database, not of the single key. Okay. Discussion? Thank you. Yeah. I try to answer. Sure. It's a better idea to keep in the console 100 megabytes. I know people that tries to keep in console maybe 500 megabytes. And it will be, it was a big problem to synchronize it between all again agents of console. So console is key value storage for the small count of data. Yeah. It's meant for configuration mostly. It's configuration distribution mostly. Main problem in the cap theory, you know, capabilities. Consistently availability and persistent. You can choose only two or three. Yeah. Then you would need some kind of cross data center. You would need a database that have XDR cross data center replication support. And use this one. I guess that would be the perfect that's still a problem. Yeah. Yeah. I know. But we are bound to internet speed. So yeah. That's, that's the point. But it works in real life. So there are a lot of people using data center replication of databases and are quite a nice databases that have some pretty nice and neat implementations of this. I think one of the most mature one about this in real production and very, very high load is aero spike. If you ever heard of them, it's, they're good at this, really good at this and been doing in big situation and very high workload for years now. So maybe you check it out. Sorry. Okay. Thanks for talk. I have a question. How console server will realize that some process will die if process will die? You have a time out between half checks. So it sounds hard. If he doesn't hear back from, it's, yeah, it's her bit based though. If he doesn't hear back, it will, it's removed from the category. Okay. So if a collector is my Python process, how exactly heartbeat sending is implemented? Once again. Well, I'm just wondering from architecture perspective, your collector process is implemented in Python, right? Yeah, everything's implemented in Python. So you have some kind of threat, which is sending heartbeats. No, I use a USB console plugin, which does it all for me. Actually, the HTTP worker source code, it looks like this. I just connect to console to get the count every, every key under my count folder. You can see, right? I can change. Sorry. Lights. It's better. All right. So get data from console gets the color and with a default value and finds every key under the count folder in the key value store and then sums them up. So in my collector, all I have to do is connect to console. Nothing else. They have checked their service registration or their registration is done by the USG console plugin for me. That was one line I added on the collector initialization file for USG. Then USG spawned this code, ran it and registered and when it was sure it was up, it registered it on the console cluster catalog and then it became available. Okay. How USG is doing this? Using the console cluster HTTP API and pulling on it. You can check also the source code of the console plugin, which is written in C++ and it's really easy, but you don't have to code it yourself. Your stack is here for you and that's good. Okay. We're almost time for coffee. So if we can have discussion after this, it would be great. Let's thank the speaker again.