 Good morning. Good afternoon. Greetings. I'm lurking by. Hello. Hello. Give me. Hi, Louise. Hey, Raffaele. Good morning. Good afternoon. Am I pronouncing your name right when I call you Raffaele? Very correct. Very right. It's a release. We'll just wait a minute or two and we can move into Raffaele, the DR presentation puts the presentation into the chat window. Is it possible to get the presentation comment only or view only link? I was going to say it's locked. Yeah, it's locked right now. It has to be added individually. I mean, I think if there is a, maybe a group, an email group, I can enable that. Otherwise, I don't know how to do it. No, you should be able to set it to like anyone with the link and view only and then after this presentation, it's probably best to be able to keep it at like everyone with the link and view only. Yes. I want me to, Raffaele, if it's sort of on your internal and you can't share it, I can make a copy and share that instead if you want. Yeah. Yeah, please do that. Okay. I can only enable the link to Red Haters. I cannot do it, make it publicly available, but I can enable people individually. I see. Got it. Alex, your ways, bitch. I wish I Red Hat, I used to copy it to my personal account for community stuff and then share that using Gmail. Could probably do that too. But I mean, yeah, it's what Alex is doing basically. So after that, after today I'm going to use his copy if there is any update. Let's post the link. Can you open that one? Great. Cool. Yeah. I think it's five past. We should probably start. So I'll hand over to Raffaele who's going to put a few slides together to summarize the documents that we've been working on to see if it makes it easier to get some of the concepts across and maybe provide some feedback. So over to you, Raffaele. Thank you. Yes. So, as you know, there is a much more detailed document that we are considering for, you know, maybe to publish where my intention and hopefully becomes the group intention is to talk about a cloud native approach to disaster recovery. So I created this presentation just to make it easier to communicate, but really that document has a ton more of details. And the idea is you can still use all of your traditional DR approaches, but we think that there is maybe a new way to do things with cloud native and so we're going to talk about that. So again, it's not to say that the other thing, the other approaches cannot be used, typically the active passive approaches. But we're looking at, we're going to try to give a definition to what cloud native disaster recovery might mean. And this is my slides here to compare the two approaches. So I'm going to go line by line, line by line in, in, as far as disaster detection and the procedure, traditionally, there is a human decision. Okay, something goes wrong and somebody triggers the DR procedures then maybe the DR procedure is automated but there is very often there is a human decision. Well, you want to get to with cloud native or what we define as cloud native disaster detection is, is that it's going to be autonomous so the system automatically understands the disaster and triggers whatever reaction is need to be trigger. Then the DR procedure itself, what I find with my customers that normally it's a mix of automation and human, you know, manual actions. We want it to be fully automated in cloud native. In cloud native the recovery time so that's how for how long our system is down needs to be near zero. It cannot be exactly zero because there are caches there are load balances that need to switch but it can be very close to zero. Normally, we see it to be around minutes or two hours right in modern traditional data centers. The recovery point objectives that is a much data I lose or how much inconsistency I have between copies of my data. In traditional disaster recovery I see it being between zero and hours depending how we do the sync or backup and restore in cloud native it can be exactly zero. So full full consistency process wise I I see this pattern and you guys tell me if you see that's not true but in traditional disaster recovery, the owner the formal ownership of the DR procedure is on the apps. This has been a responsibility of the apps but what the apps do is this up uses this this storage. They turn around, they turn around to the storage team and say as the search team what is your disaster recovery SLA starts team as an answer and they say okay for this up that is the SLA that we're going to get. And they, you know, even if formally the DR procedure is on owned by the stories by the application team in reality it's owned by this by the stories team just now native. Sorry, sorry, just a quick question I hope I can interrupt with questions is this okay. Absolutely okay. Thank you. When you say zero in the almost zero seconds is that because the assumption is that the is the same cloud, or is the same region. When you do a DR not across like across far regions across the crowd clouds. No, that's not the assumption. The assumption is we are going to go. We have geographically distributed workloads, possibly across different clouds, and we still get the near zero. Okay, okay. Okay, thank you. And instead of going back to ownership in cloud native it's it's an application responsibility. The other observation I made is that in terms of technical capabilities, most in traditional disaster recovery most. storage capability, mostly from the storage side so backups volume sinks. And this kind of capabilities, but to build this cloud native disaster recovery infrastructure or architecture. capabilities from networking. And in particular, we need we're going to see that we we need his abilities communicate is West. So if I am in two different clouds this cloud have to be able to communicate. And to have a good global load balancer capability. And that's, that's where the switch happens right. So, I mean just just a quick comment here I think I, you know I think we may need to differentiate between sort of what the high level objectives are and what happens in reality right because, you know, to have recovery point objective of zero is certainly doable and plausible but it also kind of means it sort of also implies that that's every transaction, or every database action or every, you know, file action or whatever the application is using is is going to be synchronously happening across multiple sites, which which may or may not be the case right so so so you know, I think I think zero could be the targets that's a bit that's achievable but it's because we're enabling the automation and I think I think that's the point that we're trying to make here right. Yeah, yeah. I perhaps I see what you mean Alex, the, not, not every time I may need to get to zero right not always, I may need to get to zero right now you're the point I'm trying to make is now you can get to zero. And it's not that complicated with you know the, the narrative was, you can do the R as you can make the R as good as you want, as long as you're willing to spend a lot of money right. I think with cloud native approach that narrative changes a little bit this, these architectures are not that more expensive than the traditional, you know active passive ones. And so much so that in an article that I wrote about this I called it the democratization of, of zero downtime. Right during a disaster because I think anyone who can swipe a credit card and start deploying on different clouds can achieve this in a in a way that is not that expensive. So that's, that's the point here that it's achievable and it's not that expensive. Yeah, I think it's not just about costs to Alex's point. We had the presentation last week. I think it was called full FS or that they were talking about performance trade offs, how they were not politics compliance. So I think it's not just about costs. It's also about performance because if you want to have zero RPO you have to write to you know every single zone and get back acknowledgments and so it's not all about costs. There are some certainly some use cases where performance Trump consistency. And that's totally fine. We can we can I totally acknowledge that. Cool. Okay, so high level this is the reference architecture. We assume there are, let's say three data centers in three far away regions. If you region is if you're thinking about the cloud or could be three that centers in different geographical locality. If you think about on premise it doesn't matter their architecture works anyway. There is a state will work load, you can imagine a database or a queue right that is distributed across this data center to form a logical entity but obviously there are different instances. These instances communicate between each other via these horizontal East West ability to communicate. We don't need to know at this, you know, in this reference architecture that is implemented, but they need to be able to communicate East West. So find each other, look at each other and discover each other and and communicate. And that's how they achieve data sync, state sync. We also have a volume. So we need storage of course, because we're still storage doesn't go away but but we don't ask those volume implementation of that that you know that volume that storage implementation. We don't ask you to have any particular capabilities besides the ability to obviously store data. I can imagine that there is a front end or maybe just the recognition but probably there's going to be a front end status front end and then there is a global load balancer. Okay. And so the idea is when one of these regions goes down because of a disaster. The state for workload adjust itself because it has some kind of, you know, leader election and state sync protocol, and we can analyze those in the tail. But it adjusts itself instantaneously there is no data loss, and then the global load balancer has some level of health checks, some health checks. And so clients will start going only to the regions that are active. Okay, so as a, so we, we, we, we reacted to a disaster completely in an autonomous way. The clients keep working maybe they get they get a glitch of a few seconds. I work with the database that where the glitch can be up to nine seconds. And then, but, but then everything continues to to function normally. Okay, so that's the idea. I think it's a general model that the trick is to find state for workload that can actually work that way. There are some prerequisites that they have to implement in order to do this and this state for workload. So I just mentioned q and and databases, but obviously it could be a distributed storage, although performance there could could become an issue. It could be a distributed cash and it could be, it could be anything that needs to manage a state. So, in order to understand why this works I need to bring to mind some concept. So it was there. Yes, so, sorry. So, um, I was wondering in this reference architecture we basically say that for the DR we are basically this implies the state sync is down always going to be down right applications. Right. And application have to have ability to operate replicas across different data center which might potentially has worry high latency. Right. And so that's the on this is there any other model we consider or is this is going to be our like only reference architecture for the DR. Well, you like for for this model that's that's that's how the application is to work. Like I said, and like the slide says you can still do your active passive models or master slave, you know, that I've always worked right but you don't get all the information that that I described. Yeah. Maybe, maybe I'd like to suggest, sort of slight refinement here, mostly to do with the terminology right. I mean, when we put the, the sort of the storage landscape, my paper together we kind of talked about different ways of of persisting data. And, you know, that could be some sort of volume but it could also be, you know, at level stuff like, like a database but also, you know, key value stores or, or, or object stores for example are also are also you know valid ways of of persisting data. And, you know, whether it's distributed storage that's providing volumes or distributed database or distributed key value store right. I think what we're kind of saying here is the stateful workload needs to have a distributed way of persisting the data. So that could be, you know, distributed volumes it could be, you know, like, like a distributed file system or distributed storage system. It could be a distributed database, you know, like a cockroach DB or yet a bite or or, or, or, or, or it could be, you know, a distributed object store. And in that case, then you kind of have that, that, that sort of functionality available to the, to the application I think. Yeah. So, so I think it would kind of be useful to sort of change the safer workload to sort of some sort of distributed storage layer. And the volume is ultimately where data is persisted but it's not necessarily, you know, it could be a distributed volume in the blue layer potentially I think. I agree. Yeah, so I didn't say what service this state of workload offers to the grid layer right and that this service could be storage, right, or it could be key value store or could be SQL or could be Q, you know, it could be anything. I think I can improve this slide by, by adding that, that piece of information. And so yeah, this volume here, like I said, could be the, the disk on which this state for workload is running or could be another layer of software defined storage. It doesn't really matter because the state sink is managed at this layer at the blue layer. Makes sense. So, one second, I'm taking note on this. Okay, so we're saying, you know, so the, the, the document that I wrote tries to explain why this is technically feasible right because you might say I don't believe that this can be done. Right. Well, we haven't done it for many years right now is it possible. Right. So I tried to explain why and I just have to remind you a few concepts. So I think everybody knows what I availability and disaster recovery is I just wanted, I wanted to find them in, in relationship to what the failure domain is. Okay, so failure domain is something is an area of it of our it system that when, when, when there could be a single event that makes everything running in that area fail. Okay. So, it could be a node, for example, which means all the process running on the nodes now fail, it could be a rack which mean all the nodes running on the rack fail, fail, or a Kubernetes cluster and network zone and data center. So, a failure domain is sort of a fractal concept that is out of similar at different scales. But in what we need to remember here for this discussion is, when we talk about a high availability, relative to failure domain, we really are asking the question, what happens when one component fails within this failure domain, right, what happens to the system and one component fails. Assuming a HA01, I have a very big full tolerance of one, assuming that that is what we mean by her. When we talk about disaster recovery. Really, we are asking the question, what happens if everything in this failure domain is lost. So basically the failure domain fails. What happens to my system obviously I need to have other failures in the main somewhere. And normally, in this case the failure domain is the data center by conventionally, when we talk about disaster recovery we talk about an entire center going down. Okay. So with this in mind, I'll continue because I'm sure everybody knows this concept. This consistency here we mean, we mean that all instances are observing the same state and they're reporting on the same state. It's not the consistency of acid, which is more about multi-threading on a single instance and every thread is in the same state. Yeah, I think we define consistency in the white paper pretty well. I really like this slide in that I think we should, what we should sort of pop out of this slide is that high availability is about sort of the recovery from a single point of failure or something like that. Whereas disaster recovery we're talking about the failure of an entire failure domain and that's a really useful differentiation to have. Right. And it's, I felt the need to make this differentiation because these two concepts when you talk to the customers these days are starting to overlap because rightfully so they would like to treat a disaster recovery event like if it was an HA event. And in this theoretical model that I just described, that is exactly what happens. But unfortunately, that also brings confusion between these two concepts. And so it's important to understand what the difference is. Continuing, the other thing we need to remember is the cap theorem. Again, I'm pretty sure you guys know what it is. You know the common way to explain the cap theorem is you can have thinking about the consistency availability and partition you can have you can pick two but not all of three. I like, I like to tell it in a slightly different way that I think helps in this discussion, which is that you don't choose partitioning, natural partitioning is something that happens. But at least the errors will happen, right. So, assuming that you need to be partition tolerant. How do you design your workload, do you design it to be available or do you design to be consistent so that's really in my mind the choice that you have, and I have, and I have here at table, showing some of these choices made by some you should obviously every, every state for workload that attempts so that attempts to be distributed has to deal with this theorem and that's to make a choice here. The other thing that we need to keep in mind is the concept of consensus protocols. Hey, Raphael, sorry, just on one step. When you say the cap choice for those examples, for example, MongoDB's consistency as in, it allows the third consistency or, or it's optimizing for uptime, what did you mean by that. So this consistency in, in the cap theory, it means that when the system goes into network partitioning, which is, which is where the system cannot establish anymore, if, if there is a piece of the system that is actually the other is not working, but it doesn't know what, what the other piece of the system is, is doing, then it puts itself in a not available state, which could mean read only or maybe just rejection of calls. Because the objective is to maintain the state consistent. Certainly it doesn't accept rights anymore, right? While Cassandra and DynamoDB will keep accepting rights, even if they don't know the state of a piece of the rest of the system, assuming that they, because they assume they can do eventual consistency when all of the system, all of the instances would restart to be able to communicate. Right. The problem, so eventual consistency is, is an appealing approach. And I think it has been explored a lot. There is now some emerging, there is a line of thought, you may agree or not, but there is a line of thought that an eventual consistency is kind of dangerous path, because eventual consistency does not imply eventual correctness. It just implies that at some point in the future, and there is really no SLA that you can put on that statement, but just at some point in the future, all the instances will agree on the state. It doesn't mean that the state is what you would have expected from a business standpoint, business logic standpoint of view, right? So the developers now have to take extra care to make sure that they catch this incorrect consistency decision, because there is a conflict resolution algorithm in this state for workload that decides when there is an inconsistent, that decides who's right, maybe with a time stamp or something else. So the developers have to take care of that. And yeah, there are some people that Google or some, you know, thread in where they discussed how painful was to remedy those kinds of things. And so, at least this line of thought, and I like that line of thought is let's keep everything consistent with the risk of taking an outage, but it's simpler from a developer point of view to, in many, in many cases, right? It's simpler to operate that way. There are situations where inconsistency doesn't matter too much. And so in those cases, it's fine to use, to use those databases, but I work a lot in financial institutions, consistency is important, is very important there. So I'm going, but this, this was to explain why I focused here on consistency. So then, and that's, that is really what we mean when we say zero RPO, right? It's there is no, there is no inconsistency. There's no data loss. So consensus protocol. I invented these two definitions. Share state and share state. This is really my terminology. We can change it. But the concept is, like, well, let's, let's find consensus protocol first is the idea that I have, I have distributed workload that is needs to act as a single logical entity. So they, the various instances need to agree on actions to be taken, right? And there are two kind of protocols to agree on actions. The one in the first one year share state is when we have to agree on all of us, you know, all of the instances doing the same action. So we share the state we share the action. Now, at least from an academic standpoint, the way you solve this problem is with a little election kind of consensus protocol where the strict majority can agree on the action to be taken. And they commit that action and then the others that are followers or we're not able to agree on the, on the action they just have to follow and do the action later when they come up online or they are able to join the network. The major algorithm in this, in this area are Paxos and Raft and Raft is gaining popularity because it's much easier to understand. I could even understand it. Paxos is, it's just magic if you try to read it. And then there is a shared state consensus protocol where the participants to these orchestrations really are can potentially do different actions. So maybe I'm writing to a database and then another participant is sending a message in a queue, right? So in this case, we have the historically well-known two-phase commit and three-phase commit algorithm. But notice that these algorithms require all of their instances to be online essentially. There's no tolerance to net or partitioning when you do, when you do a shared state consensus protocol. And that's, it's understandable, right? We are not doing the same thing. So, or potentially we are not doing the same thing. So we all need to know what we are doing individually. We cannot ask later. There are a couple of papers from Google that based on this shared state consensus algorithm, you can build a reliable replicated state machine, which means there is a generic way of agreeing on the state. And then on top of these, and you know, Raft does generic way on agreeing not just on a state, but on a series of actions to be taken with the concept of operation log. And that's really the state that is being shared between these instances. And then every instance has to do the operation that is written in the operation log. And then building on top of this concept, there is the concept of reliable replicated data store where now the action here is I have a series of operation, I have a log of operation to do, but really the operation is to write something on a data store. So this is a concept very highly reusable concept that they that could be implemented generically and then on top of these I could, I could put an API to serve some kind of storage service right so it could be an API to do Q could be an API to do SQL and all the things we said before. And this is as gone beyond, sorry, beyond theory now because if you look at the Apache bookkeeper project it's here in the notes on the left. That is exactly a reliable replicated data store with with with the abstraction, the operation that the abstract is really the things that Kafka does so append only operation to a sort of file system file. In fact, Apache book, Apache bookkeeper is being used to implement highly distributed, a geographical distributed Q system and pulsar. If you want to take a look pulsar is one of such implementations. So, putting it all together. We have, we have replicas, as we know in so to a state full workload can have replicas, and we, we have just studied how we can. We can coordinate this replica with with boxes or raft. And then we can have partitions, which is I partition the data set. Each, each, each group of replicas as to manage a subset of the, of the data set and I do that for being able to scale horizontally, right. And partitions it's it's one of those cases where partition a and partition B are doing different things so if I happen to have to coordinate a transaction that touches to partitions. I have to use one of the unshared state protocols. Okay, so I can use shared state protocols between replica, I can use a shared state protocol between between partitions and that that is how I can create a highly scalable, highly scalable state for workload. And here I have collected some examples of these, these workloads because there are starting to be many. Like I said, not everything will work in the way that I have described in the initial slide but there are starting to be many of these. I tried to collect what they do for the replicas and what is the consensus protocol for their replica and what is the consensus protocol for the partition. Some of them don't support partitions. Some of them don't have inter partitions operations. You can only work with a single partition, but you can only work with a single partition at any given time, but in general, this is, I thought it was a good, a good exercise. And I thought these are actually the right question to ask if you are examining a state workload and making the decision whether you want to use it or not. You can understand what you can or cannot do in terms of reacting to failure in a, in a distributed way. Okay. Sorry, could you explain what do you mean by partition consensus vertical. What do you mean by. So what partition is the concept of partition. Clear. Some, some shards, some, you know, it's, it's a, there are several terminology right. But yeah, so, so a client may try to do an operation that needs to touch multiple partitions. So far so good. Yeah. So for example, let's say, I think in in elastic search, when I each index is a different partition or something, something similar like that. So if I try to add a lot of any piece of information, a document in in two index with a single transaction, I need to do that operation. Across these two partitions right. So, how do I coordinate that I need, I need because the operation is not going to be the same because the partitions they deal with different that sets right. So how do I coordinate that that is usually that happens with one of those unshared state coordination protocol. Maybe I wasn't very keen. Okay. So I guess you know, kind of depends on how you look at it because replicas can also do you know partitions or at least, but here basically you're meaning. If you're doing. Look, you use the rev term replica to talk about different copies of the same object, whereas partition, you're talking about grouping of objects, or you know, operations across different objects. So that's what you mean by partitions. Okay, as opposed. So let's say, let's say my data set goes from a to Z. Right. I could say that I want my partitioning to to deal with a for a to M say, and then partition B to deal with and to Z. Okay, so I divide my data set into different ranges. And each partition is essentially a standalone state of workload just operating on a shorter interval of that data set. And they don't have its partition doesn't have to do anything about the other partition, except when the client wants to do an operation that logically touches to partitions. And if I'm working only only on on the range from a to M, what did I say before and I am just interacting with partition a and I can, I just have to make sure that each replica is replicating the state right across the partition a. But if I am inserting a piece of information in in partition a and in partition be at the same time then I have to and I want to do it in a single transaction. I have to find out where I have a way to make sure that partition a and partition be agree that they're doing that operation. Yeah, I, I don't know. Maybe, um, maybe I see where what what you're sort of pointing out. I think the use of partitions here is possibly unhelpful from a terminology point of view, and maybe it would be easier if we just call them shards, simply because, you know, partitioning as in the verb when applied to the cap theorem is sort of different from partitions when we're referring to a shard. So maybe we want to call them shards over here other than partitions. Okay. Yeah, I find that each, each state will work load uses a different term for this concept. I thought partition was the one that in English meant the closest thing. But I don't have a problem changing to something else. Yeah, I think when we, we had to, in fact, this was one of the things that we debated when we're putting the landscape together and we sort of ended up putting a table to describe shards and replicated shards and sharded replicas as well because different store systems apply them in different ways. But yeah, I think it would just make it easier for everyone if we if we call them shards on this slide. I can do that. Cool. All right. So I'm sorry gentlemen that was asking the question. Thank you. Yeah, honestly. Yeah. Okay, cool. So these are just some databases know some some state for workloads that I have classified along those parameters. I think we should have more. We should extend this table to to more or products, but that's what that's what they have so far. The other thing I would what I've explained so far is really generic. And it would work anyway, anywhere or with any deployment but I thought we could take a look a closer look to Kubernetes and now this would work in Kubernetes. Right, so it's essentially same slide as before, except that now there is a Kubernetes cluster in which our workload is running. So we can, we can translate it to more close. We're more closely to Kubernetes concepts so we have a persistent, we will have a persistent volume will have ingresses. The global balancer has to load balance for, you know, to these ingress or, you know, ingress is using generic terms that this could be a load balancer service or it could be an ingress object. And this is where you see better what what I meant by I need to have this is West. Kappa, you know, networking capability, because building, building that across clusters is not that it's not necessarily is straightforward today with with Kubernetes, it can depend on the CNI implementation that you're using or the cloud where you're running. So if you can do it, you know, still this is the requirement for this that they for this workloads to be able to to be stood up that way. Okay, I'm assuming everybody's familiar with Kubernetes, not much to say here. How much else to say. I didn't set up the demo. I mean, I have it set up what I wasn't planning to run it today. I don't know how much time we have. I could certainly run a demo in one of the next day, but just to explain in one of the next meetings, just to explain what what the demo is about I, we would have these cockroach database that is distributed across clusters in three different regions right now my setup is on AWS but could be could be anything. We deploy a network tunnel in the case of I'm running an open ship so in the case of network time in the case of open ship we need to deploy a network tunnel to make this cluster be able to talk to each other in an horizontal way so without doing egress and ingress right this we are essentially merging the SDNs into a single larger software defined network so that everything is routable and discoverable. To do that, we use a pro we use an operator and the product called some Marina, which was initially developed by Rancher, but now I think is joining the CNCF as a product and that is basically it establishes a IP stack base VPN across the across the SDNs of the clusters. And then we I deploy a global load balancer with health checks on route 53 using using an operator that talks to route 53 and makes this configuration. So these from a cockroach perspective we have nine instances because we need the way cockroach work. It's, it's better not mandatory but it's better if you do local. What is it called local majority so local leader selection and then global leader selection. And then. So we have nine instances but they behave like a single cross regional entity. Okay. What works is I take down, I take down one one region, and we see that the clients keep keep just working normally we set up some client here that run the TPCC test which is a standard SQL test for highly, you know, highly transactional operations on a SQL database. And we see that, obviously we kill this instance, but we see that the other instance keep working. So what is this could be our status from 10 but here we're just generating a bunch of transactions. Hey, so is, is that, you know, is is some sort of network handling like like with some manner, mind for this sort of architecture or ability. So all of this state workloads the way they and I'm working with others the way they work is each instance need to discover and establish a peer to peer connection with with all the other ones. That's necessary for the draft coordination to work. So, so discovery and connectivity is needed. The way you implement it. That's up to you. For example, I know that if you use the, if you use the Google Kubernetes service, you can build a cluster and switch a flag where if all the other clusters are in Google, you know, regions, they will just be able to talk directly. So it's, they, they give, they give it to you, but other implementation of or other distribution of Kubernetes may not have this capability right. So you have to somehow provide it. And in, I can only talk about open shift for this particular capability and open shift that's how we are doing it. Right. And in this, in this instance, in this, sorry, in this example, the, the database has nine nodes total three in each region does is does that behave like a single logical database is that kind of the, the, the gist of this year. Yes, it's, it's a gist and it's exactly what happens. It's actually nice to see. That's why maybe next time. Yeah, if you guys want to see this demo, I'll be happy to show it to you. I'll be very happy to show it to you. But yes, it behaves like a single database from the client's perspective. Could you highlight on this slide to for just to, you know, follow the previous slides, where you're doing your replicas and where you're doing your shards. I mean, it's pretty, pretty obvious that you want your replicas in the regions. And then you're, you're sharding within that but just, it would be nice to, since you have three and three here it's not clear. Yeah. So this is where we start talking about the second generation of state of workload which decide the sharding by themselves. So cockroach, based on how you use the data can reshards can reshard and can decide how to charge. So you can hint when you create tables, you can hint how to shard them, but you don't have to and it knows what to do. It's really, really equals. I think they use the name tablets for shards. So that's yet another name. Okay. And then it creates its own tablets. You don't have to decide it. And these are nine replicas. So all the database is, is, is fully replicated everywhere. Except we don't have to have all of these instances agree to in order to proceed with that transaction and that's, that's how they can make it efficient. I did, I did this with the cockroach guys and you really for this question you need to talk to them, but we run a performance test. So keep in mind in, in Amazon between is the US East and US West region. There is about 70 millisecond of latency. So that's, that's just physics. There is nothing you can do around that but with that kind of latency was, we're still able to run the TPCC test with 97% efficiency, which the TPCC 1000 sorry so that emulating 1000 databases, using OLTP so highly highly transactions kind of operation so not, it's not data we're housing or you know big queries is more insert, insert select insert select this kind of things. So with that kind of traffic pattern emulating 1000 instances we did 64%. We, I'm sorry. We did 64% which is, which is almost the same that you would get from a monolithical database, probably monolithical database can do a little bit more but it's, it's close to the theoretical limit 100%. So, they were, they were up with the result. They could already do achieve those results running on VMs but the exercise you obviously was running on containers and inside of operation. I guess, from a concept point of view, this this applies to, to just about any distributed storage right if, if, if you have, if you have a logical instance that combines sharding and replicas between the between between sort of multiple cluster instances and you have some sort of network tunneling, then this this can apply to potentially distributed file systems key value stores and object stores and so so so you know we can probably make this a fairly generic player as well. Right that that's my objective here. I don't think it matters what the state workload does what what we are finding a solution for here is replicate state across regions, right, and or keep state in sync across region, but so I think it can be done with other eight, you know, interfaces because this is a secret interface right it's this SQL service. I think it can be done with other type of state of services. In fact, I would like to be able to showcase this the same architecture with other kind of workloads. Because it proves the point right the point right now one might say, okay, it works with CoCoCB but it's not a general solution. But if I can make it work with other products then it starts to be a generic statement, more of a generic statement. So I'm collaborating with other patterns to see if we can recreate the same kind of deployments. So it's a part that can vary across different distributed databases or file systems is how they consume this topology. So for this demo like how did you convey this topology of you know there are three different availability zones. And you know, how did you make CoCoCB aware of this topology so proper starting happens across AZs, you know, as opposed to within the same age. So this approach has some parameters that you need to pass the process when you run it to make it topology aware. So using downward API and other approaches I make the pods aware of where they run. And that's how it decides to do the sharding right because like I said it's, that's a nice property of it, of that it does all the sharding. I see so like the note labels on labels. Right right. Yeah, yeah CoCoCB understands one level of topology. I'm working with another database now Yugabyte which understand multiple layers of topology potentially so it understands cloud region and AZ. So passing these parameters you make it aware of where each instance runs and then they can make a decision on how to distribute the data. I guess, you know, it probably therefore makes sense to have a short section or a slide or something to cover discovery and topology. As in, you know, how do the notes discover each other and how do the notes and how do you kind of like define the topology somehow somewhere because it could just be it could just be labels but just as equally right they could be they could be looking at data up in a discovery service as well. Right. Yeah, there are it. Yeah. And I think, yeah, I can create a slide on that I think I talk about that a little bit in the document. It started to it started. It becomes implementation dependent very quickly that that's all. Yeah, yeah, I'm not suggesting we start to define how they how they do it just that we kind of need to tell people if you're looking to build this architecture you need to figure out how you're going to do your discovery and your topology. Yeah, topology is is a fundamental discovery and topology are fundamental in the case of some arena it comes with a discovery service so if I know what what to look at if I know the name of the server, you know, if I know the name of these these are stateful set right so if I know the name of these individual instances. I can look them up from this cluster, just because I have a generally distributed discovery service. But yes other. If you don't use some arena you will have will have a way you need to need a way to do that right. For example, Selium. If you know Selium is another CNI that you can you can configure in your Kubernetes cluster, Selium support, not internally out of the box. So it's a switch that you can turn on. I think. What is the other famous one. The famous CNI. Calico, Calico, Calico does this. I think Calico has the same capability if you, if you look into that. Interesting, we, we, maybe, maybe it's worth thinking the, the SIG network and seeing if they have any information about those, those product capabilities. You can do that and the multi, I think it's the multi cluster SIG but there is some SIG that has defined a standard as a final spec for cross cluster discovery. They don't define the tunneling, but they define the cross cluster discovery and some arena implements that spec. Yeah. We're actually a minute over so I think we're going to have to cool time, but, but this was, this was brilliant Rafael and I think we've got something solid to, to, to work on. Okay. Thank you. Thanks everyone. And we'll see you all in a couple of weeks. Thank you. Thank you, Rafael. Bye bye.