 Hello everyone. Thank you all for coming. My name is Shalini Shekhar Munger. I work at Lucidworks And I work almost full-time on Apache solar focusing specifically on solar cloud and Finding its bugs hardening it testing it to its limits, you know and doing that over and over again So in this talk, we're gonna talk about Jebson and flaky networks. What is Jebson? I'll come to that in a bit. Sorry about that Let's start with What do I mean by a flaky network? So this is from Jeff Dean who works at Google at LADDIS 2009 Admittedly the stats are a little bit old, but the message is still true today In their experience in the first year of a new cluster a cluster has about between 40 to 80 machines about five racks have problems in which There's a packet loss of almost 50% so 50% of servers can't randomly not be able to talk to each other and You have network maintenance events in which you have sporadic connectivity issues between servers router failures DNS issues what not right and this is Google these guys know a few things about running large clusters, right? so the fact is that Reliable networks are a myth. They don't exist when you have asynchronous communication happening in a data center things will go bad and It's not just the network. It's not just the network equipment which goes faulty It's also your systems. For example, if you're running Java You have garbage collection pauses, right? So processes which will fail to send a message or respond to a message because they are in a garbage collection pause and That is not any different from a network partition if you're looking from the outside, right? If I'm trying to send initiative a request to a server, which is in a garbage collection pause It's not any different than if there was a bad firewall rule which prevented the packets from reaching that server There is no difference between the two, right? So network partitions are a reality It's not just the network equipment which goes bad. It is process crashes It is the scheduling delays of the Linux kernel For example, you're running your systems in a cloud on AWS or other shared hardware, right? With virtualization and if the other virtual machines on that same physical host are doing are very busy and the the virtual The the scheduler decides to give more priority to some other virtual machine Then the same thing happens. Your processes cannot run They'll be paused until the the scheduler comes back and gives them priority right So when we talk about network partitions, most people think that the network partition happens something like this So this is a five-note cluster So I've named the machines n1 n2 up to n2 for n5 and this dotted line in the middle can be a bad firewall rule It can be a faulty switch which it refuses to send packets across It can even be a garbage collection pause happening on n4 and n5 making Keeping them completely separate from talking to the other nodes in the cluster Right, but this is not the whole truth. This is not the only way that network partitions happen In an asynchronous network, it is allowed that packets can be lost. It can they can be completely dropped They can be duplicated. They can be reordered and they can be delayed Right and if the delay is too much, what do you get in your applications? You have timeouts. You have connection timeouts. You have read timeouts, right? So all this are all these things are allowed in asynchronous networks right So when you have these kind of things, how do you? How do you figure out what a distributed system is supposed to do in the presence of these kind of problems? So let's come back to a topic which has been discussed at least twice yesterday in different talks the cap theorem The cap theorem says that you have consistency availability and partition tolerance and you can choose only two Right, and this has been discussed to death yesterday and in literature and you know in different research papers as well So what is consistency according to cap theorem? Actually, the cap theorem is very very specific about the kind of consistency it requires it requires linearizable consistency What that means is from an external client all operations that are happening in a system are supposed to happen They should look like they are happening at the same instant Okay, even though they are not happening exactly the same instant in the presence of concurrent Read and update requests, but from an outside perspective. They should look like they are happening at the same instant, okay? Availability basically means that every request which is received by a non-failing note Should respond it does not say how soon should it respond? There is no time bound on it, but it does say that there is a finite time bound It's not like you can just give up and not respond at all So you should respond at some point, but it does not say how soon it you should respond and Then you have partition tolerance what that means is during a partition an arbitrary number of messages may be dropped It only talks about dropping messages. It does not talk about Reordering messages or delaying messages, whatever well the delay is actually quite similar to a drop But let's not go there for a moment So partition tolerance says that unless the complete network is shut down There should never be a incorrect response coming back from a machine But when it comes to real systems that we use real data stores and databases that we use like Cassandra Elastic solar mongo, RIDIS, etc. etc How does that manifest practically to your applications? when you choose to forego availability so Partition tolerance is a must right you cannot not choose part in tolerance a system should not I mean if you Partitions are going to happen no matter what you do no matter how reliable your systems are right So if you you must choose to tolerate partitions So which means you can either give up consistency or you can give up availability for different operations And it's not a it's not a hard and fast tool that you have to do it for all kinds of operations So different systems can choose different kinds of guarantees for different kinds of app for different kinds of operations But when you choose to forego availability Then what happens either the server or your data store responds with an error saying, you know I'm not available right now. Maybe try again later or you might not find a route to that host So you get connection timeouts or maybe that host is actually trying to make a call to a resource on some other node In which case your request can be hung for a long time and you can have timeouts in your applications At the same time if you if your data store foregoes Consistency then what happens you can get stale results So basically results which are out of date which have been overwritten by another process on some other part of the data Store on some other machine or you can have dirty reads. What is a dirty read a dirty read is a read which is Showing incorrect or garbage data. For example, a system might choose to Give you give the client a new Response a new data even though that new data has not been acknowledged back to the original guy who wrote that data So that is a dirty read so some systems can do that the third thing is and the most catastrophic thing is it loses your data The data which you added which you updated to the system and you got back an acknowledgement from that system is now No longer there at all. It's just gone Right, so this is like the most catastrophic kind of a kind of a failure when you forego consistency How do you plan for these things? To plan for these things you first have to figure out whether your data store actually exhibits these behaviors and which Behaviors does it exhibit now a lot of the literature a lot of the documentation for different data stores? Sometimes it's not complete. Sometimes it's a little hand wavy, you know, not very precise or sometimes it's not present at all and The only way you can actually know the guarantees of your systems is by testing them so how do you test for these kind of problems and That is where Jepsen comes in So Jepsen is a tool written by a guy called Kyle Kingsbury who goes by the handle of if you're online and He has written Prolifically about different data stores. He wrote this tool in closure What this tool does is it tries to set up a small five-note cluster and Then it tries to wreck havoc on that cluster by either using IP tables to drop packets or to delay packets Or to SSH into those individual boxes and crashing the processes without giving any warnings or maybe pausing and restarting those processes Or maybe it can even do things like go in and change the clock of the system or go in or Flip the bits on the disk for the different data source, right? So it allows you to create very very powerful tests, which you can use then to test on your own systems So what does a typical Jepsen test look like? A typical Jepsen test has these components First of all, you have a bit of code to set up that data store This data store can be Cassandra, Mongo, whatever whatever you want to test, right? So you will set up this in an automated fashion and you can tear it down So what we typically do is For example, how I test it is I have this one large desktop machine with the eight cores and 32 gigs of RAM Well, it's not that big now, but it was big with four years old four years back when I built it and Inside that I have Linux containers running So I have five nodes running as different Linux containers in each of these five nodes I set up solar and I have these clients which are running as part of the Jepsen process on the host machine Right and this host machine has permissions to go and SSH into each of these Linux containers So that we can do these kind of operations So you have the automated database setup You have the client or the test definition and we are going to talk about that in a bit You have something called a nemesis, which is the guy who creates chaos It's like a chaos monkey kind of a thing, right? So it creates chaos in your cluster Then you have a schedule of operations So you can say, you know, I'm gonna write a new data or a new document every one second Then I'm gonna wait for a random amount of time and maybe after every 30 seconds I'm gonna create a partition between nodes and then heal that partition after another minute or so Okay, and I'm gonna do that repeatedly for maybe five minutes or 10 minutes or whatever. Okay? So you have the schedule of operations Whenever you whenever the client makes a request whenever the client invokes an operation on the host Which is running on the Linux container You either get back an okay response a success or a failure or it times out which is basically ambiguous It could either have succeeded and you did not know about it or it could have failed So you have to treat that as an ambiguous response. It can either be a success or a failure message So all these success failure and indeterminate responses are logged Which we call as the history of operations and this history of operations is important because that lets you figure out What should have been the final state of the system under the guarantees that are provided by that data store Okay, and then you have a consistency checker which basically figures out whether the final read was consistent Or whether it lost data or whether there is some extra garbage, which we never actually put in in the first place Okay, so the first three things are data store specific So if you are testing mongo, you will write drivers a client for mongo You'll write the automated setup code for mongo and and the nemesis types Some of the nemesis are actually provided by gypsum and we're gonna talk about that in a bit But this part is data store specific So if you're gonna test something of your own you're gonna write this part in closure according to your own data store and Then these three things are provided by gypsum. Okay, so you have these three nodes Which are created using Linux containers. You have these three clients Which are created inside the gypsum process They make request concurrently to these different nodes and they get responses, which can be okay or failure or indeterminate and Collectively these responses are known as the history of the system Okay Now what is a nemesis So the dictionary describes the nemesis as the inescapable agent of someone's downfall and this inescapable agent of Someone's downfall that someone is your data store. It could be mongo or solar or whatever, right? So let's look at a few of those nemesis that are provided by gypsum for you So the first one is pretty simple. It's called the partition random node It will randomly choose a node from your existing cluster and it will SSH into that box and use IP tables to drop all external traffic So both incoming and outgoing so that node will neither receive any traffic Not it will be able to send any request out of it, right? The second one is very similar But instead of using IP tables it will just go inside that node and do a kill minus nine It will just kill it without any Warning at all right and then it will do all these tests the third one would be it will go and change the clock It will change the clock Maybe make it five minutes before five minutes afterwards or whatever you can actually specify the threshold of how much the clock is supposed to change Then you can get into more complicated tests where you partition halves so if you have five nodes you will have a majority partition and a minority partition, right and You can choose them linearly so you can always say that hey N1 and N2 and N3 are gonna be together and N4 and 5 are gonna be in a separate partition Or you can choose them randomly, which is the second one here the partition random house The third one is even more interesting. The third one is called a bridge partition In this bridge partition what happens is the node N3 can talk to the nodes on both sides of the partition But the nodes N1 and N2 cannot talk to N4 N5 and vice versa, right? So this is a very very tricky sort of a partition and it's not like that this partition doesn't happen This partition can also happen in physical and real networks So how do we write the client? How do we test? What are we actually gonna test? So this is a simple test that I used for solar It was originally written for elastic by the guy who wrote Jepson and we adopted the same test to test solar else as well So in this case what we're gonna do is we're gonna create one document Inside a data store which is going to keep a list of integers a set of integers Okay, so those integers should not repeat and We are going to use the compare and set primitives available in these data stores to add a new integer to this set Right, so we read the data store. We see that the current version is V1 Then we say, you know add an integer 2 to this set as long as the version is V1 And then you can keep on repeating from each client Okay, so presumably only one of those compare and set operations should succeed and whatever has succeeded should never be rolled back right So we do this concurrently from each client So if you're going to set up a five node cluster to test We are going to create five clients as well and we are going to ask them to use multiple threads to write New integers to this set stored in that data store okay, and Then we'll create and restore partitions in the in the middle and at when finally the all the partitions have healed We'll go and read the set of values which are stored inside the data store and see if they actually match What should have happened? Okay So in this case, this is a visual representation of the same thing You initially have an empty set and you do a compare and set and add the value one to the set of integers So you have the value one inside stored as a document inside solar or elastic or mongo or whatever Then you have two clients which are trying to update Which are trying to add a new value to this set concurrently So because they're trying to use the same version one of them will succeed one of them will fail Right, and then we'll repeat this process again and again During this time we will create a partition and then heal the partition and we'll keep doing this a couple of times to see what happens so This part whether the operation succeeded or failed or was indeterminate is the history of operations The history is nothing But the time at which the operation responded back to you the operation that was actually being performed and the result of that operation so originally the author of Jepsen He Wrote a series of blog posts which are really good which you should all go and read about if you're interested in distributed systems and He done he did tests on redis on mongo on elastic zookeeper Cassandra aero spike Redis dis queue a lot of different data stores He did not test solar But when these results came out a lot of our customers a lot of people started asking That how well does solar perform in the face of these kind of partitions? so since it is one of my Responsibilities in my job to do these kind of things So I started learning closure and I started writing these tests adapting them from the ones that he had written For elastic and running them on solar. So this is a process. It took about two to three months for me and we found a few bugs and We did solve those bugs So this is I'm gonna talk a little bit about what we found and how does it actually perform? And then I'm gonna come to other data stores So a solar in how many of you are familiar with solar? Okay, a lot of you are familiar with solar. So a quick recap solar is a Search server which uses lucine are not very different from elastic in that sense But there are a few differences For example solar uses zookeeper for distribute consensus. So it relies on a zookeeper cluster to figure out Which note should become a leader or who is the current leader or whether a particular node is live or dead and so on and so forth It has a lucine index. So you can add new Documents you can do a full-text search on those documents You can do autocomplete highlighting faceting analytics so on and so forth It also has things like compare in set So you can atomically update a given document by providing the last version that you know about and if the version is still the same Solar will go ahead and update the document. Otherwise, you it will say, you know, there's a conflict and I cannot update So presumably because it provides a compare in set and an atomic primitive Presumably it should be linearizable. Okay, so let's see what happens When you add a new document to solar, it is synchronously replicated to all the nodes which are available at that time So which nodes are live at that time is determined by using zookeeper. Okay If you want to read the whole thing like the complete test methodology and the results and in the configuration everything There's a blog post I wrote last year in December. It has all the juicy details You can go and read it In this test, I'm going to use a compare in set client I'm going to simulate a set of integers going to add an integer every second concurrently from five different clients on five different solar nodes and I'm going to partition the network Randomly in half every 30 seconds and I'm going to keep running the partition for 200 seconds So for 200 seconds nodes in the in the majority in minority partition will not be able to talk to each other Okay, I know you cannot read this but it's not that important Just closure. I wanted to show I can write closure So are we actually safe the solar actually do what we wanted to do? Well, uh, maybe The thing is uh, every application is different if you are going to use solar as the only data store Then you're going to have some problems But we do not have any catastrophic data loss problems which were found in other data stores What we did find was that when a partition happens the leaders on the minority partition They become unavailable For up to the zookeeper session timeout, which is typically 30 seconds in solar by default So for 30 seconds until the time that solar realizes that the leaders are no longer Reachable from a majority partition. It will not elect a new leader But after 30 seconds when the zk session timeout happens It will elect a new leader on the majority partition and then things will continue as they were before So for that 30 second or maybe a little bit more or less than that solar writes on solar For the leaders, which were on the minority partition will be unavailable So you can figure out whether that is uh reasonable for your application or not So there is no hard and fast rule on these things Some rights will hang for a long time And of course timeouts are essential Uh, so when I say for a long time, they actually hang on for some of these rights Which happened right as the partition was happening would just go and hang for as long as the partition Was kept up Which is pretty bad But it's not so bad because in in in your real practical applications You will set up a timeout for maybe three or four seconds or something like that And after that you will say hey, the operation did not succeed and maybe I'll try again later So not that a big deal Under the comparison operations and under these kind of partitions The final reads had all the integers for which we got an okay response Plus some extra integers for which we were not able to figure out For example, the the request timed out So we were not able to figure out whether they failed or whether they succeeded And some of those actually succeeded and they were also found in the final set So on the right hand side, you have a small snippet of the results which are generated by Jepsen Which says the recovered fraction and this is a ratio So which means for 14 requests the results were ambiguous and they were actually found in the final results So the 14 were recovered Zero were unexpected. So there was no dirty reads There was no nothing which was which should not have been there which we never added in the first place You we did not lose any documents And the okay fraction was one to five one out of two three five nine total attempts So when it comes to availability, well, probably not that great But then this also depends a lot on the cluster topology that you're testing Right For example, if you're running just one shard and five replicas The availability results might be very different than if you're running three shards and five replicas or five shards and three replicas Right, it all depends on where the leaders of those Shards land up and whether they land up in the minority partition or not So no data loss yet, which is good But uh, we have not proved a linearizable history For example, we did not look at the intermediate results and figured out whether the intermediate history was also correct or not Was there a total order or not? We have not proved that yet. So we'll do that in the later iterations of these tests But Just because we did not find very bad bugs that doesn't mean they don't exist Right, so we're gonna keep looking. So this is now a process Every time we have a new release coming up in solar, we run these tests figure out whether it failed or not Go deep investigate what happened. What really happened? Is that a valid bug or not? What happened? You know, and we just keep that doing keep doing that all the time for each release So that's solar What were the bugs that were found? so I'm not going to cover this very deeply in this because you can go and read up on these bugs online if you just search for these things One very bad bug that we found was that if you are issuing commits on solar during a partition Uh, it would put randomly it would just put any node in the downstate And then that node will not receive any traffic from the client and that's pretty bad What was happening was that all the rights that are done in solar are coordinated via the leader Right, but the commits are just broadcasted by any node because you don't need to be a leader to broadcast a commit It does not change the actual data. Which is inside the solar, right? It just makes it visible So what was happening was that the node which is broadcasting the commit was also acting as a leader for that operation during that bit of time and therefore it was able to Put another node another node in a bad state when it could not make a request that node, right? Which was pretty bad. So we fixed that now the commit operations don't go through that code path anymore The second one was not catastrophic, but it was a bit of an annoyance really what was happening was when The nodes reconnected to zk It was invoking the startup logic which is to recover from the transaction log and then Become available and that was not a bad thing. I mean that was not bad in the way that Anything any operation was being repeated because all of these operations are stored with their versions in the transaction log So if you're going to repeat them, it does not change anything at all Because the versions are old solar will just discard them But it takes time to go through that transaction log which can be hundreds of megabytes and actually invoke all those operations and then figure out Hey, this was all users in the first place Then there was some other I I already talked about the fact that some requests hung around for a long time Which we fixed the cluster status API of solar was a pretty was a little flaky It used to go to zookeeper each and every time Which gives you very consistent responses, but during partitions it tends to hang So it would instead of saying that you know, I cannot go to zookeeper It will just hang for a couple of minutes and not give you a response at all which is bad So we fixed that in 5.2 The last one was especially tricky to figure out this was a race condition between how a leader marks a replica is down and The exact time at which the partition happens and so on So the more juicy details are available on the JIRA issues. If you're interested in that, you can go and look it up Elastic so elastic is quite similar to solar, right? It also uses a lucine index and a transaction log. It's a full text server not unlike solar Promises data safety and consistent single doc reads and compare and set operations The search is of course like solar. It's eventually consistent For example, when you commit an operation, that's when the results of the ad or update become visible to searches So this should also expose this should also Show similar behavior as what solar Exposed, right? But unfortunately, that's not the case elastic uses a homegrown algorithm that what what they call as zen disco And it's not based on any uh pure reviewed Consensus algorithm like pack source or raw after whatever. I don't know what it actually does really. I've never looked at the code but what happens really is when And this test is not something that iron myself. This is a test that The guy who wrote jeffson if he did did these tests and he wrote a nice blog post summarizing them For version 1.1 that was initially when he wrote the tests and then again for version 1.5 And in both the cases he found a lot of data loss issues during different kinds of partitions And not just during network partitions, but also during garbage collection pauses so, uh They're getting better between versions, but they're still not quite there They still have little work to do. I think if they just use something like zookeeper They would avoid a lot of these problems But they choose not to do that to make it easy to set up probably and that bites you in production Okay Let's come to mongo. How many of you use mongo? Okay, and how many of you know about mongo like generally Almost all of you right? Okay, so mongo is a different right? It's supposed to be like a proper data store Not a search server like elastic or solar And it's a documented or document oriented data store Which means you have a document with a list of key value pairs The values can be further json documents and so on and so forth And you can nest them and it provides atomic updates on the documents as well So it is supposed to be like a An alternative to a proper database like mysql or postgres or whatever right? So in theory it should give you some of the same guarantees that those systems give you or maybe a little less Right, but what exactly are the guarantees that it gives? So first of all mongo does not use anything like suki power or anything It has uh, basically you have concept of replica sets Who elect a primary? And the others become secondary to that primary All the right operations go to the primary and the operation log is replicated Asynchronously to all the secondaries in that replica set And the replicas decide among themselves when the current leader is to be demoted or when One replica among them is to be promoted as the primary Okay And what i'm going to talk about here is applies to versions 2.4.3 and 2.6.7 So the way it figures out whether nodes are up or down Is very simple it every node sends a heartbeat message every 10 seconds And if a node does not receive heartbeat messages from other node for up to 10 seconds It determines that you know that node is falling down and if it was the leader I'm going to demote it and I'm going to promote myself or somebody else and so on and so forth So it's very simple in that manner So it claims to have atomic rights for a document and it supports fully consistent reads What is a fully consistent read so that is again Uh a bit ambiguous. It's not actually fully consistent In like when you start reading the documentation, they first say it's like fully consistent But then when you come down later in the documentation, it says that Uh, if you want strict consistency, you should always read from the primary because the the Updates are replicated asynchronously to the to the secondaries, right? If you're reading the secondary, you might get some steel data which has already been changed on the primary Okay, can you guys still hear me? Okay, so presumably if we write to Uh, if we always read from the primary, then we should be safe Right, but that's not the complete story There is something called A right concern in mongo, uh, which has different levels So when you're writing a document to mongo, you can specify one of or more one or more of these right concerns And it will do things accordingly in 2.4.3 mongo db used simple udp messages to send Data one way to the server and just forget about it It would not even know whether the server received that request at all or whether it acknowledged it So it was like completely unsafe. So they changed it after if it published Uh, a dog a blog post on mongo db and now they use something called an acknowledged right concern Which means that the primary acknowledges to the client that yes, I have written this data But this data is only written memory. It's not actually journaled to the top log or written to its actual file formats So it's not on the disc just in memory Which also means that if the primary goes down Before the data can be replicated to the secondaries your document might be lost Right, which is bad So you have right concern journaled fsync replicas acknowledged and then something called majority Which means it's going to acknowledge that the right actually has been sent and acknowledged by the majority of the replicas So this is supposed to be like the safe setting Right, so this should be perfectly safe But is that true? No, not really what happens is That when you're writing to a majority So first of all if you're not even writing the majority then when a new primary is elected in a majority in a new partition Then the data written to the old primary Will get rolled back eventually when the partition is healed Right, so you will have rollback of data for which you got an acknowledgement from mongo, which is pretty bad But there's a silver lining the rollback is limited to 300 megabytes only So only 300 megabytes worth of data will be rolled back if it tries to roll back more than that Then mongo will actually error out and say that node will go down and say, you know You gotta do some manual recovery steps. I cannot roll back more than this amount of data So safe, right safe So even when there is a majority right concern enabled When it's fsync and journaled and all those things Still you can have inconsistent rates and that is because the isolation level in mongo is read uncommitted Which means that if client a sends a new document to mongo and client b Sends a new document to mongo So let's say client a added an integer one to a document written in mongo, right So it will mongo will write the primary will write to its own op log in its own index And then it will try to replicate them to all the replicas and wait until the acknowledgement happens But in the middle of that if another client comes in and tries to read the value From the primary it will give you the value, which has still not been acknowledged To the actual client who wrote the value, right? So what does this mean? So this means that let's say You have a service where you allow users to change their username Uh, or the display name something like twitter, maybe right So what happens is I change in my name to x and you also try to change the name to x Which should not be allowed, right? But the primary that I wrote to Went into a temporary partition. It was in the minority, right? So eventually this primary will give up its primary ownership because it is now in a part in a in a partition And the other one will assume The primary owner of that shard the one on the other majority partition But in the meantime Somebody comes, for example, you change your name and then you attempt to see the Uh preferences of your account Somebody will come and try to do a read from the old primary. He will see the new data And if it was you who was trying to see your preference pages Mongo might show my data to you Which is very very bad. So in anything which requires some sort of a linearizable constraints Should not be done on mongo at all. It's just not safe under any right concern under any sort of replication strategy So What do we Take out from this? What do we learn from this? Uh session the first thing that we learn is network communication is flaky If you have asynchronous communication things are going to get dropped. They're going to get delayed reordered whatnot If you decide to use a data store just by how much buzz it has That's what I like to call is hacker news driven development You're in for a surprise. So it's better that you find that surprise yourself first before your users find that Out for you Right. So jebson is what you can use to test for these kind of problems And if you want to learn jebson, if you want to if you already know closure or if you want to learn closure and Do some more tests We can always use some more help testing solar So help us find some bugs so that I can fix them And that's pretty much what I wanted to cover. So, uh, before I Close this I just want to say one thing that we are going to have a solar lucine meetup Next week on saturday at the target office in manatek tech park. So if you guys are around and if you're free Come by we're going to have a few good sessions And now I open up the deck for questions Thank you For solar, I mean through jebson. Would you also put zoo zookeeper as well under jebson or yes? Yes, I do. So the so we tested two different topologies in the so In the first one, we had zookeeper running on the host machine Which means that it is never partitioned away In the second topology, we had five zookeeper nodes co-located with five solar nodes running on each of the Linux containers. So whenever there was a partition the zookeeper also partitioned into majority minority So zookeeper actually performs pretty well. Uh, if it himself tested zookeeper and he gave it a clean Bill of health a good, okay So zookeeper, uh, compare and states are linearizable Of course, it has some because it's a consistent data store It does have some problems with availability But that is to be expected and actually it's a good thing for zookeeper to have those problems So have you tested elastic serves performance In case when it undergoes, uh, bridge partition Yes, actually, uh, I think one of these, uh Okay, so there is a bridge partition right the second bullet here This is these are the results in case of the bridge partition So if it tested it against random node partitions under partition halves portion random halves and the bridge partition as well as he simulated a test where He tried to simulate a garbage collection pause So he would go into the box and do a six stop and say continue So it will pause the process and then unfreeze it again to simulate a garbage collection pause And it had these data loss issues in almost all of these partitions. So it's getting better It's definitely getting a lot better But like for example, it had like a 29% loss in the in 1.1 and only a 3% loss in the second one, but You know Whatever is acceptable to you