 Hi, welcome to Cloud Native Live, where we dive into the code behind Cloud Native. I'm your host today. My name is Whitney Lee, and I'm a CNCF ambassador, and I'm a developer advocate at VMware. So every week, we bring new presenters to showcase how to work with Cloud Native technologies. We'll build things, we'll break things, and we'll answer your questions. I'm so excited. This week, we have Justin Barksdale and Oz Teram here with us to talk about how to use two node H.J. for EdgeCube Kubernetes, a new approach, so I'm super psyched. Now, as always, this is an official live stream of the CNCF, and as such, it's subject to the CNCF code of conduct. So please don't add anything to the chat that would be in violation of that code of conduct. It's really easy to keep track of, basically just be kind, be kind to each other, be kind to the presenters, be kind to me, and I'll be kind to you too. So friends who are joining us live, please say hello in the chat and tell us where you're from. It's so cool that we have a global community here with us each time. And as always, if you have questions during the presentation, please do post them in the chat. We're going to have today's feel like, more like a conversation, and we're going to go ahead and answer your questions as they come in today. So with that, I'm gonna hand it over to our friends, Justin, Barksdale, and Oz Turam, to kick off today's presentation. Tell us about yourselves. Hi everyone. My name is Oz Turam. I'm a principal software engineer with the advanced project team with Spectro Cloud. What we do in this team is usually explore different cloud-native technologies and see how we can bring them to our customers and hand it over to you, Justin. Hi, thanks Oz. Yeah, my name is Justin Barksdale. I am a principal architect on the sales side. So I help support our customers and help them design and implement solutions around our products. So I'm happy to be here. I do want to kind of go through a quick agenda. So we are here to talk about two node HA at the edge. Our focus is around edge in this particular conversation. I want to talk a little bit about what the three node problem is. So why three nodes really doesn't work for some customers or many customers that we talk to? What are some alternatives that we've come across as we were sort of building this solution or going through the solution? What solutions have we come up with? We have a couple of different options and a couple of different demos to go through. And then what are next steps? So going into that, really what's the problem with three nodes at the edge? We've been running three nodes or more Kubernetes for quite some time. What are the challenges that we see at the edge and why is this really kind of important? And I think the first thing is really to talk about the scale. Edge scale is much, much more massive than what we see in data centers or I'm just clouds. And a lot of this drives around costs. The number of nodes I have at the edge drives up costs for me to do certain things in that particular location. And oftentimes three nodes becomes overkill for what we need in terms of what the workloads require. So we are buying an extra node just to support the ability to have high availability, the reality is the workloads themselves don't take that much. They're that many resources, could we do better? When you think about edge and thousands of clusters, tens of thousands of clusters in some places, every time you add a third node that just adds up and the need to have that third node for HA only really kind of takes away from the model that we've taken in terms of built your infrastructure support or application versus having to have this high availability extra node. And that thousands of nodes adds into dollars. When you think about the types of devices being deployed at the edge, whether they're small form factor devices, single socks, there's still hundreds of dollars each and we've seen most of them between $500 and $1,000. So when you think about that and you have 10,000 locations, just adds up and becomes very, very expensive to achieve HA. And most edge deployments that we're seeing in terms of how we're replacing are just two node virtual machine hosts running some hypervisor. And so a lot of customers want that same kind of pattern. So really what's the problem? And if you think about Kubernetes and how the database is interacted with, it is a key value store that requires a quorum typically. We see, so the odd number control planes. Quorum is just the minimum number of nodes required to form a majority. So in the case of a single node cluster, you have one node as a majority. In the case of a three node cluster, two nodes would be a majority. And so a three node cluster can afford a single node failure. But if a second one of those nodes fails, then we run into this problem of the databases and read only, the applications continue to function, but I can't make changes. And those changes include things like promoting SCD, telling SCD that, hey, this is a leader or deleting nodes, you can't do any sort of SCD operation when the cluster is in an impaired state. And so when we looked at this from a design perspective, it was kind of what are the considerations that we wanna look at? We focused on K3s first, primarily, again, we're talking about edge. So we wanna keep the footprint as small as possible. Application high availability is the most important thing. So we don't necessarily care if the control plane goes down for a period of time. Of course it needs to be recoverable. We need to be able to replace nodes. We need to be able to do actions that would warrant kind of having high availability. But if we lost the control plane for a period of time, five, 10 minutes, that was not something that we were really concerned with. So with that, I'm gonna turn it over to Oz who is gonna talk through some of the ETD alternatives before we come back and talk about our solutions as we presented them. All right, thank you, Justin. So when we start looking at this project, we thought, okay, we know there's Kine, which is a component in K3s. So how about we use that to replicate a relational database? So for those of you who don't know what is Kine, KineStats for Kine is not ETCD, it's acronym. And it's basically like a sheen where you can stick between Kubernetes API server and a data store. Originally Kine supported SQLite, PostgreSQL, MySQL. And in recent months, there's also an option to use Nuts as a backend. And we thought, let's try, let's give it a shot. Let's look at what we can do if we can use PostgreSQL or MySQL. But the thing is with those network databases that we start looking at that, there's like no single understanding or solution to achieve high availability. If you look at this link below here, we put in the presentation, it's like about just in PostgreSQL, 16 different ways or even more to achieve high availability. And each of those high availability modes has a choice that compromise you have to understand. So what we want to achieve basically is the following. We want to have like a Kubernetes, a Kine and some kind of like a network database in the end. And ideally we have high availability of the control plane. So we can insert or update or delete records in Kine or the database from both nodes. So this is the ideal state we want to achieve. And so imagine in between those two nodes, there's no HCT, there's some kind of like a database. We don't know yet which and to choose which one we had to kind of like go on to understanding what is about in PostgreSQL or MySQL when you talk about a high availability. So first of all, relational databases can do a logical application or streaming application, synchronous application or asynchronous application. There's also like a streaming and logical synchronous and asynchronous. You can also do like connection pooling to improve the availability of your PostgreSQL or MySQL. You can also have like a proxy and load balances between the client and the database. So let me just like go into a little bit of detail of what is these types of high availability mean, okay? So the first thing is you have logical replication. You replicate from a primary database to a secondary database, an object. An object can be a column, a table or a complete database in your database management system. So obviously, first of all, you see that it doesn't really bring us to where we want to be with like two control planes on two nodes at the edge because with logical application we can only write and read to one node and the second there is read only. With streaming application, it's similar we don't just replicate an object in the database. Rather we, I'm sorry, I apologize. A few slideshow, okay, apologize for that. So with streaming replication, we don't replicate a single object but we replicate the whole database system by replicating the right ahead log of the database. So there's a process on one system that kind of like sends the database actions over the network to a subscriber on a secondary system which then sees what actions the primary database did and writes them into the system here. But again, on the secondary you can only perform select actions. So it's kind of like just a read only. The next thing you have to consider with those high availability system is just like asynchronous versus synchronous. So with logical or right ahead log application you can insert one node and then there's a secondary which takes those actions after an interval of time and writes them to the database. That means that again, we don't have two control planes and there's a delta of time where the data is inconsistent. So there is no, if the database here, the primary caches, there's a chance of secondary losing some data. So with that to mitigate that there's synchronous application which is a little bit more complex. The client performs an action. The action is propagated to the secondary which then sends an acknowledgement to the primary and then only then the client gets like a message of success that's insert, update or delete arrived successfully on both nodes. That's obviously, there's a lot of room here for things going wrong. So it could be like a network between the client and the database. And again, from primary to secondary there could be lost of data. So there's a complexity here which you have to consider when you go to synchronous replication. Again, for us, using MySQL or PostgreSQL with this kind of application, it's a no go because we didn't have like the availability to have two control planes. Another type of high availability is with relational databases is like having improving the availability of primary database by having some kind of a load balance or epoxy between the client and the database which distributes requests by the type to the right database. So epoxy could kind of see, okay, you want to select something. So upon my knowledge of your previous request I can either send your request to primary or secondary. And you can also have like a tertiary inquiry like a third or fourth member with the setup which improves the availability of the database in general. But that's kind of high availability of database system which is not relevant for us with more nodes because we want to have again, two control planes on two nodes at the edge to make things worse, many high availability solutions on relational databases require actually a quorum by having a console or ECD at the background which synchronizes the database at the background. So we kind of like, okay, network database don't work for us. Let's look at those things. Of course, if we choose any of those solutions and we give this to the client, we need to have the right mental model or application model that understands what kind of application or workload is required acceptable for this type of availability we want to have. So obviously it's a complex thing to choose between all of those systems and to make things worse again, there's a complex failure mechanism. So if the master fails, it's not just like with ECD that the primary goes away and then the secondary is automatically promoted to the master. So again, we weren't really sure we want to go with network databases. So we looked at the last option with Kine which is SQLite. And we thought, okay, we know that we heard of SQLite over the network but looking closely into those network SQLite solutions, they all require a quorum and they use ECD or some kind of raft under the hood. For example, Alkylite from Canonical uses its own raft implementation but that just means that if you take micro Kubernetes with Alkylite under the hood, you have to have three nodes at the edge. Then there's like other solutions of replicating SQLite over the network. One of them is Lightstream which is active passive replication of the network which we almost looked into it but then by chance we bumped in a nice interesting project which is called Mahmoud which I want to introduce to you. Mahmoud is a SQLite replicator over the network using a message board called Nuts. It's extremely simple to set up all you have to do is start a process and point it to your database and it takes care of the rest. At the same time, you can do this on another database, some of the network and tell it, I want you to subscribe to a publisher of Nuts and it will take care of everything else. What was really attractive in this project for us in the beginning that the replication is two-way so we can have like we wanted two control planes on two nodes. You can check out the introduction of Mahmoud and I don't know if many of you are familiar with CDC. CDC stands for Change Data Capture. You can do this with any kind of database by sticking a message board in behind the database with Kafka for example, but Nuts is embedded so it's embedded into Mahmoud so it's really lightweight and affordable for the edge use case. How does it work? It works the following way in theory. You have an SQLite database and you can insert, update or delete stuff and when Mahmoud starts, it creates a collection of triggers on the database so every action on the database is triggering and message into the Mahmoud leader. On the other side, we have a Mahmoud folder which subscribes to those messages here and it says, okay, there's an insert here on this side I will write this insert here. At the same time, the client can also publish those messages to the broker and if there is an update on this side on the second day note then it will write it to the primary. So this is what we thought in the beginning. Eventually we realized that Mahmoud cannot do two control planes and only after playing a lot with kind we realized that the problem is not Mahmoud itself is a transaction mechanism which is incompatible with this synchronous mechanism on both sides because if you start a transaction here and it's published here at the same time the transaction goes here and starts there's kind of like a deadlock between the two nodes. So as a compromise, we started looking at Mahmoud for replicating SQLite on the nodes while we go with a single control plane because of this locking problem which we don't have good solution yet for. With that, I think I transfer again to you Justin. Yes, thanks Oz, I appreciate that. So I wanna kind of go through what our solution looks like as it is today and then kind of where we're going with it. And again, as Oz mentioned, our goal is to focus on how can we achieve high availability of the control plane and our applications. However, at the current moment we're looking at the evolution of that in terms of, well, can we just do this with single node control plane and have some type of failover mechanism leveraging Mahmoud. Now there are some implications that we have to consider and one of those being, we're focused again on the application but the application should be designed in a manner that survives losing one half of it. So for example, having replicas on one node and that node fails, that doesn't really help out if the control plane's down as the application can't failover, certainly Kubernetes timers need to be tweaked. There's a lot of things to go into this but fundamentally the goal is to say we have an application that can run on any one node in any one time and there's no dependency on the other node. The control plane high availability as I mentioned can be down for some period of time. We look at it like between five and 10 minutes is okay and staple applications would need a very careful design considerations and we really weren't targeting those types of applications at least initially. So architecture looks something like this. I'll talk through and go through demo in just a moment but fundamentally we have two nodes. One of them we label as primary the other one we label as secondary. The primary node has our agent which we call stylus that runs on it and is doing state checks and things like that. And it's fundamentally responsible for determining when it needs to failover. So in the case of the secondary node in this particular slide, the secondary state checks are taking place when it can't reach the primary node for some period of time or some of these state checks fail. For example, we ping the alternate hosts. We look for the default gateway. We try and attach or can we connect to the API server? Those types of things when a certain sequence of those takes place and the state has failed then we initiate a failover which in turn tells K3S to restart Pointer Database to your local copy which is what's on node two and then resume any kind of replication that would have been happening back to node one. So we kind of flip the role of how this takes place. So node two subscribes to nodes one, to node one's Marmot instance and Marmot is responsible for doing this replication. All right, and then kind of going into the next step what happens during a failover? Node one dies. You'll see like node two continues to run again you the application as you'll see in the demo would continue to function but there's no control plane. At some point during that timeline of state checks the control plane will switch over. Node two will promote itself and become the leader in this particular instance and begin that replication as I mentioned. When node one comes back online you'll see that it will realize that it's in a state that it shouldn't be in realize that node two is the leader and that state engine that agent that we have initiates a demotion. So it tells it, hey, you're no longer gonna be the leader you need to become the secondary node and go forth, right? So with that, let me kind of go through a quick demo. I wanna set up a couple of things here. One is we have our environment which is called palette. I have a cluster that is deployed that I've called two nodes. So it's just a two node cluster. And inside of here you can see we have two different node pools. One is a master pool or control plane pool. And the other one is a worker pool. And this is just running on a couple of virtual machines in our environment. But the concepts are the same. One's labeled primary and one is labeled secondary. What I'm gonna do is because of the time that it takes and the timers and stuff aren't tweaked 100% the time that it takes I'm gonna go ahead and kill as you see node two is labeled as the primary here. I'm gonna go ahead and stop node two on my VMware environment. So just stopping the node. We'll let the process go through of it failing. So I should be able to attach to my application which is a very basic app that's just kind of showing the representation of both nodes. And then again and showing where we're hitting it has 10 replicas and so we'll see this in the demo. I have a recorded version of this. So I'm gonna let this take place right now and we'll move to the recorded version that shows more in depth about what we're doing. So as again, going to our architecture we have this primary and secondary concept. By hit play, we'll go through. You'll see a couple of things and I'll pause and just kind of highlight them. In the bottom right hand corner you see the logs from node two. So in this case, node two you can see the state of which one is the control plane in this particular instance, which is node one. And on the left hand side you can see I have 10 replicas of a deployment. And again, these just kind of the representative is being across both nodes but they just kind of bounce back and forth depending upon what's happening. So I institute that shutdown that we just happened and you'll see showing the liveness checks and all of those things. But when the shutdown takes place node two initially says, oh my liveness check fails. And you can see the health check failed. It's failed one time. We'll do this three more times. And we're still below the threshold though of three health checks. And so we'll keep checking. These are timers that are tunable by the settings on here. We have them set to do this every 30 seconds. So every 30 seconds the health check takes place. If it fails once, okay, it fails twice. So basically there's like a minute and a half or two minutes before an actual failover event which is why I recorded this and just kind of talking through it. But we'll cut ahead and see, okay, the failover as you notice the nodes themselves, the control plane is down, which is the expectation. Again, as I mentioned before, high availability of the control plane itself is not something that we're focused on at this moment. We're focused on high availability of our application which you can see in the top is still running. And we'll see that only node two is getting, we're only seeing node two pop up. Now on the bottom we're noticing that some other error messages are happening that additional liveness pros come in. The failover took place. So node two realized after four failures, I need to become the leader. I need to promote myself and it's promoted itself. And you can see, we're showing those deployments. There's still the timers within Kubernetes haven't timed out and flagged the devices on node one as being there. But you'll notice the leader is now or the control plane is now focused on the node two. Node one isn't ready because node one is still shut down, right? There's no concept there. So powering back on node one at this point, what we'll see is kind of like a conflict moment where you can see I'm pinging node one. At some point node one, it comes back and now it's been demoted. So what's happening during this process is I was pinging it, it's coming back online being rebooted. And when it comes up, it notices, oh, I'm not supposed to be a leader. It's demoted itself, but it's not quite ready yet. I'm just gonna go back and to look at the logs of node one and we kind of see, okay, well, it pops up and it's ready and we still have this, but all of our replicas are still on node two. We can use the Kubernetes de-scheduler and timings to reset this. So based on values within there, I can tell it, hey, I want to reschedule these nodes like I want to reschedule these pods, I'm sorry, and distribute them more evenly because by nature Kubernetes during the failover won't reschedule as long as it has capacity available. So basically just rescheduling those so that we are going back to our previously good state of load balanced in terms of the deployments are split across both nodes. So this is our current iteration of where we are. So if I go back to my instance inside my palette, as I mentioned, one of the challenges is like timers within Kubernetes, how fast things failover. If you don't have the pods, the pod timers, I'm using Metal LB as a high availability, like load balance kind of mechanism. If it doesn't failover fast enough, this screen refreshes every 10 seconds. So if it's in between a refresh, then we get that, I can't display whatever, which is again, thinking about the application design, how can we make this more available? Timers being tweaked, application availability, those types of things are certainly things that need to be considered. But this whole process will go through and you'll see inside of here, once my node one becomes, kind of going to switch over right now, node one will become the leader and I'm back into a good known desired state, all right? So again, this is the evolution. This is what we're driving forward. Now, we started this out by saying that we want to have a high availability with two control planes. And so if I think about what that might look like, it is kind of a combination of everything that we put together and our phase of this looks something like this. Recently, Postgres had a new release that allows for a synchronous replication between two databases or more databases. And so we've come up with this kind of idea that well, could we take Postgres just natively and tell it to replicate amongst itself and have two high availability control planes? And so Oz is going to show a demo of this in just a moment. But I just want to kind of set the stage and say this is what we're driving towards using these technologies to be able to enable our customers at the edge to have high availability, but for both the control plane and the application. So I'll turn it back over to Oz to show a quick demo of our HA control plane model, which consists of Postgres and looks very similar to what I have on the screen here. Oz? Thanks. Okay, thank you, Justin. So what we have here is I set up a variant lab with two virtual machines which have Postgres QL 16, which was released a couple of weeks ago. It's very early, kind of like previous technology for us that has two databases on two nodes. And you can see in the log messages that I created a publication on one node and there's a subscription to the other node. And what you see here is that I have actually really two control planes, kubectl get nodes. So here I can talk to the control plane to get nodes. And if I show you the pods, you'll see that I have, sorry, I mean to this, only the columns. So I have VMware control plane, and I have nodes and something's wrong here. I don't have, it's PS, the KTS server. Oh, it is running here, but there's only one control plane. That's surprising for me. But what I want to show you is that like I can fail this node here, VM1, and the cluster seal available because what happens under the hood is that both control planes talk to the same database. And there's an active standby I can switch to. So there would be here a config, let's do grep data, data store. And here I have the same, there was an obsolete bad user grep, grep data base for this file again. Oh, that's why there's no control plane here. Really apologize for that. Let me check why is it not started here. Sorry, script, start KTS. Let's see if that works now without breaking anything. Okay, I have no KTS. We promised at the top of the episode we'd break things. So I'm glad you're making me true to my word. Yeah, I apologize. I thought the vacant map was working. While you're troubleshooting, I have a question. I don't know much about this space. So I fear that a lot of my questions today will be remedial. But why do you want to have a control plane on every node? That's a good question actually. Well, the reason is that because as Justin showed, the failover mechanism is quite complicated. Like in the case of Marmot, when we failover the control plane, we need to change also the model of Marmot. We need to kind of like stop Marmot and make it from a leader to a, from subscriber to publisher. When the failed node starts again, it cannot start again, tomatically as a Marmot leader because you get conflicts, right? So you need to say, no, no, no, no, no, no, you can't start like this. You have to change your configuration for your starting Marmot again and demote yourself. So this whole kerfuffle of like promotion, demotion was quite complicated to implement, to be honest. And we broke our heads a couple of good weeks on that. And if we could completely avoid that, that would be awesome. And we kind of like push this whole logic of code upstream to PostgreSQL, it would be awesome because PostgreSQL has, I don't know, millions of users out there. And if this works, then it's bottle tested. It's not just like something we implemented over a couple of weeks. So it seems that K3S is now working. Let me see. There is K3S. So the config file should be here. Mention K3S, there you go. So I have both K3S talked to the same, member of the cluster, okay. And I have PostgreSQL running here on VM2 PSAX. Okay, PostgreSQL, you believe me all. So there is PostgreSQL here and it's there, it's not. So now let's see that we have also two control planes, cat parts. So this is only imagine a workload I have. Let's see if I have, where is the command? Just like with import names. There's still no two control planes, but it does actually doesn't really matter because I can fail the control planes. This one on two should start. But PSAX is there under this one. Gap K3S, there is a K3S here too, but it's not turning as a server. That is weird. Look at the system D unit. Let's see if it's turning as a server. K3S service, it turns as a server. Let me too. I have another question while you're troubleshooting. Do you have to use K3S or can you use another Kubernetes distribution? Well, because we use Kine, we actually have like a prototype that also works with QBADM, so QBADM usually expects an external ECD. So we could just start Kine and point QBADM to Kine, so that should work and it actually works in our implementation. Cool. Yeah, yeah. Just to add that, Wendy, one of the goals is to be agnostic. We started out with K3S because that's kind of again where we see most edge deployment. That's kind of again where we see most edge deployments, but there are certainly use cases where K3S doesn't fit. For example, like FIPS compliant areas, things like that. So our goal is to try and leverage where we can, having to not do this logic as Oz mentioned, but also providing some non-dependencies on things like Kine as an example or K3S or other distributions, just being able to say, hey, I want to plug, today I'm going to use this one because I have this requirement or whatever and not having to worry, does that work? Okay, I think the problem is actually myself. I just can't see the message of Curl, yeah, but I think I'm actually talking to the control plane locally. Let me just check if I'm right here. Justin, can you give me a hint if I can see the actual Curl call with Grip? Just like, I know there's a way to get the actual Grip command, the Curl command with Kubernetes, KTL. Let's see if, no, it's somewhere there. It's gripping it, but I don't see it. Okay, so let me just, I hope I don't embarrass myself, but I think if I stop M1 here, the control bench will still work. So what I did to simulate the, I'm not in there, what, VM1, sorry, okay. Do you want to ask us another question while it takes time? So, this is all brand new to me. So I'm just going to restate the problem. So basically to use Kubernetes on the edge, so all of the solutions presently, you have to run a three node cluster. And that just, it's often overkill for a lot of different workloads, which actually leads me to a question right there, which is, I remember you saying like, stateful workloads are probably not best for this, at least not yet, but what kind of workloads are ideal for a two node HA solution? That's a good question. So I think to just clarify, yes, you can run single node Kubernetes at the edge, but you don't have high availability. So if you lose hardware failure or something like that, that's out of service. And there's the use cases where customers are comfortable with that. They have, I don't need, it's not mission critical, it's just providing some type of whatever. The use cases that we're targeting focus on, I need high availability for my applications. They're designed to run across multiple nodes. And I have built into there the capabilities for those either through their own internal replication, the applications handling, if they're using databases, data replication, but I don't have a need for transactional, like I need things to state between them. We're not sharing state, but my application's handling it. What we're seeing a lot of is, I think in terms of I have IoT aggregation points, I've got a lot of thing in like fast food is an example I have a lot of friars and fridges and things that have sensors and information. I don't want all of those talking to the internet. I don't want, I just wanna use my local device to kind of aggregate, maybe do some pre-processing and then punt it to a cloud to be processed more. Those things turn into additional kinds of use cases of, well, now maybe I wanna do some intelligence locally. I need to have that there, because if the internet's down, my thing needs to continue to provide information, but I don't necessarily need to have like super high availability all the time, but I also wanna have some survivability. So I can't, I don't wanna do one node and I don't wanna do three nodes because it costs a lot. The medium is I wanna do two nodes and I still wanna make changes and I still wanna have some of those same functions that I get when I have a three on high availability cluster. So it's important to me that I keep all this information I'm gathering, but it's not important to me that I have it all every second. Correct. And so therefore that's when a two node solution is good. Yeah, so while Justin was explaining, I figured out what's going on. So let me show you. So I stopped the control plan and the database on node one, simulating that it's stopped, okay? Now, if I try to get pods here, I do like get pods, I get a timeout, right? So it's trying to connect to Postgres QL and the database on this node, which is stopped, so cube color is failing and this is the RPCO coming for kind. But if I look at K2S, it's still running. So I should be able to talk to it. And what I did in the meanwhile is I updated the configuration of K2S to actually talk to the database running locally. So if I'm correct, restarting K2S should reconfigure to talk to the database locally. Now, remember what I said is that we don't want to have this complex mechanism of like failing over. So if we had from the beginning, the local K2S talking to the local database, I wouldn't have to do this actions that I did now. And there's actually an interesting story why we can't do that, I mentioned it briefly. And let's see if it's working. It seems that it's not, or it's taking a while to start. But the idea is that you can't do active, active replication for all kinds of SQL queries. And that's if I may share my test of my browser with you, when you look at the local... We can't see your browser right now, but when you share, is this, let me... Yeah, I will share my test, my test in, so... Okay, great. I got just... Yeah, okay. There we go. So the documentation for active, active Postgres comes with a warning. It says that I don't get together and they explain here that you can't do transactions. There's actually no good way to solve this public publish and subscribing with transaction. And we discovered that under the hood, Kine emulates its CD by doing transactions, actually. So when, if we start to Kine or like two control planes on active, active replication with Postgres SQL, it seems like it's working for a couple of seconds, but after a couple of seconds, there's like a compaction mechanism of all records in the database, and that's done with a transaction. So the database goes into a lock. And we really hope that we find a way how to kind of like go around that and see. And I think right now what happens is that I kind of like lock down. What happens is that I kind of like locked my database. This is why Kine doesn't start. So yeah, I break things, but at least I can explain why it's breaking. And we don't know yet how we are contemplating how to kind of like, whether we want to discuss this with Kine developers to do this compaction decision without the Kine transaction and committing. We are still not sure how to, but eventually we will get this right and we will have, I'm quite sure, very optimistic that we will have the two control planes running on, sorry, two control planes running on two nodes. I have another question that might be silly at this point in the conversation, but why not use an external database for this level of needing information? Yeah. So if you think about, I mean, a goal is to keep costs down and to having to have infrastructure that runs an external database at a fast food restaurant or a retail location doesn't really solve for that. There's certainly maybe logic where you could centralize and put it in a cloud or something, but then that doesn't solve the survivability element of if my internet goes down, then I've lost my cluster and I can't do what I wanted to do anyways. So we initially told with the idea could we like throw out a Raspberry Pi and have it as running some type of database. But again, now, how do you achieve high availability there? Well, now I've got to have some number of Raspberry Pi's so it just really didn't solve the problem that we were trying to solve. And then there's a lot of other inherent complexity as Oz mentioned earlier with just network databases, latency and availability and those types of things that we just, it just wasn't going to fit what we were trying for. Okay. And we also, Ferdinand has an answer to, which is latency should be considered for an external database too. So we do have a question in the chat. Can I, what if you have an active passive mode cluster? I guess I don't quite understand it, but maybe because I don't know a lot in the space, can you? Let's see. Can I answer that? Yeah, yeah, go ahead. So it's the same what I said in the beginning when I showed the models of like high availability. You can have primary and secondary like active passive with some whatever replication mechanism you do. But then again, as I showed you, you can't have two control planes because the secondary is passive, you're not allowed to write into it. So you will have to, in case of the primary failing, you'll have to switch over like promote not just the database, you have to kind of like start a control plane on the secondary. So we, while we can do that, we don't want to do it. We want to have a solution which is simple and there's a lot of like active passive replication mechanism we put in the beginning of this presentation, I think to the PostgreSQL Wiki with many, many, many options to how to do that. And it's the same for MySQL. There's no really golden standard of how to doing active passive replication with PostgreSQL. So, yeah. Excellent. Thank you. So I have another, I have a question. So I'm what I'm, I own a successful line of fast food chains and chains and I'm sold. I want all my French fry data to be on two node clusters. So what should I do to start implementing this instead of using the three node system that I am using for HA, for HHA workloads? Well, I mean, I would say come talk to us. We have, we're going to be present at CubeCon, of course. So we'll have a booth there that you can schedule, you know, one-on-one demos. We have a room as well for meetings. So if you are looking for, you know, learning more or seeing more or hearing more as this is an evolving space, feel free to come out there. We also have, we published a, we have a series of blogs coming out, but one of them published around just two node in general and focusing on really what is the challenge and what are we solving for. So a lot of the things we talked about today, but certainly as a next step, I would encourage you to come see us at CubeCon next week. Is this something that's ready to be implemented now or does it still mean some work? Yeah. So we're targeting what we're calling tech preview within our platform by the end of this year. So it should be available December, mid-December timeframe with GAs happening sometime like Q1 of counter year 24. So our edge-wise three nodes, that's all available now, but the two nodes sort of, again, focused on a very specific use case. The two node solution, we're still, we're still, as you see, actively looking and experimenting and trying different ways to do it. I love that you're learning out loud with us and showing us your journey. I think it's so fascinating. I think you had a last slide to show off. Is that true? Should I put it up? Yep, I have it. Yep, ready to go. Excellent. Oh, hey, this is familiar information. As I mentioned, yeah, next steps reach out to us at CubeCon schedule a demo. You have the QR codes here, and I think the links will also be posted in the show notes. And with that, if there's any other additional questions anybody has, take a moment. Doesn't seem like it. So far we don't have any. Will you please, so will you restate for me? So I understand why a two node cluster and what problem it's solving, but I think what you said was the difficulty of getting a three node cluster down to a two node cluster is that all of the key value data stores that back up the cluster require three nodes, all the ones that exist currently. And so what is the challenge of getting that key values data stored down to two nodes? Well, most of them are, I mean, most of them center around RAF consensus algorithm, which requires the quorum. And the quorum, as I mentioned before, is a majority of the nodes to be present. So in this case, three nodes needs two nodes to be available. But in a two node cluster, if you just employed it with two nodes and you never lost a node, it would work fine. The quorum is there, majority is fine. But if you lose one of those nodes and you need to replace it, that's where it becomes. Because now I can't make any changes. So I have to rebuild the cluster anyways. And so how do we get that down to two nodes as we don't use RAF? I think that's kind of the gist studies RAF. And then we have a question. And so I have another one. But how would this compare to a worker node in the edge and control plane in the data center? So I think some of the same challenges that we talked about before in terms of if my control plane is in the data center and one of our goals is that I must survive an outage. If I lose the internet, I need to make sure that my node continues to run. We have situations you talked about fast food joints. Imagine a fast food restaurant that is in an area of the world that is frequented by natural disasters. And their internet or their infrastructure is impacted, but they happen to be the only restaurant available because they have power, but they don't have internet. But in a situation where my control plane is in the cloud and I lost power for a period of time when that node comes back up, it's not going to be able to really do anything. So those are types of survivable concerns that we have and making sure that the entire cluster is available at any one time with no dependencies on the outside world. That's great. So to further recap, now we have the two nodes and the challenges at CD or those related things are all based on the RAF algorithm, which requires at least three nodes. So now we need a new solution. And you looked at lots of different ways to replicate data between two nodes. And so what you came up with as the best choice was Marmont. And will you recap for me why that was the best choice? I don't, I want to caveat today. I don't know if it's the best. It was the first choice that we found. Yeah, it was the first thing that worked. So no, really. We couldn't go on with Lightstream because it's very simple and it's a bit older than Marmont and has a bigger, I think usage. But when I bumped into Marmont it had a very attractive tagline and the fact that we really, really wanted to have two control planes on two nodes seemed that it's possible to do with Marmont. So that was the first thing that worked and that was kind of like a really initial solution. And PostgresQL, active, active just like I said got released a couple of weeks ago. So this very early technology we hope we can find a way to get it working with Postgres. So this is again a very new question but is Postgres instead of Marmont or in addition to Marmont? That would be instead of Marmont because there's no CDC, it's active, active replication which is supposed to be more consistent. Also, Marmont is not a bad thing considering that we are on the edge. Marmont is very lightweight and Postgres is a very heavy process so it might not be suitable to all edge cases. If we're talking about like really better devices at the edge we might not want to have Postgres. So having choice is good. And so in figuring out a way to duplicate the data across the two nodes in a way where one could go down you did manage to solve that but it's just more complex than you would like it to be and maybe takes a little longer than you would like for it to resolve. The one that actually worked was that with Marmont or was that with Postgres? Marmont, okay. Signing with what our design goals are and trying to keep it simple. Marmont is very simple. You can turn it on as Oz mentioned you put a couple of flags and it's done. The challenge becomes in the logic on how you fail over. You don't want to have two active nodes. You need to make sure that everything is in sync before you fail over. What happens when it's just abrupt, oh I pulled the power and what happens when it comes back online and that orchestration is it turns out is quite complex. If we could just leverage what Kubernetes does already and we had a back end that supported that this becomes a very, very easy problem to solve but as we saw today, there's a lot of choice but also not very many of them are very easy to implement. So now the challenge is to get control playing on both nodes and that you're hoping to solve with Postgres QL but that's what you're currently working on right now. Yeah, we have a couple of options that we're looking at. Postgres is one. There's a couple of other design choices that we've made and internally are testing and the one that's most promising right now is that. So is there anything you'd like to add to my child's mind to recap of your very wonderful and detailed presentation? No, I mean I think ultimately our goal is to try and keep it simple but we're solving for a problem that at mass scale which is what we're talking about. We're talking thousands and thousands of locations. You know, we feel like we've done a very good job with Edge in general. It's a platform that provides consistency and it provides our customers easy deployment but this is a ask I want to chop off a node. A thousand dollars a node times a thousand locations is a big number and especially when organizations are looking at maybe leveraging other technologies. Oh, maybe I want to do something with, you know, AI but I don't have the budget to be able to do that because I have to spend a million dollars more every time we roll out a new thing. So being able to divert those funds to customers is something I've asked for. And if we, and I think we will in the next few months you'll see we're going to have a very elegant solution to solve that problem. This isn't super cool. I've learned so, so much. Is there anything you'd like to say in closing before I read the ending script? No, just I look forward to seeing you at KubeCon stop by our booth and there's a question that popped in. Oh, this is okay. Question not related to this presentation but have you tried running kind in RKE2 and using PostgreSQL? Not RKE2 but we have with QADM so we shim kind with that and that works. We think it will work with RKE2 that is one of our target distributions and as I mentioned before the big reason for that is FIPS but yes, we are looking at that. Cool. Thank you for your question for an end and thank you so much. I'll be at KubeCon too so I look forward to meeting you both in person and just watching this, please do say hi if you see any of us around. We'd love to talk. All right, so thanks everyone. Thank you so much for joining today's episode of Cloud Native Live. It was great to have Justin Barksdale and Oz Taram here teaching us about using two node HA for edge Kubernetes. It's a new approach. It's so exciting. As always, I love the interaction and questions from chat. Thanks for being amazing and thank you so much for those of you who watch the recording. Here at Cloud Native Live we bring you the latest in Cloud Native code at noon US Eastern. We're going to be taking a break for KubeCon and then Thanksgiving and holidays but we'll be back in December. Thanks so much for joining today. Thank you so much Justin and Oz for sharing your expertise and your journey. It's been so fun and thanks to those who watch the recording. I'll see you soon. Bye.