 Hi, KubeCon and welcome to building the multi-cluster data layer talk. My name is Chirag Narang. I'm a product leader at Ugobite. In today's session, I'm going to show you how to build a multi-cluster data layer the easy way. We'll go over some key requirements for deploying a database across multiple Kubernetes clusters running in different regions and how to use a service mesh like Istio to solve for networking challenges. Then we'll have the product demo and at the end, I'm going to give you an interesting use case for improving your application performance that you can use on your own. So with that, let's start with the UgobiteDB project. UgobiteDB is a fully open source distributed database, which is built for the cloud native world. It can be deployed across private and public clouds, including Kubernetes. It reuses the query layer of Postgres SQL, so it offers advanced features like triggers, stored procedures, partial indexes. It allows easy migration from other databases like Postgres, MySQL, Mongo, Cassandra. The database offers high resiliency and high availability. The multi-node architecture allows you to survive different failures. On the node level, zone level, region or entire data center. It allows you to do zero downtime upgrades and security patching. Unlike traditional databases, you can scale UgobiteDB horizontally to serve high throughput like billions of operations a day and store hundreds of terabytes of data. You can reliably scale out and scale in on demand with larger data set without really impacting the application performance. This ensures that you can handle peak traffic during Black Friday and Cyber Monday. So once the holiday season is over, you can always scale back in. You don't need maintenance window for these operations anymore. All these operations can happen while your database is online. You can distribute data across zones, regions, or clouds with asset consistency so you can move data closer to your customers and comply with regulations like GDPR. So very quickly, let's look at the database design. The guiding principle for Ugobite is a layered approach. We built a pluggable query engine which preserves the top half query layer of Postgres and Cassandra. But at the storage layer is changed to use DocDB, which is common across both APIs. It is built using a custom integration of Raft replication, distributed asset transaction, and the RocksDB storage engine, which all are based on the Google Spanner design. So now next, let's see what a few benefits of deploying your database and Kubernetes. One of the key values of containers over deploying your application simply on VMs is that you get to package all of the system dependencies that are required for the application to work in your container. That includes your database, right? Everything can travel together. So that makes it much easier for you to ensure that your application alongside database move from development to test to all the way into production. Nothing really changes except maybe some of the secrets you're using to communicate with the databases. This really gives you operational efficiency. Next it eliminates the single point of failure. Now since databases are running in containers, all single point of failure can be eliminated completely. Database containers can be as highly available as any other container in your ecosystem. You can set up multiple dependencies, use load balancing to bridge multiple containers, and maintain performance at all times. It gives you better resource utilization. When databases are deployed as containers, they automatically become on demand as per the rest of the application. There is no need to maintain a monolith database instance to maintain your data. Instead, applications can utilize their own databases as needed and only when they are needed. The next one is declarative config. You can use a declarative config to specify your database resources so if the content of that file changes, Kubernetes can automatically reconfigure a database to match that config. This really allows easy scale on demand during peak traffic. The last one is it gives you the freedom to run anywhere. It gives you easy portability between different clouds and on-premise. With that, let's see next how we modeled Yugo by DV as a workload on Kubernetes. The database has two distributed services. The first one is YVMaster, which is actually responsible for keeping your system metadata, coordinating system-wide operations like create, alter, and drop-table commands, and maintaining your operations such as load balancing. The second one is the YVT server. This is the actual data node which is responsible for hosting and serving your user data. Both YVMaster and YVT server are modeled as independent stateful sets. The YVMaster deployment needs one stateful set and two services. One of these services is actually a headless service that enables the discovery of underlying stateful ports, and the other one is a load balancer which is needed for accessing the app and UI. Whereas the YVT server needs just one stateful set and one headless service. The stateful set, as I mentioned before, is actually a data node, so you can easily scale up your cluster up and down by just changing the replica count, and it will trigger a rolling upgrade without any maintenance on time. The headless service provides a load balancer for your client applications to actually connect to the database. Very quickly, let me show you the basics of installing the database on Kubernetes. Then we'll switch over to requirements and challenges in a multi-cluster deployment model. Everything on Kubernetes is pretty straightforward. We have a Helm repo for you to get started. You can come up with your own set of values, what kind of database what you want, resource allocation, number of replicas for the master and the data node, and if you want to change the storage class. So based on your use case and throughput, you can really tailor the configuration of the database. And then you can go ahead and simply just install it. Deploying a Helm chart is pretty straightforward in a single cluster architecture. So let's see behind the scene what's happening in the cluster and how is networking set up using Kali. So at this point, let me switch over to the terminal. All right, so let's first check the SQL set and services that are running inside a single cluster deployment of HBLADB. As you can see here, I have both T-server and master stateful sets deployed. There is one headless service for YB masters and one headless service for YBT servers. So now at this point, let me launch the Kali dashboard and let's see the networking set up behind the scenes. Okay, let me switch to the GraphView. Here it shows how master and T-server services, the headless services are communicating with each other. Then we have the YBT server service, which is the load balancer for client applications to connect to the database and the YB master UI service for any admin operations. As you can see in a single cluster deployment, the setup is fairly straightforward and easy. We'll see on how this setup really changes with a multi-cluster setup. So let me go back to my presentation. So modeling a stateful workload for a multi-cluster has its own challenges, especially around networking. For deploying Yugabyte in a multi-cluster setup, we have to satisfy three high-level requirements. Now the master and T-server pods will be distributed across multiple clusters, so now they should be able to reach and communicate with each other. That's the first requirement. The second one is we need consistent global identity across and within the cluster, which requires setting a fully qualified domain name for each master and T-server within the cluster. The last one is setting up load balancer for client apps connecting to the database, the data node, which is T-server, and the admin UI for any operation activities. So here it shows what the end of the day, a very simplified view of what you would expect with a three-region setup of Yugabyte TV. The T-server and masters are reachable from within the cluster and outside the cluster. As you can see, the central T-server is actually connecting to its own master and is also to west and east as well. This is a really simplified view. In few moments, we'll actually look at the real view I'm using clearly on what's happening under the hood. So next let's see what steps you need to follow to achieve this deployment policy. First step is to set up service discovery, then you have to allow cross-cluster access, and finally you have to expose services to other regions and clusters. So for the first two steps of service discovery and cross-cluster access, I'm going to use STO service mesh, and then finally for exposing services, I'll need to make few changes on the database side. So let's look at setting up service discovery with STO. For service discovery, I'm using STO's multi-primary, multi-network setup. You can see the instructions are available at the bottom of the screen. The steps actually require you to install Istio control plane on all the clusters that are going to participate in a multi-cluster setup, and you have to mark each cluster as a primary cluster. With the Istio 1.8 release of Istio, they introduce a concept of DNS proxy. While Kubernetes provides DNS resolution for Kubernetes services out of the box, but any custom service entry is not recognized. With the latest release in 1.8, now service entry addresses can be resolved with the Istio site card itself, so you don't require any custom configuration of a DNS server. You will need to repeat these two steps of setting up primary clusters and DNS proxy for each cluster that is going to participate in this configuration. And as you can see on the right, the cluster 1 is on the network 1, while the cluster 2 is on the network 2. So this means that at this point there is no direct connectivity between pods across cluster boundaries yet, which brings us to the next step. The next step is to set up connectivity between pods, so service workloads across cluster boundaries can communicate indirectly via delegated gateways for east-west traffic. The gateway in each cluster must be reachable from the other cluster. So again, you will need to repeat this step of setting up east-west gateway for every cluster that will be participating in this config. Another important point to actually keep in mind here is that gateway will be public on the internet by default. So if you're setting up anything in production, you may need to make some additional adjustments on FI world rules to not allow public access. The last one is exposing services outside the cluster. Since these clusters are on two separate networks, we need to expose all the services from cluster 1 into cluster 2. While this gateway is public on the internet, the services behind it can only be accessed by services with a trusted mutual TLS certificate and a workload ID, which really ensures that your data is secured at any given point of time. And you don't really have to worry about all this because STO will handle all this seamlessly behind the scene. This we are done with the service discovery using STO. And the next step is setting up cross cluster access inside STO. The service discovery won't alone work without the endpoint discovery. So you need to allow API servers, so API servers access across clusters. You have to install remote opaque secrets, let's say in the cluster 2 that provides access to cluster 1's API server and vice versa. So if you have three clusters participating in this configuration, you will have to repeat this step for every pair. So with this, I have STO which is installing all my clusters, I have set up my service discovery and I have also set up remote secrets. So I would really expect this cross cluster access for my master and T server workloads to work, but there is a slight problem which brings us to our next step, which is to expose services. Let's consider I'm doing replication factor 3 deployment across three different clusters, east cluster and the west cluster and a center cluster. For simplicity sake, I'm just showing east and west right now. What this means is really that there will be one master deployment per cluster, one in each cluster. The master pod in the east cluster is connecting to the master pod in the west cluster and both of them are running in different networks. In this case, everything is working fine because the master service resorts to a single pod IP because there is just one pod, but what happens in case of a T server which has multiple pods scheduled on the same cluster? The first problem that we faced with this set up was a single T server headless service which is what we actually use in a single cluster setup. If we recall, I showed you when we were talking about a single cluster setup, it was doing round robin between multiple T server pods. So there was no consistent IP address to reach all T server pods. So if the master from west was trying to connect to T server in east, it was not able to get IPs of all the T servers that were actually running in that cluster. The second one was when this headless service was exposed in other clusters using Istio, it didn't resolve to the individual pod FUDNs. So again, we had communication challenges between all master and T server pods across clusters. So to solve for these issues, we ended up creating separate cluster IP services for every T server and master pods in each cluster. So as you can see, now we have multiple cluster IP services for T servers which will finally resolve to individual pod IPs and they are reachable from master in that cluster and also from other clusters. I had to repeat this step for all the clusters that are participating to make sure the T servers and masters are reachable everywhere. So this was, this was it. This was the only step that I had to do on, on top of Istio setup to get this entire architecture working. So now we have established what steps we need to follow. Let's look at the actual networking setup behind the scene using Keali and I also want to show you on how the database is set up on my terminal. Let me switch the window. So here I have three terminal windows. The one on the top left is my east cluster, on the right is my west and at the bottom is the center cluster. First let's check the workloads and services running in all of them. As you can notice I have a lot more services running in each cluster compared to a single cluster setup that we just talked about and we just touched base on the fact that we had to expose extra T server and master services for setting up cross cluster communication and that's why you're seeing these extra services. First let me access the admin UI for this database to verify a cluster config and then we'll switch over to Keali to look at the networking setup. So for accessing the admin UI I need to access the master UI load balancer IP on port number 7000. So let me bring it up here. As you can see the replication factor 3 and the number of nodes are 6 and here the master servers which are evenly distributed across each cluster, so US central one has one then the US west two and US east one. Now let's verify the T servers so in total you can see there are 6 T servers which are also evenly distributed across these clusters. So my cluster seems to be set up fine. Now let's deep dive into how the networking setup behind the scene using Keali. So I'm just going to go back to my terminal again and launch Keali from the center cluster. Let's first look at the mesh. As you can see it shows it's a 3 cluster setup all the clusters are shown here which looks fine we talked about that. Now let's go to the graph view and look at the database setup. Let me just clean up a little bit for hiding few unnecessary nodes. So this is better. As you can see the networking is a lot more complicated compared to the single network setup sorry single cluster setup. The master and T server services within the YB central sorry within the central cluster now they're connecting across the west and the east cluster. And all this is set up using the same steps that we talked about using Istio for setting up cross cluster access and then setting up those extra services for allowing access across clusters. So yeah this is it pretty much in terms of setting up your database. Let me now go back to my presentation. Okay so that now next is the most interesting section of this talk where we're going to have a product demo of fall tall rents. I'm going to deploy I have deployed a three node and three node you go by cluster across three different regions and we're going to demonstrate fall tall rents by by taking down the entire region. This is my cluster spec. I have different the databases deployed across three GKE clusters in US east, west and central I'm using the default synchronous replication between between our three different nodes and data is secured with both encryption at rest and in flight. For the purpose of demo I'm using a classic rest microservices application which is which is also running in Kubernetes the UI app is actually outside of the cluster. So let me switch my screen and show you the UI application first. So here is my pet service UI app it's a it's a spring boot app. What I'm going to do is I'm going to use this app to create some owners and wet records from the UI and then verify if I'm able to access the data across different clusters. So let's first create a new record here. Okay, so yeah everything checks out fine on the UI and now let's create one more for the owner some dummy phone number and let's add an owner. Okay now this is done let me switch to my terminal to verify if the records have been created what I'm going to do is from the east cluster I'm going to use the ysql s shell utility to access my database so let me launch that here and what this does is this allows me to use PostgreSQL meta commands to access the database so connect to that Kleeneck I'm going to display all the tables look at the owners table just let's clear up all right here as you can see this was the record that I created from the UI so everything checks are fine in the east cluster for owners table let's repeat the same exercise on the west cluster so the record is available even in the west let's check the records for the vet that we created ups I think the name of the table is vets so cube contest that's the one I created okay yeah so data seems to be persisted across across different clusters so this verifies that single database database stressed across multiple communities is working fine with with data consistency I mean I don't have to do the test for central we'll look at and in the next in the failover by accessing the data actually through central so at this point let me go back to my presentation now let's actually simulate a region level outage by bringing down both the stateless app and my database now what I want to achieve a strong high availability even in this case when the entire region goes down I want to see what will be impact on my client application let me go back to the terminal or simulating east region level failure I'm going to scale down the deployment for both stateless and the stateful app in my east cluster so let me get out of get out of this shell utility back here and first let's scale down our master stateful set to zero let's bring down the T server as well and finally the stateless application all right so we can verify databases scale down okay you can see it's done any so that which is good and now let's go back to our UI app and try to access the app with some records and let's see what happens on the database side so let me just force refresh this app and try to create a new record for an owner all right now added a record let's go back to the terminal and try to access this record from the west and the center cluster kubecon region failure record is available here and let's try to do the same thing from the center cluster connected to the pet cleaning database let's try to access the record here so as you can see even my even in the case when my ease cluster is not available I'm trying I'm able to access the records I'm able to access the database from the central and the west cluster so this really proves that my cluster is resilient to region level failures as well so this is it I hope you enjoyed this demo and you notice how easy it was to build a cloud native fault tolerant and geo distributed data layer using service mesh and you go by TV but at this point few of you might be thinking about the app latency due to regional boundaries and how I can improve that performance so for that let me switch back to my presentation and we'll cover on how to improve performance for geo distributed applications all right so this is an interesting use case that he can try on your own to improve your application performance by default all nodes are Pico by DB are eligible to have short leaders but what if you want to localize your short leaders closer to your client application which might be deployed in a particular region for example if the client app is deployed in US West so what if you also wanted to have a short leader in that particular region or imagine another use kit that your application reads including multi road joints are coming from a single region then a three region distributed cluster can be configured to have short leaders pinned to that single region in that case you can set a preferred zone which will reduce the number of hops for database to write transactions and it will give you an immediate performance improvement as you can see here you can use why be admin command which takes all master addresses and preferred zone as an as input variables so try this use case out and let us know how does it improve your application performance so now that we have gone through getting multi cluster set up the easy way and that's pretty much it we have shown what you need to do to get service service discovery so that you can access access services from one cluster in another we have outlined what what it takes to get cross cluster access so this really opens up new possibilities on how you can deploy a distributed SQL database to build fault tolerant cloud native applications as you can see the first one is single region multi zone this is a traditional use case that probably most of us most of you are doing it today this has the lowest latency because everything is localized within the region but it does not give you any region level failover failover resiliency the second one is what we talked about today within a single cloud now you're stretching your single database across multiple regions this gives you region level failover resiliency and the third one is multi cloud multi region which is really now you can stretch your database across different clouds right the same constructs the primitives that we talked in a single cloud multi region now can be applied to this set up as well so now you get cloud level failover resiliency for your application as well finally some parting thoughts we are a fast growing project but we really like to get the community involved we would love to know more about your use cases so jump onto onto those links if you are interested in becoming a cloud native database expert and if there is anything that you need our help with come join a Slack community with that thank you so much and have a great rest of your cubecon thank you