 Okay, hello, my name is David and today I will present you how we solve inter-cluster communication challenge in Kubernetes. So again, my name is David Gerchikov and I am a software developer in EarthPike. I'm working in EarthPike for two years. Previously I worked in companies such as Iron Source and Huawei. I am a proud father of one and the second one is on the way. Thank you. So I'm using Kubernetes for the last five years. So what we're going to talk about today is I will introduce you in EarthPike to give you a little bit of context. Afterwards I will talk about different networking models in Kubernetes, which is which and when they are useful. And after that I will address you the issue that we faced and the way we solve it. So EarthPike is a new SQL distributed database. It specializes for low latency and high throughput. It's useful for applications that need sub millisecond access, data access. It uses share nothing data model, meaning that each node of the cluster holds a unique subset of the data. At Kubernetes we run as a stateful set application, meaning that the operator is a preferred way to install EarthPike cluster. So we use also affinity and anti affinity rules in order to make sure that our EarthPike pods are running on separate nodes. So EarthPike implements very interesting communication model between server and its client libraries. We call it smart clients. Those clients knows, has some business logic that allows given some data key, it figures out the specific node where the data resides on the cluster. So for that it needs to maintain direct connection to each of the nodes of the database cluster. It doesn't mean that the library user needs to provide all those IPs. You just need to provide the one IP and then our library knows how to interact with that specific node and then that node figures out all the network topology and sends it back as a response to the client library. So that's how our clients work. So let's talk a little bit about Kubernetes and what kind of networking models we have there. So the simplest one is the pod-to-pod communication, right? If you have some Kubernetes, some application that runs on Kubernetes that want to interact with other applications that run on the same Kubernetes cluster, so there is no much restrictions there. Although you put some network policies there that may restrict that. So in that case, you can use cluster IP, right? If you need the service discovery of the Kubernetes or you can use the headless services as we, in retrospect, using because we're implementing the service discovery on our side. So the other case is a node port, right? A node port is basically allows us to bind pod-port-IP to the node-port-IP and then that specific pod that runs on that node, you can interact with that using the node port. There is a downside for this approach because there is no any service discovery there. But in Aerospike, we use it pretty much widely because the service discovery is done on our side. Other approach is a load balancer. There is no specific solution, no default solution for a load balancer on Kubernetes. So you need to install it separately. You can use a cloud provider solution if you run on the cloud and if you run on the prem you can use Metal LB maybe in order to install the load balancer. There is a few kinds of load balancers. Some load balancers are using the node port in order to forward the traffic. Other are just interact with the pods directly. And basically what it does is it just exposes you a single point access to your cluster and it gives an ability for external application to interact with some application that runs inside of the Kubernetes cluster. So in Aerospike we are not using load balancers because we are a stateful application and in stateful application each node is a difference in contrast as in a stateless application where each pod is identical to the other pod. So their load balancer works perfectly. For example, NGINX web server. So it doesn't matter how much instances do you have. For you it doesn't matter which of the instances, which of the pods will process your request. In stateful application it does matter. Why? Because it depends what data it holds, this specific pod that you interact with. So other component that I would like to talk about that is part of our database is XDR. XDR is cross data center application. It's part of our database features and what it allows to do is just to replicate some of the data that is on one cluster to another cluster that is maybe resides on some different physical location. You want to do it for several cases. For example, it can be done for disaster recovery reasons. For example, you have your cluster and you want to protect your data in order to maybe for reasons that this specific data center will be unavailable maybe in the future for some reasons. So you may replicate the data to the other geographical location or the data to a different data center. The other use case of the XDR maybe that you want to decrease the load of the cluster. For example, you can create an architecture where one of the clusters serves as a write update for a write update queries and the other one is a read only. For example, you can do a machine learning for machine learning purposes for example. We also use the XDR functionality to connect our connectors. Connectors is also some feature, some piece of software that we provide that it allows as they are working as translators. For example, you want to ship the data that is on the Aerospike to other technology like Kafka or Elasticsearch and then you just connect this translator to that XDR functionality. This XDR feature mimics our smart clients. So meaning that there is also there is a need of pull of the connections, pull of the sticky connections, pull of the TCP connections. So the trouble that the issue that we faced was that we had two Kubernetes clusters that are running on different geographical locations maybe or on different clouds and we wanted to make them to interact with each other. So we could expose a node port and we could define the XDR but we wanted to stick with the standard methodologies of Kubernetes in order to achieve that and also there is a not good practice to expose a big portion of your Kubernetes cluster to be maybe publicly accessible like with public APs so we wanted to fit to the standard approach that Kubernetes takes is using load balancer. So our solution contains two parts. The first part is ESP, we call it ESP. It's event stream processor, it's a stateless application, it runs as a stateless pod on the source Kubernetes cluster. On the other side we have XDR proxy, XDR proxy is also a stateless application that runs also as a separate pod on the target cluster. So what they do is that ESP takes XDR messages that our cluster provides and it just batches them and just wraps them to the HTTP request. It makes the stickiness of our, the sticky connection of our XDR feature to be stateless. It's like a converter from the sticky to the stateless connection then it sends over the network using HTTP or GRPC and then on the other side we have another stateless application pod that knows how to unwrap these messages and just to convert them to the database operations and just to put the data inside of the cluster. And also this XDR proxy maintains a sticky connection with our target aerospike cluster but on the other side it listens for a stateless HTTP or GRPC request that comes from the load balancer for example. So what we did, so what we saw here, yeah, so what we saw here, so I introduced a little bit about the aerospike and what it is and afterwards I explained a little bit about the load balancer and node port and afterwards we saw the issue that we had and the issue was that we could not use the stickiness of the communication and we just wanted to convert it to be stateless as Kubernetes, as it does in Kubernetes and afterwards we saw the solution. So if you want to read more about our aerospike and about our issue that we had and about the solution that we found so you can find it in these links and yeah, there is a question. If anybody has a question, so thank you and yeah, good luck.