 Hello, guys. Thanks for coming. Unfortunately, today, we are going to enjoy this quality of the slides. We couldn't fix the hardware issues. But anyway, I think just pay attention to what I'm saying, and you will get some useful information by the end of this conversation. Well, so a couple of words about myself. My name is Dennis Magda. Presently, I'm taking a role over Director of Product Management at GridGain, and at the same time, I'm Apache Ignite PMC Chair. But anyway, so don't be afraid of my current position. You're not going to listen about all the marketing stuff throughout this conversation. Because I'm a software worker. I joined the company as a software engineer, and I'm still a negative contributor to Apache Ignite community. And Apache Ignite is we're going to talk about Apache Ignite in relation to the Kubernetes during this presentation. Because Apache Ignite is basically a distributed database and computational platform that we are developing at Apache Sartfeyer Foundation. And I'm going to share with you my personal experience on how you could stick together Kubernetes with our distributed database. And even more throughout this conversation, I want to give you an understanding on where is the responsibility of a distributed database if you plan to deploy one in Kubernetes? Where is the responsibility of Kubernetes itself? And what's our role? What's the role of DevOps? And what's the role of application developers? Well, so considering this in mind, our agenda looks like this way for today. First, as I said, I'm going to use Apache Ignite as an example for this conversation. But basically, the concepts and the solution that I'm going to cover today can be, and basically is adopted and can be adopted for the rest of the distributed databases, such as Cassandra, such as MammonSQL, such as Redis. There is no any difference because usually the main difference between a distributed database, and just let's call them regular databases, is that when you have a distributed database, you have a cluster of machines that are deployed somewhere. And if we deploy this cluster in Kubernetes, we should somehow stick databases clustering component with the Kubernetes environment. And that's actually, in our experience, the only and the essential thing to care about if you're going to run a distributed database in Kubernetes. That's from the database engineer standpoint. And this is why we are going to briefly look at the clustering and deployment component of Apache Ignite. Then I'm going to talk about orchestration specificities, how are they different and what's required to consider for this type of deployments. And after that, I'm just quickly will explain what we exactly did for Apache Ignite and how actually you can deploy Apache Ignite in Kubernetes. And the same concepts, again, can be reused and actually reused in other distributed databases. That's Cassandra, Redis, on my SQL. Okay, so, and finally, the last five minutes will be dedicated to a short demo. I'm going to use my laptop. I'm going to run, quickly run Apache Ignite cluster in a mini-cube environment. And we will try to scale it out and scale it down to see how we can use Apache Kubernetes facilities to maintain our cluster that is already deployed in Kubernetes. So, but before we go to these details, just three minutes of the high level or review of Apache Ignite. So how many of you have heard about Apache Ignite? Okay, good, a few folks, but then it's a right slide to show. So basically in short, Apache Ignite is distributed database and computational platform. At the core of this database, we have the distributed storage, right? So this distributed storage is called memory-centric. It means it's called memory-centric because it stores, it treats RAM as a primary storage for your data and your indexes. RAM is not just a caching layer. That's you are not required to use disk layer in Ignite in general. If you're fine, you know, with the memory only. But anyway, when many companies and users go in production, they still want, they still think about durability because if by some reason a cluster goes down, the data will be lost, right? And to avoid this, you can enable the persistence. The persistence Ignite supports provides its own native persistence, which is also distributed across the cluster of your machines, or you can use a third party persistence, such as a relational database or no SQL database, for instance. And on top of this distributed storage, on top of this distributed database, the community built a variety of different components, such as key value APIs, SQL, queries, computations, machine learning algorithms, and many more. But our goal of today's conversation is to take a look at the core of the system at the clustering component. That is responsible for taking separated cluster nodes and for uniting these cluster nodes into a single unit called a cluster machine. And so in Ignite, we have two types of cluster nodes. The first type is, we call them server nodes. Server nodes are actually your data storage. They store the data, they store indexes, and they process all the queries and computations that are coming from your applications. And applications are usually connected to the cluster by means of so-called client nodes that you can see on the slide to the right. And these client nodes provide you all the APIs available in Apache Ignite. In addition to the client nodes, if you kind of, you can use GDBC or DBC drivers, you can use REST protocol or you can use binary protocol. If you can connect to the cluster from different management or monitoring tool or tools you use for data analysis. So that's, but then when we know about server nodes in cluster nodes, then we want to deploy them somehow. And Ignite can be literally deployed anywhere you like. So for instance, you can deploy as a developer when you start development of your next application, you can deploy the first cluster on a local laptop. You do not need to go to your IT department and negotiate in some environment. All the development and the initial testing can happen on your personal machine. However, when it's time for advanced testing or the time for the production, you can deploy your cluster on premise or you can deploy it on cloud. And what's interesting about the deployments, the cluster deployment and lifecycle can be orchestrated and provisioned by different solutions such as Kubernetes. And here is we're going to talk about Kubernetes. The deployment in Kubernetes is not that difficult as you will find out. Actually, the main thing of the clustering component is to let all the cluster nodes to discover each other. Here is in this picture, you already see a connected cluster. So all the nodes know about each other, they can communicate, they can exchange messages, they can accept different queries and calculations. But before this cluster is united, every single node needs to discover each other and join the cluster. And to achieve this in the networking component, we have a special thing called IP Finder. The responsibility of the IP Finder is to provide a new cluster node with the list of the IP addresses of the nodes that are already in the cluster or the nodes that might be or should be in the cluster. And when the new node will start the joining process of the cluster, it will pick one of the addresses provided by the IP Finder and will connect to the cluster via that IP address. So it's pretty simple. And this is actually how it works. And there is a variety of different IP Finders available in Apache Ignite. For instance, the basic two ones are multicast IP Finder and static IP Finder. So for instance, if you start, let's say, your cluster on the local laptop or in your private network, then you can use multicast IP Finder because a new node will issue real broadcast multicast packages into the network. And if there is any other node in this network that is already part of the cluster, the nodes will reply to this new node with their own IP addresses and the new node will be able to join the cluster by receiving this information. But probably the simplest way to let all the nodes to fit all the IP addresses to the nodes is by providing the static IP Finder. Static IP Finder simply includes already known IP addresses of your machines where you're going to run cluster nodes. Actually, this type of configuration really works for on-premise deployments when you can know the IP addresses in advance and you just prepare the configuration for your cluster nodes. You give this configuration to the machines and that's it. But when you decide to deploy your cluster, when you decide to deploy your distributed database, let's say, in the cloud or on the Kubernetes supervision, neither static IP Finder, nor multicast IP Finder will work for you. Because if to talk about Kubernetes, we all know about this. It's unlikely that your Ignite cluster nodes or Ignite parts will know even its own IP address in advance. So the IP addresses are unknown. And here we are going to the orchestration specificities of distributed databases such as Apache Ignite in Kubernetes. So again, there are two things that are related to the networking component of the database. The first one is cluster discovery. This is what I am just talking about. And here is, as I said, nodes, cluster nodes, do not know IP addresses in advance. So they will not be able to join the cluster if their addresses are unknown. And also the IP addresses of the nodes can be changed over the time if you restart your cluster machines, right? So it's not under our control. It's up to Kubernetes to decide what would be the next IP address if I restarted one of my cluster nodes. However, once you find a way how to provide, how to share IP addresses among all your cluster nodes that represent your distributed database inside of the Kubernetes, you will be able to get a workable self-contained and self-healing cluster. Specifically, if to talk about Apache Ignite, once all the nodes are interconnected, they will take care of the rest of the functionality and the rest of the tasks. For instance, if the cluster topology, the size of the cluster is going to be changed if you're going to add new cluster nodes or you move cluster nodes, the cluster that already exists is going to update this information internally. It will never ever communicate to Kubernetes. The cluster will communicate to the Kubernetes only at the time when a new node joins the new cluster. So, and also once we have this cluster, all the nodes, at least the nodes in Apache Ignite database, will be able to open peer-to-peer connection whenever it's needed, like when you want to execute a new SQL query or you are sending some computation to your cluster node, one node will be able to open a direct connection to the other. So here is again, we don't need to know, we don't need to talk to Kubernetes environment. And today, and now let's dive into the first paragraph into the cluster discovery because this is actually the integration point between the distributed database such as Apache Ignite and Kubernetes environment because the rest is usually handled by a database vendor. So, what are the specificities of the Kubernetes deployment or Ignite deployment in the Kubernetes? So first, how we solve this issue? How we provided the IP finder implementation for Apache Ignite? We all know that in Kubernetes we have a concept of services. And we just decided to reuse this concept for the nodes auto-discovery inside of the Kubernetes environment. Basically, if you take a look at this diagram, here is I'm trying to explain the process when a new node shown over here is trying to join a cluster of machines. These machines are already here, they're communicating, they can accept and they might accept your queries, a process workloads coming from your application but this is the new one. And when this node is launched, we can call it Ignite part in terms of Kubernetes. It doesn't know IP addresses of any of these nodes but it will be using a special IP finder representation. That IP finder will connect to a previously started Ignite Kubernetes service and that Ignite Kubernetes service is keeping a list of all the IP addresses of all the Ignite parts that are already deployed in your Kubernetes environment. And this node will just, will do the simplest operation possible. It will open a socket connection to the service, it will retrieve the IP addresses of all the Ignite parts, deploy it in the Kubernetes and will connect to the cluster via one of them, randomly using one of them and that's it. It's that simple. So this lookup service actually works and looks exactly this way. So this is the only configuration you need to use to start Ignite service in your Kubernetes environment. So basically the only one mandatory option is that you're encouraged to specify the name of your Ignite cluster because every Ignite part that is about to be started in Kubernetes will be labeled using some name and here is a new username Ignite to specify that this Ignite service should track IP addresses of the parts that are labeled with this name. And then we're just going to use well-known commands to start this service. What goes next? Once our service in Kubernetes is up and running, we are ready to start, we are ready to deploy our distributor database because now we know that all the database nodes will be able to look up each other by connecting to the service that is already up and running. Unfortunately, the quality is poor, but anyway, I'll show this to you during the demo section. Here is what I am doing. Again, I just specified that the name of this part will be equal to Ignite. And then here is I'm providing an Ignite configuration. I'll show this configuration in a minute and basically in time of this configuration that's Apache Ignite configuration that is used by Apache Ignite node. And inside of this configuration, the nodes use Kubernetes API Finder that connects to the server, gets all the IP addresses from the server, service, sorry. And this actually allows to all the nodes to interconnect with each other. And finally, so that's it. So actually, that's a reality. That's how it's simple to deploy a distributor database in the cluster. If you just, basically you just need to use a special networking component that will support your database nodes out of discovery. And you just need to provide the configuration for it. And then next, many databases provide its own facilities when just to handle the situations when you scale out, scale down, or provision more machines to your cluster. But actually, in my opinion, Kubernetes has already built a pretty advanced and major abstraction in its own framework. For instance, when you use Apache Ignite deployment, you can use the same Kubernetes APIs if you want to scale out your cluster, or if you want to shrink the size of your cluster, depending on your workloads and depending on the memory consumption and other characteristics. So for instance, in a couple of minutes, we are going to start our first Apache Ignite cluster in Kubernetes using MiniCube. That cluster will consist just of two cluster nodes. But then, if we use, we can use this well-known kubectl scale command to scale out our cluster. And you don't need to know anything about Ignite internals. You don't need to know, don't need to use any management facilities provided by Ignite. If your goal is just to scale out the cluster, you can use all the knowledge about Kubernetes you already have. So that's all about theoretical part. So now let me quickly show you how we can deploy our cluster. Okay, here is my window. I've started the MiniCube environment. And first I want to start, as we already know, now we need to start Apache Ignite service, right? Kubernetes service that is going to provide us all the IP addresses. So presently, we don't have the service deployed, but I have a configuration of the service. You have already seen it on the previous slides, but let me show it to you one more time. Again, so it's just simple Kubernetes service. And the essential thing about it is this service again is going to track IP addresses of Ignite parts only. And here is, we just need to provide this label to this service. So let's quickly start it. Okay, this service is up and running, which is good. So then the next step is to deploy our first cluster in Kubernetes. For this purpose, we prepared deployment configuration. We are going to start two cluster nodes, it's enough. At least for my local laptop. And then I'm going to say that all the parts will be labeled with Ignite name, so that they're somehow related to that Ignite service. I'm going to use the latest version of Apache Ignite, Docker image, and here is interesting thing, that's Ignite configuration. So basically here is this configuration will be passed to Apache Ignite process when it's about to start. And inside of this configuration, we have one interesting thing that is related to our conversation. Here is that configuration, it's just available in GitHub. It contains nothing except for this Kubernetes API finder. So that's the only thing we need to provide in the configuration of our database nodes if you want to run them in Kubernetes cluster. And that's it. Then when we are going to start two first, when Kubernetes will be starting two first nodes using our deployment configuration, the nodes will use this API finder. This API finder will connect to our Ignite service that is already deployed in Kubernetes. And the service will be aware about the IP addresses that are assigned to these nodes and they will be able to interconnect and form a single cluster. So let's do it. So we are starting our first cluster. So now let's double check that we have, so we have Ignite cluster deployment over here. And let me check that the nodes, the paths are up and running. Okay, here is we have two Ignite cluster nodes or two Kubernetes paths. Let's connect, let's see the locks from one of them. I just want to be sure that they are actually belong to the same cluster. Okay, that's kind of Apache Ignite locks. We see that the node is started. And this topology snapshot says that we have two cluster machines and they kind of can use two gigabyte of Java heap and much more memory for the rest of the data. But the cluster is empty, so pretty simple. That's it. This is how you just can deploy your first distributed database by just providing the way of for auto discovery of your cluster machines. And now when you want to change the cluster size, again, you can use some facilities provided by a database vendor or you can use the functionality available in Kubernetes. And let me just quickly go to the slide and copy and paste this command. Here is I'm just going to scale out my cluster to five nodes just by using this command. Cubesetail, scale and number of replicas. Again, we are going to use the same deployment. And now if we check the number of ports, now we have five cluster nodes running in Kubernetes. And let's check the log files probably of the previous machine. It should be updated here. We connected to the log file of the previous machine and now we can see that there are five nodes running in our cluster. So what's the upshot of this presentation? So basically the deployment and maintenance of distributed databases is not that different from regular databases. You as an application developer or developer just need to provide a valid configuration for your distributed database. But the rest has to be supported by database vendor. Today I just explained you what we did specifically at the Apache Ignite community. We had our so-called pluggable and expandable networking component that allowed us just to provide the simplest implementation of Kubernetes IP flag are possible. And now kind of Apache Ignite users and customers can easily deploy Ignite cluster without limitation. And the same technique usually applied for other distributed databases. So I know that pretty the same techniques I used for Cassandra, for MongoDB, for Radis, so with no any difference. And then today we're not covering the applications. So with this type of deployment if you use when all the nodes are deployed inside of your Kubernetes environment, this type of IP flag implies that your applications will be deployed in Kubernetes too so that they can connect to Kubernetes to Ignite parts to the cluster using internal IP addresses. But anyway, so if you want to deploy your applications outside of the Kubernetes, there is also different tips and tricks on how you can do this with distributed databases with Ignite. You just need to use a bit different version of the IP finder or you just need to assign Ignite service for every Ignite part you have in a cluster so that every individual Ignite cluster node can be visible outside of your Kubernetes environment. So that's all and we have five minutes for your questions guys. Thanks. Any questions? So the question is, is there any volume used in this presentation? Today I just showed how to deploy Apache Ignite with a memory storage only. But if you want to enable the persistence, right, so you just need to provide volumes configuration. It's not that different. And but the specificity of distributed databases deployment is that usually you want to assign a specific volume to a specific cluster node so that if this node is restarted, it will own the same volume. This is pretty, it just requires you to provide a bit more configuration in your Ignite deployment file. It's also possible, it's just, you know, this is how I just decided to show and how to explain this to you using the simplest way possible to show you how we actually made how, what you as a kind of what a distributed database vendor has to do and the simplest thing just implement and provide some functionality for the networking component. But anyway, yes, you can rely on the stateful set to achieve the same if it suits more your needs. Yeah, yeah, I see. So the question is, so the question is basically what's the reason of running several cluster nodes on a single machine, right? So it usually kind of, I would say that in 90% of the deployments, at least Apache Ignite users dedicate an individual physical machine or VM machine that is in the cloud to Apache Ignite node process. But sometimes machines, I know that there are some of the deployments when there are, you know, dozens of different CPUs and they are running compute intensive computations and companies might run several Ignite nodes on the same hardware just to be able to utilize all the resources. At least this, that was true a couple of years ago. But nowadays, speaking about Apache Ignite, all the data, all the indexes are stored in the manageable memory region, so which is not visible to Java heap. So we will not get any stop the world poses. And actually the platform itself is tries to utilize all the resources you have. And basically all the time to talk about present days, I see, I couldn't say that I talk to any user customer who tries to launch several Ignite nodes on the single machine. But usually it's good for the development. Like I, as a developer, I just start my cluster on locally and I do it. And it's good for quality assurance because you can start, you can have a single Java process and you can start several Apache Ignite nodes in the same Java process. And this truly simplifies the testing cycles of your functionality. For instance, in the Apache Ignite community, we have thousands of different tests. And probably the majority of these tests start the cluster in a single JVM process just to be able to check some functionality, that the queries return consistent result, that the computations are executed in the same way. And because if we were doing it in a different way, when we were dedicating, let's say, if we would dedicate a separated machine for every node process, the testing cycles would be really, really long. I guess so, if you still have any questions, you can catch me over here. And again, guys, thanks for coming. Thanks for choosing this talk. Enjoy the rest of the day. Thank you.