 I'm a little bit impressed myself with this introduction. So I'm going to talk about centralizing Kubernetes management and orchestration, and I'll start with a little bit of history. So there is no need in that already. Isbank Technologies mentioned in this introduction is a custom software development company of 20 years on market, and we have probably dozens of new projects and new clients every year in different environments, in different clouds, sometimes on prem, sometimes hybrid. So at some point, we realized quite long ago, maybe from the beginning, we understood that we need to unify our processes, delivery processes, and most importantly, operations processes. And that includes not only deployment, but secondary operations, because in many cases, we also maintain software we deliver. So it was great that when we started researching technologies that can help with that, Docker came out and we quickly realized that it helps significantly with application delivery and packaging. Unfortunately, again, we also quickly realized that it's quite difficult to manage distributed applications with Docker alone. And I think previous presenter talked a bit about that as well, so next thing that we had to find is a way to manage distributed container applications. And well, there are a number of solutions for that, Docker Swarm, Kubernetes, Nomad. We focused on Kubernetes mainly after several rounds of comparisons, and quickly realized that it's great when it's up and running. But questions arise, who sets it up? It's not that easy. Who operates it after it's up? Who takes care of operations at scale? And every time, it's a question how to provide governance and compliance, and sure compliance, because in most cases, in relatively large and middle-sized organizations, what happens often is that development team finds quickly that Kubernetes and container orchestration technologies significantly simplifies their life, improve deployment pipeline, et cetera. But then they start essentially asking for Kubernetes support, and this task often goes to operations teams. And not many organizations still have very mature DevOps culture where those teams can efficiently work together. And so operations team gets tasked with a problem of essentially establishing container orchestration and container management practice in an organization, and this is where things often stall. So again, what we quickly realized is that we need some way to provide our clients with Kubernetes management platform. And several high-level requirements that I put on this slide include portability, because we need something which would help us to set up Kubernetes in different clouds, in hybrid environments, in air-gap environments where internet access is limited, et cetera. It needs to be a centralized multi-cluster management platform because, again, operations teams need a single place to manage different Kubernetes clusters we set up in this organization. They need a single place to establish, to set up policies, to ensure compliance, again, to provide governance. Still, self-service is very important because if every development team comes to operations and asks to create a cluster for every test run they want to do, that's not going to work. So self-service is important. The platform needs to provide ability for dev teams to create clusters, but still, operations teams need to be able to control at least some aspects of that. Resource usage, for example, cost controls, et cetera. Reliability is clearly very important because it's not enough to just set up a cluster, a Kubernetes cluster using COP's Quick Start Guide. So you need to make sure that across your organization even development clusters are reliable, secure, sometimes highly available. Because we are looking for portability, for ability to manage clusters across different clouds and platform, it's important that the platform has limited management profile in how it manages the cluster. It needs to be compatible with pretty much any Kubernetes cluster that we can think about, which means, again, that it probably should limit itself to using cloud API and open Kubernetes API when it works with clusters it's managed. Of course, our target was open architecture because we want compliant Kubernetes clusters, not custom build extension or fork. And all non-functional requirements, you might think about security, scalability, high availability, and disaster recovery being there important as well. So eventually, we came up with a framework which gradually turned into a practice and into a product, eventually. So I'm going to talk about architecture and architecture decision we made on the way when developing this framework. And because we, again, tried to stay open, tried to use as much open source and open approach as possible, I hope this will be useful not only in reference to the product we are developing, Kubler, but in many other contexts. So if you are building your own Kubernetes practice, you may find useful some of the solutions here or considerations we went through. OK, so going back to the requirements, the central element of this operation platform is, again, because it's central control plane, or you can call it operations center. So a component which itself runs in Kubernetes cluster to ensure portability and takes care of a number of operational aspects. And I'll start with essentially operations for Kubernetes. So it includes API, user interface that works on top of that API components that implement lifecycle of clusters in different clouds and environments, including AWS, Azure. And again, from portability standpoint, so it may be as limited or as necessary in any specific circumstances. So in case of limited implementation of Kubernetes, this may be an automation job in Jenkins that you run and it creates a new cluster in AWS environment. In our case, it's full blown user interface with user interface and API, et cetera. This is how user interface looks. Again, the second important component is architecture of the clusters we deploy. And here, where it becomes much less tubular specific, so we analyzed a number of solutions available out there for cluster deployment and the number of architectures used for cluster deployment. So it ended up with implementing a tubular agent that manages every instance of Kubernetes cluster, every instance that is part of a Kubernetes cluster, so whether it's master or node. Again, depending on your specific implementation, it may be as small or as fully functional as it is needed. But the main functions of this agent is making sure that all required packages are set up on the instance. Well, most important one, of course, is container management docker in this case and configuring elements of Kubernetes clusters that need to run on this instance. In case of tubular, this is a single binary which is relatively independent, so it gets configured, so it gets a configuration usually from this centralized control plane. But it may be provided manually by administrator. And so other parts of its responsibilities include coordination of cluster setup, because an important aspect of this cluster is self-sufficiency. So being able to not only get up but also recover in case of hardware failures, for example, even without access to central control plane or without involvement from the central control plane. So this agent includes a number of orchestration features. So another decision that we made is that those orchestration features should require as little as possible from underlying infrastructure. So this agent relies on a... So the only thing actually those agents need to orchestrate cluster functionality is a shared file storage. It may be S3, it may be Amazon storage account, or it may be even R-sync local directory. As long as it is eventually consistent, those agents will be able to share secrets, to declare themselves to the cluster and let nodes know where masters are and recover cluster if nodes or masters go down. To ensure portability, several decisions that were implemented that make sense is, first of all, everything, every component of Kubernetes runs as a container. Well, except KubeNet, because KubeNet, so that's one of the findings that running KubeNet as a container creates more problems than it solves. But everything else actually can run as a container very efficiently. Again, simple agent, so we started with a bunch of shell scripts, and then it converted, it transformed into a binary written in Go. But again, for a limited use case, it still may be a shell script that uses KubeNet, for example, to set up this specific instance. Minimal storage requirements, again, because we only need eventual consistency and ability to share those files in a secure way, so this can work with different clouds, with different environments, different architecture. Minimal infrastructure automation requirements, which means that the agent actually doesn't depend on how infrastructure is set up. It can be run on any Linux machine, whether it's cloud formation created instance or Azure ARM manager created instance or Bosch initialized instance, whatever. Another thing that we found is that some environments do not provide or do not allow you to create load balancers easily, and a reliable Kubernetes cluster that includes several masters require a single entry point for nodes to connect to, so which means load balancer. Unfortunately, Kubernetes doesn't include, again, out of the box, node master client failover features, so we had to implement it itself. Fortunately, it's not that complicated to do, so essentially, every node, in addition to system Kubernetes containers, may run a very lightweight hop proxy container which essentially switches between masters on the node side. So we are able to run even in those environments where load balancers are not easily created. Reliability, again, many of those characteristics I mentioned in the previous slide also play for cluster reliability, relying on the underlying platform as much as possible. If you are running on AWS, make sure that you are not just creating instances for your cluster, but create them in auto-scaling groups, even if it's an auto-scaling group, for instance, which we do for masters, for example, because masters require their identity. Minimum SLA for infrastructure, again, so make sure that those Kubler agent and Kubler agent orchestration algorithms can survive hardware failures. So for Kubernetes, this means, for example, that masters should store their data on attachable disks in those environments where it is possible, and reuse them when master fails and gets restarted. And again, in case of Kubler, Kubler agent is responsible for that, again, it's easy to automate, even if you are building it yourself. I talked about multi-master API fail over already. Very interesting point here is the last one, resource management. Make sure that you look into how resource usage is declared for every component in your system. Out of the box, Kubernetes only provides resource usage declarations for some components, and resource usage limits for some components, which means that certain elements of Kubernetes out of the box are not limited in memory usage, for example. So they can use as many memory, as much memory as they like. And another feature of Kubernetes is that it doesn't like swap memory so much. So it actually requires by default that swap is disabled, which means that when too much memory is allocated on a node, this node will most probably fail, or some arbitrary component or process will fail. So it makes sense to spend some time for profiling and make sure that you limit and declare how much memory or CPU this and that in that container or process uses, so that Kubernetes knows that how many applications can be deployed or how many containers can be deployed on that node. So another aspect of centralized operations is log collection and monitoring. And again, we realized that it is a concern quite quickly because, well, at the first sight, it looks like Kubernetes has everything that you need there. So Kubernetes includes add-ons with Elasticsearch, with Prometheus, InfluxDB, and Grafana. Unfortunately, what we found, there are several aspects that prevent using that simple simplistic approach when you are managing multiple Kubernetes cluster, when you are doing that at scale. So first of all, those Prometheus or InfluxDB or Elasticsearch components are quite heavy and not that easy to operate. So when you have five, ten clusters, you are probably using up, starting from 16 gig and more of RAM per cluster, and you have in each of your clusters this monitoring and log collection subsystem that, again, someone needs to take care of, and that's not something that is easy to organize again when you have a lot of clusters. A very convenient way of implementing monitoring and log collection is using SAS, like LogZ, Datadoc, some of them are present here in this conference. Unfortunately, again, sometimes it's not possible due to policy reasons. Some organizations don't want to ship logs outside or have concerns about that, and others can use them but want also to duplicate and replicate this information in their own storage. Sometimes those systems or existing systems that organization uses already are not container aware. So, again, some organizations use Zebix, and Zebix just recently started providing some good integrations for container-based software. If you are deploying monitoring per cluster, again, you most probably won't have an aggregated analytics cross-cluster, cross-environment analysis, and centralized governance for alerting, which may be useful because, again, if you are running multiple clusters in your data center, you may want to do cross-environment analysis even including development and QA environments and production environments to find some hardware-related problem or networking-related problem across the organization. So this slide shows how monitoring works in Kubernetes out of the box. If you are using Prometheus, so when you are deploying Prometheus into a cluster, you are setting it up to work with Kubernetes API for discovery. So Prometheus connects to Kubernetes API, looks up where all nodes are, where all services and pods are, and uses meta-information like annotations to find out what endpoints to use to extract metrics from your containers. Fortunately, it's easy to extend this functionality when you want to provide centralized management for multiple clusters because, well, Kubernetes API is accessible from outside, so you essentially can configure Prometheus to connect to each and every cluster which is under your control. And the second element that makes it possible is that Kubernetes provides access to internal processes through API, through proxy API. So essentially not only Prometheus can discover all nodes, services, and pods, but Prometheus can also send HTTP requests to those nodes, pods, and services, and therefore it can connect to metrics exported by those containers from outside of your Kubernetes cluster. So the only missing piece is something which can take the list of your clusters, can take all required credentials information or certificates, passwords, whatever, collect it into a single config file and give it to Prometheus to run its jobs to collect metrics from all Kubernetes clusters. So there are several considerations that need to be taken into account when you do that. So first of all, of course, you need to be aware of Prometheus resource usage because clearly it becomes dependent on the number of clusters and applications you run. So configuration file can grow significantly when you have many clusters and at some point it becomes difficult for Prometheus to handle it, especially if there are many clusters and they change frequently, you at some point will need to shard your deployment, so essentially to make sure that you run different instances of Prometheus to collect information from different clusters. There are some limitations to Kubernetes proxy API, so while from inside the cluster, Prometheus can access more information actually than it can do from outside. Fortunately, this consideration is just something that you need to be aware of, not something which is blocking this deployment because in most cases it's quite enough to have access to just pods, nodes and services. And metrics labeling becomes a bit more complicated because you need to introduce also some labels that allow you to distinguish between clusters and make sure that it doesn't cross your existing labels. But the end result, well, another consideration is additional load on API servers because all this metrics data goes through API server. Again, based on our experience, it's not, again, a blocker, it's not too significant. But the end result is that, again, you have all that information from the database for cross-environment analysis. Centralized log collection has used a similar approach with the difference that while metrics with Prometheus use pool, so Prometheus pulls this information from your containers, logs in most cases are push, especially if we use standard way of shipping those logs on Kubernetes, FluentD. So instead of just working through Kubernetes API, we had to implement a lightweight broker between pushing FluentD and Elasticsearch that consumes those logs. And we used Rebitmq for that. Again, it's possible to do quite efficiently. Some considerations, again, for centralized logging are Elasticsearch resource usage, tuning, same thing as for Prometheus, making sure that your log index structure is unified. What this means is that while, if you are using your log collection system for just one to three applications, it's relatively easy to either distribute them between different indexes with different structure or unify their output. But if you are using your log system for just an arbitrary set of applications, you need to make sure that you transform outputs in efficient way so that they can be stored in a single index. And there are ways of doing that. So Elasticsearch has nice good blog posts. So this picture actually shows an example of how this can be done. And again, take into account additional load on API servers. Then there are other aspects of centralized Kubernetes management that I will just quickly go through in one slide. Identity management. Again, it may or may not be required in any specific case. In our case, we use KickLog as an identity broker inside this Kubler control plane, which on one hand provides OpenID identity provider API to Kubernetes clusters we create. On the other hand, it can be used either as an identity management provider by itself or can be integrated with LDAP or other identity management software. And again, KickLog is an open source software. It can be used in any project. So we found that it's quite efficient and integrates with Kubernetes very well. Back up in disaster recovery, what's important to take into account is that not only Kubernetes metadata needs to be backed up, it's CD data, et cetera, but also application data. And what's good is that Kubernetes has all required information to actually let this centralized manager to do that because Kubernetes keeps all the information about persistent volumes, et cetera. And if you are running in AWS, for example, you can just do synchronized snapshots of all your persistent volumes including at CD. Docker image management may be also an important part of the system, especially if you are working with air-gap environment or with isolated environment which has limited access to Internet or limited by policies or traffic or whatever. So... And again, it makes sense to use some form of Docker image registry to provide images both for system components, for Kubernetes, for network overlay providers, and for applications running inside Kubernetes. It also allows to establish practice of image scanning. It also allows to cache and optimize traffic. So, I hope I'm in time for... Alright. Thank you very much.