 Welcome to this talk. My name is Lukaj Hocsek. I've been working with Red Hat for a couple of years and right now I'm helping an OpenShift team to better integrate Elasticsearch into the logging component. So this is a short presentation. We have only like 20 minutes so I really want to make it. We will go fast. There will be no demos because there is a lot to talk about and I will try to make sure that we have some time left in the end for the questions. First of all I will describe very high level view of what logging is and how it fits into OpenShift. Then I will talk about what are the changes of operating and running Elasticsearch inside OpenShift. Of course we are using Prometheus and Grafana or we are starting using Prometheus and Grafana even for Elasticsearch and this conference is about OpenShift 4 so you've probably heard the term operator pattern. We also have operator for Elasticsearch. It's important I will mention that as well. So very quickly. Just notice in advance most of these charts are very simplified. So okay these days we run applications in containers. They produce logs and at some point those logs end up on file system on the host on the node. And then they are picked up by the collector and shipped directly to the logging stack. In this case it's Elasticsearch. There are some abstractions probably. Maybe some of you are familiar with those terms. If not containers are running inside a pot. Inside a pot there can be more containers running and collection of pots that somehow belongs together. They run in namespace. In OpenShift we use term project. I believe it's an extension of Kubernetes namespace and it's mainly about the security. It's important. We will get to that in a minute. And also we have special namespace for logging. It's called OpenShift logging and it's home to Elasticsearch cluster. All the collectors that are running on individual hosts. There are other components like Kibana, Quarator and some other. Those are not important for this talk. Okay so what does it take to run Elasticsearch cluster inside OpenShift? When I was thinking about it I think every challenge or problem can fall in one of those three categories. So one of them is the data model. It's the model of your data, your log data. It can be a scalability challenge for Elasticsearch as well. We will talk about it. The other one is the environment where the Elasticsearch is running. It's the physical hardware and it's the topology of your cluster. And last but not least the people. They are very important and we will talk about it. So let's talk about the data model. So what can go wrong with the data model or why it is so important? So from the beginning the OpenShift logging started with a very simple model. We've already mentioned that there are namespaces that groups together, pots that somehow balance together and individual users in OpenShift can access or have privileges to see or work with specific namespaces. Which means that if you are collecting logs you really want to make sure that users that have privileges for one namespace cannot see logs from the other namespaces they do not have privilege for. So what we do is that we create an index in Elasticsearch per project or per namespace and we cut every day we cut a new index. That's a very simplified model. It works well except it has major challenges. It can lead to a high number of indices and it can lead to a high number of shards. So those are really scaling issues. If you do not have a large Elasticsearch cluster these things will get you. And if you look into Elastic forums you will see that this topic is frequently discussed. So I guess it's pretty okay to say that those problems are known. I would say if you run into average person on street and you ask, hey, I have this many shards in this small cluster they should tell you, hey, this is wrong. It will not work. The other problem is something called index mapping explosion. I'm not sure if you are familiar with that. It basically means that if you allow users to introduce their own field names into the model, bad things can happen. It can grow too much and Elasticsearch will suffer again as well. So what we can do about that? Well, there are some things that we can do. We can try to reduce number of shards or indices. Elasticsearch has the roll over API that can help with that. We are looking at it. We are looking at how we can benefit from that. There are some commercial solutions to that which basically are built around document level security which means that you should be able to store logs from many namespaces into common index. And the security is ensured on a different level. Not on the index level but on the lower level. But all those solutions are, I would say, commercial. Basically, their license does not permit us as a redhead to redistribute this code. So we cannot use them at the moment. One other solution, what I've saw is that, okay, if one cluster cannot handle your data well, okay, then let's start another cluster or another cluster and let's share the load between those clusters. And in the end, I would like to mention that this scenario will be possible with Elasticsearch operators. It will be easier to use. So maybe we will try that. So another category of problems is really about perfection. Performance and operational tuning. So what is really important is what the hardware that your Elasticsearch runs on is really important. You need fast local storage. If you do not have fast local storage, at some point you will probably suffer, right? The other important thing is topology of your cluster. So far we've been using quite simple model where every node can do anything. If you are familiar with Elasticsearch terminology, you know probably that you can run node as a master, as a data node, and things like that. So it's really recommended to, if you want to have better performance, better stability, and you have a lot of data to have dedicated master nodes, for example. So the topology will be like that. Okay, that's known. But of course, if the topology is like that, it can be more complicated to operate this cluster and manage, right? So we will get to that in a minute. So at this area of problems, I would say what's really important is to have a monitoring and other thing. Again, there are commercial solutions. We cannot distribute them as a part of OpenShift. So we either have to build our own or use some other third-party solutions. And that's where exactly Prometheus and Grafana falls in. We will get to that. Okay, last but not least, the cluster will be managed by some people, by real people. And those people usually know their job very well, but they may not be experienced on Elasticsearch, right? So the thing is that if you ask these people to do upgrades for you, or do maintenance for you, or do troubleshooting for you, you probably should make sure that they have good tools for that and that they know how to use them, right? So let's jump into the Prometheus. We like Prometheus. I like it as well. I will probably not, I do not want to introduce what Prometheus is. I guess you probably should know it. It has many nice features. Now, when you want to allow Prometheus to scrape Elasticsearch metrics, how to do that? Well, Elasticsearch does not out of the box support this scenario, right? Because it does not export metrics in Prometheus format, right? So in Prometheus world, there are two, I would say, patterns how to deal with the situation. First is that you use something called Prometheus exporter. So it's basically another process. It's like proxy. It knows how to pull metrics from Elasticsearch, transform it into Prometheus metric, and ship it into Prometheus. The other option is to extend Elasticsearch itself. So Elasticsearch has a plug-in model, so it's possible to implement a plug-in that you can plug into Elasticsearch and it will expose the metrics in Prometheus format. So I want to see what kind of audience I have here. I know that there exists Prometheus exporter that is implemented in Go and there is Elasticsearch plug-in for Prometheus that is, of course, implemented in Java. So what would be your preference? What you would like to use? So who would like to go with Go, basically? Who would like to go with the Prometheus exporter? Okay. And who would prefer the Java solution? Yeah, okay. Maybe you are JBos guys. Not because of Java. Right. I prefer the Java solution as well. Okay. I'm a Java developer. I've been using Java for a lot of years. But that's not the point. I think when you use Java and native Java plug-in, it has several advantages. So let's briefly talk about it. When you implement plug-in for Elasticsearch, especially if you want to expose metrics, you really have to talk to internal and quite low-level APIs. Is this an advantage? Well, it's definitely hard. It's definitely hard. But the advantage is that Java is strongly typed language and when any changes introduced within even the internals of Elasticsearch, you will know that because when you download new version of Elasticsearch and you will try to compile your plug-in, you will know how something has changed. Cool. That's very important. The other thing that is very important is implementation of integration tests. I've been looking at this Prometheus export implemented in Go and I haven't found any test at all. Maybe there are. I was just blind. But Elasticsearch developed their own framework to test plug-ins and it's really great. I mean, it has disadvantages that the documentation around that is not, I mean, how to say it? I would like to see more documentation around that. But anyway, once you have gone through this and you learn how to use it, it can do really great things for you. So what currently the Elasticsearch plug-in that is implemented in Java does is that it has set of integration tests that are directly, you can directly run them from Gradle and the tests will do the thing that it will really instantiate, you know, Elasticsearch processes. You can tell, okay, start two or three nodes for me. It will deploy the plug-in into those Elasticsearch nodes and it has great support for, you know, running the rest tests. I really like it because I started using this Elasticsearch plug-in and I implemented a lot of tests on that level. It's really great because whenever, again, whenever anything changes, you will learn that. I don't know how you can do the same with Prometheus. You will have to implement it yourself. You can do that, of course. Okay. The last thing that I would like to mention is that exposing Elasticsearch metrics in a Prometheus format may not be always that straightforward. I've learned that at some point I needed to expose metric, you know, Elasticsearch has this feature that when a node disk fills up, you can set up specific thresholds. And Elasticsearch, first, it stops allocating new charts into those nodes. If this consumption grows, it will start allocating or allocating those charts out of the node. I mean, the cluster still breaks perfectly and the operator may not, you know, even notice that something is happening. But I think you really want to know that this is happening because it can rib-byte later. And with Prometheus alerting in Open Sheet, what we are trying to do is we are trying to be proactive. So we do not want to wait until something breaks. We want to let you know in advance. However, if you look at how this metric is exposed, you will learn that users can change them on the fly and that this metric can basically use two different units. It can be expressed either in percentage or in bytes. If you think about that, it's not that easy to directly export this into Prometheus. Prometheus will not like that. So you need to do something extra with those metrics. And it's quite easy to do it in Java. Again, you have to dive into low-level APIs and handle that. And it's good that then you have the control grid. Okay. Grafana. We have Grafana dashboard for Elasticsearch. It's exposing or it's consuming most of the, or a lot of the metrics that are exposed by the Prometheus plugin. Not all of them. We are just, you know, adding things as we see fit. And also, you know, it's open source. So we will be glad if you can join it and, you know, contribute. It's open. How it's done? There is concept of mixings. I'm not sure if you have heard about it. There's Kubernetes mixings. And we built Elasticsearch mixings on top of that. What it is, it's basically bundle of Prometheus recording rules, alerting rules, and Grafana dashboards. So it's built on JSON which is templating language. And it uses some other libraries. This is quite low-level, but, you know, maintaining and building Grafana dashboards that are huge JSONs, it's not a good idea to do it directly in JSONs. So JSON helps with that a lot. Currently, you can find this bundle at this URL. The location will likely change in the future. Okay, last but not least, Elasticsearch operator. Very quickly. I do not want to go into details. I haven't implemented the operator myself. It's been done by my colleagues. But anyway, in OpenShift, there is a new space where all the operators live. And this operator right now, so specifically Elasticsearch operator is responsible for setting up and starting Elasticsearch clusters. So you have this Elasticsearch custom resource, and you describe how your cluster should look like, and it will start the cluster for you. So the advantage here is that now it will be much more easier to start more clusters. So we can possibly share the load or start clusters with a different topology. Again, thanks to the Elasticsearch operator, this will be much more easier. And it will be also much easier for the people that operate the cluster to maintain it. I've been talking about Elasticsearch mixing. So in OpenShift, there are also the OpenShift monitoring namespace where the Grafana, Prometheus, and other manager lives. And thanks to some magic, I do not understand this magic yet, but if your project like OpenShift Logging contains specific objects that contain the dashboards and the rules, they are picked up and they should be injected into Grafana and Prometheus. So the integration should be quite easy. And it opens a couple of new opportunities going forward. I mean running different Elasticsearch topologies, sorry. Or shipping the logs to completely different, I would say, targets. Some customers or users just want to ship it into their own in-house logging or not into Elasticsearch but into Q or not ship it the logs anywhere, maybe. I don't know. So that's all I had for today. If you have any questions, please. Now it's your time. Yes. So the question was if there is, if Elasticsearch can close all the indices for you so that you don't have to delete them but still they are available based on the pattern. Well, Elasticsearch can close indices so it shouldn't be a hard job to implement this job for you. Okay, so find all the indices with this pattern and close them. Yeah, that would be possible. But still it's, the goal is not to have a lot of indices, gasoline also of indices in Elasticsearch. Right, so if you need auditing and things like that, that's a different topic. I personally do not look at Elasticsearch as a system where you can, you know, long-term store your records that you need for auditors. I would probably export it to some different storage. Yeah. Yeah, I'm not sure if it will be used for logging if it's in our use case because, you know, OpenShift just, you know, if it provides you logs for the last two days, it's probably enough. If you want to archive them, maybe use a different system. Yeah, you are welcome. Any other questions, please? That's great. So thank you.