 Hello everybody and welcome again to another OpenShift Commons briefing. This time we're going to get a deep dive on the OpenShift logging stack from someone who's been doing a lot of work with it, Gabrielle Forrest-Stein who's one of the technical account managers for OpenShift and I'm going to let him introduce himself and his background and take it away. We'll have live Q&A at the end and you can stage your questions in the chat. So Gabrielle, please take it away. Okay, hi Diane, hi everyone. I'm Gabrielle Stein and I'm a technical account manager for OpenShift. I have a lot of customers which have problems with logging stack, how it works and how it could have a better performance and fix also some problems. So I decided to do a presentation talking about the logging stack and how to fix some issues and also introducing to people the logging stack firstly. So that's why I'll show you now, it's just like an introduction and then after we go for some experience which I have with my customers so that you can also understand better how the logging stack works and also how to fix some problems and even also there is some hints how we can work together if you are a Red Hat customer and would like to use our support so that you know already how much data or what data you should send to us on our logging stack so that we can have a much better work together and a faster approach to these problems. So let's start with the basic definitions from the logging stack, what is the logging stack first when we install OpenShift and then you do the setup, you install the logging stack, you have like three components, mainly three components working which is the Elastic Search and also the Fluent D and also the Kibana so you have like the three components which forms this acronym which is EFK and Elastic Search Fluent D and Kibana so that is the main components from the Elastic Search and what those like Fluent D, Fluent D just like do the tail from logs from every node which I have running OpenShift so I will split pods from Fluent D on every node and these pods will be telling the logs from my node and information also from my namespaces and applications which I'm running on top of OpenShift. These logs are after the tail from Fluent D sent to Elastic Search and Elastic Search is a store for these logs so I will have these logs there and also indexes for these logs so for an easy approach, easy use for these logs so I can view it on Kibana and Kibana it's just mainly the tool which I use to open these logs which are stored in Elastic Search just like a web UI which I use for that and that's what I told you EFK so the Elastic Search is the store which I have so for logs I deliver these logs for Kibana I can visualize these logs on Kibana and also create indexes and one thing of Elastic Search which we should pay attention of that is like it's a Java application so I will not go like in discussion if Java is good or not but there concerns a lot of resources right so we should have like a plenty of RAM and CPU for that we have we should plan for it that do we have enough RAM and not CPU and also this with a lot of or a lot no but with good IO so you need to have fast disks that you could store these logs and you could be a fast access or be better to access and we're using Kibana right and firstly you will ask me hey Gabriel why what kind of plenty of RAM they should have or how much RAM and how much CPU should have for these it's hard to come to a customer and say hey put that amount of RAM and we'll work for you or this amount of CPU this is something that should be planned should be first checked how many logs you have in your OpenShift cluster how many amount this amount of logs and also you you should need how many or much RAM for it we have some kind of documentation and they are documentation we have a kind of formula and for this formula you can see how much lines of room logs will be stored and how many indexes and everything but for my experience I should like put a lot of RAM firstly and CPU associated to this logging stack so if I see that I need less resources if the logging stack is really running well and they can take these resources back and they could do it and then it will be much better and one thing that is happens a lot of time if you are setting elastic search and you don't have a plan how to do it and there is a need some resources right and then you have problems taking these logs storing these logs with elastic search and also producing indexes and if that happens will happen that you cannot see these logs on Kibana right because elastic search cannot prepare these logs so I can check on Kibana what kind of logs I will need. Fluency is also a port which I will have running on my nodes OpenShift nodes and the approach for these is that you will have like these logs send to elastic search and the elastic search will create indexes and will manipulate these logs so I can view it and visualize on Kibana and Kibana is just like poor web UI for elastic search which I use to check my logs which are produced on my OpenShift nodes. I have also some deployment considerations so what we should do before we just deploy the logging stack in our OpenShift cluster so the most important thing when I'm deploying an OpenShift cluster is as I already said is like to I should plan how big is the amount of logs which I will use or how many resources I will need for it of course the approach if you have the split clusters OpenShift cluster for development and another split cluster cluster for production you should also check it you need for development probably much more resources because you have developers developing or doing a lot of tests and CI CD and a lot of deployments on this development cluster and this will produce a lot of logs right and in another way if you are going to the production cluster you have your applications running there and you have your customers taking these logs or producing these using the applications right and they also need to check if this is an application which you are deployed on top of OpenShift that produces a lot of logs right you need a lot of resources too right and as I said to you there is a basic calculation I will show you after there's a newer L a solution from Red Hat and an article which you can use for the turbo shooting from the logging stack and you can have some basic information there for it and also it's important also that you check the configuration from FluentD mainly on the OpenShift we have some plugins which are already deployment with the FluentD configuration right running with together with FluentD but we are fortunately or fortunately we don't support new plugins if you start using different plugins together with the FluentD pods on top of OpenShift these will bring some complications to us so I will also do give an advice that we take care with with these just don't install different plugins on top of FluentD pods there because you probably have some problems and then if you reach our support and then we see that there's some plugin which is not so to say supported by us then you have some problems right the second consideration or another consideration is from Elasticsearch is like if you would like to have an HA setup please have three nodes from OpenShift running Elasticsearch pods so that you have this HA running your cluster and another thing is for storage right don't take care of how much consumption you have on your storage for this indexes from Elasticsearch it could bring some complications so leave it like between 50 and 70 percent no more than these if you have more than these you start getting some complications and also you probably will not consume logs you cannot consume these logs on Kibana and other considerations are that the Docker if you are doing the configuration from Docker there please set the json file as log driver this is something that is standard in the new versions for OpenShift and the JournalD gave a lot of complications for the customers in the past so we advise our customers to use the json file instead of JournalD so that we can work better with the customer and haplicas if I have haplicas from my data from Elasticsearch and please pay attention these haplicas are different from from the the nodes which I have from the pods which I will have from Elasticsearch I can have many pods from Elasticsearch but I'm saying here about the haplicas and the haplicas is the data which I will be storing from Elasticsearch so mainly if you have like one haplica we have two copies from this data from Elasticsearch and for three you have more copies for that you have like haplica from this data and then it will be better secure and that if you have some problems in a in a node from Elasticsearch you can use another haplicas with this data and another recommendation or consideration is that this configuration which you can set on the demon set from from FluentD is the merge in the json log right this is causing a lot of problems in our customer setups because there's a lot of applications and running on these clusters from our customers which have different kind of mapping from the data and from the indexes and Elasticsearch sometimes doesn't understand and why it doesn't understand is because you have some data which are stored on these indexes which are strings or binary data types so if you try to store a binary or integer on the string data type there you have some problems and with the merge json log as true this will bring you problems on the log because you cannot see some part of the logs or part of these logs will not appear on the kibana so false means that you kind of ignore these mappings from string on integral and they will not break your logging to appear then on kibana about performance I have some types to improve performance and first is like you just need to say it again right it's like please don't use NFS this is something which is are on our documentation please don't use NFS in production this is the worst error which you can do deployment from a logging stack and the data from Elasticsearch on top of NFS already said fast disks are a must right you need to have really to have a good IO right and RAM you need to have enough RAM that you could have better logging processing from this Elasticsearch and I will say why it is and firstly is that you if you don't have enough RAM and the consumption from Elasticsearch is too slow the logs from FluentD you will start on the nodes to stay over there and then you cannot consume the whole logs from FluentD so if you have enough RAM you can consume these logs from FluentD you will have no problems with this output from logs and consumption from this Elasticsearch and another component from our logging stack is the Curator the Curator is like the home job or which we have on the OpenShift logging project and the Curator the main function for it is like to do the maintenance which I will have on my logging stack firstly is like I will delete the indexes in a couple of days I can set up it's like 10 or 11 or 30 days right and he will delete the indexes in these days we will like we will work like really a cron job like a cron job and the recommendation which I spoke with our team is like to have seven days you if you would like to have more than seven days will be complicated because then you have much more indexes and this will affect performance and also can even break your setup like with customers already had set up for 60 days or 45 days and this break up complete this setup so keep it in just seven days if you would like to have it for more days more than seven days and have better performance or a better approach so I will recommend that you probably need to go for another product like as a plank or another logging product but for us for the our logging stack is not recommended to go for more than these and also I said to you that about the replicas so you have the copy from data from elastic search in a couple of elastic search nodes right and the shards which you the indexes are split it through the nodes which are running elastic search and this is splitting is like doing by shards right the thing is if you have a lot of shards and more than the usual this will affect performance so use this shards configuration with caution like don't just split the shards for a lot of nodes or do a lot of shards for it because it could affect performance and even break your setup your logging stack setup so I have customers that are just using the normal configuration for shards and also taking care of this deletion from from from the indexes and this this works really really well but if you have a really big setup so you need to do it but do it with caution right always if you have support from our product from redhead so contact your support team or a team or technical account manager and ask him for this and then we have some common errors which you have in our logging stack so this is what happens with a lot of customers first it's like the fluent D doesn't work anymore or doesn't send the logs to elastic search and also mainly the customers perceive this error when they cannot see the logging on Kibana so this is a problem it's unfortunately resources you need to check if you have enough resources for it right and another error is also that the fluent D cannot send quickly the logs to elastic search and elastic search cannot consume it as as wished and one tip or one hint which I give to my customers is like if you I will speak about more about the logging dump right but if you someone asked for you to do the logging dump and this is the to take all the information from the logging stack running and put in some files and put in some description files and if you are to produce in this logging dump and you have some errors like exit code 60 or 28 or some exit code this could be that you are having problems with resources that even the logging dump even the catching from whole information which you have in your logging stack you have some problems you don't have enough resources to run this logging dump and you have like this URL which I put here and I don't know probably Diane will share this presentation with you this is the main article which we have on redhead with a lot of trouble shooting hints what we should use to troubleshoot open shift logging stack so also there is some link the link for the logging dump how I can take use this script for the logging dump and also how can debug kibana elastic search and friendly there's different topics on this article and I think it's really important that to to use that so now I was just speaking about open shift three or at least the the logging stack mainly from open shift three and I was talking with the colleagues about the open shift four and what we should expect from the open shift four and the logging stack mainly if we compare the setup from the logging stack on open shift three three x and four x when you are deploying things on the open shift three you are using ansible for it we are using ansible scripts right and on open shift four we are doing the deployment from from the logging stack using an operator we have a cluster logging operator so we can go to the web interface on our open shift there and just do the go to the operator hub a search for clustering cluster logging and do the deployment from our operator for clustering login and then you have the kibana and splinty and everything that you need and I will probably in the end of the presentation show you this in open shift four web interface let's see so one thing so one thing which will come on the next versions from open shift of the logging stack will be like the logging forwards and that means that you will you can forward the logs which you are having on flinty to another sources like you can export for splunk gray log or kefka and you though will not need the whole infrastructure from from the logging stack right and you also can use tls between the collector which is collecting the logs which is the flint d right and the destination which will forward the logs now and the open shift three four four three is tech preview you can now start using it and check it and this will be ga are generally a general availability for version 4.5 and for it you can also and for the logging forwards you can also you can have the audit logs you can forward these audit logs from the systems to another external systems and then you can check like in the example here like the how how is the access for the cluster and then you have these audit logs also in the external tool you can also send in a system and also will be an upgrade a upgraded version from the elastic search in kibana so we have the version 6 from both right we'll move from the search board to open distro yeah and the new data model will also improve not just its scalability but also performance right and we'll have a better separation between the operator from the logging the cluster logging and also elastic search and the product also from elastic that is kibana this is planned for version 4.5 so the logging dump this is for both open shift for open shift three x and four x when you have problems on the logging stack you have different ways so that you can debug the problem or troubleshoot the problem one could be like you can access the logging project there and with the OC command line right and check what pods are running and check the logs or check the description for the described from the pods and from the components on the project or you can do a logging dump a logging dump is just a script that you run in our open shift cluster and this is script you just like catch the information from the open shift logging project namespace and we'll create a directory with the with this whole information so that you could use it for debugging the logging stack what is happening if you have enough resources or if you have problems with indexes and and so on right and if you are working actively with the support from redhead you can also send it for for the support from redhead here I have a logging dump here is a file which I compact and put in a small file if the whole information from a logging stack it would like to have some help on this that's that we can do so mainly the operation for these is like you catch the script is this in cherry script right and then you run this script with as an admin cluster admin because you need to catch this information and then you have like a directory with the information and I will show you also a little bit about the logging dump in the next so the logging dump when I create this logging dump from my from my system from my logging stack and we have different directories and every directory has a kind of function the first level which you can say is like if is a directory for every component from the logging stack which is the corata and fluently the elastic search in kibana so I will have like inside of this directory so we have like the information from the pods running this fluently or elastic search in kibana so on and also we have logs right and the last directory which is project is really I scrapped from the whole project from the open shift log logging project so I will have there inside the demo sets the deployment configs the ports configuration and so on I will have the whole components from my project there in files so I can read it and I can see how the deployment from this logging stack was made right so to go in something to do something practical I have here an example from the logging dump I created here a logging dump file which I downloaded and it's just a shell script which goes through the logging stack and saves the details and the logs from the my logging stack in different directories so if I run this script I will go I will have a kind of directory like this and go into the directory as I said so you'll have these directories which is corata elastic search fluently kibana and project so go into the corata I will have mainly the description from the port running the corata if you remember the corata is just a crown job which runs from time to time and here I have just like some errors that is you didn't run well or have some problems right this is an error which I'm having here on my logging stack and if I go to the logs directory so I have some logs the files are already compact and they can use less for example and so we have like the content from this file you can see what is happening with my router and the same approach is if I'm going to the elastic search directory and for example here we already see that is nothing there so why is nothing there I will show you easy I have already an error on my cluster here running for example so if we already have customers sent to us these logs this logging dump and I see that the directory from elastic search is already empty is really something that is not okay so I will ask for the customer the first thing like to to show me if the logging the elastic search ports are running or not so we need to also then investigate why these elastic search ports are not running and then I can use a lot of tools like go into the logs or do a describe on the port and then I can see what is going on so another directory which I have is like if I go the main directory here of the Fluent D so I have my Fluent D ports they are running there I will have here files for every node running the Fluent D ports so that means that they have here six, four, seven nodes running a front D sorry so I open a file here and they will have the details from from my Fluent D ports like this is exactly if I run or see the scribe port on top of this OpenShift Fluent D port so and then we see also that here is the merge JSON log as true that's something which is we probably should change if you are having some complications with the logging from Fluent D and sends to elastic search and if elastic search cannot understand it I'm going to the log directory logs directory I have the logs from Fluent D so I can also use this or just open this file and I can see what is going on the first file which I opened was the logs from the Fluent D and the communication with elastic search is send logs to elastic search and here is like the log from the port which is the Fluent D port running on the node so I have some errors I have some problems here or just some warnings which I should check what is going on right and we saw in minutes ago my elastic search port is not running so here's it cannot send the logs to elastic search and also I if I do a check here I have probably also some stack file here stale files which cannot be sent to my elastic search and also it's important to mention that inside of logs I will also have logs for every port running on my nodes from an open shift right so I can go into all the this logging files from the Fluent D ports and check what is going on and go into Kebana it's also the same so I will have the description from the port running Kebana so this is the data which I have on my Kebana port running and in the logs also I have I will have the logs on my Kebana port running there it's also complaining that the elastic search is not running I have here some in my cluster I have some problems with the storage is the storage is not mounting so that's why this happened this error and if I go to the project I will see like my config maps demon sets deployment configs and persistent volumes and routes and services and so on so even my secrets they can check it out I have the information for my nodes here running the logging stack so one thing also I which I can see here doing this is also if a node from open shift is over committed if I have a lot of the limits really is really high so my node cannot attend the whole demand which I have on my logging stack and also I can see also the events from the project empty and also I said to you that you can change the merge JSON log configuration to false and he is on demon sets they go for flinty logging flinty I can see this configuration here or true so I will probably in this case go to my customer and say hey please change the demon set and set it for false of course you need to do it on on the cluster from my customer so I just see that is wrong and they can advise him so it's better to do and how you can fix this problem so is the mainly the logging dump which we have this is really helpful we will help a lot you to check what is going on with the logging stack and I recommend to you also to check this link to the troubleshooting the open shift logging stack and how to produce the login dump so you can start navigating into it and also check what is going on and of course we have all supported which can help you on this demand so I would like to thank you for being here to watch this presentation I have a lot of I presented a lot presented a lot of things there's much more than which I can do in a presentation about the logging stack I hope that you enjoy it and if you have some questions please vote on the questions I will check now the questions let's see yes there are a number of questions so and I think they're all good ones so the most recent one Noville is asking about forwarding logs to Splunk and they want to have the same indexing in Splunk i.e. namespace indexing but he's not seeing anything is one written up about it and he's wondering if there's any if this is possible and if you know let's see okay indexing is Splunk good question I to be honest I didn't I didn't use it is Splunk with the logging forward until now I don't have a customer doing that until now so we probably need to check the documentation and and find a way to do it probably it's possible right and if not we can do also kind of request for enhancement and try to fix it in the next versions from the the open shift logging stack and then there's another one from Amadeo yeah I'm checking it yeah what new elastic features such as index life cycle management are they available is it possible to use that in open shift without using curator oh the first version the first thing is like about the logging stack is like we are using not the most updated version from the elastic search so we are using the version six and as I remember the elastic search now the last version is the version seven let me check so we are not like with the resources or we are not the same version from elastic search which you can just use it without the open shift logging stack and I don't I never I never see someone doing it without karate I think karate now is the way to do it we would probably could check it in the next versions I think that was there was a couple other questions here clothes asking what you would recommend for any further reading documentation or books on this topic first we are using the the open shift if you have also access to the documentation from open shift or even okd right read the docs from from okd what we offer there is the first point of start and also the documentation from elastic is really good you have a really good documentation there but it's something that you need to to take care to pay attention that is that the elastic search which you find on the website from elastic is the some versions I had from our version use it in open shift so you probably try to to to implement something which is on this documentation from elastic website and it's not compatible with our version because we are a little bit back on this versions management from elastic search list and also the documentation from fluently are really good so I will also go for the website from fluently and let me check it out they have also really good documentation there which is docs dot fluently dot org so you have also some good documentation to start and I'll Gabriel and I will collect some of those links and put them in a blog post on openshift.com along with his presentation and the video for this so these are all great questions and you know we look forward to having more talks on this topic obviously and perhaps we can get the elastic folks to come on and give a talk about you know what's going on in their latest versions and what we can anticipate in the future yeah that would be a good follow on to this as well so thank you very much I thank you Gabriel for your time and for taking the time to walk through all of this I will try and get all of this up in the next day or so so look for it probably on Monday on the openshift.com blog and it'll probably be on the YouTube channel openshift sooner than that takes a little while to get the blog published so thank you again Gabriel and thank you all for attending and for your thank you all