 Good player, so let's give it to Manoj Yeah Yeah, we'll wait for 30 seconds to get people settled in then we'll talk Hello, everyone. My name is Manoj and I work for host analytics a very good afternoon and welcome to the Selenium conference day one Yeah, so today's topic smartest failure analysis with elastic search I think a lot of you guys are spending a lot of time in analyzing the failures on a day-to-day basis And this is the last thing you want to spend time on right? Before starting the talk, I want everybody to stand up once Thank you so We have scribble pads or notepads just drop it on your chairs and I want you guys to Clap twice if you agree with what I say Okay, you want to spend time wisely on writing new tests rather than spending time on failures you agree To collapse Okay now You are spending less than one hour per day on failure analysis Really less than one hour. You're spending more than two hours per day. Oh Oh Cool, thanks, then this session is for you and we have only 20 minutes our session is time box so let's make best use of these 20 minutes and Let's go back home and fix some problems. Okay, so host analytics Who we are what we do we partner with finance to help them and the organization to achieve peak performance We also provide single cloud-based DPM platform It has everything unit planning consolidation modeling reporting analytics, etc And if you can see the host analytics evolution, we are the first cloud DPM solution and first cloud consolidation or close management solution provider and so on and So many of you know Gartner's magic quadrant Okay, so we are the only company apart from Oracle to be listed as leaders in both Financial corporate performance as well as strategic corporate performance management quadrants And this is the test automation evolution at host analytics. We started somewhere around 2009 on test automation initially we actually were using silk test and then we moved on to what in which was then used as a unit testing tool we enhanced that to To utilize for functional testing then in 2014 we started with selenium and Now we are a full-grown Framework we have to automate and execute the test cases and this is the automation evolution now from 2016 to 18 in terms of number of test cases we execute on a daily basis today we execute or and close to 12,000 test cases on a daily basis and That takes a hell out of hardware and time and energy So we have an ecosystem for this execution So this is the test execution ecosystem we do we use team city for continuous integration and continuous delivery and Jenkins for distributed execution management and you can see the all the automation client agents which are used as a automation clients were executing the test cases and For at the end of every test suit execution we write the test results to a database and We have built our custom dashboard which code which will display the reports and results on a daily basis Okay, this dashboard will be used as a health check on a day-to-day basis What is the functionality that is broken and how much of the functionality is fully working condition? Okay, as you see this is the sample execution results dashboard we call it the daily execution trends we display for the last one week and you can see out of Like the typical failure rate would be where we're wearing around 47% if you consider that stats The around four to five percent of 9,000 would be somewhere around 354 352 400 test cases per day. So imagine Qa team working on or spending time on debugging all these 400 test cases on a day-to-day basis We'll take a lot of time. It's not the effort where Management or even the team want to spend their time on right? So what we did was like we first of all we classified the failures What are the typical failure types? We encode on encounter on a daily day-to-day basis The first one is functional issues basically any functionality is broken our automation test case will be failed Second one is just script issue plucky test We are not using the right identifiers to automate This is a case of UI automation a third one is test infrastructure issue It could be an issue with the environment or a junkins issue or issue with the client or some something or the other The last one is test script catch up with UI and functionality. So how many of you? Will face this issue like some functionality is changed some element locators are changed But the test cases are not updated with the functional changes. So this is the last part of the issue What we face typical Failures are classified into the percentages. Whatever that is shown here about 50% or test script issues basically like and 33% are functional issues and around 10% are test catch-up issues and 7 to 8% are test infrastructure issues So we classify the issues on these basis and we want to analyze like how to detect automatically Whatever issue that is popping up on a day-to-day basis, okay? So we came up with this architecture log analytics What we do is we typically parse the logs and search for the log level Basically, if you have an idea on how logging is done, there are basically Multiple levels of logging info debug warning and error So what we do is we parse the logs and we look for errors the log entries of kind errors and then we generate patterns out of it We'll go deep into it on the further slides. So this is the architecture the first segment is like the automation log segment where the agents will generate the logs and those logs are Read by file beats and these are fed or ingested by lock stash and then index it with elastic search and Visualization we are doing with Kibana Okay, similarly we do the same thing with application server logs as well. So you'll get stuck with file beat Those who are not aware of you'll get stuck elastic search is a free open-source distributed restful Search and analytics engine built on top of Apache Lucene Basically, it's a full-text search algorithm Log stash is a server-side data processing pipeline that ingests data from multitude of sources Simultaneously basically you can feed log stash with multiple sources. Not only one client. I can feed with a number of clients In our case, we are providing an input stream for elastic and for storage and search Kimana as I said, it's let's you to visualize your elastic search data and It can be used as exploration tool for log and time series analytics and application monitoring and operational intelligence use cases a File beat is a lightweight shipper for forwarding and centralizing log data And it monitors the log files and locations that you specify and it has various capabilities like generating chats reports it can generate geo special charts also and Elastic has a very good documentation. You can go through the link where you can download all these Plugins for elastic and you can install it will on a high level will go through the configuration of Each and every plug-in which comes along with elastic So elastic run the following command once he download Basically, what you need to do is go to the command prompt and run the Command which is shown there been elastic search. It will automatically Ups the server elastic server and for configuration. We are using elastic search dot via YML file And we have different options to name the to name particular cluster or node configuring elastic search is a typical similar to Kibana or elastic so we have to execute that command and Logstash a configuration file consists of three parts input filter and output. I'll explain you all the three parts so the configuration file consists of input like from which port or How the logs are shipped from file beat to Logstash basically We are reading the log input on the port 5048 which is the default port for file beat and We have filter basically we Read each and every log entry from the logs and we pass that and for passing We have a grok expression and we are doing it by time zone time-sizing approach and Grok pattern will pass unstructured log data into something structured and variable and Logstime champs will be ingested into time zone specified basically we have This feature where my server log is generating logs on UTC But I want to convert that into IST or PSD We have this feature in Logstash which can be converted and finally the output the Logstash output can be shipped or pointed towards Elastic or I can point it directly towards the Kibana So this is the grok expression. I was talking about if you see There are roughly five parts to it. The first part is timestamp You can see the section below it is showing the timestamp the first part and then space and then we are actually mapping a Keyword to the next section here the keyword is log level and it is mapped to info Okay, and the next word is class name. Basically. What is my class name that is? putting this log into logs log entry into logs and The next one is the line number in that particular class and Then we are actually reading the entire message into one keyword greedy data so Kibana config in Kibana, we should run this command and Kibana dot VML file is for the configuration we can specify port the default port is 5601 this we can actually specify the host name if it is not the local host and define similarly the server name and Elastic search URL basically where it gets the data from so configuring file bit So ELK can be configured at any folder on any drive But file bit should be configured in C drive program files So We host this file bit as a service So that at any time any new log or new log entry is written to log this will read and pass it and supply it to the log-elastic or Kibana So first of all to install that as a service you have to execute the install service file bit ps1 command and To start that you have to execute the start service file bit command once you execute this command It will be hosted as a service and you can turn on it on and off using task manager as well So this is the example for file bit You can see the path where the logs are residing basically you are mentioning the log file path from where it should read the logs and The multi-line pattern If you see our logs busy in server logs or in Selenium logs whenever an encore error is encountered We'll get an stactress exceptions with multiple lines. So multi-line pattern Describes or sets a rule saying that if my log line starts with a date or timestamp It should consider that line as one starting point of the entry log entry and again Another timestamp is encountered all the multi-line log entries will be considered as a single entry so that you can read all the stactress from the logs and Logging level we are putting it as info We want to not only read the errors and classify the errors We want to generate different kinds of dashboards say for example how many Instances of a class is executed or how many Instances of a particular method is executed on a day-to-day test execution So we can generate such meaningful kind of dashboards and reports using file bit So this is the output path see program data file bit logs Sorry, it is a centralized location where it will be storing and sending it to the elastic The last part is output dot lock stash where it will be pushing here. We can as I said the file bit can be sending data to log stash Elastic and kibana in our case. We are sending it to lock stash and we are mentioning the IP address and a port so these are the charts which we have generated from kibana and You can see the charts the number of errors that are encountered on a time-slicing basis We are parsing the logs and we are generating the patterns And as we have seen in the grok expression. We're actually mapping each keyword to the log entry part and We are basically generating the aggregations based on time-sliced approach to get insights as I said how many of the how many how many hits are there for a particular class in this time and How many errors are encountered for a particular tenant or a functionality that can be mapped So how does this help basically pattern matching using timestamp logged in username tenant and different modules or functionality will give you the insights of like how many test cases are more concentrating on which part of the functionality or How are the errors encountered in our application which part of the Application is more buggy or a which is prone to prone to sorry Which is which is more buggy or functional? Which is functionally instable? We can build analytics for error pattern analysis and map these patterns to test execution failures. So What we are trying to do is as I said there are four different classification of errors The first one is functional issues. So how we are doing it is whenever an error is encountered at a server log or application log With that in a similar time frame say for example in less than two minutes We are encountering a error entry in Selenium logs. So we are matching these two errors by timestamp logged in user and tenant and Modular class name if all these four keywords are matching there is there is a ninety two ninety five percent chance that that is a functional issue and We can monitor and validate pattern matches for accuracy even though there is ninety two ninety five percent of accuracy There are few instances where we have encountered false positive Say for example, there would be an unwanted log entry in server log and we have a test case Which is which failed due to some other issue not a functional issue So when such cases are encountered basically we we analyze that and we mark we mark them or we blacklist them basically So that's how we eliminate the false positives So this is the custom charts based on hits here One hit means there is a log entry in Application server log as well as Selenium log at the same time So we consider that as a hit and the probability of it to be a functional bug is as I said ninety two ninety five percent so challenges and Whatever challenges we have faced and what are the mitigation strategies poor logging practices like as I said developers or QA people Does not log properly say for example, they swallow the log so fellow the errors and they won't log it the exceptions Logging errors as warnings or info as I said log level even though I get an exception If I catch the exception, I would enter that as a logger info rather than logger dot error and Unwanted log entries to make it difficult to segregate. There are unwanted log entries in Server log as well as Selenium log, which are not required So the mitigation strategies what we followed were blacklisting unwanted log entries and Identifying the false positive errors in both server server side as well as test start mission site This actually drastically improved our accuracy in finding functional issues not only functional issues but also other kind of issues like a test reiki test issues or test infrastructure issues, so What next to improve accuracy of finding functional issues and other errors? We want to use x-pack It's another plug-in of elastic Where it has machine learning capabilities and it can improve the accuracy based on false positive and We want to build a engine to identify functional and other issues automatically So as I said this concept is not limited to the elastic stack or selenium It can be built upon any automation tool and any log mining tool say for example in market We have many other tools which can be replaced with elastic here Say for example Sumo logic is one tool and Splunk is other tool which can actually parse the logs and It can generate dashboards and reports So I just want to show so this is the Log stash kibana server So when I start file bit it starts to ingest data into log stash So all the elastic log stash and kibana are Installed in my machine only so I'm actually parsing the log which is there in my local machine into my machine only We can have a distributed system Kated around multiple machines to build the same thing So we'll see How this data can be seen in Kibana Sometimes I'm creating an index So as I said these are the custom dashboards we have so if you see here application log entry There is a entry in the application log at 24th, June early morning 2 o'clock Similarly, we have entry at two minutes after the application server log We can mark that as a functional issue basically all the timestamps tenant username and module are matching in this case and This is other kind of report what we have developed. We can see the failures Of a particular test case in last one week How many times the same test case has failed and what are the different reasons for that? I think that's it You can move on to Q&A I think you know, I believe lunch is starting shortly outside But if you have any questions from it oh, she's more than welcome to answer some of them any questions So so far good talk So my question is a we haven't seen any group buy by test the failures So you you said like you are executing 3000 test parallely and what happened when let's say as you said 300 tests are failing per day and I need to debug 300 s I don't want to see each and every job in Jenkins and see what it could be a failure because might be Login is broken right and I don't want to debug or three hundred test which failed because of just login Right, so you didn't show on anything like because of login all of the three hundred tests are paid So can you just explain where can I see those things basically we can see that So you can build your own Custom report in Kibana where I can actually drag and drop the filters or add filters and then see the say for example here if you want to filter on something on message You can do that. You can add a filter here So you can see there is only one hit with the message So you can actually classify or get the log entries or classification Particular keywords and then make a filter out of it and generate a dashboard So it could be login or any other issues. Suppose a single function issue might result in 500 test cases failures I can still search that error and I can filter that or generate by generate a report on top of it All right, cool. Thank you. My nose Let's give them a round of applause All right, and I think lunch is outside. So Thank you. Hope you're enjoying the Selenium conference