 Hello everyone, my name is Manoj and I work for host analytics very good afternoon and welcome to the Selenium conference day one Yeah, so today's topic smart test failure analysis with elastic search I think a lot of you guys are spending a lot of time in analyzing the failures on a day-to-day basis And this is the last thing you want to spend time on right? Before starting the talk, I want everybody to stand up once Thank you so We have scribble pads or notepads. Just drop it on your chairs, and I want you guys to Clap wise if you agree with what I say, okay? You want to spend time wisely on writing new tests rather than spending time on failures. You agree? Two claps Okay now You are spending less than one hour per day on failure analysis Really less than one hour. You're spending more than two hours per day. Oh Cool, thanks, then this session is for you and we have only 20 minutes our session is time box so let's make best use of this 20 minutes and Let's go back home and fix some problems. Okay? So host antics who we are what we do we partner with finance to help them and the organization to achieve Peak performance. We also provide single cloud-based DPM platform It has everything you need planning consolidation modeling reporting analytics, etc And if you can see the host antics evolution, we are the first cloud DPM solution and first cloud consolidation or closed management solution provider and so on and So one of you know Gartner's magic quadrant Okay, so we are the only company apart from Oracle to be listed as leaders in both Financial corporate performance as well as strategic corporate performance management quadrants and This is the test automation evolution at host antics. We started Somewhere around 2009 on test automation initially we actually Were using silk test and then we moved on to what in which was then used as a unit testing tool. We announced that to Utilize for functional testing then in 2014 we Started with selenium and now we are a full-grown Framework we have to automate and execute the test cases and this is the automation evolution from 2016 to 18 in terms of number of test cases we execute on a daily basis today we execute or unclose to 12,000 test cases on a daily basis and That takes a hell out of hardware and time and energy So we have an ecosystem for this execution So this is the test execution ecosystem we do we use team city for continuous integration and continuous delivery and junk ins for distributed execution management and You can see the all the automation client agents which are used as a automation clients for executing the test cases and For at the end of every test suit execution. We write the test results to a database and We have built our custom dashboard, which code which will display the reports and results on a daily basis Okay, this dashboard will be used as a health check on a day to day basis What is the functionality that is broken and how much of the functionality is fully working condition? Okay, as you see, this is the sample execution results dashboard. We call it the daily execution trends we display for the last one week and You can see out of Like the typical failure rate would be where we're in around 47 percent if you consider that stats the around four to five percent of 9000 would be somewhere around 350 foot 350 to 400 test cases per day. So imagine QA team working on or spending time on debugging all these 400 test cases on a day to day basis We'll take a lot of time. It's not the effort where Management or even the team want to spend their time on right? So what we did was like we first of all we classified the failures What are the typical failure types we encode on encounter on a daily day to day basis? The first one is functional issues. Basically any functionality is broken our automation test case will be failed Second one is test script issue plaky test We are not using the right identifiers to automate This is a case of UI automation a third one is test infrastructure issue It could be a issue with the environment or a junkins issue or issue with the client or some something or the other The last one is test script catch up with UI and functionality. So how many of you? Will face this issue like some functionalities change some element locators are changed But the test cases are not updated with the functional changes. So this is the last part of the issue what we face typical Failures are classified into the percentages whatever that is shown here about 50% or test script issues basically like and 33% are functional issues and around 10% are test catch-up issues and 7 to 8% are test infrastructure issues So we classify the issues on these basis and we want to analyze like how to detect automatically Whatever issue that is popping up on a day to day basis, okay? So we came up with this architecture log analytics what we do is we typically parse the logs and Search for the log level basically if you have an idea on how logging is done. There are basically Multiple levels of logging info debug warning and error So what we do is we parse the logs and we look for errors the log entries of kind errors and then we generate patterns out of it We will go deep into it on the further slides. So this is the architecture the first segment is like the automation log segment where the agents will generate the logs and those logs are Read by file beats and these are fed or ingested by log stash and then index it with elastic search and Visualization we are doing with kibana. Okay. Similarly. We do the same thing with application server logs as well. So you'll get stuck with file beat Those who are not aware of you'll get stuck Elastic search is a free open-source distributed restful search and analytics engine built on top of Apache Lucene Basically, it's a full-text search algorithm Log stash is a server-side data processing pipeline that ingests data from multitude of sources Simultaneously, basically you can feed log stash with multiple sources not only one client I can feed with a number of clients In our case, we are providing an input stream for elastic and for storage and search Kimana as I said it lets you to visualize your elastic search data and It can be used as exploration tool for log and time series analytics and application monitoring and operational intelligence use cases a File beat is a lightweight shipper for forwarding and centralizing log data And it monitors the log files and locations that you specify and it has various capabilities like generating charts reports it can generate geo special charts also and Elastic has a very good documentation. You can go through the link where you can download all these Plugins for elastic and you can install it will on a high level We'll go through the configuration of each and every plug-in which comes along with elastic So elastic run the following command once you download Basically, what you need to do is go to the command prompt and run the Command which is shown there been elastic search. It will automatically Ups the server elastic server and for configuration. We are using elastic search dot via my file And we have different options to name the to name particular cluster or node configuring elastic search is a typical similar to Kibana or elastic so we have to execute that command and Logstash a configuration file consists of three parts input filter and output. I'll explain you all the three parts so the configuration file consists of input like from which port or The how the logs are shipped from file beat to log stash. Basically We are reading the log input on the port 5048 which is the default port for file beat and and We have filter basically we read each and every log entry from the logs and we parse that and For parsing we have a grok expression and we are doing it by time zone time-sizing approach and grok pattern will pass unstructured log data into something structured and variable and Logs time champs will be ingested into time zone specified basically. We have this Feature where my server log is generating logs on UTC, but I want to convert that into IST or PST We have this feature in log stash, which can be converted and finally the output the log stash output can be shipped or pointed towards Elastic or I can point it directly towards the Kibana So this is the grok expression. I was talking about if you see There are roughly five parts to it. The first part is timestamp You can see the section below it is showing the timestamp the first part and then space and then we are actually mapping a Keyword to the next section here the keyword is log level and it is mapped to info Okay, and the next Word is class name. Basically. What is the my class name that is? putting this log into logs log entry into logs and The next one is the line number in that particular class and Then we are actually reading the entire message into one keyword greedy data so Kibana config in Kibana, we should run this command and Kibana dot yml file is for the configuration. We can specify port the default port is 5601 this we can actually specify the host name if it is not the local host and Define similarly the server name Elastic search URL basically where it gets the data from so configuring file bit So ELK can be configured at any folder on any drive, but file bit should be configured in C drive program files So We host this file bit as a service So that at any time any new log or new log entries written to log This will read and pass it and supply it to the log-elastic or Kibana So first of all to install that as a service you have to execute the install service file bit ps1 command and to start that You have to execute the start service file bit command once you execute this command It will be hosted as a service and you can turn on it on and off using task manager as well So this is the example for file bit You can see the path where the logs are residing basically you are mentioning the log file path from where it should read the logs and The multi-line pattern if you see our logs in server logs or in Selenium logs whenever an error is encountered We'll get an stactress exceptions with multiple lines. So Multi-line pattern describes or sets a rule saying that if my log line starts with a date or time stamp Which would consider that line as one starting point of the entry log entry and again another time stamp is encountered All the multi-line log entries will be considered as a single entry so that you can read all the stactress from the logs and Logging Level we are putting it as info We want to not only read the errors and classify the errors We want to generate different kinds of dashboards say for example, how many Instances of a class is executed or how many Instances of a particular method is executed on a day-to-day test execution So we can generate such meaningful kind of dashboards and reports using file bit So this is the output path see program data file bit logs Sorry, it is a centralized location where it will be storing and sending it to the elastic The last part is output dot lock stash where it will be pushing here. We can as I said the file bit can be sending data to log stash Elastic and kibana in our case We are sending it to log stash and we are mentioning the IP address and a port so these are the charts which we have generated from kibana and You can see the charts the number of errors that are encountered on a time-slicing basis We are parsing the logs and we are generating the patterns And as we have seen in the grok expression. We're actually mapping each keyword to the log entry part and We are basically generating the aggregations based on time-sliced approach to get insights as I said how many of the How many how many hits are there for a particular class in this time? And how many errors are encountered for a particular tenant or a functionality that can be mapped So how does this help basically pattern matching using timestamp logged in username tenant and different modules or functionality will give you the insights of Like how many test cases are more concentrating on which part of the functionality or How are the errors encountered in our application which part of the Application is more buggy or a which is prone to prone to sorry Which is which is more buggy or functional? Which is functionally instable We can build analytics for error pattern analysis and map these patterns to test execution failures. So What we are trying to do is as I said there are four different classification of errors The first one is functional issues. So how we are doing it is whenever a error is encountered at a server log or application log With that in a similar time frame say for example in less than two minutes We are encountering a error entry in selenium locks. So we are matching these two errors by timestamp logged in user and tenant and Modular class name if all these four keywords are matching there is a 90 to 95 percent chance that that is a functional issue and We can monitor and validate pattern matches for accuracy even though there is 90 to 95 percent of accuracy There are few instances where we have encountered false positive say for example There would be an unwanted log entry in server log and we have a test case Which is which failed due to some other issue not a functional issue So when such cases are encountered basically we we analyze that and we mark we mark them or we blacklist them basically So that's how we eliminate the false positives So this is the custom charts based on hits here One hit means there is a log entry in Application server log as well as selenium log at the same time So we consider that as a hit and the probability of fit to be a functional bug is as I said 90 to 95 percent so challenges and Whatever challenges we have faced and what are the mitigation strategies poor logging practices like as I said Developers or QA people does not log properly say for example, they swallow the log so fellow the errors and they won't log it the exceptions Logging errors as warnings or info as I said log level even though I get an exception If I catch the exception, I would enter that as a logger info rather than logger error and Unwanted log entries to make it difficult to segregate there are unwanted log entries in server log as well as selenium log which are not required So the mitigation strategy is what we followed were blacklisting unwanted log entries and Identifying the false positives in both server server side as well as test automation side This actually drastically improved our accuracy in finding functional issues Not only functional issues, but also other kind of issues like a test make it as issues or test infrastructure issues so What next to improve accuracy of finding functional issues and other errors we want to use X pack It's another plug-in of elastic where it has machine learning capabilities and it can improve the accuracy based on false positives and We want to build a engine to identify functional and other issues automatically So as I said, this concept is not limited to the elastic stack or selenium It can be built upon any automation tool and any log mining tool say for example in market We have many other tools which can be replaced with elastic here Say for example, Sumo logic is one tool and Splunk is other tool which can actually parse the logs and It can generate dashboards and reports We have a minute So I just want to show so this is the Log Stash Kibana server So when I start file bit it starts to ingest data into Log Stash So all the elastic log stash and Kibana are Installed in my machine only so I'm actually parsing the log which is there in my local machine into my machine only We can have a distributed system Kated around multiple machines to build the same thing So we'll see How this data can be seen in Ibana since I'm creating an index So as I said, these are the custom dashboards we have so if you see here application log entry There is a entry in the application log at 24th June Early morning 2 o'clock Similarly, we have entry at two minutes after the application server log We can mark that as a functional issue basically all the timestamps tenant username and module are matching in this case and This is other kind of report what we have developed. We can see the failures Of a particular test case in last one week How many times the same test case has failed and what are the different reasons for that? I think that's it You can move on to Q&A basically we can see that so you can build your own Custom report in Kibana where I can actually drag and drop the filters or add filters and then see the Say for example here if you want to filter on something on message You can do that. You can add a filter here So you can see there is only one hit with the message So you can actually classify or get the log entries or classification Particular keywords and then make a filter out of it and generate a dashboard So it could be login or any other issues suppose a single functional issue might result in finite test cases Failures I can still search that error and I can filter that or generate by generate a report on top of it