 Hi, I'm Ravi. Welcome to the city of conference 2022. Let me have Anand Bhagmar who is talking about the test and the retries. We keep repeating tests and should we retry the fail test? That's our question. We get off run and we get said we are said to rerun the test of which are failed but is it a good practice? When? It's a good practice and if it's not a good practice, when? Yeah, that's what we will going to learn from Anand in this short talk. And to say about Anand, Anand is running us from Pune, India and he is a founder software quality avalanches in a sense of testing which he runs. And he's also a solution architect with aptitudes. Here Anand, go to you. Welcome Anand. Thank you. Thank you so much for that kind introduction. So hope you have a great time. Hope you get some great learnings about what to do, what not to do, what you would like to learn and experiment based on this new information. And today what we'll be doing is talking about why you need to stop having retries in the test and rerunning of your tests. So this is the topic that we have today that we are going to discuss today. I spoke about this briefly as one of the anti-patterns in the keynote yesterday. But today we get a chance to talk in more depth about what this is, what are the typical practices that teams follow, why some of these practices might be anti-patterns and instead what you should be doing. So let's dive deep in. As mentioned, I'm Anand Bhakmar. I've been in the quality space for more than 20 years now. I work with product and services organizations across the world in this time. I was fortunate to get a lot of different experiences in this time. I'm a selenium contributor along with that small contribution that I do. I have a bunch of open source products that I've built related with testing, test automation. That is my passion. That is why I'm here to share my experiences and I hope to have interactions with you so I can learn from your experiences as well. You can reach out to me on Twitter, on LinkedIn. I'll be very happy to continue the conversation there, share experiences, share thoughts and take this discussion forward. So let's get into our core topic, flaky tests. I'm going to need some interaction from you at this point in time. So the question is, have you ever heard of UI tests being flaky? If you can have a raise of hands, if you have heard this. Okay, great. A lot of hands raised, which is good that, yes, you understood the question and you understand what it means. I hope the people who have not raised their hands are not in the situation of saying they have not heard about flaky tests in the UI. And if they have not encountered flaky tests, then we definitely need to talk because I want to understand what are you doing right to avoid flakiness in your UI tests. So reach out to me and I would love to understand what are you doing right to make sure there are no flaky tests in your automation framework. So let's go with the assumption right now that, yes, UI tests are flaky. UI automation is flaky. Why is that the case? That is a question that we first need to understand. In my opinion, and this list is not in any particular order, but the test could be flaky because your test environment itself is flaky. There are some issues in the environment because of which when you run the test, it encounters some problem over there and your test fails. There could be network issues which might have this challenge as well. I hope we do not have any network issue right now in this session. That is a chance of flakiness happening in this as well. There could be synchronization or timing issues. Sometimes it just takes a longer time for things to happen because a load on the network may be more load on the back end. Servers may be more of your application various reasons. It could be performance issues in your product based on certain context. That means it takes a longer time in some cases, less time to achieve the same operation in certain cases, and that can lead to flakiness in your test as well. There could be intermittent issues again based on context of what is happening in your application that leads to race conditions which might lead to flakiness of your test execution as well. It is possible, especially in this microservice architecture that everyone is doing these days, that maybe there are some components which are getting deployed and that might cause a problem as well. So the deployment might lead to some form of issues as well when your tests are executing because remember, we are talking about your end-to-end UI test or a functional test. So anything that happens in the back end could affect the front end usage of the product. Your application, if it is using some third-party dependencies, like let's say a third-party payment gateway, and if that gateway goes down, which is not in your control, that can have an impact on your test execution as well. Data dependencies, I can keep talking about data for a long time, but dependencies on data and the data changing can be a big problem. It could be related with just the way you are implementing based on date or time type issues, end of the month type issues, end of the year type issues, or if you're dependent on certain time periods in your test executions, that could be a problem. If there is some dynamic data that is coming up or your expected data is changing in the product, then your test might fail again. Again, this is not something that you can control completely because it's a shared environment. So that could also be a problem. It could also be because of poor implementation of your tests. You are not implementing it in a robust fashion, which means some other tests when executing can have an impact on this test as well. Of course, the locator changes can cause your test to fail, but in this case, it's not really going to be 100% flaky, because if a locator has changed, you do not update the locator in your test, it is consistently going to fail. Unless of course, you use fallback locator mechanism, which may or may not pass, and the fallback locators may interact with the same element or a different element, we don't know, that could cause flakiness as well. There could be a challenge when it comes up with, when you're running against different types of browsers or devices, that could also introduce flakiness because some browsers may have plugins loaded into them, some browsers may be inherently faster or slower than the others, likewise for devices, and that could also become a problem. Especially the infrastructure where the browsers or devices are available, that could also make a difference because with one browser instance, everything is fine, but the minute you have five browser instances, it starts eating up your resources, your machine resources, and things are going to get slower, that could create a problem. Running tests in parallel, if your framework is not designed correctly, your sharing state between test data, between your test execution, that could be a problem. Unpredictable behavior of your test execution, if not run in a particular sequence, which essentially means your tests are not designed correctly. So again, there is unpredictable behavior what would happen in that case. There could also be an issue with respect, if you're running the test, it's a larger team or other more than one person, but the dependencies of the libraries you're dependent on, let's say for example, Selenium or Java version, or any other dependencies that you are using, if that is not consistently managed for everyone on the team, that could become a problem. That could have unpredictable behaviors. So these are some of the reasons, I'm sure you can think of many more based on your past experiences as well, why your tests can be flaky. I see a question that there was no audio, but I hope everything is fine right now. Thank you. So these are the potential reasons why your tests can be flaky. But now how do you avoid this flakiness? How do you handle it in your test execution? That is the main aspect what we need to focus on. And unfortunately, there is a very easy way out how flakiness can be handled. And that is ideas that seem good, but are not really good. These are anti-patterns. This is something I was referring to yesterday in the keynote. These are very quick and easy things we think is going to help make our test more robust, but actually it is not. It is not just not making your test robust. It is also hiding a lot of core behaviors, core problems in your product, in your environment or wherever the issue might really be. So what are these anti-patterns? Let's look a little more deeply in that. First is rerunning or failing tests automatically. How many of us rerun the failed tests automatically? Could you raise your hand? Quite a few folks. Thank you for sharing that. This is a feature that can very easily be implemented with TestNG. You implement a retrial listener. And the saddest part about this retrial listener type approach is you can say how many times you want to try and rerun the failing test till it passes. I find that a very sad usage of a feature. You are rerunning this with the hope that the test passes. And if the test passes, then you do not need to debug and find the root cause why did it fail in the first place. You do not need to follow up with different team members once you find the root cause to get the issue fixed. This is one of the main reasons why you're doing a rerun. That is not good. What a failing test indicates in this case could be network issues, could be environment instability, could be defect in the product, maybe related to race conditions. It could be a poor test implementation. And in most cases, it might be because of weights and delays. But because of these reasons and you don't want to debug and find the root cause, you just rerun it and hope it passes. If it passes, now it is someone else's responsibility how to fix it or if the problem is shown up. If someone reports a problem later on, you have a test that says, no, this test passed. This is not my problem. It is your problem or someone else's problem. You don't want to take ownership of this, which is not the right approach to take. The other anti-pattern is not about the test failing and rerun, but during the course of the execution of a test, you are retrying automatically hoping that that particular action or set of actions passes. For example, with this hope, you are saying that if the page did not load in time, let me just try to reload the page and see if it works. If it works and I can proceed. Or if the elements are not visible or clickable where you need to interact, let me wait for a longer time or retry refreshing that particular snippet or that page for redoing certain action to see if that element gets visible. If you do not do this, what happens? The test fails and if the test fails, then again, you end up in that rerun cycle and you'll need to debug the cause of failure. So might as well let's do whatever it takes to try and make the test pass, because the responsibility as an automation engineer, as an estate or whatever title that we have, that responsibility is to just automate the test and get done with it, move on to the next implementation. Because my responsibility is to write code to implement the test, not for anything else. I think that's a wrong attitude if that is the attitude we are taking. What type of problems might be there why certain of these actions fail? It could be indicating a performance issue on the client side or the server side for that matter. And that's why you think because of these performance issues, if I just retry or retry that action, it might work. This is not going to help us. These are not good ideas. Because think about this, if you have to log in to a particular page, you are not going to wait for 30 seconds or 60 seconds for that login to complete. If you as a user cannot wait for 30 seconds or 60 seconds and why are we putting a wait or retrying so many times up to 30 seconds, 60 seconds in our implementation for login to succeed. The real user is not going to do that. Because of this behavior, increasing the wait time, your test might end up passing, but your user is actually getting affected with that functionality. So how much to wait when to retry and when to say, no, this is a problem. We need to fix this problem. You really need to take a step back and think about this. We don't want to get into the band-aid approach and say that this element was not found, but I know the locator is correct. Let me just increase the wait time and it will pass the next time. That is a band-aid approach. Why did the element not show up in the first case itself in the desired time? In production, a typical, this is a Google thing, a standard page load should be in about three seconds. Of course, this depends on type of products and domains and everything, but a typical B2C type of application, a page load should happen in under three seconds, at least the first viewport. Because what that does is that helps the user keep engaged with your product. There's at least something showing up in my application that they can start interacting with before their attention gets diverted. Either they do a refresh or they move on to another application to do something similar. You don't want the user's focus to move away from your application. You want to keep them engaged. In that case, don't take this band-aid approach. Think about what is the right type of maximum delay that is acceptable for certain actions to happen. And if it goes beyond that time, then you want to fail the test over there. Now, in production, a page load time, typical page load time, SLA might be three seconds, but we understand, I understand that your test environments are not production scale, are not production quality, cannot set up the same way. But if in production it takes three seconds, can I expect that page to load in about six seconds or seven seconds in my test environment based on a limited data, poor network infrastructure, whatever. And if that seven seconds is an acceptable time from your test environments, then you should automate based on a SLA of seven seconds. If it takes more than seven seconds, then you have to question yourself and say, is this a defect or not? Is this a performance issue or not? You don't need to do focus performance testing to find out these obvious type of issues. Think about it. Don't just come up with a solution of increasing the wait time just because the test took a longer time, but rather the page took a longer time to load as an example. Don't just rerun the test because it will pass the next time there was another issue that happened. What have you done to fix that issue and make sure it doesn't come up again? That is a question you need to ask yourself. Don't take the easy way out. Get to the root cause of why the test failed in the first place. Understand that if you don't have enough information, add more instrumentation to the test which is captured automatically every time subsequently. Once you add more instrumentation capabilities, then you rerun the test and when it fails again, you will hopefully have more information why that problem happened the first time. So if you need to rerun, don't just rerun without changing any parameters and hope test pass. Do something, take some additional actions, take some additional steps to figure out how to avoid that root cause problem in the first place. If you don't have enough information, at least make some instrumentation changes in your test to capture additional information, additional data. So the next time the problem happens, at least then you will have more information why the test failed. And when you have more information, you can get to the root cause, you can try and fix the problem at the source, not just patch it up and hope it doesn't happen again. So let's talk about some of the practices what we can do, how you can remove the flaky test. First, of course, is reduce the number of UI tests. I know this is a Selenium conference, it's a functional testing tool. It's a UI test automation tool that we're talking about. But you have to understand the pyramid, the test pyramid, where should you have what type of tests, how many tests you should have at different layers. And it is very obvious that the less number of tests you have on the top, the more tests you have in the lower layers of your pyramid, you will get better and deterministic feedback from all your tests. So you reduce the number of UI tests, that is the first thing. The other thing is, other set of practices that you need to do is to get better at your test executions, is you have to build the quality and that means again, it comes down to the other types of tests, which is going to give you the better and more deterministic feedback. Make sure quality is a team responsibility. So when you get to the root cause of a failing test, it might mean that your network is unstable. So you need to work with the network team to make sure your environment is stable, your network is stable. So when you run the test, it is actually going to give you deterministic feedback. If infrastructure is poor, the server components, for example, they're running out of memory because it's less resources on that infrastructure. Work with the environment team who is helping set it up and tell them these are the problems, our tests are not deterministic, it doesn't add any value. How can you help make that environment stable? How can you help increase resources on that infrastructure to help your test execute better? So all roles involved in that application, making that application, releasing it to production, supporting it in production, all roles are involved in making sure your product is of good quality and you have to learn to collaborate with all of them when you get to the root cause or take their help to get to the root cause and fix it at the source. Choose relevant practices that help. I'm not saying which practices are good or bad. There is no such thing as best practices. There are a ton of practices for various different aspects of your SDLC. Figure out how and which practices, if you use, will give you good value in terms of understanding quality and releasing faster to production. So choose the practices given your context of the team that will help get to that stage. Focus on the code quality, that is the product side as well as the test side. You cannot just say, this is a test code. I can do anything that I want. I don't care. I'll just add sleep of whatever time my test passes proceed. No. You have to think about the code quality, how it is going to help. And you have to think about what is a tech debt that you have that you're constantly monitoring and at periodic times you are addressing. Focus on quick feedback. Your CI test executions running your tests in parallel, intelligent test data strategy. This is quick feedback. This will give you quick feedback. The other aspect of quick feedback, again, something I mentioned yesterday, do not swallow exceptions. Let the test fail immediately where the problem occurred, whether it's element not found or it is any other thing. Do not just handle it and try and retry it, unless your business logic allows you to retry it. But don't add that intelligence in your framework unnecessarily just to make your test pass. Wherever there is a problem, let the test fail exactly at that point. So it becomes easy for you to do root cause analysis. What is happening there? And when that test fails, try to capture and dump out as much information about your test execution at that point, including capturing of the browser logs or the device logs, screenshots, whatever else it helps to maintain, to build that context that will allow you to do a better root cause analysis. So proper root cause analysis is very important. Fix the root cause, not the symptom. This is the banded approach that we want to try and avoid having. So fix the root cause, not the symptom. You have to focus on upskilling yourself and the team members as well because not everyone has all the skills. I get onto a new team where they might be using a different tech stack. I don't have all the skills that are appropriate for that tech stack, for that type of product, for that domain. I need to upskill myself to be effective. Yes, I have a lot of concepts and ideas, but unless I know better about the domain, the tech stack, the product that is there, I will not be able to do justification to my presence in that team. I need to upskill myself. Likewise, you have to think about if any team members or your colleagues lack in certain capabilities, there's nothing wrong, but how can you help them get better? You have to focus on that. And choose the tools that are fit for purpose. This is extremely important given the context of your product, given the context of what you're trying to automate, choose the tool sets that are going to help you get to the end state that you truly desire. This is very important. Think about contract-driven development because the minute you start doing contract-driven development, then suddenly a whole different set of types of automation open up for you. You have contract tests between the provider and the consumer. You can leverage those contracts to same contracts to run API tests by mocking out all the other dependencies that it might have. You will be able to use these same contracts to do API workflow tests and stub out the external dependencies that you have. You can use the same contracts and stub out these external dependencies in a different environment and run your UI end-to-end test for your application as part of that. But having a uniform contract or the same contract that all these types of tests refer to is extremely important. Otherwise, you will end up stubbing some APIs in the wrong way. Your test will pass, but the minute that product is integrated with these other services, it will start failing because you have tested with the wrong contract in place. So contract-driven development is very important. And for this, one of the tools you should use actually, this is the older API. You can look at specmatic.in. I will put it on chat shortly. You can take a look at that. That's a great open-source tool where you can be doing contract driven development and have all these other tests going on. In fact, there is a workshop going on on this contract-driven development right now by Hari and Joel, but I'm very happy you are here instead of there for that matter. So thank you. Okay. The other aspect is think about how you can include visual testing as part of your functional automation. Remember, over here, my recommendation is to use APT tools with its AI algorithms that can give you very accurate types of validations. And it can scale for all different browsers and devices seamlessly, but choose whichever visual testing tool as long as it meets the criteria of your framework, of your type of application, your type of testing that is required. Use that as long as the visual testing tool works with your dynamic data, all different types of view form, view factors that are there, view ports that are there, browsers and devices, it is fine. It has to give value to you to help you take quick decisions on what is going wrong in your application and take corrective decisions. So these are very important aspects that you need to think about when you are thinking about flakiness. How can you understand, identify what is the reason of the flakiness and how you can get better at that? So I hope this gives you some deeper insights into flakiness. What are the anti-patterns? Why you should not fall into the trap of using those anti-patterns and where you need to put in some lot of hard work in making this better? I apologize if I use some strong words about why you might be using this in the first place. Apologize for that. But I really feel strongly if you are using this, stop using it, get to the root cause and fix it at the source. So I am done with this. I hope this gave you some interesting insights. Ravi, I don't know if you have time for questions or we could just talk on the Hangout table. We have time, Anand. We can go. Yeah, I can read out the question myself. That should be okay. So Charlie Pradeep is asking, production environment and lower environment will not have the same bandwidth and performance. So weight is the only choice. If we could use any other approach, please advise. Charlie, this is a good question. Yes, production environment and lower environments will definitely not have the same bandwidth, will not have the same performance tuning happened on the infrastructure. At the same time, your test environments do not have the data set, the huge data that production will be happening. So data is also less. What you need to figure out is what I mentioned earlier. Figure out based on what is your environment configurations. What is the SLA expected in production? If it is three seconds for a certain action to happen, what is a decent SLA to expect in your lower environment? If that lower environment, you say it is 10 seconds and that is acceptable based on the way infrastructure is set up, the network is there, whatever other factors that might be there, then you implement your test based on a weight of 10 seconds. But again, don't use thread dot sleep. That's a huge anti pattern in trying to handle weights, use explicit weights, use fluent weights where you are saying, I'm going to wait maximum of 10 seconds for this action because that is the expected SLA in this test environment in this lower environment. I'm not going to wait for 20 seconds or 30 seconds. Just hoping that it will happen in that much time. That is what I mean by make it contextual to the environment that you are running in. So don't use thread dot sleep, use intelligent weights. Intelligent weights says I'm going to wait maximum 10 seconds, but probably I'll keep pulling every one second to see if that action has happened, if that element is now visible. But I'm not going to wait beyond 10 seconds. After that, this is a test failed for me. Okay. So Charlie, I hope that helps provide you some thoughts on how you could handle it better. Vishnu Prakash is asking, is data driven approach good to have an automation framework? Can data driven avoid flaky test? Vishnu, I'm not exactly clear about your question. Are you talking about provide your data externally and use that for your execution? Is that what you're talking about? Can you clarify your question? Because I'm getting confused what it means in terms of a data provider. So please clarify that. Yeah. Vishnu, let me know if you want to talk. I can allow, okay, to talk. Okay, let me do that. Okay. Okay. Yeah. So in the meantime, we'll go to that. Oh, is he speaking? Great. Vishnu, you can unmute yourself and can you hear me now? Yes, Vishnu. Great. So my question was very simple. So whenever we develop like one test script, right? So instead of developing another 10, nine scripts, right, using one test script, providing the data from external XLR, like CSV kind of data provider, right? Yeah. That's what I thought it might be referring to a data provider. Okay. So I can answer that question. Now I get the question. Thanks, Vishnu, for clarifying. So I personally do not like a data provider approach. And again, I report to this yesterday in the keynote that was there. By the way, the video of that is out now. So if you just go to the conference proposal, you can hear, listen to that, where we've spoken about this aspect as well. But the data provider approach, I do not like, because if you take a simple example, let's say I want to fly from Mumbai to San Francisco. I could fly from Mumbai to San Francisco in multiple ways. If there's a direct flight, I could do that. I could go east from Mumbai and take a flight from there. So for our example, I could go from here to Japan, from Japan, I'll fly to San Francisco. I could go west. That means I could go to Europe, from Europe, I fly to San Francisco. There are multiple ways how I can fly. Okay. The data provider is essentially an aeroplane in this case. The data provider is just saying that I'm a plane, give me whatever data I will just take you wherever you want to go. And if any of those journeys fail or does not work as expected, then you dig through the reports and figure out what was the data and what was the reason for that failure to happen. Okay. I don't like that approach because your test automation needs to be extremely explicit about what is it that you are trying to test. I'm trying to test for a nonstop flight from Mumbai to San Francisco. I'm trying to test for a flight from Mumbai to San Francisco going east or going west. These are two, another two scenarios. These are very explicit scenarios which have different aspects of what you're trying to validate out of. You might be trying to figure out what is more optimal or not, for example, right? In terms of flying time, wait time, connections and so on. So each of these approaches is very important to know explicitly with a data provider approach. You are just feeding it to the test. The test is doing whatever it is saying out of those three journeys. If one of those journeys is not optimal, it is going to say test a dataset number two failed. But what was the dataset number two? Now you have to spend more time understanding that, figuring out what was a dataset two, it is actually going eastwards. That is not optimal. So you're spending more time in root cause analysis and based on that more time that you spend, then you will be getting fixing the problem. I prefer not to use data provider, but instead use an explicit test for journey and implement that. I hope that answers your question, Krishna. So that will not help avoid flaky tests at all. It is just dataset might have 10 different, data provider might take 10 different datasets and each dataset is actually running a test which might have its own different cause of flakiness. We have other two questions coming in, Anil. Yes, yes. Just getting into that. So how to handle retry in time shortage? What do we mean by time shortage? Again, not 100% clear on that. So whoever asked it, can you please update your question? Next question, Anil, from Gaurav. Yeah. So Gaurav is asking, instead of having too much UI cases, can we only have a few cases related to UI? Rest of the things would be verified from API to minimize the flaky test or test failures. Absolutely. That is what I'm saying. Understand the test pyramid, what it means for you in your context, in your application. And accordingly, any test that is important to be automated, you should automate it. But don't start doing that from the UI level. Start thinking, can this particular test intent be automated at the unit test level? Or is it a contract test? Or is it an API test? Is it a UI component test? Is it an API workflow test? And if all these answers are no, only then you go to, okay, I need to do this from the UI layer. So don't automatically start saying that this test is important. I'm going to automate at a UI level. So Gaurav, it's exactly spot on what you have said. Do not have too many tests at the UI. Move as many of them in the lower layers as possible. One last question from my side on it to you. So this is the typical use case, okay, what we witness and what I hear. Okay, almost okay. Whenever I talk to the in my or in the community. So the stakeholders ask, okay, the test engineer to retry the field test and to make them integrate. So how to address or how to communicate to those stakeholders who asked to retry the field test? How do you approach? I think the intention of this is something we need to understand, right? So the five wise is a good technique to try to get to the root cause. Maybe the person, the stakeholders asking you to retry, they are being measured at what they are being asked to give a certain type of report which they need to provide. Okay. So their purpose, their SLA, their metrics, their KPIs are about how many tests have been implemented, how many green runs do you have? Right. You need to get to the root cause and find the solution accordingly. Regardless of the reasons which I don't want to get into, it could be political or whatever it might be, right? I still don't believe that's a good approach because you're not getting to the root cause, you're not going to be able to fix what the core problem is. And somewhere or the other, your users will face that type of behavior. Right. The cost of understanding that defects so late in the cycle and then fixing it is huge compared to trying to fix it when you found it immediately. So it's a time to hand the session. It was a free and relaxed session for me and for Anand as well, I guess. Yes. It was a very needful and insightful session for all of us. Yeah. Anand from Pune and Ravi from Bangalore, you are asked. We are signing off from the stock. Okay. Stay safe. Have a good conference.