 Welcome everyone to the session, How I Reduced APM Test Execution Time by 54 by Chandan Mishra. We are glad they could join us today. Thank you. Thank you, Nidhi. So welcome everyone. My name is Chandan Mishra. I work as a lead estate in Phoenix. It's a company in Indonesia, dealing with Fed Declones and etc. So yeah, so this session is inspired from an activity which we did as a team in Phoenix last year. So two years back when I joined Phoenix, we saw that we started with there was no automation at that time. So we started making automation scripts slowly slowly to become bigger around nine months after nine or 10 months. And then we started to see that the execution is time is reaching like four, four hours, five hours. So then it became a problem. And then in last July 2020, we tried to do some activity. Did some research and how can we improve where can we make changes which can impact in the execution. So while talking about that, what we did and how we were able to reduce time more than 50%. So before I start, this was a problem statement. And then the framework which we are using, we were using at that time is Java as programming language, APM for APM or mobile app testing. Test and you to execute our test. Manage and execute our test gradle to for build dependency, management and with attribution, gen case to run our builds in CICD mode. We have three types of apps to deal with Android, iOS and mobile web mobile web, because our application critical has integration with multiple e-commerce websites such as Tokopedia. So they are what happens whenever users clicks on some some link where we have to pay through a web page opens up and then they complete their transaction there using mobile web. So all these, all these things were done using APM itself. Initially, we started with local devices, but then when our test cases became so large and also due to the pandemic, we had to move towards the cloud device for this provider or browser stack. So you're using at that time, you're using finalized time. So at that time, we had around 287 cases before we did this activity and total execution time was 240 minutes. And this is for like, they were around 40, 40 failures at that time. So, so basically around 240 cases were getting passed and the other ones are getting failed. So this is the execution time for those. So I will be now talking about one by one, what are the activities we did. And then in those cases, I would also be taking which had made the maximum impact and there are some changes which has not made much maximum impact, but good to have. So I think as an automation engineer, when you when you go into the automation field, this is the first beginner mistake we make, we make a lot of use of expert. Okay, sometimes this is due to not knowing that IDs are faster and sometimes in our case also, because of a lot of accessibility IDs and those IDs are not present in the application, we had no other way. So we had to use expert initially. But when so it was fine till time, let's say we had 50 to 100 cases, but slowly, slowly we were seeing that the test cases execution are taking a lot of time, especially for iOS. Android was still fine, but for iOS the difference is huge when you compare to an iOS accessibility ID to an XPAR. So for a normal dashboard element, it was taking 48 milliseconds for XPAR and for accessibility ID, it was taking only around 20 to 60 days with a difference of around 100% that there's a twice difference, a 2x difference. So what we did for this, we worked with our developers and product manager. So it was like not a thing which we can control. It was with the product manager and developers. So we worked with them. It took some time because developers and product managers has a lot of other things to do compared to helping out us. And as I said, the application was already old. So there are a lot of pages that does not have these kind of accessibility IDs. So it took some time, I think around 2, 3 months. But after that, once all the IDs were there, we were able to see difference of around 10 minutes. As I said, this is not making much impact, but differences around 10 minutes are visible. And of course, when there are more failures, you can see the difference being less and less. The second thing, and this is a favorite command for a lot of beginners, automation engineers, when they have no way and script has keep failing and they are not able to handle synchronization, they try to use thread.zip, including us. But not in a way where we can say we are using it for XFAR, etc. because we already knew that we have to use explicit ways. The problem was with data sync. Data sync means, let's say I perform because we have multiple modules and multiple applications interacting with each other, including our microservices. So we had to wait for some amount of time for data sync. Let's say we created a loan on one side. We had to wait that the loan is available now in landed debuts. So it was taking more than three to five minutes based on the consumer queue what we have. So in these cases, DB validation, sometimes race condition also because we were running everything in parallel. So in those cases to handle those, we tried to use thread.zip. Let's say if we see that maximum time is five minutes, we try to use thread.zip for six minutes. It was fine. But then if we keep doing that because later on, we started focusing on DB test. And then we saw that we are keep using it. And then one day the test failed. All the DB test failed because the consumer was not working. So it took around eight to nine hours to build to complete. And then we found out that this is where we had to spend time. We cannot keep running our execution, test execution using thread.zip. So what we did in this case is using a waitability. So this is one example, a waitability dot wait dot at most five seconds until status is updated in DB. I customized that on top of that, I created some other additional changes. And then the final command would look like this retry. Number of time I have to try out like 15 this case with timeout interval. So I can keep changing timeout interval. So let's say if I have to check every five seconds for cron job, et cetera, I can do that timeout one and till I have code here, which I can run as a lambda expression. Right. So I said that till the data, data loan ID is present in database is not that. Okay. So till it is not only keep checking for 50 times every second. So rather than waiting for five, six seconds, we were able to build 50 seconds. Yeah, based upon the consumer time again, but rather than waiting for exactly 60 seconds or exactly 50 seconds or exactly five minutes, you can use retry now. So we don't have to wait for extra time. Let's say if we had cleared of sleep for five minutes, we are waiting for exactly five minutes. But with this, if we have received the data in one minute only, we are not wasting the other four minutes. So that has made a huge difference. That has made around 30 to 40 minutes of difference when we did this activity. Using common explicit wait time, like a lot of automation engineer already know what is explicit wait time. I'm not able to talk about it. I'm talking about not having a constant or single explicit wait time. Okay. So as your application, you can see that in your application, there are pages and there are elements. So whenever you open a new page, which is dynamic and depends upon DB or API calls, takes a time to load. And in that page, once one of the element is, which is the most important element is loaded. Others are very, very quick to load. So don't have like explicit wait time. Let's say one explicit rate of 30 seconds to all elements. So what you can do because the problem with this is, let's say when your test cases are failing, it will check for all 30 seconds for all the elements. So you are waiting for, let's say if the page has five elements, you are waiting for 150 seconds in case of failure. Everything is fine when everything is success. And in case of failure, you're waiting for 150 seconds, which you can reduce if you are using multiple explicitly. So what you can say is that the elements which is most important and the first element to load on the page, you can have a long wait like 30 seconds for dot by default for elements, which are normally used in the application. You can have 15 seconds. Minimal elements are those elements. Let's say you have some logic in your application that when I select this, drop down this element appears. So because that displays the data based on the logic, it is already loaded in the back end, in the application, but it displays based on the logic. So for those kind of things, you can use minimal wait like two seconds, one second. And then there are elements which, if you wait for 30 seconds, the first element, there are other simple static elements like text views and et cetera are loaded anyway. So you don't have to wait for them. You just apply and move it at that time. So for example, see this page. So as you can see in this page, important elements are the request button. I have, I have blurred out some of the elements which were like private, but see the first is this request loan button is the most important thing. So I can wait for long, long wait for this item. Then other items like this and this, I can wait for a very minimal wait. Now there is a dropdown here. Now in this case, sorry, in this case, I can wait for, because this dropdown, this value is actually depends upon this dropdown. So for this particular thing, I can wait for only short wait or minimal wait. And then the other things like these 5, 10, 15 million is constant. So I don't have to wait for any, wait for them any bit. Like this bar, slide bar is also constant. I don't have to wait for this text is also constant. So that's how you can distribute your test. So when the tests are filling, it does not wait for everything, 30, 30, 30, 30, 30, like 30, 30 into 5 or 30 into 7. It depends upon like 30, plus 5, plus 1, plus 1. So you are able to reduce your time as 150 to around 50 seconds with that approach. So with this way, we were saving around again, 30 to 45 minutes in our execution. The fourth thing we did was using retry depends on priority, especially retry and priority. Why? Because again, flaky networks, user interfaces, handling risk conditions, lot of people in the initial part do use it. We were using it because our application was also not that stable at that time. But when they became stable, then we thought let's try to improve them. So what we did, the problem with priority is it is helpful for a sequential execution. But with the priority, we will not get the parallelism. So if let's say your five tests are running in parallel, if you use priority for all of five of them, then they will run in sequence. So even though you have 322 parallel, right, it will not work parallel. So try to have like a single test, single method for single test and every test should test itself. It does not have any dependency to other tests. Now what I did with retry, I did something different. I cannot ignore it directly. I cannot say I don't want to retry. Because let's face it, even you have like lot of stable features and everything is stable, not a network is stable, you will see some failure which are flaky failures. So what I did is for this, I made it dynamic based on my database. So what I was doing, I was storing all the test entry results in the database, test result database and from there I selected those tests which in the last week I was failing more than let's say three times or four times. You can customize the count there. Now once I got the list, I will pass this list to my retry method and then there, so I can pass this is like this. So I can search this method in the list if it is present in the list then retry else no. So rather than retrying for let's say 287 cases, I was retrying only for let's say 15 or 20 cases which are flaky compared to others. So this method is actually helped a lot and we were able to because this method is helped a lot when there is a failure because let's say for eight hour build we were able to save four hours because we were not trying everything. In case of success, it does not run anyway. So this retry customization help us a lot when there are a lot of failures in the test. Fifth thing we did it is basically focusing on test cases count. What I mean by that, a lot of time what we do is basically we try to map everything with test case management tool. Let's say in our case you are using test rails. So you are trying to match everything one by one. But the problem with that in test rail there is more duplicacy. They write precondition, push condition and everything that has been written. Here whatever we do we are actually writing a duplicate code. That is, let's say a lot of time we have to go to one particular section. So in the test cases we have to write that go logging and then go counter and then go to the loans. But in the automation test, let's say if I have five different tests, I will not do that. I will create one method which will be a before method or something and then I will be only changing where I have to perform different action. So in this way we can't say that we can make everything like disappear. We have to choose carefully which test to use. So in this way I will say that the tests which are not API dependent, which are not DB dependent or which are not dynamic starting in nature you can choose them. Also they are not failing as quickly like you have the database of all the failure you can which test are stable enough. So based on that you can make decision and let's say for social media test. Rather than testing that four different tests for Instagram, Twitter Facebook and YouTube you can just have one test social media website. Go to Instagram, check the Instagram web pages showing up. Use back button to come back here. Then Twitter check Twitter. Then again come back here. So use back button so that you are not wasting your time initializing APM or coming on this page. So you are already here and APM session is already live. So you can make use of that. Because these are heads up for this five minutes more to the session. These are your static test going little faster now ok. Knowing everything from APM commands. So why we didn't make mistake because don't want any dependency on databases, API layer or network layer. But the problem with this is every APM command is costly ok. It takes a lot of execution time. So decide yourself that whenever there is a possibility you need to test your UI only once. That's it. If you are able to test UI only one out of those 87 cases you don't need one to 86 cases. You just have to test once. And then for other things you can use API calls you can use shell scripts, you can use DB scripts. You can directly connect with the Python console in the server and then run the scripts to test. Just like developers test, they are unit test right. Not using in memory cases. So I will explain with the example here. So let's say this is my page ok. Here I have to verify let's say one field status, order ID, scheduling package admin fee, total pay in 30 fee. Let's say there are 10 elements here ok. If I'm making 10 different commands, get text command to the APM server that will consume a lot of time and I will pay to let's say if I have one command where I will say give me all the text use text and then I can identify that the index one I have this, I can map it to the hash map and then check my data with the data which I have in JSON or in hash map. In this way you will not see much but at least 15 or 20 minutes time will be less time difference will be there using deep links. This is the biggest impact maker. So what do you have without using deep link? Without using deep link. So let's say without using deep link you go a test flow case for login and go to hash load and go to account tab, go to known section and then apply. But if you have deep links then you can directly launch the app with the app link. Deep link let's say app name slash loan apply and you will directly launch the app. So you are saving around let's say 5 to 7 minutes which you are consuming and you are going to go to dial code and other stuff. So work with your developers, let them create lot of package. Anyway they do it for marketing. Just ask them that some of them you can create for test automation team also. Then optimizing time out value for cloud vendor. We were using browser site as a tool earlier. So a browser site capability where you can use your session. Session means let's say if alien is not doing something or ideal for let's say 60 seconds you can just set it here and then if you have somebody's writing script and it is taking more than 60 seconds it will automatically close. And then you will take action while it is taking more than 60 seconds and then you can move your head dot sleep to something like explicit weight or different weight type you used earlier. The second one test time out. Let's say a browser site has by deferred 2 hours. You can set that any test which is taking more than 20 minutes you will close that. Using analytics to drive your test execution don't run every test every time in every build. Take decision based on the production analytics. See what are the features which are used in production more by customers. Let's say in our case the customer were using loans, billing accounts, dashboard, payment registration. The other case is like FAQs social media were used less than 1%. So what we were doing we were identifying the test cases then based on that we were running the most important one with every build. The regression related one or the second important one with the nightly build. And these kind of as we are running weekly or as we are running bi-weekly. So after doing all this activity for nearly same amount of failures we saw that the reduction was around 117 minutes or 127 minutes. And then now it's been 1 year now and we are using the same practices and then we can see that we have made around thrice number of cases. Now 800 the count is but time has not become so much. Time is still around 40 to 50 minutes. These are the reference then there is a link for how to create app links then there is a link for different websites and then there is a link for worse practices. Before doing the good practices you should know what are the worst practices. You can get to know about them from me. Okay I am open to questions if you have any. I will just I will just read out a few questions that are there. So one of the questions is using common framework for both iOS and Android. iOS execution is very slow compared to android using hybrid apps any help on it. Yeah. So I will tell you that we are also planning we are migrating for IAPM to XC UI driver. Okay. So sorry native execution using XC UI driver using Swift. So if you see a large difference let's say 50% or twice the difference between APM and the XC UI right native XC UI you should use XC UI built in Swift code with your developers. So what we are doing we are not doing everything there we are using production matrix and testing out those features which are important in the code itself in the native Swift code itself. Okay next. Yeah while using deep links how can we handle app data in case of skipping login of the app will not load the user data right. Can we handle this in deep links? Yeah in deep links you can actually provide parameters. So let's say I can provide parameter like app token which actually logs in the user when he clicks on it. You see that all the marketing campaigns actually use it deep links and you use it daily. How they find out what their data right. So you can append the parameters there and then it will take action on that obviously it is not be like something very public they will use some kind of introduction mechanism there so that you don't know what is this but yeah you can use that. Okay one more last question. Will the DL links work on iOS? What what can you repeat that? Will the DL links work on iOS? Deep links. Yeah Deep links. Yeah. So it is not straightforward like like Android but there is a way you can contact the developers they can create it. Because in iOS a lot of security is involved so if you can get it so you can definitely test the simulator if you are within the code you can test it very easily. Not straightforward like Android. Okay right. And is iOS chain not faster than accessibility id? I will say that the difference is less but accessibility id and depends upon the like what kind of element you have let's say if there is an id on list then chains are faster otherwise accessibility are mostly faster. Okay We are a little over time so we will be ending the session here Thanks Chandan for sharing experience with us today.