 Okay, I am going to speak to you about a very important topic at least in the apium and the UI automation space. It's basically on how we can test flakiness in apium, a little bit about me. My name is Gaurav Singh. I work as a product engineer in Gojek Bangalore. Here are my Twitter handles and my GitHub IO profile, wherein you can find more info on how to reach out to me. So I guess let's start with what do we really mean by test flakiness? Can we have a show of hands from people here, like how many people have been working with apium for more than a year? And how many of you are okay with the reliability of your apium tests, meaning like it passes all the time? Okay, the smiles give me an answer. So yeah, so does this scenario seem familiar? You basically code your test in your local environment, it works perfectly, and you push it into your CI environment, either Jenkins, GitLab, whatever. And then you start seeing some test failings sporadically, I mean they pass some time, they fail some time, you sometime can't even make out why it's happening that way. And like it's a mystery to most people, but it actually points to deeper symptoms like problems with how you have written your framework or it can even point to some issues with the app itself. I'll elaborate more in this talk, but before that, why should we really care, right? Why is this important? Because as soon as your bills start becoming yellow over a period of time, you start losing confidence in your tests. The developers who are monitoring those test results will not look at them the same way. They're not going to trust whether it works fine or not. And in this process, you might actually start missing actual bugs in your application, right? Also, before we proceed, I know the diagram is a bit fuzzy, but it actually tells the architecture of Appium. And it's important to have a good understanding of why it is. Because Appium, as you know, is anyways a wrapper over native mobile frameworks, like XC-UIT for iOS and UI Automator or Espresso for Android. And then you basically have a remote server, which is either running in your machine or in some remote VM. And you have the actual client on your machine, right? With which you are doing your commands like drive-old or find element, send keys, whatever. So there is a whole lot of technology in between when you actually run your test. And certain times the problems can actually occur a deeper layer in the stack. And so it's important to know how it is all set up so that you can actually debug it faster. And we'll actually walk through that later in the slides. Okay, so we know it's a first class problem, right? But what can be the solutions for it? That's what we're going to talk about today. So we will choose some effective weight strategies. I'll walk over what those means. We'll look into what reliable locator strategies actually are. We'll look into how we can make our tests faster. We'll also look into how we can make our tests more reliable. I'll give you some debugging tips. And lastly, we are going to know about some of the best practices that you can implement so that it reduces drastically. So for most of the demos or the code that I'm going to show you, I'm using the API demos app. This is a fairly common app that you can find in Appium Java client repo. I have the link right here. And whatever code I'm presenting here, it's also updated on GitHub. Never needless to say, these slides are available. So I mean, you guys can always refer it after this talk. So let's get started. Whenever you're writing, the first time you really wrote your Appium test, you would have written something similar. You would find an element, you will perform some action, and then you'll probably assert something. And sometimes what you'll see is your app is not really in a state where you want it to. So you might actually put some sleeps. Let's wait for five seconds, six seconds, whatever. And it seems very innocent. It works in your local also. But then when it is promoted in CI, this can be one of the major causes of your flakiness. Why is that? Because you really cannot ensure how the app is actually going to behave when it's running in a CI environment. There can be much more bigger network latency on a real device compared to when you're running on an emulator. And the biggest disadvantage with this is let's say you put this in a function, like login function, and you repeat it for 100 of your scripts. You are already waiting six seconds per script, which is like six minutes of execution time on your test. So this is not a good idea. Everyone agrees. So let's stop using this. What can we do better? So Selenium actually provided a concept of implicitly waiting. And Appium team really inherited the same thing also. So how that works is at a driver level, you basically specify how long you want Appium to wait when it's trying to find certain elements. And now, as you see in the code, I've removed all thread dot sleeves. So this looks better, right? The code appears a bit more cleaner. And initially, it might appear so as well. But then this also creates a deeper problem. This timeout has to be sufficiently high enough to accommodate for the slowest element on your app. So let's say you say, OK, one slow screen, I need to wait for two minutes. And you set this timeout to two. Now a screen which actually is like it's never going to actually arrive at that state, even if you wait for two minutes, you are still waiting needlessly for it. And that is like a deal breaker, right? You don't get fast feedback. And that we hear the whole thing that UI tests are slow and all that. So if this is not ideal, then what can we do? So the best thing that we can do is actually use explicit weights. This sounds simple enough, but it works great. How it really works is you create a web driver weight object. You specify how much longer you want Appium to wait. And this is specific to certain screens or elements that you are using. And so you use until method on that. And then you provide it certain expected conditions, like wait for the presence of this element, wait for it to be visible, wait for this element to be clickable. And this timeout can vary from screen to screen, however you want. So now this way, you give intelligence to your script to actually wait at the points where you know it's worth waiting. And in other places, you maybe don't even have such a high timeout. So this is one way which you can actually do. Appium also provides you ability to override and write your own expected conditions. I'll provide a link on how you can do that. So what's the big takeaway here? Prefer using explicit weights. If you do this, your code is obviously going to become much more reliable. So we are done with one, five more to go. We need to use reliable locator strategies. So whenever you are writing a command like driver.findElement by xpath, and you provide a string there, the findElement by xpath is really the locator strategy that you're choosing. And the actual string is the selector. So generally whenever people ask what locator strategy should I use, most people will think of xpath because coming from a Serenium experience. And initially it might make sense, but then there are clear disadvantages to using xpath. Why is that? It's especially highlighted in terms of iOS because iOS does not support XML by default. So what the Appium team has to do is walk through the entire hierarchy, create an XML out of it, then actually get the actual element references from that and essentially construct the entire UI tree. This even though it's not visible to us, it's actually tedious and has a performance set. So on iOS, your xpath locators will actually be very slow. The second disadvantage is any change in your app hierarchy can easily break your tests. And it doesn't stop people from writing some crazy xpaths like this which they might get using some tool, right? This is not readable. Any element that is moved around in the screen will cause your test to break here and there. So now we know that xpath has some disadvantages, for sure. But what can we do, right? What is an alternative? The best alternative that you can actually do is accessibility ID. In iOS it's called as accessibility ID and in Android it's called content description. When you're trying to look in your UI automator, viewer, whatever. And Appium gives you a convenient way of passing these. Now, why is this better? Because this is actually being put in by the devs and the chances of this changing would be very less. Even if they change the layout of the screen, your element's unique identifier will not change. And so this is probably your best bet. But let's say you are in a team where you don't have direct access to the developers and you can't really reach out to them for every time. So then what can we do? Is there any other alternative? Turns out there is. So there are like native strategies which we can use on both iOS and Android platforms which actually can give us a benefit of using. They are much more faster than experts. So one is predicate strings. It's actually a query-like syntax that you can use to find your iOS specific elements. So in this example, if you see, I'm trying to find any element which is of type button whose value begins with a certain string and is visible. This looks like just like SQL, right? Most of the times it's quite intuitive. And you don't, you see I have not specified much of hierarchy here. It will find your elements very quickly because it's using native technology only. There is something better. Appium also has iOS class chain. This, the Appium team themselves have written which is like a wrapper over native Xe lookup functions. Now what this does is, it's actually a hybrid between Xpath and predicate strings. Making the syntax a bit more easy for you to understand. So here I have a similar example with the same thing. I'm trying to find any window whose name contains blah, right? And you use by the associated method in mobile buy, right? Why is this faster? Because you don't need to construct any UI tree in this case. It's a one-to-one mapping with native iOS commands and it's very fast. There is a caveat though. This is not cross-platform. You cannot use this on Android. So if you decide to use a native strategies, what you need to make sure is you create appropriate wrapper so that you choose a different strategy for Android. We also have something called as UI selector in Android which is provided by Google itself and Appium has given a way for us to use this library to find our elements. Essentially it's a valid Java code starting with UI selector object and then it exposes a whole library of methods that you can use. You pass it as a string in Android UI Automator and it works. There are some limitations obviously because Appium itself is trying to do reflection and all that stuff and figure out and then hit the API. But still in most cases, it's much more faster than XPath. So the takeaway here is prefer using accessibility ID first and if you don't have that, try to see if you can use native strategies over XPath. It will give you a lot of gains. Cool, so we are done with two, four more to go. If you follow the first two strategies, your test automatically become much more reliable and much more faster, right? But there is still, we can take this to the next level. So who doesn't really like shortcuts? We are all engineers here, right? You find a problem, you go to stack overflow, you figure out the problem and you might port it somehow, right? Similarly, any app that you have is generally just a sequence of some events like tabs, key presses. You make calls to your backend service to get some data or pursue some data into that and then you show animations, right? All these things take time. They actually can make your test slow. But what we really want in a functional test is to just get down to the meat of what we want to verify on a screen, right? And so we'll see some strategies for that. Let me take a very simple example. Any app has login, right? Any of your apps. So if you have 200 Appian scripts and you are doing login for all of them, we are unnecessarily wasting time. And what might so happen is login itself might fail and then you are never able to really verify your tests. That sucks, but what can we do to improve? So there are some strategies. At least on the Android space, Android activities are your friend. In Android, any screen that you see is basically an activity. Whenever it needs to talk to some other flow, it basically calls a different activity, right? Appian provides you a way to trigger these app, like activities directly, using either the start activity method, which is exposed on your driver, provided you cast it to an Android driver. And you can also set up by directly via desired capabilities, right? So using this, you can directly launch your app into the activity that you want to test. Devs can actually help you. So there is a concept called as kitchen sinks. What it means is right now on your screen, you see the Gojek ride app that we have, right? Essentially, it's a step like you make a booking. You have to select where you want to get picked up from, where you want to go. And then as a consumer, you see certain estimates, how much it's going to cost me, right? Now I want to just try and verify something on the estimate screen. I need not go through the steps of login, pick up, drop, with tons of data set up, and finally coming to the screen. Essentially, like the same point, back end can fail, or there can be some or the other issues. So a strategy like kitchen sink can be followed wherein a single screen can be created, which has a simple radio button or some interface for you to click around and directly launch the activity with some state already set up for you. So in this example, I can directly go to estimates with a pickup and a destination location already pre-selected. I don't need to do all that work. Here the devs obviously need to help you out a little bit, but then it saves a ton of time. One more strategy that we can actually follow is to use deep links. So at least in the mobile world, we have this concept and its cross-platform between both Android and iOS. So essentially a deep link is just like a URL. Here, this deep link means I'm opening an app called test. I want to open home screen with a given username and password. What the dev has to do actually is write some logic on the back end as some sort of handler, which understands what this URL means, parse it out, get the variables out of it, and then launch your app in that particular state. So if they were to provide you a deep link like this, with Appium, you can directly just do get, provide the URL, and then you are done. You can directly land onto that screen by skipping all the steps that you would do manually. Obviously, you need to make sure that you have one test that tests the entire flow, but then for the majority of your tests for which login is a prerequisite, you can just skip that, right? Okay. Disabling animations. So when you're running functional tests, let's say you are waiting for something, you do an action and you're going on the next screen. Certain times developers or designers will put in fancy animations on that, and that take time. Certain times it might so happen that your script is so fast that it actually tries to verify on that page and it's not available because of some animation running. That sucks because we are not even verifying that, right? So what can we do here? We can disable it at an ADP level. So here you have certain commands which you can just execute at the start of your session and which makes sure that none of the animations will play. And on iOS, it's a bit more tedious because we don't have something as awesome as ADP there. So what we do, what you can do actually is go to your general accessibility settings and reduce the motion. You turn the setting as on. And what that means is none of the animations will play. Doing this via an Appian script might be counterintuitive because you will waste time doing all the screen. So if you're using three, four real devices, it's easy to just do it as a one-time setup. So what's the key takeaway here? We need to set up hooks to directly land on the screen under test, which ensures that your tests are faster. Okay, so we are done with three. Already we have done so many improvements on our Appian scripts. And I'm sure once we do it, it's going to be much more better. But we can still go a little bit more further. We can make our tests more reliable. So what do I mean by that, right? Any application that you use in a typical setup has a bunch of backend services running. At least in Gojek, we have a huge microservices-based architecture. And so we have tons of different APIs which are owned by different, different teams. Ultimately, it gets wrapped and then you do something on the app, right? Behind the screens, we might be actually hitting 10 different APIs. Now, what if some dev decides to deploy during the day? Or let's say they have a downtime on their environment when you are running a test suit. Everything will fail, right? It's not the test's fault. It's not Appian's fault. It's a problem with your infra, right? So what can we do to like resolve this? There is a common enough technique called as mocking. So how that works, I'll try to illustrate here. So here you have a very simple diagram for a popular Java mocking library that we have, mock server. I have the link there. Essentially what you do is you set up a mock server to run either on a machine that you know or any Docker container, which has the ability for you to set up. Like, okay, let's say I want to do login. If I provide you the correct username and password, give me the login back, right? And based on that, you have to verify your screen. So we can essentially say, I don't want to hit the live API. I want to just hit this mock API and get the data on my app so that I can check whether my app looks fine, right? So all these requests can be set up on such a mock server with some more intelligence. You can also configure it so that with certain different types of data, you can set up like the same URL can give you a success response or a failure response or even edge conditions, which I know is very difficult to like really simulate in a live environment. So you get the benefit of that additional coverage also. How do you make sure that your app knows who to talk to, right? So here you need a little bit of dev help. You need to make sure that you generate a debug build wherein you specify, okay, I have a way to either go on a screen and then enter the mock server host name and the port that I want to hit or they give you some command line interface to set that. So as soon as you do that, you know you can talk to your mock server and then a large portion of your tests can actually be much more reliable, right? You might also need an ability to start and stop the server, but that is under your control because you are setting it up yourself. There is one more thing which we can actually do to even improve your reliability further. So let me take a overbooking example for example. So if you're making an overbooking, once you find the driver, there is a whole suit of screens that you see, right? Let's say I want to check that. So in this case, a typical approach might be you might either use deep links to create a booking directly and then you might go on to the screen and then verify it out. But you can actually enhance this even further. You can simply use the creation APIs that you might have for booking, make the booking via APIs for a given user that you care about and you know you have to verify and then using either deep links or your app land on to the screen which you want to verify. That way you cut down all the steps which are needed to actually set up that data for you, right? So that way you can increase your reliability much more. So the biggest takeaway really here is try looking into mocking your APIs and use APIs to set up your data. We are on a roll, we are done with four, just two more to go. So I want to ask some questions from you. So what do you do when your APM tests fail? What's the first thing you do? But I think what I used to do was say it's APM's fault. It's buggy. I don't know, what is this tool doing, right? Yeah, that might be the typical reaction that you might have but let's not end there, right? As engineers, you owe it to yourself to dig a little bit more deeper. So APM has this whole support around logging, a very nice support where they capture whatever actions you are doing via the client as proper logs, right? So I would encourage you to whenever, whenever you see a test failure, read your test code exceptions in detail, right? There are certain strategies that you can also follow. So what I would do typically when my tests fail is I will go to the session log start line and see what desired capabilities I have passed. Does it match the scenario that I'm trying to execute here, right? And that gives me a hint, have I done something wrong in setup itself, correct? Then the next thing is you go to the bottom of the log to see what is the last possible thing that APM really did, right? And it does give you like the diagram that I mentioned. It tells you whether it's at a debug layer, at a mobile JSON web protocol or something else. So it already tells you what layer of the stack there is a problem. So let's take an example for this. So what I was trying to do was trying to run a test on a device which is not even connected, right? So in this case, APM typically gives you an exception trace like this. You can see there are a lot of debug entries where it's telling it's trying to kill the ADB server and could not find the devices and it's restarting the ADB server, right? And in the end you say it ultimately times out mjsonwp and then it tells, okay, I could not find your connected Android device. Now, instead of if we were to actually have this visibility into seeing this log first, you can actually find out the real problem rather than saying for some reason APM is not working. It quite so happens that the library is going through multiple iterations, right? And you might think that, okay, the new version of APM is not stable enough, but these kinds of cases actually help you dig deep and figure out. Additionally, what you can also do is in this logs, just check for a warn or error related entries. And that might be a good starting point for you guys to start with. Beyond this, this is simple debugging, but at least on the Android level, there is a logcat log being written on a continuous basis. So what you can also do is actually APM provides you a method to get the logcat methods, logcat logs itself. And you can take a look at that to figure out, okay, did your app actually crash or what happened here? And this is actually good information to share even with your devs when a functional test fails, right? So the biggest takeaway here really is when in doubt, you should go through APM logs and Android logs, try to figure out really what is the meat of the problem. Okay, we are done with five and we are doing quite well on time. So there is only last one thing, some best practices and tips that I have seen from my experience, right? So as you know, APM is just a tool or an API for you to do something, right? It provides you a way to accomplish automation on your app and it doesn't really come with like out of the box reporting and all that. But that is something that you can build out. So if you see on the right hand side, this is a dashboard that we have within Gojek, which is done by one of the consulting services companies that we actually work with. And it tells you, we follow cucumber within our company. So it tells us, what are the actual steps? If something failed, it gives us a stack trace, which you can actually, which is equivalent to your APM log stack trace and it gives you a history of your test runs. This is quite important because the whole topic we are talking about here is reliability of our tests, right? It's very important to build a reliability profile of your tests. How are you going to know if you're not even storing these results somewhere? If you're just going by build by build, then you are going to lose all that data. So the key best practice, what I've observed is, try to store your test results in some persistent storage. Can be MongoDB, can be Postgres, whatever works best for you and keep on storing this at a historical layer so that you know you capture this data. The next step really is to visualize it into some dashboard which can either be something custom that you do and there are like, lot of reporting frameworks available in the market. So this is one straightforward thing you can do to make sure you have a way to analyze your data and you kill flakiness right at the root. But the biggest thing that is more challenging really is the cultural change, right? As testers or QA engineers, what you might typically see is devs are more focused on building out their features and they will not be giving it sufficient amount of, what do you say, care in the beginning. They might not build testability into the app. That is where you or we actually have to take that ownership to go and get involved right in the process when the features are being made and make sure that they get into the habit of building testability. By testability I basically mean accessibility IDs, right? Because that's the most stable thing you can do to make sure you have a headache-free automation. So the key takeaways really here are we need to build testability from start and we need to have a way to visualize our results. Oh, we are done with everything. So I guess we will not have any more flaky tests, no? Okay. So how did I come up with this talk? I would highly recommend you guys to go through Appium Pro Newsletter with Jonathan and Jonah Wright. It's really helpful, it explains all of this in much more level of detail than I could hear and that would give you a kickstart, increase your Appium knowledge a lot. Also, there is a webinar on the same topic by Jonathan which he did using, I think consulting with Hatspin and there is also a talk by Vim about mock server. So that's all, if you guys have any questions, we can take them now. Hello, this is Aldo. So I wanna ask about something about the mocking the API. So in the real cases, we want also to know with whether the API is returning the true response. So did the backend on Gojek provide some endpoint to like just roll back the transaction maybe? Like when I want to buy some food and did Gojek provide like a dummy driver in a real production and when the transaction is made, we just call the API for the rollback the transaction itself. Okay, so the strategy that we really follow is we like to respect the test pyramid, right? So below the UI test, you actually have the API tests and the integration tests and when you say whether the API is doing its job correctly or it's returning me correct data, that is kind of the purview of API tests. So we have, we invest a lot into writing API tests to make sure your backend services are really tested thoroughly for all your positive and negative scenarios and on the UI test, we try to minimize the number of tests that we are actually writing. So that way, that approach coupled with using mock server automatically makes your tests quite more reliable. But what we also do is, we do keep a very small subset of tests which are more end to end kind of thing. So that you do hit the live APIs once in a while and have a suit like that. But it doesn't need to be the majority of your suit. Does that answer your question? Okay, thank you. I want to ask about MOOC as well. So actually you already said that we have tons of API, and how you can manage with all the responses that we can keep up with changes maybe the contract changes so we can test the real of the response of the backend itself. Sure. So again, that is not a problem for UI automation. You have something called as contract testing that you can do using a lot of libraries. Packed is a very common library that you can do. So let's say your app basically has five services, right? You can actually write some contract tests for them which basically hit those APIs and validate whether the response has changed. And those tests actually give us a hint whether there is, there are contract changes. If there are contract changes, we obviously need to update our MOOCs. So that is an overhead that you do to get the benefit of reliability. So in the contract test fail, we sold update the MOOC responses, right? So let's say the contract test fail for illegitimate reasons. No changes needed. They fail for legitimate reasons. We update our automation accordingly. Essentially it's just an update for what response you will spew out for a different contract. Okay, one more thing. So you said that we could start the activity in the Android and is for iOS, is it possible as well? I mean. That is something I'm not aware of. And in our company, that requires a lot of MOOCs built into the iOS app. It needs a lot of support from your devs. But I would definitely check this out and get back to you if you can share your details. And one thing. And is it in Android when we're running the test, is it possible we launch the specific activity in the middle of the process? Yes, you can do that anytime. Because even within Gojek, we have multiple apps which actually talk to each other in conjunction like the consumer app and the driver app. So if we have to do like a end-to-end test across apps, we do launch the other app using the start activity thing. Okay, great. Thank you. You're welcome. But one more, but it depends on how the apps work, right? Maybe when we change the activity, the state will gun. Maybe it depends on the developer itself, right? To manage the how the application works. So what you would need to do is the device that you're running it on needs to have both the apps installed. And you are, let's say going through your app on the consumer app where you're making some bookings. And once you hit that start activity step for a driver app, it is going to launch that app and bring it onto the forefront. And then you can execute your tests, whatever are the remaining tests. Though I would recommend not to have too many tests like this because they will be super flaky. What I would rather recommend is if you want to test across apps, set up the state via API as I told and verify it directly on the consuming app. Okay, sure, thanks. Yeah, hi, my name is Sumit. My question was like you mentioned we need to save the reports in some MongoDB server. So what if we are running the test on CI? So won't it be a overhead if there is a server running and we have to save all the reports in DB? It is, but then you get a lot of benefits out of it. So even like within Gojek, we have like 18 different products, right? And all of us use a single MongoDB server with our own DBs at a product level. So what we do is we do maintain the result for like a month and then we set up a cron job to like kill whatever is the historical data earlier than that. You might not care for a data which is more than a month old, right? Yeah, okay. Are there any other questions? No more questions I guess, okay. Thanks, Karu. Yeah, if not, thanks a lot for your time. I'll be around.