 Hi everyone. I welcome everyone to the session at PM under the hood of WebDriver agent by Mykola. We are glad that he could join us today for this incredible session that we'll be having. Without much further delay, I guess I'll hand it over. Mykola, you could just start talking and start your presentation. Okay, thanks. So yeah, let's start. Thank you for attending the session. Nice to see you here. So I hope everyone can see it. My name is Mykola Mokhnach or just short Nick. I am working at Source Labs as senior backend developer. I'm actually contributing to Opium since 2014. And my areas of expertise is mobile automation, desktop automation, and also Opium architecture. And today we would like to talk about WebDriver agent, like what this is WebDriver agent and where it's coming from and why it actually exists there. So basically it's a quite big piece because initially this was developed by Facebook. I also I would like to say thank you to those people who were working on it initially. So for example, Marek Circos or Lawrence Lomax, big thanks to you guys. Even though Facebook does not support WebDriver agent now, they moved it to archive and replaced with some other project. But we in Opium continue contributing to it and supporting the project because for our purpose, for what Opium is doing is super important. And without it, we probably wouldn't have a possibility to automate iOS in general. So it's super important to us. And today we would like to talk about something that you would probably not hear anywhere else because this is like low level stuff, private stuff that is subject of change. So basically when you see some presentations from mixed test developers, you would see something that is usually public. But here we will talk more about private stuff, which is hidden under the hood, according to the name of this presentation. And we will also talk about how this works and especially how this will help you to improve your automation scripts, how you can actually understand you in order to make it better, and also what is important in order to implement it in the best possible way. So yeah, let's start. This is the content of our presentation. So the first will be about the accessibility and why it is important for us. As you know, all the XC test stuff is based on the accessibility. So that's why and not only accessibility is important for us. The second one will be about how WebDriver Edge in itself works, its architecture and how this is actually working in Opium itself. Then we will be talking about idling what it is and why it is needed. And the last one will be about the CUI element and the stuff behind it. So yeah, the first one is about the accessibility. So as you can see here, almost everything that WebDriver Edge is doing on the device screen is done through the accessibility framework. And why we are talking about this, because initially accessibility was actually not created for automation, but the main purpose of it was to actually help impaired people to interact with applications, not only mobile application. It's about all applications like desktop as well. So for example, there is such application as voiceover. And for example, if you cannot see something on the screen, you can easily enable voiceover and it will read the stuff. And basically what it's going to read is what the accessibility framework is providing to it. So that's why when you develop your application, it's important actually to have this properly designed because then it actually helps a lot to those people who cannot use your application the way the most people are using it. But for example, they have some limited possibilities. So that's why they have to use the accessibility. And this is actually helping them. And I'm super happy that actually right now Apple is also very careful with this topic and they really implemented really like many things to help with that and to improve that. So that's why if you want to make a good application, then pay attention to accessibility. So it's accessible and it actually helps all the customers to work with this in the best possible manner. And the topics about accessibility are that you need to actually assign proper identifiers to your element because identifiers are very important when actually those screen readers or any other application are going through application and like reading it to the people. So that's why when you select identifiers, think about it, what to put there. So for example, those identifiers like BTN Close or ID OK or don't click me one word probably won't be the best choice if people are reading this because this is very cryptic especially for people who are like far from the topic. So when the screen reader is reading something like ID OK, it's probably not the best way to understand what your application is doing. And yeah, think careful about what identifiers you select for it. The same about structure. So because accessibility is not only about particular controls but about all of them in general. So basically if you want your application to look properly for the screen reader and you have, let's say, custom controls there, then think about making those control accessible, which means that you have to properly design them as containers, put properties to those container controls and all the stuff. So then accessibility manager can actually distinguish them properly. And at the same point of time when you do this, then Appium will also be able to do this because Appium gets all this hierarchy, all the stuff exactly from there, from the accessibility. So if you see your three in the accessibility, let's say in Appium Inspector and you are wondering why some element is not looking the way you expect it to look like, then check your accessibility properties. Maybe this is the source and something that needs to be changed in order to fix it. And the last but not least, guidelines. So Apple, like I said, now is very good with supporting application accessibility and it has many nice guidelines about how to design it properly, how to, like, examples in the code and all the other nice stuff. Just read it and follow it and it will be already a big improvement for your application. So now we will talk about how WebDriver agent works. As you can see, like WebDriver agent itself is not a standalone application. It's an important part of Appium infrastructure. So there is like the whole chain of components that are talking to each other in order for your automation script to work properly. So basically, Appium itself is not like a monolith. There are several important components inside the feed that are talking to each other. So at the end, you can write your script in different languages and you actually don't see all this complexity inside of it. Appium actually hides all this complexity. So on the client side, you just have like nice APIs that you can use to automate both iOS and Android and possibly even other platforms using W3C API with some extensions. So basically here we can see like what WebDriver agent application itself includes. Like on the back side, this is the XC test API. This is basically the main API that it is using in order to talk to the device primitives. So in order, for example, to click a button or do any other action in your application, it calls exactly this XC test API that Apple actively pushing to be used in your tests. And yeah, WebDriver agent is not special from this perspective. It uses exactly the same API. But why, what makes it special? It's using this API and a little bit more that is by default hidden from you in XC test. And basically what WebDriver agent is is this main transformation core. So basically what WebDriver agent is doing is making it possible to talk for you by calling WebDriver REST API and internally it transforms those REST API calls into XC test calls. So you might not even know that XC test exists and what it is doing, but WebDriver agent is doing this for you. So basically if you call some action from here, it knows how to properly transform the section. So it's executed by XC test properly then. So and basically on this slide, we can see how WebDriver agent itself fits into Appium infrastructure. So here we have on the front side Appium XC UI test driver, it actually encapsulates the high level API there. So actually it wraps the WebDriver agent and all the stuff plus some custom stuff. This might be the whole topic of a different conversation what actually XC UI test driver contains and what is coming from WebDriver agent, what is not. But basically what you should know is that it just covering the high level API. Then on the next step, we have Appium base driver. Appium base driver is responsible for routing and proxying all this nice stuff that is coming from XC UI test driver. On the next stage, we have Appium iOS device module. So this is also something special that exists only for iOS and only for real devices. Why? Because the protocol that is used in order to talk to the real device is also some private stuff. So in order to make it possible for you to actually connect to the device, let's say, because Appium does not talk to the device using its network address, but instead it proxies actually all the network requests to the machine where Appium is running. And this is what is actually this library for. It helps you to make this part forward in so called probably the same stuff that in Android ADB does. And then we have the device where actually this WebDriver agent stuff is running. And this WebDriver agent stuff itself is also talking to the test manager D service that is running on the device. Yeah. And this is actually responsible for executing all the XC test primitives. So we will talk about this a bit later how this actually this interaction works WebDriver agent to test manager D. So, yeah, just for you to see like how this all works and how long is the path here. So let's say if we are on the client side call element click, then Appium itself, like there is this umbrella driver which executes, which actually finds which driver executes for this particular call. So it calls then XC UI test driver, then XC UI test driver calls base driver, then in base driver, there is this proxy, which actually calls the appropriate API from the WebDriver agent. There is this IPM iOS device library in between, which actually makes it possible to forward your request from base driver to the WebDriver agent, which is running on the device itself or on the simulator, but nevertheless, it's like a separate machine. And now there is WebDriver agent itself, which transforms this call and executes actually XC UI element tap. And internally, then XC UI test passes this call to test manager D and test manager D does its magic. Of course, it's not public API, it's some private Apple stuff, which also might be a subject to change. So basically that's why there is a question mark. We don't really know what it is doing. We assume that it is just synthesizing some event like internal system kernel event. And then it simulates actually the top there. So basically WebDriver agent API has several important categories there, which is WebDriver API itself. The management API. So for the management API, there are some endpoints, for example, health or shutdown that allows you, of course, to check the health and to shut down the WebDriver agent. There are some custom APIs, like so called extensions. So let's say hide keyboard, it does not exist in W3C protocol, but nevertheless, it's very useful for devices. And it also has the stuff that is called screen recording API. So this API allows you actually to get the mjpeg stream from the device, just like stream of screenshots, in order for you to nicely reflect the device screen somewhere, you know, depending on where you want to do it. But yeah, this makes it possible. The next topic is to be or not to be idle. And why idle is so important for us? Basically idle in general is super important for the WebDriver agent and XCOI test in particular, because if we are not idle, then XCOI test has to wait and to wait until actually we are in idle mode. So basically here we see, let's say this is our application, the timeline of our application, and the application is doing something on the main UI thread. And when it is doing something, actually XCOI test has to wait until this is done. So internally, it has special procedures that verify that the application is actually not doing anything on the main thread, not doing anything for XCOI test means that not actually executing any actions or not running any animations, because animation is also like done on the main thread in order to draw everything. So basically it waits until it's done. And only after that executes those accessibility action. And what happens if there is no space for idle, it does not know how to execute or where to execute. So it waits, waits, waits until it actually expires. And this is actually the main reason for many complaints that the XCOI test is slow or not executing those actions in time. Because probably your application just does not give it a chance to execute those actions. Those things are taking too much time. And unfortunately, yeah, XCOI test waits, waits, waits, and then expires and files probably. So here we can see actually why it happens, why your application is not in idle mode, because there might be some background animation that never ends or some internal server polling or something that actually prevents the application to be idle. So XCOI test cannot interact then with the accessibility. And this actually creates this set scenario, actually, which is also the case for Android as well. Let's say, for example, UI Automator 2, it's exactly the same. It has to wait until the application is in idle mode. Because if the application is doing something, it cannot properly execute those actions because it does not know if this is a proper time to do it or not. And yeah, those are actually about idle intervals. So by default, it's 60 seconds in vanilla XCOI test. For us, it's 15 seconds. And if we actually tweak XCOI test and try to still execute our action, we don't know what happens after that. So we hope that it will be OK, but all the synchronization stuff and other like something that's happening under the hood, we don't know actually. It's a matter of pure luck if this is going to succeed or not. So yeah, I'm not quite sure actually, do we have still time for the last topic or probably we can stop here then? We can quickly go through it. If it's possible, we have a minute or probably if there are any questions, we can take that. OK, very quick, one minute. Unfortunately, yeah, the time is super limited and the topic is really wide. So I mean, there is a lot to talk about this and I can talk about this actually the whole day because there are really many things to talk about. So yeah, here we quickly show like what is under XCOI element. So basically, the idea is that there are also element snapshots and snapshots are something that are like the particular snapshot of the application interface of this particular of time and this particular moment of time and there are multiple snapshots or might be multiple snapshots for a single element. So element itself, the XCOI element sync that we have. So basically, that reflects to that element in our scripts only includes, at least in the drive region terminology only includes the query itself, how to find this element. But basically, it is resolved when we actually try to access any of their properties or execute any action on it. And then we actually get resolved to a snapshot and snapshot itself isn't actually a pointer a container for the corresponding accessibility element. So all those properties, they basically stored in snapshot. So let's say this label value and other stuff is in a snapshot. And what's more important in snapshot, we have hierarchy there like parents and children. And also like several words about queries here. So you can see like how those queries are constructed. All queries, they contain set of transformers. And here, just about query resolution and about like when those queries are resolved, they have to be bind with the actual element. So those are possible calls that allow you to bind the element with the snapshot. And there are also many different things about this that I can talk about, but unfortunately right now we don't have time for this. So I hope you will be able to ask all your questions later because I will be then in Hangouts. So thank you for your attention. And yeah, let's switch to questions and answers now. Okay. So there is one question. I mean sort of a statement on the chat. You can read that out, Michael. I can click on the chat. Sure. Most of us exhibit. Should I read it loudly or? Yeah, I think it's visible to everyone. So could just read or answer it, I guess. So what I meant is that don't use something like BTN close because again, if you are using like this, it would be super hard for people that like using your application like visually impaired people, it would be super hard for them to understand what is BTN close. So just use the strategy that is described in the Apple guidelines about how to assign those accessibility. So basically, just for you like very simple example, give it a name that would be clear to any human not a robot that is actually working with your application. This is the main rule of thumb there. So let's say if you have a button that closes the stuff, just name it close. So when the voice over is reading the name of this button, then it's clear for the human which is using your application that this is the button to close it, not like BTN or something which is completely unclear for people who are like far from all this computer sign stuff. This is the main idea here. Okay. I answered the question. Yeah. Okay. I think that's about the time we had. Thanks a lot, Mykola for sharing your experience. It was wonderful.