 Thank you all for coming this afternoon. My name is Isaac Murci. I'm an ecosystem developer at Sauce Labs and one of the core developers on Appium about which I'm talking today. So my title is Selenium in the palm of your hand and I'm going to discuss what Appium is, its architecture, what is doing under the hood when you send your test commands in, and then a little bit of a discussion about the trials and tribulations, the easy parts and hard parts of making a system like Selenium built to drive web browsers, do things that are not driving web browsers, in this case, driving a mobile phone. So how many here have a cell phone, as expected? How many people have a smart phone that takes apps and allows you to modify it on the fly? So as you see, most people, if not everybody, has one these days and most people seem to be making some sort of app that does some sort of random thing like allow you to say yo to your friends. And so there's a need to make sure those things do it right. You don't want to say oi to your friends when you want to say yo. So do we need to test our apps? The hint, the answer is yes. And on what device, all of them preferably, but at least the ones that people are using most. And that brings us to how do we test things these days? What does Apple, what does Android want you to use to test apps that run on those platforms? And unfortunately, that is a problem. You all know that devices are fragmented, that you've got your iPhone 4, 5, SC, whatever they've come up with these days and soon to be 6, I guess. But the test tools are also fragmented. So for Android, you might be able to test one thing with API level 19 and not on 18 and so on and so forth. You may need a completely different tool if you're using API level 16. So you end up with tests if you're using their systems which have a whole bunch of switch statements on build. So you say, is it build level 19? Yes, I can test all these things. If not, I got to test these things and that sort of thing. So basically poorly supported automation options from the vendors of the phones themselves. Apple provides the iOS UI automation system, which is a system which only allows you to test on iOS phones and iPads. It's JavaScript and it's rather arcane JavaScript that even if you're good at JavaScript, it's not necessarily true that you would understand what you're trying to do with the UI automation library. And it is ridiculously badly documented. In fact, it's probably the worst documented system I've ever had the misfortune of using. And finally, it's not scalable. It's not scalable as a kind of testing tool like trying to write tests. It doesn't engender writing extendable tests or keeping things simple, keeping things unrepeated. It kind of makes you kind of repeat things. It makes you copy and paste and do awful things. And it's also not scalable in terms of running it. So Apple only allows you to connect one device to one computer and automate it at one time. So if you want to do in parallel with five devices, you need five Macs sitting next to each other or five virtual machines on one Mac, which is less than ideal. Android has much of the same problems. Their system is called UI Automator just to be confusing. So you have to keep track of UI automation and UI Automator. It only works for Android. You can't write tests which work on any other system. It's Java and fairly idiomatic Java. If you know Java, you can write these tests just by looking through their documentation. It's Java that gets pushed to the device. So the device gets dedicated to that test while it's running and then you can put another test on there. So it's not scalable in that way, although you can have lots of devices connected to one computer. If you've got lots of USB ports or Wi-Fi connections. So both Apple and Google provide rudimentary tools which lock you to their systems and only allow you to test in the very limited ways that they have seen fit to think that you might want to test. And these things are fragmented and make it a pain to keep tests and to write tests for the new device as it comes out, that sort of thing. So there is this motivation that we need an open framework. We need something which people can actually understand and delve into that can abstract away what is going on in both these systems. So you don't have to know the arcane syntax of UI automation code in order to test your iOS apps. And it would be nice to be able to use languages other than Java and JavaScript. Even though those are popular, they're not necessarily the only languages out there. Which brings us to Appium. Appium is essentially Selenium plus phones. That's all it is. Appium is a Selenium server with a few extra commands attached to it. But for the most part, and increasingly as the specs for the mobile JSON wire protocol get ratified and the W3C standard gets ratified, it becomes more and more standard Selenium. And that was the goal. The goal was to use this standard which already existed and allow people who already know how to test with using Selenium to jump in and test mobile apps with as little headache as possible. I know if you've used Appium, there's quite a bit of headaches, but still. So it's open source. All the code is available except for the underlying UI Automator and UI Automation code which it relies upon. I did some stats this morning. There's almost 5,000 commits in the project and the project is only 20 months old, so that's fairly good. With 123 contributors. So there's a lot of activity going on. There's a lot of people making it better and everybody is welcome to make it better if you want to. Code is there and routinely people send in pull requests and get them put into the code. So it's under active development. 5,000 commits in two years basically. So every day we now, Sauce Labs has four people who are basically dedicated to working on it. And then many people outside of Sauce Labs. And there's a very active community. So lots of people asking questions, answering questions, writing blog posts and telling other people how they've done what they've done. Which is good because the people who build the system are not themselves testers. They use the system to make sure it works. But as testers, you guys know what you're trying to do better than we know what you're trying to do. So the community involvement is very important. And as a system, it is almost all JavaScript. The server is in Node, so it's easy to play around with. And then there's a Java portion which is pushed onto the Android phones to allow tests to be run there. The architecture of the system should be pretty familiar to anybody with a rudimentary understanding of Selenium. So there is a client side, a server side, send things through JSON wire protocol. And it interacts with the devices on the server side in order to drive them and make them do what you want them to do. The actual client side code is tiny. It's built on top of the Selenium clients. So you can do most things with the Selenium clients by themselves until a few months ago, that was all there was. Everything that Appium did was through the Selenium clients. The Appium clients are basically little wrappers with extra methods. And these methods, as the spec gets ratified and the Selenium moves towards three and four, those methods will be pushed back into the official clients. And Appium clients will get smaller and smaller and smaller and just include little extra methods that make it easier to do certain things that, as the spec was trying to be generalizable, you can do gestures, but you have to say, I want to go here to here and then to here rather than doing like a swipe. So the Appium clients will include methods that just abstract away the math behind all that. On the remote side, it depends upon UI Automator and UI Automation for the most part. In certain versions of Android, it requires Selenroid and just kind of proxies to Selenroid. But most of what the server does is takes what gets passed from the Selenium calls, which are this kind of abstract data and then massages it into what one of the underlying frameworks expects. So when you do a gesture, Apple always expects absolute coordinates. So the spec says that you're passing in an element and relative coordinates. So Appium has to kind of figure out where the element is and find out an absolute coordinate that you actually want and then pass it to Apple. And Apple does the heavy lifting of actually driving the device. This thing demonstrates that I have no idea how to use a computer to draw. But this is the overview of the architecture of the whole system. So what is going on when you send in a command over one of your Selenium commands, it gets sent as a JSON object over the JSON wire protocol to the Appium server. And the Appium server makes the data into what is expected. First on the iOS side, there are certain things that are kind of setting up the device that you want. So there's a lot of writing P list files and kind of making the environment right and then launching your device or launching your simulator. So those get done without actually hitting the device. And then what the system does is when it first launches, it loads a test basically onto the device. And the test is just standard UI Automator, UI Automation code. But it has a loop in it. And it sits there looping, reading a file, trying to read a queue. Then when a command comes in from your test, all that Appium does is pushes that command into the queue. And then the test, which sits there infinitely cycling, gets it and runs it as if it's a UI Automation code as if it were just going through a file as Apple would like it to. So it's kind of a hack on what Apple built the system to do. Similarly on the Android side, there's a test. It's just a bunch of Java classes that just as Android or Google would like you to do that sits on the device and opens a TCP connection. It basically becomes a server and Appium, the Appium server sends commands through TCP to that device. So it sits there as a server on your phone or on your emulator and grabs the instructions as they come and executes them on the device. So again, it's taking what Android wants you to do as a tester and extending it to allow infinite commands to come in and commands to be changed rather than just whatever you put into your class and deployed there. And then there's also a lot of things which are manipulating the phone that are not part of the UI Automator. So we use Telnet and ADB to get into the phone and do things like changing network connections and changing the location, that sort of thing, which for some reason neither Android nor Apple figured you'd ever want to do as a tester. Mock your location, that sort of thing. So that is the overall architecture of the system and kind of a general overview of what is going on when you write a test. So hopefully it makes more sense than the kind of black box that is this Appium system. It can be kind of complex when you're using it. But if you understand the parts and the workflow, then it begins to make more sense, the errors that you get, the kinds of problems that might exist as you run your tests on Appium. So at this point, we're kind of moving topics into the development of Appium and the problems, the good things, the bad things that have happened along the way and how it kind of allows us to envision what Selenium can do now that Selenium is not tied only to driving web browsers. So when Appium was conceived and the first versions of it were built, all there was was web driver. Selenium didn't have any notion of there being mobile testing out there at all. And that standard, though it's a great standard and there's lots of thought was put into it and it is very logical for driving web browsers, it's ill fitting to do other things like drive a mobile device. There are lots of things that you do on your mobile device that you don't do with your web browser and vice versa. So they were an unhappy marriage, basically. The first thing that had to happen was to look at the things which are logically the same and map those to commands in Appium. So there's lots of things that you can do on your phone that you can also do on a web page logically. So finding elements. Finding elements is basically the same as finding an element on a web page. That sort of thing. Then there are things which, while not entirely fitting, are logically of the same order and can be repurposed. So, for instance, mobile apps can be either web apps that are just running in Safari or Chrome or one of the other browsers. They can be native apps, just straight up apps for your phone, or they can be hybrid apps. So a native app that has a web view inside of it which loads a web page and uses that. So in the last one, in hybrid apps, you have a context in which you are dealing with the mobile world, but once you're inside the web view, all you have is a web page. So what you're doing is driving the web. You are doing what Selenium does well. So there needs to be a way to switch between them. And in the early versions of Appium, this was done by repurposing the Windows commands. So what you do in WebDriver for switching Windows on your web browser became what you do to switch from a native app to a web view, which kind of fits, but it's also semantically confusing. It's not particularly good. And then finally, there are commands that just don't make any sense. So outside of a web view in a native app, getting a URL doesn't make any sense whatsoever. And get repurposed to get something else. Try to imagine what that something else would be that you'd want to be getting in a generalized way. So those became just not implemented methods. If you were to call them, you would say, you're not in the right context for this to happen. So rewrite your test to figure something out. So in the first versions before there was work on bringing mobile into Selenium officially, all the commands that we could do were the commands that Selenium had, and some were just kind of a little bit messy. So when we're moving on into who knows what, Selenium, we were assuming it was Selenium 3 until Friday. Selenium 4, Selenium 5, whatever version has the W3C standard in it, and the mobile JSON wire protocol implemented in it. This is what Appium implements basically. Appium implements everything as it stands today in those specs that make sense for mobile devices. There are lots of things which WebDriver does that don't make sense, and those are not implemented. Saying it's like a complete thing, but as it stands, it implements the new standards that are coming out. So mobile-specific commands have been, as much as possible, made into actual endpoints. So they are official now, and part of this mobile JSON wire protocol, which is slowly, slowly being worked on and ratified. And then also, the W3C standard has a provision in it for vendor-specific endpoints. So the official Selenium endpoint doesn't get muddied, doesn't get all filled up with things that a particular server might want to do. Since now, a particular server is no longer just the Selenium server jar, but ChromeDriver and Selendroid and Appium. So the various things which all of the people who are writing the spec could not agree upon will go into there. So Appium supports things like unlocking your phone, simulating a shake, that sort of thing, which just are never going to make it into the specification for one reason or another. And they will reside in the vendor-specific section of the protocol. So taking it apart a little bit, we've discussed a little bit about these mappings. But just to go into a little more detail, there are recognizable web driver actions when you're dealing with your mobile phone, with automating a mobile device. As I said, element location. Although if you think about it logically, there's lots of strategies that don't make any sense. There's no CSS. There's no tags. There's no link text or anything like that. So those strategies don't work anymore in a mobile context. But there are things like accessibility ID, which is an immutable part of the mobile app that is useful. And also, the spec specifies that you can pass in the underlying automation frameworks, commands themselves. So a string which represents, say, the JavaScript defined an element using Apple's iOS UI automation code, which will be executed as code on a device just passed through. Same with Android. You can write a string which represents some random Java, as long as that random Java is UI Automator, random Java, in order to find the particular element that you want. So there's these new element location strategies. And some are just gone because they don't make sense. But logically, the idea of locating an element remains the same. Same with the kind of test infrastructure, lack of a better word, things like timeouts, taking screenshots, that sort of thing. And finally, the navigation. Navigation is navigation wherever you are. So in a mobile context, clicking, going back forward, all of these sorts of things ought to work in logically the same way, if not actually the same way, as a web browser does. So I think I have, I don't know how to get on to something else here. So maybe I will not have a demo. Anyway, watching somebody else's test run is one of the more boring things in the world. But you can see from this at least the sorts of code that goes into doing web driver-ish stuff on a mobile device. So this is Ruby code, which I hope you can read. And it should look, if you've done Selenium, it should look completely normal. All it is is finding elements and sending keys to them and asserting things about particular properties of them. So you could do this, you could run this code using the normal Ruby driver for Selenium. And it would run, it would work perfectly fine, because it's generating JSON objects that are passed to the server that are exactly the same as the JSON objects, which will be passed to the normal Selenium server. And this works perfectly fine if I knew how to get it to run. Then there are things which don't make sense in a web context, which web drive would never do, should never do, will never do. So we've mentioned the special locator strategies, the things like class name to get every object in the tree, which is of a Android.widget.text field, say. Contexts, which is the new way of doing what used to be repurposed window switching. So since that was semantically a bit problematic, context makes more sense, since you're not switching windows. So if you're reading the API and you come across that and say it's actually switching contexts, you'd get confused and you'd not like the API writer. So contexts do exactly the same thing as Appium had done with windows, but now make sense in doing it. There's a lot of things to do with the state of a device which you might want to tweak, twiddle, whatever, while your test is going on. So network connections is a big one. So when you're driving a web page on the desktop or in the browser window, it doesn't matter so much if the Wi-Fi goes down or that sort of thing. But on a phone, what happens with your app if you suddenly have no data connection? If you're suddenly in India and your phone can't do anything except connect to a Wi-Fi connection which crops out every 10 minutes. What happens to your app? So in the middle of your test, you can use a network connection API and drop everything, or you can turn off the location services so all of a sudden your app can no longer get where you are from the phone. These sorts of things. What happens if in the middle of using your app, the screen locks, and then you've got to unlock it? And does your app handle that sort of thing well? So all of these things to do with it being a phone rather than a computer are part of the spec or are added as things which you might want to do. And then finally, there are gestures. So on a phone, the way you interact with it most of the time is through touching it, moving it around, doing things to it, which doesn't happen in a web page for the most part. I guess it will increasingly be so as touch screens come into play more and more. But for the moment, it's mostly a mobile thing. And both vendors, presumably other vendors, Windows Phone will have a different way of doing gestures, of automating gestures. Both iOS and Android have ways to simulate gestures in your tests. They are entirely different, and they are dependent upon different data. So it becomes difficult. You'll see in the next slide some actual code of the sorts of things that you've got to do in order to make a gesture using pure UI automator or UI automation code. But it would be nice to have a way to have a logical layer, a bunch of things that you would actually do. You touch and you move. You don't do an action from X to Y and then move it to another X and Y and that sort of thing. Which brings us to what Apple and Google want us to do in order to test these sorts of things. On the left is fairly typical UI automation code. You get a target, which is basically the view you're interacting with. And then you pass in an array of hashes, basically. An array of different touch events, basically. So this is saying, first you touch in a particular place at a particular time, a .2 second, or .2, I don't remember what the units of that is. I think seconds. Then you move and you touch another place without lifting up. So you're swiping up here at .4 seconds. Then at 5.4 seconds. So you wait there with your finger held on the screen for five seconds. And then you move back down for another .2 seconds. And then you release. So basically this is saying, whoop, whoop, with a five second positive. And as you can see, if you wanted a more complex gesture, this thing would get longer and longer. And it accepts multiple arrays. So if you wanted to make two fingers do different things, you just pass in another array with another set of hashes that tell it another set of actions to do. So things can get pages and pages long in order to do complex gestures. If you wanted to interact and draw a circle, you'd have to have every single little point along it timed out to draw that circle. The pain in the ass. Android, the UI Automator code has so much reflection in it that they have an entirely separate library just to deal with reflection, if that tells you anything about the way that they've designed this thing. So every method, for some reason, is hidden, is private. So you've got to reflect and get it. But at least they are logical. There's a touchdown event, a touch move, a wait, which is external to it, another touch move, and a touch up. So at least looking at the Android code, you know what gesture it is. Looking at the iOS code, who knows what's going on there. So while it's difficult to write because you keep on having to reflect methods and then kind of call them through indirection, at least you can glance at it and tell. But have a look at what you do for that same gesture using what Selenium has proposed as the standard. And this is part of the W3C standard, not the mobile standard. So looking at this, there's a touch action that you instantiate, and that varies from language to language. This is the implementation in WD, which is a JavaScript library. And then there are a number of different things that you can do, in this case, pressing on an element, moving to another element, waiting, moving back to the first element. So exactly what both the iOS code and the Android code were doing, only you don't have to do the math to figure out what the absolute coordinates are. So if you are using suddenly you're testing an iPhone 4 versus an iPhone 5S, and suddenly you have an extra three quarters of an inch of screen, you don't have to accommodate for that in your code. Your tests can become much more simple by having this logical structure rather than what Apple and Google would like you to use. The other thing to notice is that there's a perform at the end. So how this actually works is that until the perform, if the perform weren't there, it would do nothing. All it would do is populate a data structure that sits there on your test side and doesn't do anything. Then you say perform, and that all gets sent at once to the server, where it manipulates it and creates the iOS or the Android code and sends that off to the device. But this means that you could do this twice. You could press call perform, and then 10 steps later say action.perform. As long as you haven't done anything to action, it will do it again, the same action. So you can store them, you can pass them around. It becomes much more easy than the underlying code is or allows it to be. I do have a demo of that, but we will skip it. And I don't want to say that the gesture API is perfect or works perfectly, but it is moving towards a direction that makes sense is basically what I'd like to say. And the way that it is coming into the Selenium standard is making it make sense on as many systems as possible. So it was initially proposed, I think, by the Marionette team. So they had thought of the way. And then we figured out that it works also for iOS and Android. It is not completely ridiculous. And so hopefully as things move forward, it will also be able to be logically mapped onto Windows Phone, onto whatever other sorts of devices might be out there. Ideally, the Selenium standard will be as generalizable as possible to as many devices as possible as devices become more and more and more and more part of our lives, which leads us to the obstacles that have come into play as Selenium WebDriver has been extended in order to allow it to drive things which are not web, which is mostly what is interesting about this. Reading this, I realize the first and last are exactly the same, basically. But the underlying systems are flaky. They are sketchy. They don't always work. They are not designed particularly well. Neither Apple nor Google particularly seems to care about them. So there are the normal sorts of issues with building a system that works reliably out of parts which are not reliable. So there's a need on the server side to do a lot of timing out, of retrying things, reconnecting these sorts of things. And it can slow things down, as anybody who's used any of these systems can tell you, it can get slow. And then there's a variety of disconnections between parties, basically. So different systems can do different things. Within a particular system, different versions can do different things. So as Simon mentioned yesterday about the early days of Selenium, there's a lot of code that has to go in there which is accommodating different systems. And as people try out new devices, there's new branches that have to be made on this code in order to accommodate the fact that your Galaxy S5 no longer has that page or does X and Y rather than A and B. So hopefully as time goes on and Selenium becomes a standard and is accepted more and more, the vendors will provide more stable things, will do as the web browser vendors are doing now, so doing part of Selenium's team's job and allowing the Selenium server to get smaller and more reliable. There's also a disconnect between what people want and what is done or doable. So there are things which just can't be done. And it's a pain. It's hard to accept that it can't be done with the tools that we have, but just kind of thinking of ways to get around the limitations of the underlying systems and to get around the security often, particularly with Apple. There's a lot of locking down of devices which has to be gotten around. And then there's also the obstacle of the kind of loneliness of change in the standard world. It's hard to get everybody to agree upon things. And if we are moving forward, which is kind of the last point that I want to make is as we move forward and there are other devices, what happens if you have Google Glass? Thank goodness I haven't seen around here. What happens when you make an app for Google Glass and you want to use your normal Selenium testing tools? What happens when you work for a company and you've made a music app for your refrigerator or something and you want to test that? So I think that the Selendroid and Appium's example of taking WebDriver, taking Selenium, and repurposing it, of making it work in a new domain, kind of has potential for this world that's coming up, this Internet of Things that everybody's talking about. And Selenium is positioned, hopefully, and hopefully with less and less pain as it goes forward, to be able to capitalize on that, to be able to move into that domain and be useful for testing. Because we've seen that the people who make these devices are not particularly reliable on making things which allow you to make reliable things for it. So hopefully Appium's example can kind of guide how Selenium can move forward into the future. The future right now is mobile devices, but tomorrow is, who knows what, pants that can play apps, something. So that's it. Thank you very much. I guess I could take questions if you want. There's five minutes left in my time. But yeah. Pardon me? Yes. Yes. They're actively developing on it. But beyond that, it's hard to see any way to do it. There's no way to hack the simulator. For Android, there are ways to get Gen-Emotion or some other emulator that might work. But iOS, Apple kind of has a lock down on the tools. So it's a sticky situation. Luckily the real devices are there and work. They're right. So the hope is in the same way that Selenium has convinced the browser vendors to take it seriously, to actually devote resources to working on it, because they can just plug it into an existing thing. They don't have to work on all of the auxiliary stuff which allows you to actually run the tests. They just need hooks. Hopefully with mobile that same sort of thing can happen, particularly if everybody makes enough noise and makes these bugs known and perhaps shames Apple into doing something about it, because it makes sense for Apple to allow you to make good apps. It looks bad for Apple when your apps are always crapping out. So anything else? A common road map, it might be possible. It's debatable whether you'd want the same thing in everything. Like Ruby has idioms, has things that are expected of a tool when you use it, versus Java, versus Python, whatever, these all have different ways of going about using them. So having them all the same would be probably counterproductive. But generally speaking, if there's something that you find, like there's something super useful in the Ruby client and you wanted it in a Java client, if you were to propose it and everybody agreed, then it could go in. Yeah, so this is also part I mentioned earlier, the fact that people working on Appium are developers. They don't necessarily know exactly what would be desirable. So the basics are there, and some things which we could imagine would be useful. But as testers, proposing things, particularly since it's an open source project, it's open. If you propose it, it will at least be considered. So beyond that, there's not really much that can be done in order to standardize them. But it's the same in the Selenium clients, in the official ones. There's things that are done in Python that are not done in Ruby, and vice versa, that sort of thing. But making noise, it seems to be the theme of the question and answer here is just tell people that these things should happen. And most likely, they will either happen or be decided that there are ways to do it that make more sense or whatever. I'm not pleased that it's being on the internet. It might be actually, it is behind the data. Some extra evidence is that they are not there, but they are on the page. Yes. But somehow, they are always able to do it. Yes. But right now, it's in that limbo state that the spec is also in, where certain parties object to it happening, at least in a generalized way. So we're trying to do development that fits in with the spec, which will be as standard as possible. So I think on that issue, right now, there is an issue in GitHub that has an arms length of comments from a variety of the Selenium committers, mostly objecting to what we want to do. So once they officially object, then we can build it as something which Appium does, which will not be part of this back. We don't want to build it and then have to rebuild it as something else. But it is on the radar. All right. I think that's the time that we have. Thank you all very much for listening.