 Hi, I'm Scott Reeve, the chair for this session. The first talk is from Philip Breschler. He's an iOS professional here in Berlin, and he's also the creator of OwnTube and Cover.li. And his talk is Don't Fear Our New Robot Overlords. Hi there. My name is Philip. I'm an iPhone developer here from Berlin. I want to talk to you about a new way to test the mobile devices. First of all, thank you for coming, because I don't think this is like a very Europe Python talk. It's not about back-end something, and you put in JSON data and something like that. It's very physical. It's a robot. And computer vision and stuff like that. You can see the robot later. I'm just setting up some video, and you will see everything. Just fine, I hope. So, at least, at that part where you did a live demo. But it's probably a bad idea anyway. So, let me first show me my little agenda. So, first of all, I'm going to talk about problems when testing on mobile devices. If you're testing web apps or apps or websites that are mobile-ready or something like that, I don't like the word apps, but it's a website. Then the idea of why and what I was coming up with. Then the concept of how we do this, and then I'm going to show you, hopefully, a working live demo and some code. And then some conclusions and learnings because there were a lot of learnings, actually. So, talking about problems in general when testing on mobile devices, you just need to check because the web app or website is usually a front-end for some data service. You need to check how that looks if the text is right, if the localization is right, stuff like that. So, you can do, of course, tests where you just check coordinates, font names, font sizes, colors, and so on. But in the end, you probably never know if that's really good looking or if the coordinate system was flipped, that's possible, or other stuff that's just awkward. So, in the end, you never know. And so, you just need a human eye to check this. And that's okay if you're just doing an app with four or five views or a small website, that's okay. But if you're having a large web project with a lot of sites, you're probably going to miss something. If you just want to do a test after each deployment, for instance, and you need a lot of manpower to do this as well. So, of course, you can do some testing with non-real devices like simulators. I don't know who of you is a mobile developer or has some contact with something like that. Okay, a few. So, a simulator is something iOS uses, for instance. You run it on your Mac, and it only runs on a Mac, that's the first problem. And the other problem is simulators, because they are simulators, not emulators, have too much computation power and they use that i7 something on your device, on your notebook or iMac or whatever, and you have no memory issues. And that's one of the biggest things you have to keep in mind when developing a mobile is how to deal with memory and working with memory issues, working with you don't have that much memory to store that image in, so I need to find a way to something like that. Thank you. Because my notebook has like eight gigabytes of RAM, that's like, I don't know, ten times more than my iPhone or even more, so no memory issues. And of course, you don't have a touch interface. And if you see apps that are developed by beginners that only use the simulator to test their interfaces, you always see that the buttons are quite tiny, and of course, if you're running a simulator, you can just use the mouse to, well, tap them, and that works, because the mouse is quite exact, you just have that little thing you touch, but the finger or two fingers are way larger than the mouse pointer, so that's an issue. And another issue, and especially on iOS, the browser engines are slightly different. The simulator uses the WebKit that comes from macOS, the real device uses iOS WebKit, and yes, both Webkits, both are nearly the same version, but only nearly the same version, and they have just different bugs. And some APIs are not available. WebGL, for instance, or OpenGL is slightly different or not available. You can't do Bluetooth stuff in the simulator and some other stuff as well. Then you say, hey, yeah, I'm having an Android phone, so I'm having an emulator. It's great because they're emulating the hardware, but they have other issues. For instance, they are incredibly slow. If you ever use the Android emulator, you probably have registered that it takes about two minutes to start up, and it's hard to interact with, especially if you're doing CI testing, because you never know if the app was started, if the simulator was started. It's quite hard to make a script around that thing, so that's a problem with an emulator as well. And then, of course, every manufacturer of an Android phone does their own browser and interface and stuff like that, and you never get all the images you need in an emulator to test all those. So Samsung uses a different browser engine or web engine than, let's say, HTC. They use different versions, and even if you have, let's say, okay, that's an Android KitKat phone. Yeah, it's a KitKat, but only on the Nexus you will have the right web engine, and on the next device-running KitKat, there is another web engine or another framework. And a little issue that can be fixed, I heard, is that on Android you don't have the Google Play services in an emulator, so you can't use Google Maps, for instance, or the services by Google. So that's an issue as well. So the best solution, of course, is to test on real devices using real inputs, so fingers, and a checking eye. So you always need to know how it looks. And the best solution right now is me, a human. And that's, well, that's okay, but I need to some sleep, and well. So I set off to Newshores and tried to fix that issue and having an idea. And this idea came up to me when I saw the talk about Zikuli for automated testing of websites, and the idea was just to build Zikuli for mobile devices already. And now you're like, okay, what's Zikuli? Yeah, Zikuli. Zikuli is, as you see, and web and an app that you run on your PC, Mac or Linux, it's based on Java, and written in Jython, so Python for Java. And Zikuli was made as a tool for automating computer tasks with screenshots. As you can see, there's a scripting language just around there, and it's very Python-like. It's just click image, type, string, click image, and so on and so on. And that's very easy. Of course, you can do stuff like find that part in that part and stuff like that. So great. It was invented by the MIT. It's now developed at the University of Colorado Boulder at the Zikuli Labs. It's still maintained and stuff, but it's made for desktops and not for CI, so to get it running in a continuous testing, something you need to use with a robot framework and do some packaging, and it's incredibly hard to run on Linux and try it. It's not that good, because it's not made for running headlays and stuff like that. But it's possible, and if you want to know about it, I probably know a guy who knows a guy. They did that for a web shop, and it works. So idea was making the same for mobile devices in short, make a screenshot, use computer vision to detect an icon on that and tap that icon, and for that tapping part, we use a robot, because robots never get tired. So first things first, we need to do a screenshot. And doing a screenshot is quite easy if you connect it over USB. In Android, you just use the ADB command. That's the command that comes with the Android SDK and is used to interact with the device, and you just do some bash magics, a longer terminal thing you put in your terminal, and then you have a screenshot. It works in 90% of the cases on most devices. Some manufacturers, of course, do some stuff different, because I don't know why they just do it. And on iOS, you can do a screenshot using Xcode and iTunes, and if you're having an iPhone just using iTunes, you probably know that there are screenshots in the interface of iTunes, and they are coming from the device. So there is a channel to do a screenshot, but there is officially no way to do this with the terminal. There are also iPhone users using Linux. I don't know why, actually, but they reverse engineered that protocol called it lip-i-mobile device, and with that, you can interact from the terminal with an iPhone device, and you can do screenshots if the device is in developer mode, but, well, if you're an iPhone developer, your device is probably in developer mode. So quite easy. We have a screenshot. Yeah. Now we need to detect an icon. So finding that needle in the haystack. And for this, you can just use OpenCV. OpenCV is great, and you just do a template recognition, and that's fancy for finding an image in an image, and that's totally easy, I have to say. If I started this project, I first thought, okay, this is probably the hardest part, getting the computer vision stuff to work. No, it isn't. For those who have never heard of OpenCV, it's a C++ written library to do computer vision, and it has great Python wrappers that are so great that probably everyone who is using OpenCV is using Python. So it's quite a simple API for doing such a complex task. It's actually four lines or something to find that needle in the haystack. And in the end, you get a coordinate of that screenshot and the size. So of that thing you want to find, just get a coordinate and the size, and then you can just calculate the middle point where you need to tap. So now we're talking about robots. Finding a robot for this was not that easy, actually. The problem was I need a robot that's quite fast. I need a robot that's not too expensive. I don't want to spend, I don't know, 2,000, 20,000 euros for an industrial robot, something like that. Which is great, but I don't have the money for this, and my CGL was like, well, you need to find something cheaper. So I found this baby here, or this, if you can see it. You will see it later. It's called the tapster bot, or short tapster. As you can see, it's a delta robot. It's a robot where you have three fingers that have their actuators, so that thing that interacts with the physical world is in the middle. It's made with an Arduino, three servos, just like standard servos you get from that RC hobby store around the next corner, and 3D printed parts. So if you own a 3D printer, this thing will cost you with the bolts and nuts and everything. It's like 120 euros. If you don't own a 3D printer, it's just around 250 euros with printing, because I owned one, or a friend of mine owned one. The printing was quite cheap, and we used laser cutting for the large parts, so they are a little stronger, but if you just have a 3D printer like an Ultimaker, you can just print it out. And now we're coming to the Boo point. Driver is written in Node.js, and we'll come to that later. Not good. So, talking into a robot. Of course, we need to interact with the robot a little, to tell him where to tap and stuff like that. And to do this, I decided to use WebSockets. So why WebSockets? First of all, the robots came with a driver that already had some WebSocket support, so I only needed to extend it. And WebSockets could be a good solution for this, because you can theoretically just change out the robot without changing the Python code, so you can just use another robot. So as I thought, the robot is written in a Node.js app, and that's not that good. The Node.js app is slightly undocumented, and, yeah, coming to that later as well. So, theoretically, the robot could be exchanged quite easily, because you just have to implement the same WebSockets thing. So, yeah, more robots. If you have another one or find a better Delta robot or a better X, Y axis robot, you can use another one. So, now we're coming to that part where we all put it together and bring it to work, hopefully. And that's called Project GoldenEye. I don't know. It's not a fancy name, but I just called it that way. So, the idea was to test a mobile website or an app just written in a standard Python unit test test, because if you were ever used like unit test one or two, you know, that's quite easy to write those such tests. And, of course, one good thing is that the usual CI servers like Jenkins or TeamCity, we are using TeamCity because I'm living in a Java environment without really wanting it, but anyway. So, you can just use unit tests as a standard. You don't have to write your own parsers for the output and stuff like that. All that heavy lifting, like detecting images in an image and making screenshots, all that heavy lifting is subtracted away. You just write that test. It's quite easy. In the end, you just write a test where you say, tap there, find this, tap there, find this in this region, stuff like that. I'll show you in a minute. And writing such a test is quite easy. Even my QA guy can do this, so you don't need, if you wanted to do regular testing of a large web project or a large app, you just can give something, a tool like this, to your QA and they can write the test themselves. Great as well. And they don't even need a Mac for this, which is a problem if you're doing iOS apps because usually they don't have a Mac. So, we are coming to the part where I hopefully don't fail showing you a demo. Okay. Just need to do some setup. Okay. Okay. So now we have a video feed from my belly. Okay. So, there's a robot. They're on the right side. I'll just correct it in a minute. You can see the phone. And I will just show you the test real quick and then we're doing the thing where the robot moves. Okay. Just try to do that with the microphone. Okay. As you can see here, this is our little example test. It's quite cheap because I didn't have that much time to do a good test that's working because of that robot. I can just show you it. So, we have a little setup where we just set up the tester that's the class we're interacting with, setting it to debug through truths just to show you some images of what the robots saw. And you just give it a start command and then we have this little thing to find the path where the screenshots are. And this is the test. Of course, it's not that a good test or something just in demo. So, the first thing is we wanted to find or want to assert that if I give it the screenshot of the settings icon that's in the... I can show you it later. It's just this part of the screenshot I make. I wanted to make sure that it taps this and if it tapped and found it and tapped it, I will get a region object back so I can know, okay, it tapped that thing. Then I'm doing a find. Find is just look on the screen and find this object and return me the region where it was. So, this is what I did here. Just get the title of the settings in German, Einstellungen. And tap that title to just scroll up the table view of the settings so we can find the airplane mode settings. You can just see in a minute. So, we're gonna look if there is the airplane mode settings in the settings. So, maybe the developer forgot to put it in there. And then we use the find in command. If you're using the find in command, you'll just give it, of course, the screenshot you want to find that icon and you give it a region object. As you can see, that's airplane mode role where I got back from that find. So, I can just search in a certain region because maybe there are more than one switch. So, that's where you do a boolean value in that view. So, if I would just do a find, I would get multiple and never know if there is one in that region. And then I just tap the Bluetooth settings and just see if there is the Bluetooth standing in the title. A very simple test, of course. And then I'll have a tear down, just tell the robot to stop what he's doing. So, that's the test. Now we're going through the part where we're going to run that. Okay? Okay, right hand side, you just see the screen of the phone that's just here while I play. So, it's not for testing, just for showing you that there is a screen and that's working. On the left hand side, you see the robot in the terminal. So, I just do a test pie. Ah, and it failed because the flight mode is on and we are searching for another icon. So, just let me just quickly fix that. Never run your demo in another environment. Okay? Start it again. Okay. Just see that standard output for unit tests. Scrolls up. Steps on Bluetooth. Search that. So, yay, the test ran. Thank you. I know that's not a great test, but it should work as a demo. So, what the robot gives you, output images if you want to, just set the button true as I just showed you. And then you can get those fancy images in black and white that gives you an idea what the robot saw. I just tried to show it to you. So, this is just a screenshot it does, something like that, just the screen of the phone. See there are several of those, of course. And then you have this starting with debug. And then you can see, okay, you can see that there is that square around that settings icon and there's the square on that settings title and then we use it again. Then we're searching for that airplane mode thing, found it. Then we searched for that Bluetooth, but first of all we did that find in and I don't know exactly why I did this, but oh, I did it. Wait, there's another name in the file name, in the debug file name. So you can see that there is a page in there and then we went to Bluetooth, saw that there is the role I'm searching for and then there's the title of the Bluetooth setting. So, this is how that looks and of course I can show you real quick the code of the unit test, of that library looks. So, it's, no, that's not the library, that's my test. That's the library. So, it's probably the worst Python code you ever saw because I'm not a Python developer, I'm an object to see guy and I know object to see quite well and Python is just my, I don't know, my hobby I would say. I'm not that good, I guess. It's probably not that what you would hire me for but that's okay. So, the thing you interact with I just showed you is that tester class can give you those commands. There's some missing, of course. Tap, find, find in. What's missing, of course, is swiping and double tapping so that's something we need to implement but let's just see what happens if I do that tap. It tells a finder class that's, I don't know, a very fancy name that gets this where you just get the device to a screenshot using IDY screenshot that's the iOS part or the Android part. iOS is default but you just can give a magic number of one to use Android so then we know, okay, it's an Android and then it does a screenshot, finds that screenshot and so on and so on and if it do this we just a moment ah right, no, that's the debug. Okay, that's the part I never tested actually so that's a screenshot not find on screen here. Okay, now we do just that CV part where we just do a template image read, find that image get the template get the image get the needle, height and weight and so on and then you see that there's message CV2, TM, CC, OE FNORM I actually don't know what that means but there are several types you can use and this is the one working best I just played around with it and then you just do that template match, you can just get a location back and so on and then because you get probably more than one point back, usually from CV we just find the average of those so maybe to just just move out some noise, something like that and then we try the robot to tap on that, the robot has its own thread where it communicates and stuff like that so that part okay coming to my conclusion this could be useful actually if you're having a large app or a large website you want to test on a mobile device could be useful so I would probably keep up doing this and going much further with it the problem was I didn't have that much time to implement all the things I needed and that's because of that robot but come to this later testing for existing projects is especially the thing where I want to use it of course test-driven development is important as you probably heard on the keynote yesterday but if you have a large project you always wanted to have a test going on and on of all your stuff to just check if everything is right especially if you're getting paid for certain content you want to make sure it's in there robots never get tired so they do it every time every night every time you do a check-in and they have about the same sleeping cycle as a developer probably none and our QA has a slightly different sleeping cycle and the robot's hardware as you can see is working okay it's fine but it's okay if you're looking away from that little cables there but it's okay it's working but the software just isn't that's coming to the learnings a function and a function and a function isn't normal but in OGS it is so I don't know who came up with this language and I don't know who wrote this code I know who wrote this code anyway but it's just terrible hard to debug it isn't documented that code specifically code but that whole layout with those functions getting on the function getting in a function and it's all non-brocking so not good it's a project with one maintainer never they probably never gonna merge in your commits or your pull requests and just never answer to your issues and probably never answer to an email as well but it's all open source well yeah kind of OpenCV is fun and you should probably use it far more than you probably do right now if you're doing OpenCV and it's not that hard to learn actually they have a very nice documentation with good examples stuff like that so OpenCV EA and that's it thanks for coming any questions if you want to just see the demo again we can just do that no we do something quite similar with a library we've written called Geist it does visual automation mainly at the moment it's kind of it's strong points would be around windows but we have kind of thought could we apply this to testing of mobile devices and stuff so I'm really interested in coming and talking to you about your robot but the yeah I just thought it might be nice if you knew about Geist because Geist is quite cool yeah sure we can talk of course I have a sprint to finish and I just need to set off quite soon actually I'm just here for the day but you can just write me mail or just twitter me or just catch me at that door and yeah sure the robot is not my robot it's from someone else on the yes but of course I started writing my own Python driver because I'm getting tired of that Node.js and the problem there is that the inverse kinematics that's fancy for telling the robot to move are so buggy in that Node.js porting it to Python doesn't help so I probably need to do some math first but then it should work quite better than now yeah okay thanks other questions so you probably expect that question did you try to use it to get a really high score in in Farmville I don't play Farmville but the idea is 2048 and 3 so yes that's quite good idea actually that robot the idea of that guy who built it first had the idea of playing Angry Birds with it so you can just manipulate the angle where you just shoot very precise and that works so the Angry Birds Node.js code works better than the rest and it's actually fun but I know I never tried that I only use it for work purposes of course because my boss paid for the parts I have a suggestion for upgrade for the robot a tilting table so you can also test tilting tilting would be nice I already built something for iOS so on Android you can inject quite a lot of even you can inject tabs on the most phones but not on all and that's why but you can inject location data rotations so the screen should rotate something like that on Android via USB but you can't do this on iOS so I already wrote a library that fakes location on the phone via a socket that's on github and you owe a library to fake notifications there as well so some stuff can be faked but tilting can't be faked so that's something we needed probably a tilting table or a tilting robot or I don't know just two servos let's see any more questions actually I have a few questions the first one is when you are transitioning from one window to the other do you use a hard coded delays or is it actually inference by the robot it is kind of the robot doesn't know when it tabs so there is a small stylus in there you can probably see it's pink because that's the only color you get a small stylus in I saw and that's connected to ground so it just needs electric grounding to that it works problem is that there isn't so much current around it you can't detect it because there is just a tiny amount of current so not working what I just do is the background thread running in Python telling the robot of a web socket ok tap that and the robot's driver just knows how long he's probably going to take plus 10% to tap and then gives back a tapped signal but one way to fix that is there is a little rubber tip on that stylus and you could probably do a little switch in there with just some foyer or I don't know metal to get a real tap feedback from it but right now that isn't in there because you probably need to replace that stylus with something built by your own but that probably could be used and the protocol could use this or could work with this I have a small suggestion of how that could be used for the performance testing as you mentioned that one of the limitations of the mobile testing is that when you emulate it on the laptop then you don't have the same resources but then instead of because you externalize all the manipulation with the app but it's not the vision one which is actually based on the snapshot that is taken on by the code that is actually the OS of a given device so I would say that if you will go with the webcam it's mounted on top of the mobile device and it's actually trained using the deep learning algorithm in recognizing the rather than the snapshot itself but the sequence of them and learning the evolution of actually like clicking from one to the other view you are actually training what is the performance of it that could be a good solution problem is of course in that robot setup if you put a webcam on top it's just on front of the webcam so that wouldn't work there are two solutions for this first as you can see I just transferred the live image of that app or that phone by your app play we tried that but to do that recognition and of course you could just see if it's fast enough and stuff like that problem is that's too buggy so it's too flappy it just gets connection lost wireless I just do it here the other thing is in iOS 8 so coming in September on iOS and you can already do this on some Android devices that you can just use the screen as a webcam for the PC or your Mac so you just have a live video feed coming from the device that's somewhere quite hardware near to that could be a better solution than doing a webcam and a robot in front of the webcam but of course doing screenshots is a little it's not that performance testing it's more like it doesn't look right any more questions? I know I'm quite fast when I speak so I left some time for questions hello, did you experiment with the speed of the robot? speed? of course it's quite limited because those servers are not that fast they're made for flying an RC plane and the robot can be faster the problem is not the robot the problem is the image recognition and the transferring of that image an iPhone 5 like this one has an image of size when it comes from the screenshot it's 1136 times 640 or something it's quite large and it has a very high resolution so it's quite slow to do its screenshot it takes about a second and of course the image recognition is not that far because it's based on images not on live video feed it probably could be faster but with this setup the pro of this is or the good thing about this is you don't need so much computation power to do something like this on your Mac or something you could probably do this on a Raspberry Pi but it's a little flappy and I don't want to do a demo on a device where I know this could break so you could probably setup some of those like 10 for a sheep because you don't need such a powerful computer to do or you just do it with one computer for like 10 robots that's of course possible as well any more questions? no, okay thank you very much for coming