 This is how many of us test mobile software. Beautiful, clean, elegant, if we're rich we have lots of devices. This is how I actually use software. Like the man at the bottom, that's what happens when our software isn't quite as good as we'd like it. A fairly infamous quote from Donald Rumsfeldt looks at this concept of known knowns. There are things that we know that we know. There are things that we know that we don't know, and in software there are many things that we know that we don't know. And then the scary things are the things we don't know that we don't know. And as Rumsfeldt said, it's these ones that tend to do the more interesting problems. Before we worry about the unknown, so let's think about the knowns. And one of the challenges and the dangers of knowns is when we believe something is true, but actually it's not true. In many software development teams we believe we have an idea of what will work in the marketplace. We believe we understand that when we write the software it will work. But sadly life isn't like that. So it's really important to remember and challenge and correct ourselves so we're not fooled into believing our tiny little brains thinking that we're the right people. The next challenge is much of the information that we'd like isn't necessarily visible to us. On the left there are people hidden in the pictures. On the right is an example of an x-ray where using tools allows us to see something that's important to us we couldn't see without the tools and the same holds true for our software. Here we have some examples and this is meant to be a sort of sliding scale between 0 I know nothing and 10 I know everything that matters of different aspects of the software we're working with. Changes to the software. We can go and look at the Git logs or the commit logs of our source control system and maybe we've got some idea of what the changes are. At least we know what happened if not the details of them. We may be able to use software tools to look at the complexity, things like McCabe indexers and calculations. This seems to be a messy part of the code, we care more about it. Crashes, many of us with mobile apps now integrate something that captures the crashes. Google Play brings the crashes back for apps that are released through Google Play etc. So we can go and plow through crash logs and many, many people are now researching using crash logs to automatically be able to reproduce what caused the crash. Any of you work with mobile apps? So what percent of the bugs do you find that are reported because of the crash? Is it sort of 80%? 50%? 10%? 2%? 1%? You must have some idea, does your software ever crash? Yeah, the rest of the software never crashes but sadly never works. Okay, don't look, don't know. But typically crashes are a couple of percent for most projects and so that means that all the money we're spending on crash analytics and the rest of it is good but we're missing the majority of the potential problems we're facing. Devices, I'll show you later on that you may have 10,000 different device models using your app, particularly for Android. And just to help my good friend Naresh, this is a phone. This is an Android phone and someone's software doesn't work very well on it. Yes, terrible. The man who uses an iPhone. Emotions, say emotions about our users, yeah, they pissed off with our software. They're happier, they're enjoying using it. They're going to recommend it to their, no, sorry Naresh, I'm not even going to recommend your software. Anyway, feelings and flaws and settings because we may have our device in night mode. What does that mean for our software? Does it affect the usage? Et cetera, et cetera. Microsoft Research in about 2010 started looking at software development and they realized that with software development virtually everything we do is digitalized. Huh? How did that happen? I have no, do you have remote control of my computer? Anyway, right. So this work in 2010 was essentially looking at past, presence and future and most of us, if we do any reporting at all, we're in this little quadrant here. We look backwards and we do some of the reporting, maybe we do a retrospective but we don't look much more than that. For those of us who have live systems, we have live mobile apps who release at say six in the morning, whatever time zone that is and we start to see something's happening, we start to see alerts coming back, perhaps we see crashes, we then need to do something about it, perhaps we need to make a decision. So for those poor sods who have development teams working 24 hours a day to try and get the conference app working, they've just gone to sleep at 3 a.m. Do you wake them back up again when you realize the software's not working or let them sleep? That's the next decision and for some of us with 100 million users, this really matters because you can see a rollout over the first few hours maybe the first million users, first 2 million users, you start to see some behaviors and you need to decide, do I pull the app? Do I try and get the team to fix it? Do I staff up the customer service because we know we'll get more complaints about something not working? These are all important factors but we're not necessarily paying attention to it. All of us think we live in the future. All of us think we can predict what's going to happen, the growth rates of our applications and that's kind of useful but we often assume a sort of linear behavior. That means that as the user goes up 15%, we need 15% more servers or 15% more bandwidth but occasionally we'll cross a boundary. A simple example for you is when we turn on a tap in the house, in the kitchen or whatever and the water comes out first in little drips and we turn it a little bit more and we get bigger drips and bigger drips and then at some point it changes, it becomes a stream. It's normally a fairly smooth stream and we keep turning it up and get more and more water and eventually it becomes a chaotic behavior and starts spraying everywhere so imagine that also will happen to our computer systems and their behaviors. If we look at computers, typically we work in an area of RAM, in memory, and that's fine as we have sort of 2, 3, 4, 5 but at some point we cross the boundary of the available RAM and then it swaps to some other storage. That swapping doesn't have a 10% difference, it can have a 100% or a 1000% difference in the performance of that characteristic. So their work is essentially looking at the holistic software development practice, everything from static analysis, reviewing logs, and putting apps into software. The next thing is there's roughly 2 forms of feedback. We've got digital feedback and analog feedback. Digital feedback is generated by software and essentially it's 4 machines. So the logs that we generate from our web servers, the analytics that I'm talking about in the mobile apps, etc. is all software for software. The nice thing is it's consistent, it's easy to pass, it's ubiquitous, it comes from the entire population of users but it doesn't have much about the feelings or the users. The analog side of it is generated by people and it's aimed at people. Has anyone provided a review in the App Store? What motivates you to provide a review? About experience, which is down to your emotions. There's something that says, OK, I've had enough of this, I'm going to go do something about it. So it's people generated, but it's sparse because most of you didn't put your hands up. And as we'll see in a moment, it's a very small percentage of people who bothered to provide feedback at all. It's rich and by that I mean it's rich emotions. You get a lot of information out of a good review and again it's human oriented. Most of us have got a mobile phone, a smart phone, most of us have installed applications. So this is the iOS view of an application. I think it's a conference app, not this one because this has got three stars. I don't think I haven't provided feedback yet. And this one is the Android side. Now this is a very typical sort of shape, a sort of semi-horseshoe shape of feedback. We'll get quite a lot of one stars, five stars, quite a lot of one stars, and then we'll get some sort of distribution in the middle. And here we've got quite a lot of four stars as well. We can see the release note, what's new. Now it's cool to admit you've got bug fixes. Again you've got the same sort of thing here. It's a little bit of information about the downloads and we can see in small print that various bug fixes in this application. You're probably fairly familiar. Again this is from iOS and this is from Android. There's some subtle differences in the feedback. Here we get the Google Plus profile. We get the name of the person who provided it. Of course it's assuming it's a real name. There's someone called CD World or something, not very real. And we see some quite rich comments from someone. So I use this application to attend online conferences. As other reviewers have noted, the application disrupts your session when the phone displays a notification and the application then requires you to sign in again. Blah, blah, blah. So they're talking about the application. If we were responsible for this application, we can look at this and say, oh yeah, what should we do when a notification appears? Can we test this behavior? Can we improve the behavior? Do we really need the user to log in again? We have a phone from portrait to landscape mode, particularly on Android. What the heck does the software do? Does it lose our information, et cetera? This example is from a bank who should know better. You can see that the current rating is under two stars. Their overall rating is two and a half stars. And the user is essentially complaining about things like, when they're using it home on Wi-Fi, it's sort of worked. But the moment they cross out of Wi-Fi, the application falls apart. Have we even thought of testing that we're in our little pretty office testing on the high-speed Wi-Fi that the hotel provides? This is a Google application, so hopefully it's come from competent engineers. 3.8 score. And it essentially allows you to connect automatically on your networks. Is something weird going on? Don't mind. Probably your electricery. Anyway, so we can see here someone is actually using switching between 5 GHz and 2.4 GHz. Depending with the update to the application, they can no longer swap. So was this by design? Was it by accident? Because it's an Android application, it's available on the Nexus devices which tend to result to developers. We're getting much more technical feedback here. So very rich, and perhaps this can help us with what we do. Most of you will have heard of Kindle and the Kindle iOS application. This was around the 30th, 31st of March last year. What happened is they did an update. Now, even if you can't read the full print here, because it's quite small, the first user says, perfect and keeps getting better. You're the Amazon development team. We're cool. Oh, hang on. How to break an app, one star. The response of the app is now slow. Awful. App updated ruined my Kindle. Someone who still likes it fab app. Good, but dreadful after recent update. Awful after update. Latest update, one star. And what I think is really cute, what did you do, Amazon? Because something's broken. Now, we can see that Amazon was paying attention because roughly six days later, they updated the application and they admitted performance and stability improvements. What we notice is even a couple of days later, that's the 50th and the 6th, they started to pull back on their ratings. The same app, it hadn't been updated, another month later, had actually surpassed the old ratings. So we can see the importance of ratings and good customer feedback. With the Apple App Store, you can't as a developer respond directly back to the users. With Android, you can. So it means that all they've really got to communicate with is this and good PR. Whereas with Android, you can have a discussion and a conversation with the user and say, hey, Fred, could you tell me more about what was going on when you saw this problem? Would you mind letting me do this sort of test or whatever? A couple of things worth knowing about reviews. This is from a company called App Tentive. And what they're essentially noting is if you go from a one-star to a two-star, it significantly increases the number of downloads and the number of conversions we get. And they basically have lots of little numbers. This is around five times the volume if you go from two-star to five-stars. And numerous companies have noticed that as the App Store ratings improve, it materially improves everything about the app, including the revenues. I'll give you an example of that later on. So it influences people. We can use it to measure and engage with people, particularly when we provide feedback, and ultimately can use reviews to help us with testing and monitoring. There's a lot of talk about DevOps at the moment. Well, part of DevOps is understanding what happens when our software is being operated and why we wouldn't call our end-users operators. It is part of operations which is supporting them and working with them. This is a real application. It has several million active users at the time. We just got an update. And we noticed that the app dropped from 4.4 to 4.3. So that was our pattern, and we started getting more low reviews. So what difference do you think is made to our business? I mean, does this even matter? You wouldn't have noticed this, would you? It's millions and millions of users. Yes, it makes a difference. Now, it makes a difference, first of all, with revenue. I think our revenue dropped the in-app purchases, because there's an app that has lots of in-app purchases. It was free to download. Probably dropped about 10%. Give or take 5%. So between 5% and 15%. The drop was fairly immediate. What was the next thing it affected? Yes, we had fewer new users, so fewer people doing that. So we dropped in the search results. And many people, when they search for an app, don't type, I want the Agile India app, and by the way, you don't find the app that way. You have to type Conf Engine to find it. But if you're looking for conference apps, you'll find many results, and you're just one of many of them. So if you're not in the first view, then you're not found. So that affected things. We managed to track down the cause of this, and the cause was quite a small thing it seemed to us, and this is what we believe was the cause. When we fixed it, the reviews went back up again. We had two threads. One thread drew a box on the screen. The other thread went to a server and got the text to display. English is a fairly short language, and it's a default language. So this isn't the real message, but I've given you an equivalent one, so tap on the color you prefer to continue the game. In German, we're going to see it's quite a bit longer. This is French, and this, I think, is Sinhalese, which Google Translate kind of gave me. So what would happen is sometimes the dialogue box would be drawn first and sent on the screen. Sometimes we'd get the text back first, and if we got the text back first, it was bigger, like German, we'd draw this box a bit bigger. But what was sometimes happening is that we drew the box, and then we got the text, and it was bigger. And what happened is the box resized down, down and to the right. And although it's a very subtle thing, it took a while for us to notice this, what seemed to be happening is the users perceived that something was wrong in the app, and it sort of somehow affected them emotionally, and it stopped them using the app so much. So we fixed that, and thankfully it helped improve things. But it's an example of how paying attention to the reviews, paying attention to the ratings, can help us find problems. Unless you've got an Android app and you're actively involved in the distribution, you won't see this console. But if you've got an Android app, then you have this available to you, and you have lots of different information. Here we're focusing on the review analysis. So all the reviews we got, we tried to categorize them based on the text in the reviews. It tells us the average ratings based on the groupings. It shows you the number and the sort of arrival rate. It tells us how it affected our ratings compared to our peers. Then finally it shows us the effect on the ratings. So we can see this is a very, very strong negative effect which is called stability. What Google classifies as stability problems in the reviews is dragging us down. Now one of the things to be careful of is Google isn't very good at categorizing. The current algorithms seem to have lots of flaws in them. So you then need to pay attention to individual reviews to see if this is relevant for you and whether what they say is an install problem is actually an install problem for you. So far so good. I've given you some examples of reviews. I hope to explain a little bit about how they can help us. Here's the real problem. This is thanks to Joy for this little cartoon pictures. Roughly 1% of users write reviews. And why do they write them? Because they're pissed off, because they're happy. What about the other 99% of the users who didn't provide feedback that this one represents the 99%? So they probably don't. And we need to think about that and say, is there anything else we can use that might help us understand the 99%? So I'll move on now to this concept of analytics and heat maps because the two complement each other. On most typical development projects, we have some sort of app that we build from code. We have hope for some testing, maybe human testing, it may be automation, it may be a mix. And we have a very strong fast feedback loop. We update the app. We get feedback from our tests in the next minute, the next hour, the next day. Here we get the app store feedback. And this is ours today's two weeks a month later. And as I mentioned, at roughly 1% of users, it can be even less than that for some apps, provide feedback, and then sometimes people go onto Facebook or Twitter and complain about the app or say good things about it. This concept of what I'm calling in-app analytics, which means we modify the app and add this analytics library, like Google Analytics or Flurry or Mixpanel, also gives us this very rich feedback cycle. And typically the latency is between about 15 seconds and about an hour, depending on which product we're using, and a few other details. The slides will all be available. I'll make sure they're on the site. And they're all creative commons, which means that you can take them and modify them so you didn't claim that you wrote them. I think it's a fair compromise. And the rest of the logo is his. Anyway, this is conceptually how it all works. We have the mobile apps. They send analytics data, so they typically use a network connection to do this. So the better libraries will store the messages and they'll either batch them up or they'll wait, say, if we're in flight mode because we're flying somewhere. It then gets sent, typically, over the internet to a data collector server, put in some sort of database, typically filtered so we're not looking at raw records. And then here we're looking at a pretty graph from Google Analytics console or whatever it is that we're using. Any of these steps can be delayed, as I mentioned. With Google providing that the events arrive within four hours, it'll process them the same day. It doesn't actually say what happens if they arrive after the four-hour window. And it doesn't necessarily talk about time zones. So if we're looking at an event that happens on Indian Standard Time, how's it get reported in the logs in UTC or in Mountain Time or whatever it is? We need to think a little bit about these details. So how does all this clever analytics data help? Well, broadly, it means we're now getting information from the field. We're no longer limited to our guessing of how we think the application is being used or our hypothesis to quote Jess Humble. Hopefully it reduces our cost of automation. To be blunt, the users pay for the data. So the data that sends the analytics data back comes to us. We may be paying someone like Google if we're on the high-end tiers for the analytics library, but for most of us, the small volumes, it's free of charge. And perhaps we can use this usage to help improve our testing and bring realism to what we're doing. It also helps with development practice, but typically it's giving you feedback that helps us improve the testing practices. So we've now inverted the model. The previous models, we had 1% of the feedback. Now we've got minus one, which is the people who are not available in terms of data. It could be they're in an aeroplane right now. It could be they're behind a corporate firewall that blocks the traffic. It could be they've turned off data or whatever. Relatively few outs give the users the choice of opting in or opting out. So typically order magically gets the data from the entire user population. Again, one of the questions is, do we need to care about the 1%? Is it important to us? And one of the harder things to find out is the 1% which may be a device that's not reporting in. Now it could be that your software doesn't install on a device. Have you thought about that? How will we ever know? Well, we can go and look at industry data and say the top 100 most popular devices and we can extrapolate for that and say that if these are the top devices, we'd expect to see our software used on those top 100 devices unless there's some reason not. So we can see gaps we may go and investigate and perhaps we'd borrow one of those devices and test our app on it. So again, we're using data and analytics to help feed what we're doing. I mentioned about heat maps and analytics. So heat maps focus on the user interface and they're capturing the touches that we do. And they're called heat maps because you get an overlay of the screen and where we have lots of touches, the colors tend towards red. Where we have few touches or contact, it tends towards blue. So we get a color and I've got an example slide for this next. So we can capture the GUI events. We can get, effectively, screenshots and many of them provide the equivalent of a slow speed video where they record every half a second what's on the screen and if the application say crashes, it gives you a replay crash log so you can see what was happening in the user interface before the crash occurred and some crashes are caused by what the user did. Some may be caused by something else like a ramp message come from the network, but it helps us to diagnose the problems. In terms of the application logic, then, that's where the sweet spot is in terms of mobile analytics. It's looking at what we do, not how it's presented to the user. And we can get some data back from the operating system. Mobile analytics can do a passable job of telling us something about the GUI. It can do a passable job about telling us things like the error codes we get back from the operating system if we've requested the operating system as a network request or stores data for us, etc. So here's an example of heat maps. This comes from an example by App-C, who's one of the better companies in this marketplace. And this is one of these really ugly terms and conditions that, thankfully, we haven't got in the conference app. All caps, goes to pages, and guess what we've got to do? We've got to agree to giving away our firstborn child. But we can't read that, so we just tick the little box. Now, what we actually wanted the user to do is to tick a little tick box here and then press agree and continue. But the people who developed the application didn't really think through usability. And the user just skimmed through this and see the big button and hit the big button. The trouble is when they hit the big button, the application didn't do anything, it just didn't. So they lost something like 27% of the users at this point. So using the heat mapping, we can see that there's a lot of presses here and there's a lot of presses here, but there's also a few presses here. And I guess they're thinking, that looks like some sort of menu or active item, so there's a few presses up here and no one's scrolling the text. So from this information, they've realized that what they could do is, this is what's known as the active area. The active area is when you touch, it actually responds. The increase of this to the active area, which meant if someone showed willing and touched this, then it would be accepted as a tick box. The second thing is if someone pressed this without ticking that, they had a little dialog box that popped up and told them, we need to agree with to this. And this significantly improved, I think, by about a third the take-up rate of this application, because the users now had more information, which was cleaned by using the heat mapping software. This is an example from HP. I'll be fairly brief in this, except a couple of details about how they do analytics. The first is that when we add analytics to a mobile app, typically either developer have to modify the source code. And I had little messages to send events out through the analytics libraries. Quite a lot of code to remember. It's also important to know what we record. And again, if we don't record something, the data doesn't arrive. So we need to think clearly, have I covered enough of the system to get the data? The HP one does that all magically for you. So it's sort of auto-instruments. Some of the newer libraries these days, like Google Analytics, will also auto-label your screens if you don't. So it'll say, this is the login screen, the welcome screen. It does that by sort of reading up the name of the screen and the text that's displayed in the screen title and creates analytics events for us. So you see one of the first people do that. And they have this concept of FunDex, which is some magic score they calculate based on the number of crashes and the usage of application. What typically flows through, we'll see the user journeys through the application on most of these tools as well. So that can help inform how the software is being used. This next example, then, is a typical Anglo-centric view of the world. So our development team was in central London. That was in use in about 50 countries, I think, and multiple languages. And we just released a new version, and we happened to go and look on the analytics. And guess what? Most of our users are actually in Paris. Now, one thing that's key about Paris is most people speak French. So most of our users are using French, but we were just testing in English, even though our team probably between us spoke 20 languages. So even the simplest information can help us to do a little bit better of our job, because we now realize and say, oh, yeah, we've got to remember we've got users all over the world. Let's make sure we test the software in those languages, and if we need to test those locales. I mentioned about Crash Analytics. This is for one of the apps I work on, which is part of Wikipedia. It's the offline Wikipedia project called QX. This is from Google, and we can see the detailed information about the Crash down to the method call. It tells us it's new, so it's the first time Google's noticed it. And we can see there's one of them. This was last year, so it's October. Again, we can see a class not found. They shouldn't have them very often with Java software, but for various reasons, they may be doing what's called dynamic class loading. Again, we can see it's new, and there's one of them. This one's quite popular. We've got 70 errors of index error, about a bounce exception, so we're trying to read across an index. Maybe we're counting a number of files, and we've gone one too many. This is typically a programmers bug, and this is some low-level Androidy bugs that we don't know about. Something to know about is with the Android applications, Google now tests your application for free automatically, providing you opt-in to this service. And soon they reckon it's going to be a totally common place. And they have what seemed to be a consistent 11 devices with a mix of older versions of Android, which is still very popular, KitKat, about a quarter of the users are still using that, and then Android 5, 5.1. Testing in Hindi, English, Arabic, I think that was the main language occasion, I think to see German as well. We can see in this case, version 30 was clean, version 31 was clean, 32 had a problem. We've got seven failures, four seemed to work, and version 33 seems like it's okay again. All this stuff does is, automatically, act like a robot and type on the screen. It does that for about five minutes. We can see here, and I think I've got more details. No, I've got all the one screen here, so here we can see all the seven failures, and one factor is the version of Android. It turned out the bug was caused by the way that Android takes the way we've designed the screen, which is stored in sort of fancy XML and then draws it on the screen, and we're using a particular Android library that had a bug in it on Android 4.0. So once we realize that, we've got this error message saying error inflating class bottom, we're able to fix it fairly quickly. As ever, software isn't perfect, and version 37 failed with exactly the same error, but the report only showed us of three of the seven, but when you looked into the details, they're all the same error. They actually give you a video of the five minutes of testing that they do automatically for you, so you can replay the video, which is a bunch of screenshots, and get some idea of what's going on, and this is all free. So tomorrow, if you're here, I'm going to be talking about automated tools and how they can help improve, this is academic research. It was three years following two applications, both multi-million user bases. One was a multiplayer game called Magic Kingdom. The other one was a learning education act called Study Blue, and what they noticed is that the battery drain on devices varied by three to one between what seemed to be similar devices, similar hardware, similar specs, but for whatever reason, perhaps firmware wasn't as well written or had a bug in it, they found the battery declined a lot. Why does that matter? Well, with the Kindle Fire, which wasn't one of these, with the Kindle Fire, they noticed a session length was 17 minutes instead of 22 minutes average. They weren't sure why, but what they realized is it was using more battery, and the users probably weren't paying attention to the battery, but somehow they knew something was going wrong, and so they cut the session short on the game. They changed the code in the app to essentially turn the screen down just a little bit if it was on a Kindle Fire, and when they did that, they found the session duration went back up again, so we can see, again, cause and effect in how measuring the information helps us. What might be sort of obvious to the people who know about it, the network latency, particularly for software that uses the network a lot, has a massive impact on usage, in this case, 40%, and the users preferred Wi-Fi. The game used network more, so they seemed to prefer it more, and that means we had more usage on Wi-Fi for the study app, which didn't need the network so much, it wasn't as big an indication. Tablets had twice the usage, but so did devices that had a physical keyboard, so we could see how the user factor, the form factor affected things. How many of you have tested a mobile application? Anyone? Not many of you. One of the big challenges in software and in testing is how many devices are enough. If I test on the rest of the phone, will his software work on all phones? If I test on this one, is that enough? How many do I need to test with? Is it 2? Is it 10? Is it 50? Is it 1,000? Well, this is one of the many questions we have. This is data on the top from a project called OpenSignal. OpenSignal is a bit like the ADSL speed checker that you're using for testing your broadband connections, your cable connections, and you go and say, how fast is my network at home or in the hotel? It comes back and says 52 megabytes or whatever it is, if you're lucky. We have a little mobile app. It runs on Android and iOS. You can run it and it collects data. It gives you comparisons. What they do is they mine the data. This was a report from the summer of, I think, 2015. The most popular device was one of the Samsung Galaxy Rangers. The second one was also a Samsung and then we have a bunch of other devices. How many devices are on this picture? Ish. 1,000? 25,000. And that was just people who opted in to use this application. So it's not the full population of Android devices, but 25,000 distinct devices. If you sort the data slightly differently by manufacturer, that's what you get. This, again, is just over a year old. We've noticed that Samsung is declining now, but Samsung was about 40, 45% of the entire volume. So, again, if you're working with your software, would it be sensible to test it on a Samsung device? If you were to buy a second device, would you buy another Samsung or would you want to test it on an LG device? Do we know whether it's better to test it across different manufacturers or should we keep focusing on the biggest users? Should we be testing on Lenovo, et cetera, et cetera, et cetera? This is just by device. But, again, all the data is available. One of the biggest causes of bugs with existing applications with Android is when the operating system version updates. Now, as you probably know, if you've got Android devices, unless you've got something like a Nexus or a Samsung top-end device, we don't get many operating system updates. What we do find is our existing software, when it goes on a new version of the platform, tends to have bugs related to the API. It's something we assume that's changed subtly, and there's a lot of academic research that says that's one of the biggest factors. If you remember from the example I showed you from the Google pre-launch report all related to Android 4.4, we actually found the bug also affected 4.3. Now, to be fair to Google, we hadn't tested the app at all. We didn't know there was a bug until they told us there was a bug for our application, but once we knew the bug, we were then able to go and test a bit more around it and work out what the problem was and find a fix. For those of you working on iOS devices, gone are the days when you simply tested on your iPad and your iPhone you were done. This is an overview done about a year ago of all the different devices, the operating system versions, and the functionalities in the device. With the newer iPhones, we've got a fingerprint reader that came in the version 5 model. The iPhone 5 did not have the fingerprint reader. It simply had a home button, but when you got to the 5Ss, you did. If you don't know that, and you're not thinking about testing it, it may make a difference. This is enough for 80% of reviews. It looked at games, it looked at free games, and paid games, and this, I think, was really cunning, smart research. If you remember that the ratings and reviews we got really affects things like our in-app revenues and the rest of it, but only 1% of our users provide feedback roughly. So what they did is they looked at distinct device models, and that's available to you, and they'd say, ah, this user provided feedback on a Samsung Galaxy 3, whatever. This one provided feedback back on a Lenovo, blah, blah, blah. So they said, if we test on those devices, hopefully we'll find the bugs, but it means these people won't complain. So let's not worry about the people that don't complain, let's just focus on the ones that do. And what they found is you get into the order of sort of 10 to 15 devices enough to give you 80% coverage for most applications, and there's some slight differences between the two if it's paid or free. So we can get all this wonderful data, but what does it mean to us? This is an example from another academic paper, and the red circles indicate a key press, and it's actually these little blue lines of the network. So what you may notice, it's hardly to read on this resolution, is there's a network event that seems to be happening roughly in line with the keyboard. So you're the engineering team now. Is this a bug? Every key press does a network message. So we've got correlation seemingly. Come on, be brave. No, why isn't it a bug? Thank you, that'll do for me. It depends. So let's say this is a search engine that goes up, and each time we type something, it gets a more refined search. So if I'm typing in B, it starts to get some B results. B-A, some B-A results. B-A-N, some B-A-N results, etc. Until eventually I find what I'm looking for, which is Bangalore realisation, or whatever. It's not exactly what we want, but it could be being really wasteful and wasting network messages for no good reason. It's sending every key press that we press. So we need to go and analyse this and make some sense of it. And typically humans are better at this. There's another key challenge for us, and that is that as human beings we tend to over-trust automation. For those of you with lots of continuous builds and thousands of automated tests running and giving you lots of green bliss, well, we've got 10,000, 15,000 automated tests. They're passing, let's push to production. Where could the problem be? This study was done about 10 to 15 years ago looking at pilots and autopilots. And what they found is when they presented the pilots with information that was clearly wrong, like they were taking a strange route around the world, they're going from Bangalore to Chennai via New York, 40% of the pilots would still press the OK button for the autopilot to take control, rather than correcting it. And it's a very, very common human behaviour. There's a very nice book about the impact of human trusty and automation in medicine, etc., etc., called The Glass Door. So I'll bring a copy tomorrow if you're around to have a look at it. So what can we do with all this information? Well, hopefully we can improve what we do, the software we write, and the experience for our users. From a testing perspective, I have two broad categories, testing based on information, so things like popularity and volumes and different devices that are being used, the flows, the crashes. What we're really focusing on now is can we reproduce something quickly and can we learn it in our own environment from something that's reported in the field? We're also then looking for insights which are things like the rates of change. If we notice there's a drop-off rate happening, perhaps there's something else that's a real problem and all we're seeing is a symptom of it. If we're crossing the thresholds of running out of memory, maybe that's important. And sometimes we see anomalies that we didn't expect. We didn't ever predict that users would do this or the application would behave this way. So now we're focusing on maximizing the insights and agility so that we can respond better. Most of you will know about this company. What they did is they mandated on Tuesdays to have a mobile network that behaves like the best Indian networks, a 2.5G GPRS network. And the idea was that if you're stuck in Silicon Valley, it's very, very easy to believe that having 100 megabit network connection is normal. Oh, your 4G is going to be so much better, isn't it? It is changing, but the idea is to show the developers that the world isn't all the perfection you have in a fancy office in Silicon Valley. That's the key message. And so you have a choice when you go into their offices to switch the networks over on Tuesdays and see the behavior. The next thing, and again, this calls a lot of Ferrari in Facebook, is virtually everyone had an iPhone. But the key market still trying to address the majority of users had Android. So they said to people, right, use the devices our customers use. Why not? It doesn't seem that hard to think about that. But people are a lot more empathetic when they've used the Android app and see the bugs as a chance to fix them. Even better. Seven times that's a lot of money, though. So academic research again. This came, I think, from Spotify plus a bunch of academics. And what they were doing is they were mining the feedback for their apps. And they categorized them with things like functional errors, and they found that that was the top ranking. But they also looked at the most important bugs that were coming back from the reviews. What ultimately they noticed is the two-star feedback was the one that had the most important information in it. Because it's very easy to go one-star or a five-star. And a lot of us do. Hate the app, uninstall. Love the app, ace. Someone who provides two-star has thought about it. And they've thought about it. And typically the comment they give is actually insightful and says, I did this. This has happened. This is what happened next. I wish you'd fix this. So quite powerful stuff to look at. The next thing, which is kind of surprising that this is for Android in particular, is that you have the option to respond to users. We know that responding to users improves the communications, and yet only 4% bother to do feedback. So if you're not yet providing feedback to your users, why not try it? A little tip for you, don't do copy-paste. Thank you, user, for your excellent feedback. Because the users see through that and go screw you one-star. So you need to respond to them as if it's a personal response. Dear John, thank you for your feedback. We've raised the bug for it. Here's a link to our repository. Please try our app again in the next release, or something that gives them a feeling that you've actually paid attention and that it matters. I've got a couple of minutes and a couple of slides. This is work from App Tentif. And what they allow app developers to do is add a communications channel in the application, a way of directly communicating. This is important for various reasons, but what I'll focus on is they've got 15 times the increase in the ratings. So we went from 1% of the people running feedback to roughly one in six. And there's lots of other clever malarkey that you can read about on their website. I and some academics have got paper published on ways to improve this, which we're giving in two months' time. Pottery search work is looking at providing recommendations for tests and generating automated tests from the feedback we're getting from analytics and from reviews. I'll kind of end on this little book here. So you can download it online from essentially the book's web title.com. I think I've got this print copy and one more for tomorrow if someone has a good reason to justify getting it. You can ask questions until Naresh says stop, which is probably about 30 seconds and there's an open source app that I've been developing to enable you to review reviews and score them. So it's very much a work in progress, but you're very welcome to get involved. With that, I'll finish and hand over to whoever's in charge. I think in fast work of thin, if you get that at an individual level, that's a problem. Yes, so I'll repeat the question because you're not on the mic. The question is, does the heat mapping work at an individual level or at an aggregate level? If it's working at an individual level, it may have the potential to capture sensitive information such as pins, the girlfriend you're looking at when you shouldn't be looking at because you're married, et cetera, et cetera. So in short, it records the individual actions. So absolutely it has a challenge in terms of privacy and typically what will happen is as the app developer you will mark out areas and say this is sensitive information, don't record this. But it's a choice of the app developer. Now one of the challenges we have as users of apps is very seldom asked, do we want it to send data back? It's something that app developers typically add and say this is great for us, this is really important for us, we love the data we get back, isn't it great being able to reproduce the crash in seconds watching the video, but we've also got to think through the privacy aspects and the impact on the users. And I've stripped out those slides from this presentation because I only had the 45 minutes to go through it, but it's something I think is really important is dealing with the privacy implications and minimizing the information we collect and giving users what's known as informed consent. An informed consent isn't the tick-through box after 126 pages of the Apple terms and conditions, it's actually helping users understand what we're trying to do and get them to choose to give the information. So it's a major rethink. The more we collect data, the more we have to think about responsibilities of that. So thanks for the question. And I think it's time for me to finish because they're going to reorder the rooms for the evening. So I'll be around for about another 10 minutes, I'm around tomorrow, and I'll be back after about half-hour phone call I've got to do at 6 o'clock. Thank you.