 Welcome to the Wikimedia Foundation We're giving a talk this evening about failures and selenium tests. We've been a little cheeky and called it epic fail Because it is epic When you do a unit test, it's a very encapsulated. It's a very small thing when something breaks, you know just why it broke When you're doing browser testing, you can have things fail for any number of reasons We're going to explain our system that we use here at the foundation to do browser testing with a number of tools Number of hosts a number of interrelated systems We're also going to discuss we're going to the core of our system is a continuous integration tool called Jenkins We're going to using only the evidence that is presented to us in Jenkins We're going to reason through some of the reasons for some very common browser test failures that I know you've in here in the room have encountered and You've seen these sorts of things and we're going to look at what Jenkins presents us and take it from there We should also mention that It's possible to assert Things about software tests in a number of different ways From the very simple to the very complex in this particular case. We are using the assertion library in our spec it's a very Commonly used Ruby assertion library. We're using the very bare bones aspects of our spec But if you're interested in the particular syntax of anything we see this evening That's where it comes from But our main concern is the system itself and how things fail I'd like to explain a little bit before we jump into the actual Demonstrations of actual failures. I want to talk about the systems in which these tests are running I've made a little diagram here, and I know it's difficult to see online I'm hoping I can explain it well enough to get my point across at the core here We have Jenkins Jenkins is a continuous integration server. It's they used to be called Hudson It's been around a long time Jenkins does anything you give it a set of commands And it issues those commands and it tells you what happens in our particular system When we start a build of selenium tests in Jenkins The first thing that the Jenkins server does is it reaches out to another server where we have our source code repository This is in git. It's managed by a code review system called Garrett Could be anything it could be CVS could be SVN it could be per force it could be whatever you want But you want to keep your tests in a source code control system of some sort We happen to use git and the interface to get is Garrett When our tests kick off on the Jenkins host We have particular versions of Ruby particular versions of selenium Particular version of a wrapper of selenium that we call water web driver a number of other tools This is all completely open by the way you can find all of our configuration live open online It is open to your perusal in the source code system on both Garrett and also on github When the test kicks off We pull over our tests our tests run according to both browser and Target test environment. We'll see more of this in detail later on We send that information off to our friends at Sauce Labs and We tell it to use a particular browser and a particular Test target Sauce Labs spins up a virtual machine on command that is running firefox chrome or some version of internet Explorer And we will point our tests to one of several test environments We have a test environment that we call the beta labs cluster this is our Completely fresh beta labs every few minutes is updated to the latest version of the master of head We also target the test to wiki. That's what we call it It's a peer to the English wikipedia the German wikipedia It is another node on the production wikipedia cluster and we treat this as our staging area We have a number of tests that run against production just sanity checks These are our main test environments right here Are there any questions? So far about what we're doing here anything unfamiliar anyone want to Clarify anything Yes, sir. Thank you quick question. So when you Call Sauce Labs you're using that essentially as a the Jenkins slave at that point correct. No, sir The Jenkins slave is actually a technical term if you you can have multiple instances of the Jenkins servers And one Jenkins will report back to the other Jenkins. We have only one instance of the Jenkins Tool running right now what this means is that Sauce Labs provides an API and You you call that API with a number of different parameters your your user your password your She'll cover no more details than I would you can follow up with him at the time But there's a number of parameters that Sauce Labs API accepts and when you send it a well-formed request It just operates on that request So it is not in that sense a slave to Jenkins It is a two-way communication over the Sauce Labs provided API a virtual environment. Yes Upon issuing this command over the Sauce Labs API Sauce Labs creates on the spot from scratch an entire virtual machine with an operating system I don't even know what else is on there. So let me turn it over to my colleague I'll just start to be quick. So Life is going to my eyes Yes, as Chris said Jenkins slave is a is a is a Jenkins term and The way we run tests so Planning provides a way to run the test on your machine or on the remote machine And we just use this option to run a remote machine You have in that case you need to provide credentials to running on this remote machine and that's it So it's not technically a Jenkins slave, but yes, it behaves like that So the tests are not running on the Jenkins machine, but somewhere else So the format for this evening's thing in a moment. I'll switch the the screen share from from me to to Jenkins itself But we're going to talk about The problem with analyzing browser test failures is that you see a common set of problems and It's often very difficult to tell the root cause the underlying cause for this common set of problems So what we're going to do is use these tools available to us to look at some really common sorts of issues we're going to look at system problems that is Issues somewhere within this entire system where something breaks down After that we're going to look at Timeouts when your application simply freezes or doesn't do the right thing then we're going to take a break Breaks are good After that we're going to talk about some real actual Down-in-the-dirt problems of the test turn up and I'm going to point out that there are a number of people have Acted or tried to over the years to rename Automated tests to call them change detectors We're going to look at a couple of examples where the application being tested changed but it wasn't a bug and then we're going to look at a couple of examples where we found some bugs and We found have found some really important bugs in recent times I will make a note that One thing that I would have really liked to have shown you tonight is a browser specific bug In the past we've had we have one application that fights with Internet Explorer version 9 high E9 And this thing just don't get along unfortunately all of my Examples age off after 60 days So all of the examples that you're going to see tonight have we have incurred within the last 60 days So without further ado if there's any more questions, I'm going to switch over to Jenkins and I will also reiterate that We have a mailing list with about It was something close to a hundred people on it now. It's really high signal. It's really low noise if you're interested in these sorts of issues There's some really really smart people on the you look at lists that would make it wiki media org and just go down the list and find QA It might be worth your time So let me switch over to the Jenkins view and we'll start talking about system thinners So I'm going to start you off with all out thermonuclear heck They said in Dr. Strangelove all of our red builds that you saw there earlier. I also if you have your laptop I encourage you to follow along Um Wmf.ci cloud bees calm This is maybe the worst we've ever seen these red builds. They should have one maybe five failing tests, right? I've got 30 failures in single build 30 failures in a single build And you can see from the name of our Thing this is this is our Firefox build The Firefox build is almost always our most reliable build. It's running against our beta labs test environment Which is our less reliable test environment, but 30 failures. That's just terrible terrible. Let's see what's happening Jenkins gives us all the clues It is all the links to all the things that we need to see What sauce labs tells us Sauce labs you're asking about cucumber earlier Cucumber tells us exactly what's going on in the test We can actually see that I'm out on new pages feed And I should see this link and I should see this link and I should see an icon I should see a button and we can see that it timed out Looking for a really simple link So of course this is I was a little worried when I saw this right off hand So we follow a link and that takes us to sauce lab sauce actually has a really sweet New user interface and you can see Anybody knows what it means when you see the blue page says wiki media foundation error The host is down Beta labs was completely down This was and again without this visual Reinforcement It's really easy to say, you know This is terrible. This is terrible. What do we do 30 test failures in a single build, but it was simply that beta labs was down so we actually went through this entire rigmarole All the way out to here this entire system function Flawlessly until we got to the test environment, but the test environment was hosed so as a result of this and Certain other issues we are actually have begun a project to do far better monitoring of our test environments We should really find out that our test environments down before our browser tests run Make sense So this is this is coming it's not here today But it's coming and this is exactly the reason because we can't afford to fail 30 tests in a row Question yes, sir from comes online question is How often do you run? Okay question How often do you run sanity tests run against production and how do you configure tests to be run at different levels test versus Production doesn't have to be asked right at this moment That's actually fine. I plan to move very slowly tonight. I welcome questions I welcome interruptions feel free This is the the times at which tests are run is highly configurable in Jenkins and just in general As of right now today we run with with a few exceptions we run our test twice a day and one and we run all of them at essentially the same time at the One run goes overnight and then we look at them the first thing in the morning and Another run happens at around lunchtime Pacific time so that we can look at them before we leave for the day and I have an ulterior motive in giving this talk actually I am very much hoping that some of you will actually look at our failed builds I hope that some of you will join the mail list. I hope that some of you will say Why was this build failing? I want would really like people to ask me these questions I would really like people to look at our failed builds From the community and see and think very carefully and ask good questions for me and for Jelko and for the rest of foundation Why are these builds failing and what can we do about it? What can we learn from it? In the very near future Without going into too much detail what we'd like to do very soon is We have begun moving these browser tests into the get repos of the features that they are that are being tested So we are actually right now analyzing the possibility of running a browser test suite after each commit To that particular get repo We have a few technical issues to solve One of the things that we have done here at this all-hands meeting with Jelko and myself and some other people from France and from UK and from other places is talk about how to do this and do this well But as of right now they run as I said twice a day whether it's taste environments production they all run twice a day and they're all analyzed twice a day and When we're really lucky we report about Any other questions? Let me take another example here Mark Holmquist is gonna like this one. Oh I'll also point out that the the root cause of this failure of the entire Test environment being down was it was on a file system that was the file system itself was NFS And we did a really poor job of managing NFS So oh I take it back. I'm gonna take a before I get to some of these I'm gonna look at there's some other really intriguing system errors And we're gonna examine some of the things that you might see out of a Jenkins job Some of these are mysterious, but we have explanations Here we have our we have run tests for mobile applications. There's a gentleman talking about right mobile testing mobile applications These are some of our mobile mobile tests and again, we have our We have our link to sauce labs and we have our Cucumber page steps. This is a very mysterious Error unable to pick a platform To the best of our knowledge, this is actually a glitch in sauce labs The sauce labs is overloaded. It may actually tell you that it can't Find a VM or it can't launch one strangely enough. It seems to have actually done this According to what we have but it's reporting that it did not This is terribly frustrating. This comes in waves. We'll get we'll get this for an hour On you know, like a Friday afternoon or something. I'll show you one more Mysterious one also frustrating See we let it all hang out here at the foundation. We got no secrets We'll show you our mysteries because we like to help Like all the help we can get this one is perplexing the build is red Test result has no failures So what do you do here, right? You've got you've got no cucumber steps What have you done? You've got no cucumber steps You've got no error message Jenkins gives you a thing Jenkins actually is pretty smart As I noted here Jenkins runs on its own host and it runs Ruby and slay them on this host and Jenkins conveniently enough logs the console output For every run of every build and you can see we correctly actually find all of the tests to be run We load them up We attempt to run them Then Jenkins is telling me that the host that it runs on It's going to shell out to execute these tests and whatever happened on that host returned a non-zero result This is again one of the pitfalls of working with hosted services our Jenkins host runs on our service called cloud bees and From time to time the cloud bees host will in fact give us a non-zero result for a shell command again, it's terribly frustrating but Also again with a system this complex. It's something you have to expect from time to time This is my favorite Chats not around is he? Probably good probably wise Isn't this nice? It's red No cucumber. No status. No nothing all it says Failed to determine Failed to determine Okay, anybody want to guess where in the system? Zyalka doesn't count anybody want to guess for this went down hang on a second come back Console output console output is your friend The requested URL returned error 503 service temporary unavailable from Garrett Garrett was down for a while and so if you can't pull your tests You can't run your tests Can't crank up a VM because you've got nothing to feed your API So yeah Garrett was down at a really inconvenient time, but as you can see, you know the A cursory look at the test results doesn't tell you that much right Okay, so these are My grand examples from the last 60 days of system problems These are we've had a problem here. We've had problems here. We've had problems here They're all different problems They all manifest themselves in different ways and what I wanted to demonstrate what I really want to get across is that They can all be analyzed all of the information you need to understand What's going on in these systems is available in Jenkins in your continuous integration Server whether Jenkins could be Travis could be you know, whatever you use all of that information to analyze What's happening should be available to you and if it's not you're probably using the wrong CI server And if you've rolled your own you probably should be using a CI Michael might be living with some of my legacy test runners I don't even know if you guys still running that thing, but oh, I'm so glad Okay, any questions so far. I'm about to switch to a different Subject, but I wanted to make sure that you understood that this when we run these tests It is an entire system that we're exercising. Are there any questions about what we've seen so far? Awesome. I'll move on to the next Here we go It's dropped. Okay. Here we go. We have an application called upload wizard Mark Holmquist has worked on upload wizard. He says funny things about it and we have to develop in the room We have two of the upload wizard developers You can see we have a typical thing a timed out right and we know from experience at this point that Timeouts who knows what a timeout means right? Anything could happen Anything could be broken This is where sauce comes in enormously handy the first thing you see when you click on the link to sauce labs is the last Page that the test saw So here what you can see the very last page. It's kind of hard to see here But the very last page of the test saw is a spinner Here's upload winners wizard just spinning spinning away spinning away. This is a real failure But it's not a real bug This application uploads files to our commons thing if it's a large file It'll take a while if it's a slow network It will take a while and again the internet itself plays a factor here because you're doing all these hops among all these hosts on the internet and Timeouts are a fact of life, but timeouts and upload wizard are a big fact of life So I do have a test for upload wizard. I'm not that fond of them because they fail a lot This is something that Jocko and I talk about but we haven't actually done anything about yet It's on our list and I have a similar example of another Also, I should also mention that if you're if you're interested in the source code for these tests The upload wizard test is rather a masterpiece if I say so myself It does some really remarkable things. It was one of the first tests that Jocko works on Worked on it logs in with a secret password It generates a image file on the fly of a random nature. It uploads this file. It checks that the upload worked It's really a rather remarkable browser test because it involves a file system and it runs consistently on Linux Mac and Windows So it's pretty slick if you're interested in these sorts of things the source code for for the upload wizard test is really remarkable and Jocko is the architect of that Does some fancy DOM manipulation as I recall that's the Yes, it does We have another another test because this is another feature that breaks from time to time We have the ability if you bring up a page in Wikipedia you can download that page as a PDF and We have in the past broken this feature pretty often So we have an automated test for it and it's it runs most of the time pretty well But we go to the sauce labs and again we find the very last page that we see says Rendering As I recall this test goes to a random page if we pick a page that's particularly large The test will time out. We probably should not send it to a random page There's so much refactoring as you've seen in in our builds a lot of our builds are green I mean a lot of our builds are red today and again if you if you have your laptops with you I was hoping at the end of this session That perhaps people who have particular interest you are welcome to surf around find red builds look at the problems Look at the histories We have the architects of those tests here in the room and the many of the developers on these applications here in the room as well So with that I'm going to take a short break Let people get up get a beer move around Ask any questions off the side if you want surf the cloud B side if you want And when we come back in say 10 minutes people online Okay We'll take a very short break five minutes and We'll come back and we'll talk about detecting actual change in the application And we'll talk about detecting actual bugs in real applications over the last 60 days. So in just a few minutes So for those following online, we'll now there's no way to pause the video as far as I know So we'll turn down the volume and in five minutes sharp. We will start again. Thank you