 Got a response, hi everyone, this is awesome, okay. Thanks for having me here, I'm gonna get started. So today I'm talking about implementing a visual CSS testing framework, and we're gonna be using automatic screenshot comparison to catch style repressions. My name is Jessica, just to introduce myself, and I'm Jessica on most of the internet. And I work at a company called Bugsnag over in San Francisco, and Bugsnag is an exception monitoring tool, and I'm a software engineer there, working primarily in Ruby and JavaScript, but our stack includes lots of languages. We also have error notifiers for lots of languages and frameworks, so we have support for things like .NET, over to Unity, over to Objective-C and Angular, so that people can monitor their errors and crashes from all their different applications all in the same place. And we are also currently hiring, so please get in touch with me either with me or at our booth, we have a booth here at RailsConf. You're interested in working at developer tools at a pretty small company. And also at the booth we have mugs and stickers, so if you like those sorts of things, you can also get those. I also wanted to let you know that I have a written version of this talk because I might talk fast, or some of the slides might go by quickly, so if you're interested in seeing a written version, you can go to the Bugsnag's blog and you can find it there. So back to the whole implementing a visual CSS testing framework thing. What am I even talking about? Well, as we all know, writing, reading, code reviewing CSS, all of that can be really intense and even more intense to refactor. Generally, this is what my face looks like when I'm working with CSS. And at Bugsnag, we decided we would take on a huge multi-week project. That would be an entire organizational and code cell refactor. So we wanted a way to test that all of our site looked the same despite changing all of the code. And unfortunately, as you can tell, that didn't always work out for us. We went through many iterations of refactoring and we realized we needed a tool to help us test the pages automatically. Otherwise, our process looked a lot like, oh, have you visited all the pages? Have you visited that? Have you clicked on that? What about that border? And that didn't really work out for us because that was wasting a lot of developer time. So we needed a better way. So we went on the hunt for a way to test our CSS. We wanted to know if there was a tool already built that did what we wanted. And we didn't exactly know what it is that we wanted yet. So it was kind of an interesting journey. We dug up on the internet and we found quite a few libraries. It definitely took a bit of digging to compile a list like this, but there were definitely quite a few libraries available doing something similar to this. And we decided from the list of what we found online, we'd try a few and see what happened. So one of the first frameworks we stumbled upon was one of Facebook's open source libraries, Huxley. In Huxley's Read Me, it says it watches you browse, it takes screenshots, and it tells you when they change, which sounds amazing. That sounds like something we might be looking for. But I had noticed that it hadn't been updated in over a year. And I was like, okay, that's not promising, but maybe it's completely done. Maybe there's no bugs in it. And it's awesome. So no, that's not what happened. After a good solid day of fiddling around with it and trying to implement it, it did work sometimes, but it ended up being a little bit too buggy for us. And in something where we're trying to find bugs, we didn't want a whole lot of more bugs happening. So it would have random failures. It randomly wouldn't take screenshots. And we started realizing this is not the tool we're looking for. So we moved on. So we tried another library like Quijote. This would let you make assertions about your pages elements and their relationships and how they're actually styled in the browser. So that sounded pretty good to us. Let's give that a shot. But I went to the example code to see how it actually worked in action. And I quickly decided I didn't want to use a library where I'd have to be manually checking how many pixels away something is in order to test it. Like on line 11 over here, I don't want to check if something's 10 pixels away because stuff changes. Like designs iterate quickly. We don't want to be checking manually all of those different heights. So I wanted a framework that was gonna be smarter than that. It didn't need to understand all the different heights. So this didn't work. We tried a few more after this and we kind of had the same failures going on. Like, oh, this one is checking it manually. This one, the screenshots aren't quite working. So we couldn't quite find what we wanted in this list. But like I said, there are a lot of frameworks. So before you go off and run home and you're like, oh, I want to write my own. Maybe you should check through these frameworks and see if they might do something that you're looking for. But I decided I needed to figure out what I wanted. So I started really thinking about it. What would fit the way that Bugsnag has built best? I decided I wanted it to be visual. So to take screenshots, not sit there and manually measure everything. And we decided we wanted a way to somehow in our app take two points. Like, say we had production on the left and we wanted to compare it with something that we have locally, like on the right, our testing. So thinking about this in Git, let's say we had a feature brand that we just pushed to commit to. And we wanted to take a look at what our homepage looks like on that branch, versus how our homepage looks on production or what's currently running in master. So with those two different points in time, we'd want to spot those differences that are happening. Like, see, we have the normal header on the left and in our master branch and on the testing header on the right, on our feature branch. And with those, we'd want a way to produce some sort of highlighted area where those differences are automatically. So at Bugsnag, our web dashboard is written in Rails. So this mixed with the fact I wanted the test to take screenshots really affected my decisions in wanting to write my own framework. A lot of the existing CSS testing frameworks were using JavaScript, so a lot of them were node-based and stuff. But I wanted to take advantage of the fact I was using Rails, like DHH. So at Bugsnag, we also use Git for our source control and we use it the feature branch way. So this is kind of important in case you might not use it this way. But what that means for us is we have our master branch, which is always deployable in production ready, but that's not always the case. Generally, it's always deployable or production ready. And whenever we want to create a feature, what we'll do is we'll branch off of master until that feature is ready and then we'll merge it in. So considering the tools we had at our disposal and after taking a look at some of those screenshotting libraries I showed you, I realized that there wasn't actually that much code to a lot of them. So as one does, I decided I'm just gonna write one myself, which always goes well. But just as a disclaimer, this talk is not about like, here's my cool gem, you should use it. This talk is walking you through the process I went through to create this framework. And in fact, this isn't actually a gem or even open source so I'm sorry, but there's a blog post if you're interested in the code. Okay, so I first decided that I should come up with the process of how I wanted my test to work. So that I had an idea of what I wanted to accomplish when I was building this tool. So number one, I needed a way to somehow automatically visit the pages of our site. So the test would, in an actual browser, hit each page of our site on a local server. Once the test visited the page, I wanted the test to take a screenshot of that page. So the important bit for the screenshot is that I wanted to take a screenshot of the entire page, not just our current viewport in our browser. So for example, if a change happened below the fold of the site, like we changed the footer or removed the footer, we would need to have that full page, otherwise we wouldn't be able to capture a diff down there. So next, I need somewhere to store all these screenshots I'm taking, and I need a way to upload and download these screenshots from that storage area. So using Git, every time I made a push to a branch, I'd upload a screenshot of the current stage of each page, including our master branch. And if I had a screenshot already uploaded to our storage area from our master branch, I need a way to upload my current futures branch to that storage area, and download my already uploaded master screenshot from there. And after uploading my screenshots, I'd then need a way to make a diff of my screenshots. So I'd want a diff between a screenshot I took on master, downloaded from our storage area, and the newest screenshot I took on my future branch. So if I have the previous screenshot I took, which would be the current commit on master, versus my branch's screenshot, I'd need a way to mark those differences between those visually and produce some sort of diff like we've seen before that we could view, showing the differences between the images. And finally, after I have diffs for my screenshots, I need a way to view these diffs. Even though I'll have a place to upload them, I need an accessible way for everyone on the project, so that they can view the diff screenshots depending on the commit. So we need some sort of page something like this in order to view the different branches and the commits. And when you would click through, it would show you all of the screenshots and the diffs. So we have a plan. Now we can start to try building our framework around these things that we need. So first off, we need a way to write some tests that will run automatically after each push. So we decided to use RSPEC. RSPEC, if you're not familiar, it's a tool, it's a testing tool for the Ruby programming language. And we already use RSPEC for our tests in our Rails app, so we decided to continue that trend. So we wanted to be able to write specs that looked something like this, where we would just be able to navigate to a local URL and save a screenshot of that page. We wanted to keep them simple like this, and right now not have like any assertions that would cause a path fail other than the test failing for some technical reason, like it didn't end up running. And we also wanted these tests to be separate from our main tests, so we needed to mark them out in some way. So luckily, RSPEC has that feature available, so we pulled out these visual specs into their own RSPEC tag that we called visual. That way, the specs wouldn't run with our main specs when we were running this week locally, unless we explicitly wanted them to. And this also made it so we could break out our specs on our CI. We wanted to break out our visual specs for a few reasons. Number one was our local build speed. If our local tests were bogged down by waiting for our visual specs, that would become a huge issue for us. And by having them broken out, we could iterate on our main app specs faster and be able to push more often. And number two is speed with our CI tests. So we wanted our main specs to still be fast on our CI so that we could merge non-visual pull requests without waiting for our visual specs to finish. CI, or continuous integration, if you're not familiar, is a way for us to automatically run our test suite when we push things to GitHub. And after our specs run on our CI, we're able to see if our build is passing or not on GitHub, which is pretty handy. And GitHub also has a feature where you can split up your build. So we can split out our visual specs from our main specs here so we can quickly see whether our visual build is passing or if it's our main specs. Without the builds being combined, so if we commit a non-visual thing and our main specs are passing, we can merge it. So at Bugsnag, we use BuildKite for our CI. BuildKite allows us to add steps to our tests. That way, we can run our main specs first, apart from our visual specs. And when they're separate, our visual specs don't slow down our main specs and we can merge our non-visual pull requests. So next, we needed a way to visit web pages and take screenshots with our spec tests. And for that, we decided to use Selenium. Selenium is a tool for automating browsers for testing purposes. We would need to use specifically their WebDriver API. And this would allow us to drive a browser natively on a local or a remote machine. And more specifically, this just provides an API between us and the browser. To use Selenium, we need to use a service like SoftLabs or BrowserStack. We need to use a service like this because we need access to an actual browser. Since we're using a CI, it doesn't just have browsers built in on the server. So we'd either have to set up our own virtual machines for that, or just use one of these services. So we ended up trying BrowserStack. So before our visual tests, we'd need to start up our proxy to BrowserStack in a forked-rail server. And then we'd make an instance of our Selenium WebDriver. And then, of course, after all these tests, we would terminate those services. And we also had to allow WebMock only in our visual tests. WebMock is a library for stubbing and setting expectations on HTTP requests in Ruby. So, and we use it to disable outside web requests during our tests. We needed to enable it in order to run real web requests so that we could use our local servers and upload our screenshots, but only on our visual specs. So to get our BrowserStack proxy running, we would just spawn a new BrowserStack process, and then we'd need to be able to terminate that process. And to get our rail server running, we'd spin up a new Rails process unless one was currently running, and terminate it the same way as our BrowserStack process. So to set up our Selenium WebDriver, we just have to pass it the capabilities we wanted. And like our Browser and our Browser version, which is Firefox and 31 in this example, and we'd also have to pass a URL to hit, which is pointed at BrowserStack, which would be our Browser. Setting up our Selenium Driver was easy, but when we were setting it up, we did learn some interesting things about taking screenshots with different browsers. So as a reminder, with our WebDriver, if we wanted to hit web pages in a real browser, and be able to take screenshots of the full page, not just the current viewport. And unfortunately, this feature, I can only get working in Firefox, which is not ideal. Since Internet Explorer and Chrome didn't work for me, we couldn't really transform this tool we were using as a refactoring tool. We couldn't use it for browser compatibility or anything like that. So although this wasn't ideal, this will work for our purposes right now as a refactoring tool. However, this was when I implemented it a few months ago, and I have heard of people that have gotten it working since then, so there is hope. We can change this. So after we set up our Selenium Driver and our Forked Rail server, we could then start saving our screenshots and our tests. So we'd use our Selenium WebDriver to navigate to our local host URL, and then we could save our screenshot after navigating successfully. When saving our screenshots, we need to set up a local screenshot directory so that we could have a nice, clean place to dump all the screenshots locally, and then we would only have one thing to remove once we were done successfully uploading those screenshots. And then after we have our folder, we need to use our driver to save the screenshot, and there we can just pass it the path we wanted it to save to along with some other things so that we can name it properly. So after our writing tests for static pages, such as our homepage, we quickly realized going into the dashboard that we'd have an issue with dynamic data, like those would just produce diffs all over the place. So with dynamic data, you can get false positive diffs because data can change between the viewing times. So to come at this, we set up fixture data for our RSpec tests and manually adjusted any other data not covered by fixtures using Selenium's JavaScript support so that we don't get a false positive diff. So now we have to have our test taking screenshots. This is the fun part. So we need to figure out a way to make a diff between two of our screenshots and ImageMagic worked perfectly for this. Despite having literally one of the worst sites I've ever seen, ImageMagic did exactly what we needed. So ImageMagic is a tool to convert, edit, and compose images. So ImageMagic has this command line tool and one of them is compare. And that when using compare with various options enabled, it allowed us to shout out and produce diff screenshots based off of two other screenshots. So for example, when we'd make a simple change like to the header, ImageMagic would spot those differences for us and provide this diff you've seen. So that's from ImageMagic. Thanks, ImageMagic. So ImageMagic has a lot of options you can pass to a compare tool and we do take advantage of a few of these options in order to make this work. So let's go back and go through each one of these options and I can explain to you how we're using them. So ImageMagic's compare tool from their website. It will mathematically and visually annotate the differences between an image and its reconstruction. Or in my terms, it'll make a diff. So compare lets you provide a metric that outputs the standard error, measure of the differences between two images according to a type given metric. So here we're using PAE, where PAE stands for peak absolute. And we can use the peak absolute to find the size of the fuzz factor needed to make all the pixels similar. So if we had screenshot one and screenshot two that were pretty different, it would end up producing a diff like this. And our peak absolute measurement would be outputted as you're needing a huge fuzz factor to make all the pixels similar or to be exactly the same. The fuzz factor can be important in case we want to ignore pixels which only changed by a small amount. We might actually want to be using this in case of false positives. Like for example, sometimes gradients render slightly differently depending on two different images depending on the browser. So you might not want to count that as an issue. We don't use this output right now, but it would be important if you wanted to make an assertion in your tests meaningful. Like actually have a failure if a diff was produced by a certain amount. But we didn't end up doing that because it doesn't necessarily mean something is wrong if a diff is produced. So on to the next issue, which was a few times when we were running our specs, we noticed the diffs weren't even being produced and we were trying to figure out why. And we took a look at the screenshots and realized that they were two different heights or sizes for some reason, like if we made a change to accidentally remove the footer. And image magic wouldn't let us do a default compare on those images. So we had to use a sub-image search. Sub-image searching is required to have compare search for the best match location of a small image within a larger image. This option will produce two images or two frames as image magic calls them. And the first is the difference image, which is the image we're interested in. And the second image would be the match score image. So the match score image is a smaller image containing a pixel for every possible position of the top left corner of the given sub-image. The search will try to compare the sub-image at every possible location in the larger image. And because of this, sub-image searching can be very slow. But as you could guess, the smaller the sub-image is, the faster the search is. That being said, this option doesn't actually take effect unless you have two images that are different sizes. And this doesn't happen that often to us. We're not normally chopping our screens in half. So that's fine. So the amount it slows down our visual specs on the CI is not meaningful. Especially since those specs aren't tied to our main specs. So another fun thing we ran into was sometimes our screenshots were completely different and they were completely different. But this is only for testing, that's not on our website. And image magic was not having that. In fact, it wouldn't even give us a diff because the images were so different. So we found an option called the dissimilarity threshold. This threshold determined how different two images could be in order to diff them. And it defaulted to 0.2 or 20%. So we upset to one, the images could be completely different. But generally you don't need to do that because your things aren't gonna be changing by so much. But this was just for testing our tool. The only caveat to this, as you might have guessed, is doing diffs on completely different images can slow down your tests by a lot. So like our previous issue with sub-image searching, this doesn't happen to us much, if at all. And since the tests are separate from our main specs, it's not a huge issue. So the last three arguments aren't exciting. That's just where our current screenshot is, where our master is, and where we want to save the diff to. So okay, we're done with image magic, cool. So now that we have our screenshots and diffs, we needed somewhere to throw the screenshots online and we'd need to be able to grab them back out with our Rails app. So we decided to use AWS. AWS, or Amazon Web Services, offers cloud storage and has Ruby API. So we'll store and retrieve our screenshots from one of their buckets. We need to set up our bucket via the Ruby API using our access key ID and our secret access key. And we made a bucket called bugsnext shots where we'd be hosting our screenshots. So then in our specs, we would call save shot. And that would take care of setting up our screenshots directory, taking screenshots, and then uploading those screenshots to AWS. So inside of save shot, after we successfully got our screenshots inside of our local screenshot folder, we'd call our save shots to AWS method. Our save shots to AWS method would be responsible for getting our current screenshot, our master screenshot, and our diff screenshot up to AWS. It would also download our master screenshot in order to produce a diff, so that we could then upload that diff to our bucket. So first, we'd find the correct area of our AWS bucket and we'd upload our current screenshot to AWS. After that, we download the master screenshot that we need to produce the diff with. And we'd find that screenshot using our current shot, our current get shot. And, oh boy, hold please. Java wants to update. I don't, I don't think I'm gonna do that right now, if that's okay. Okay, okay, hold on please. Okay, where was I? So, yeah, we wanna download the master screenshot that we need to produce the diff with. And we'd find that screenshot using our current get shot. And we'd pull the screenshot down into a local temporary folder we made for this. Then we'd check if we had our master screenshot. And if we did, we'd do our fun image magic compare that we talked about earlier in order to produce our diff image. And then we'd save it to the same local storage area. After that, we'd need to upload our master screenshot to the storage area. And then we'd need to upload our newly made diff to that storage area. And then we would have finished our whole round of saving our shots to AWS. Thanks. In our AWS bucket, we ended up using a naming pattern of commit shot, area of site, page name, and image type. So, for example, we could have a commit shot of A1, A1, A1A, where we'd be on the marketing part of our site, on the index page, uploading the diff for that page. The image types could be the current screenshot we took, the master screenshot we downloaded from AWS, and or the diff we made from the two screenshots. So, now that we have our images uploaded to AWS, we needed a way to view them, but viewing the screenshots from a bucket was far less than ideal. I don't know if you've done it, but you don't want your whole team going in there and trying to play with the screenshots. So we decided to set up our own custom viewing page in our admin dashboard. So we created a page that looks familiar because I showed it to you earlier that listed out our current branches with their last three commits. The controller action takes care of grabbing our remote branches and making sure they're in AWS. And then it formats our branch names correctly for the view. Then, in our view, we loop through and show all of our branches. And we also provide a way to prune our remote branches so we can keep this area clean without forcing a prune every time you load the page. And pruning our origin would be as simple as running get remote prune origin. So then when you would click through a shot on our view, we would see by area all of your screenshots in diff. And the controller action for our show page just grabs the areas and pages based on our current shot out of our bucket so that we can display them in the view. And in our view, we iterate through each area and each page in order to show our diff image as well as the current and our master images for clarity. So cool, we made it. Our tool is done, but it's not perfect right now. I think there are some improvements we can make in the future. So for example, right now, all of our tests pass whether or not there's a diff and they'll only fail if there's an issue executing the test. So it could be interesting to make our assertions mean something like fail if there's a diff, but we need to think more about that because like I said, a diff doesn't necessarily mean a failure to us. Another thing we want to consider is accounting for 0% diffs. So maybe we shouldn't upload a diff image at all if there's no diff between the images. This could save us space on our AWS bucket as well as speed up our tests because we're trying to upload fewer screenshots and it would make our admin dashboard less messy. We also think it could be nice to automatically link these to a GitHub pull request. So when a diff is created, maybe automatically attach it onto its relevant pull request, but this one's a little tricky because maybe we don't want to create that much noise every time we push, but it's an idea. And another thing is that we currently only diff between our current commit on a branch versus our current most recent commit on master. So when we push a new commit to our branch, it'll diff versus the last thing that was pushed to master. And this way we know what's changed between the feature that I'm working on and what's currently running in production. It would be nice if we could diff on master with the current most recent commit versus its previous commit. So that way, in case you push a visual change directly to master, you can still see a diff of it. It would also be nice to see a diff between the previous commit on the current branch as well. So if you push a commit to your branch that changes some stuff visually, you might want to see a diff between those two regardless of what's happening on master. I would also really love to get this hooked up to more browsers. That would enable us to make this into a backwards compatibility tool as well as automatic browser comparison so that we can make sure things aren't messed up in Internet Explorer. Anyway, that's all I have. So oops, I shouldn't punch my mic. Feel free to find me online or during the conference if you wanna talk more about the stuff or just wanted to say hi. And I also wanted to mention that I'm working on a book with Just Enough Media about a command line tutorial series that I wrote in a blog post series a long time ago. So it's in its very early stages right now but feel free to visit that page which is leanpub.com slash Just Enough Unix Command Line and subscribe for announcements. And anyway, thanks for having me. Appreciate it. Anybody have any questions? Okay, so his question was in my current tool, am I telling the tool where to go for each page or is it just traversing the site automatically? Right now I'm telling it which page to go on and also we're using Selenium to click around in case you wanna check modals, different states. So there's no way to automatically traverse for us right now but I'm sure it's something that you could end up implementing but we do find it very valuable to be able to click around and change states in order to capture those. His question is am I getting back any metrics? Image magic gives us back some metrics but we aren't capturing any of that right now but it would be very easy to capture it because it's just outputting the standard error so we could grab that and implement it somewhere but currently we're not. Ah yes, so his question is did we explore committing our images so that we could use get in order to check diffs? Up until very recently, get was generally not recommended for images because what happens is every time you push the repo gets exponentially larger I believe because it's continually saving all those images so our repo would get quite large. However, GitHub just released get-lead-ya I believe that takes care of that problem but we haven't looked into that yet so that would be an interesting idea though. Okay, thank you. We'll see you in the next video.