 This is the continuous visual integration talk. Thank you for being here. We got to go, go, go. We have a lot of things to get through. So just a little bit about me to start. My name is Mike Fotinakis. I'm currently the founder of Percy, Percy.io, which is a tool for visual testing. So I'm really excited to share with you some of the things I've learned over the last year about how to test apps pixel by pixel. I'm also the author of two Ruby gems, JSON API serializers and Swagger blocks. If you use either of those, I'd love to talk to you after, or if you have any questions. OK, so let's jump right in. So this will come in like three parts, the problem, a general solution, and kind of how it works in architectures and methodologies, and all the problems that come along with that. So let's start with the problem. So the problem is basically that unit testing itself is kind of a solved problem. We have a lot of different strategies and techniques and technologies for testing the data of our systems, for the behavior of our systems, for the functionality of our systems, and the integration of our systems with other systems, and end-to-end testing our systems, and smoke testing our deployments. And we have a lot of tools and technologies for this, right? But how do you test something like this? So I guess the color of the text has become the color of the button, or the button is now, or the text is now, you know, has zero opacity, or something's happened, right? And this was fixed by an issue. Or another example, here's a 404 page of an app I used to work on. This is just what it's supposed to look like. It's pretty simple, pretty straightforward. We launched a feature, and then four weeks later, we were told that our 404 page looked like this. Right, you've all seen this, right? And of course, nobody caught this in QA because no one QA is the 404 page. And, you know, this was a simple change. Somebody could just like move to CSS file. Everything else worked, but the 404 page was the one that was broken. So then it got fixed, and then the fix looked like this. So, you know, the CTA is totally covered up, and it didn't QA the fix on mobile. So, you know, you're still continuing to fix. And then I went and pulled slides for this a while back and looked at the 404 page, and it was broken again, right? So I reported to this to my old team. So this, in the business, this is what we call a regression, right? And specifically, this is a type of visual regression, right? So, how do product teams fix this today? Shout out the answers. Hire more people, okay. How do you fix these kinds of problems? What? Interns. Interns, okay. What are the interns doing? They're clicking a lot. They're what? They're clicking around a lot, right? What's that called? Behavioral? Exploring. I'm looking for a specific word here. QA, thank you, QA. So QA is the big one, right? So, and this can be developer QA, this can be you doing QA on your apps, this can be you have QA engineers, right? QA can mean many things, but part of the job of this is to find these kinds of things before they hit production, right? Or, you know, that you get issues from your customers and you fix them after. But QA is very necessary, but it's also very slow and manual and complicated, right? And it's also pretty impossible to catch everything, right? Even in like a medium-sized app with just tens of models, you can have hundreds of flows and thousands of possible page states and permutations and constant feature churn, right? There's a lot of development that's happening in these apps and you can't catch all of this stuff all the time. So it's also very expensive, right? QA, you're spending manual, human, often engineering hours paying for fixing these kinds of visual regressions. So let's go back to this button problem and let's, you know, my standard fix to this would be like, can I write a regression test for this? Right, I'm a big TDD person, I love testing, I write tests for basically everything. So like, let's go try to write a test for this, right? So here's like an RSpec feature test that tests this part of the app, right? It does simple things like it visits the home page and then it fills in some text box with a title and it clicks a button, right? And then you expect that the page has new content on the page. So there's a problem here, right? Like this test didn't fail, the button still technically works, it's just visually wrong, right? And this manifests in tons of different ways. So what am I supposed to do here? Am I supposed to assert that some like CSS computed style of the color of the thing or maybe that it has a CSS class applied but that's not really testing the right thing so I'm just not gonna do this, right? And no one's gonna do this because no one wants to write a test that's this like fragile and inflexible, right? Especially in a developing product. So my normal approach is very useless here. So the problem fundamentally is that pixels are changing, right? But we're often only testing what's underneath, we're testing all of our abstractions on top of those pixels. But this is an important problem because the pixels are the things that your users are actually touching and seeing and interacting with all the time. And to go further than that, even with all of our current testing strategies and methodologies, we still lack confidence in deploys, right? You can have a million unit tests for all the different data changes in the world but if you move a CSS file or change your CSS you're gonna have to go look at it, right? You're gonna have to go check that and test it. So let's move on to the solution to this problem. And I don't like to say that this is the solution, I like to frame this as a solution. This is not the be all end all of all testing strategies that will make your life perfect but it's sort of a new tool in the toolbox. So the question I like framing is what if we could see every pixel changed in any UI state in every PR that we make, right? So that basically is like, what could we do if we could test our apps pixel by pixel? So in order to do that, I'm gonna introduce a new concept. You may or may not be familiar with these. And they're called perceptual disks, they're called PDIS, they're called visual disks. This has been pioneered many times. Brett Slatkin at Google has done quite a bit of work on this on the Google consumer surveys team. You should watch his talk, it's about how he accidentally launched a pink dancing pony to production and then they ended up having to do this style of testing in order to prevent that from happening. So what is a perceptual diff? A perceptual diff is relatively straightforward, right? Given two images, what's the difference between these two images, right? Like compute the delta between these two images and that can be this, right? So all the red pixels are the pixels that have changed between these two images without any context about what the image is about, right? So you can compute this basically for any kind of image. So how do we compute these, right? Let's try another example. So shout out the differences in these two side by side and then we'll show the PDF and see if you're right. Background color on the top. Lost the link. Capital on and thumbnails. Danger button's gone, right? You got all of them. So this is the PDF, right? And you can immediately see all of the changes in that image without having to sort of sift through it, right? All these pixels that have changed, these are the things that have changed on this page. So PDFs in 30 seconds. Let's go like do a PDF. PDFs are pretty straightforward, right? Okay, so I have these two images, just new and old, right? So let's open new and old. Okay, so here are the two images, right? And this is just from the skeleton demo site. So you can see there's some differences in them, but let's go like make a PDF and see what that actually is. So I have image magic, the library installed, and I can just use the image magic compare tool and compare old and new and I'll store the image in diff.png and I'll open diff.png. So cool, we have our first PDF, right? Like those are all the pixels that have changed and by default it applies the images underneath and makes it translucent and you can turn those things off, you can pass a bunch of different flags to this command to like fuzz factor if you don't care that pixels have changed within a certain amount of colors or those kinds of things, right? So computing PDFs themselves are actually relatively straightforward. So here's a couple of PDFs in real life, right? So if you try to figure out the difference between these two, it might take you a second, but the difference in this PDF, you can kind of immediately see that the, do you agree to the terms of view section of this page is gone, right? It just no longer exists. And I kind of love this because this is a test for an error condition but it's basically like a back end change manifesting as a front end failure, right? This is a rails form object that somehow has gotten into a weird state that is manifesting as this sort of front end failure and you might have a test for this but this form probably doesn't submit now, right? You probably can't actually submit this form. So here's another example. Here's like a normal visual changes to a visual change you might want. Like a new person got added to this page so the visual diff is, okay, a bunch of things shifted around and like got reflowed and you can sort of like go back and forth and be like, okay, I understand that this page has a new thing added to it. So you sort of have to learn how to like read PDFs because they can be a little bit noisy, right? So for example, this one, they look the same but in the footer, and you probably can't read that but it says like if type of jQuery, not equal undefined slash something. So this one was somebody added a gem which happens to inject some scripts into the page and the gem was in a broken state, right? So all of their tests are probably passing, all of the everything else is passing but their footer has some junk in it, right? And you often can't catch these kinds of things without visual tests or looking at it. Here's just a PDF art, like I found this in some diffs that I've done and an image got shifted over just perfectly to create this like nice PDF art. Totally useless but kind of cool. And also a pretty strong signal in PDFs is if there are zero pixels changed, that's really important for you, right? Like in a classic refactor of your app, in a pure refactor, you're not changing anything that somebody's interacting with. All the plumbing's shifting around, you're changing architectures, you're upgrading something but the actual thing that people are touching or the API that you're touching is not actually changing, right? So having a zero pixel change PDF can be a really strong signal because you get visibility into knowing that nothing has changed in this page, right? I can safely upgrade this thing because everything is remaining the same. And as your app gets bigger and bigger, you wanna be able to do those kinds of refactors for your code health, right? So let's go write a visual regression testing system in two minutes. Ready, go. Okay, so I have this app. This is Giffindor, which is if you went to Brandon Hayes' Talk at RailsConf two years ago. This is his app. And Brandon, I don't know if you're in the room but you probably didn't expect that anybody was gonna go back and write tests for your demo app from two years ago but we're gonna do it. So here we go. So here are some feature specs I've written for this app. And they do simple things, like you visit a page, you expect that it has some content, you click a dialogue, you expect that the new thing is up. This app has just basic behaviors. It's just a stream of posts, right? And you can upload GIFs. And you can do simple things like you click submit a GIF and it does a jQuery animation that pulls that down and you can type stuff and there's a validation state, like a bunch of things that we all do all the time, right? So these tests for this are relatively straightforward. So let's just go like save a screenshot at the end of this. All I'm doing here is using the Capybara screenshot, save screenshots capabilities. And this works with basically every web driver that you have except for rack tests. But most web drivers support this. So let's save that and let's go run the tests. R is just my bash alias for bundle exec Rspec. So don't let that throw you. And you should all have that, by the way, because it's really, because you type that all the time. So wait, great, we've run the tests. Let's see, there's a change here, right? You can open old.png. Great, so we have a screenshot of like what our test looked like in that state, right? Or what our app looked like in that test state, right? And this is kind of what I call a complex UI state, right? You've clicked a button and some jQuery animation is fired in order to open up that top dialog. This is not just like a static page that you visited, right? But you'll also notice it doesn't quite look exactly like the page we were looking at, right? This border image is all messed up and there's some other things going on here. So we'll talk a little bit about that later, like why that's actually not totally the same. Okay, so great, so we've saved our old image. Let's change it to new for the new one. Now let's go change the background color of this app. So here's the CSS, let's just change the background color by one pixel, right? And we'll make sure that this other one is saved. We'll go run our tests. Great, so we have an old and a new. Great, let's compare them and store it in diff.png. Open up diff.png. Cool, here's a pdiff, right? Like all the background pixels of this page have changed. And you might think of this as just noise, right? But why would anybody care about a background color that you can't see? But I guarantee that there's a designer in this room who actually would probably want to know if this changes, right, and they want to guarantee that there's a consistent color palette being used and that we developers aren't sort of arbitrarily changing the background shades when we think that that's a new color that we should use, right? There needs to be some consistency there. So I kind of don't discount these kinds of changes as just because you can't see them at the eye that doesn't mean that they're not important. So great, right? Awesome, let's all do this. So simple uses here are catching visual regressions, right? That's the kind of obvious one. But then if you start thinking about this more, there's a lot of advanced uses for this kind of stuff. Like CSS refactors and deletions is a big one, right? You're all terrified to delete CSS. Yes or no, right? Because it's scary, you don't know where that CSS is used. You don't know what legacy parts of your app are using that CSS. So what if you go add a visual diff test to a visual regression test to your top 50 pages, now go to delete your CSS and see what happens, right? And if you've deleted it and nothing changes on the pages you care about, great, you can probably delete that CSS. Testing style guides, especially testing living style guides is a pretty cool use of this. Safe dependency upgrades, so often your libraries, they're backwards compatible but they're adding new features. So you wanna be able to upgrade your libraries. But upgrading libraries and dependencies is also kind of scary sometimes. And you wanna be able to, especially if those libraries are providing front end dependencies of any sort, if they're providing JavaScript behaviors, if they're providing, if your style guide is in its own gem and you're importing that and you're upgrading style guide versions, like upgrading dependencies safely and having these kinds of visual checks can be really useful. Visual regression testing for emails is an awesome advanced use case I've seen. Testing D3 visualizations is something I've sort of started experimenting with recently because testing D3 is actually kind of hard, right? Like you can kind of test, I'm not like a D3 expert by any means, but you can sort of test like the data transformations you're making, like how your inputs transform to your outputs and sort of how you expect that D3 is going to be able to visualize that, but wouldn't it be nice to just be able to be like, this is what it looks like, I know that that's right. Oh look, it's changed, is it still right? That's kind of what you really wanna test with those visualizations. And then going further, what I really want here is a visual review process alongside code review. And we're gonna talk a little bit more about that. So if this was all so easy, why aren't we all doing this right now, right? And definitely somebody has said that if it wasn't easy or if it was easy, it wouldn't be hard. This is, it gets really complicated, right? There's a bunch of problems and I'm gonna sort of hand wave over a bunch of the problems, but I sort of bucket them in three different categories, tooling and workflows, performance, and non-deterministic rendering. So on the tooling front, it's kind of hard, right? There are some open source projects that do this right now. Phantom CSS is a great example of one, right? But it sort of presents all of your visual changes as a ton of individual test failures, right? And that's kind of a lot of information and a lot of failures for things that are, it just sort of, it confuses the line between like something being flaky and, or a change that you want it to be and like an actual test failure, right? Or for example, you probably shouldn't have to require that you're manually storing these baseline images like in your Git repo, right? Like that's a big workflow tooling process that most of us are probably just not going to do. That's a lot of work, right? The performance one, I think this is the big one across the spectrum of all the open source tools, all the proprietary tools, all the everything. This is the big one that probably prevents us from doing this right now. The examples I showed are somewhat contrived, right? They're pretty simple pages. But in the real world, I have some pages that are, when you render a full page screenshot of them, they're 30,000 pixels high, 40,000 pixels high, and that's not crazy, right? So rendering and screenshotting that kind of page and uploading it, storing it can take 15 seconds just to render it and another five to diff it. So if you have 100 of these tests that you want to do and they're all run serially, that's 30 minutes you're adding to your test suite, right? And none of us want to do that. Your feature specs, if you're writing feature specs already, they're already too slow, right? And they're already too flaky. So that's a hard one and I think the performance is actually the biggest problem here. And then non-deterministic rendering, which we'll talk about. So I'm sort of hand waving over the other problems if you want to talk about this more, I would love to. So on the non-deterministic rendering front, simply like there's a bunch of things that change in browsers, right? We're not just doing static pages. So animations is the big obvious one, right? So take this like pure CSS animation, right? If you visually diff this a bunch of times, what diff are you going to get? You know, you might get this diff, you might get this diff, you might get this diff, right? These aren't useful to you, they're just kind of noise. So for example, in Percy, what we do to do this is we actually freeze animations by injecting this particular CSS style into the page that tries to stop all of these animations from happening so that you can just say nothing has changed, right? And if you want to know more about that, I have a post on blog.percy.io about how we actually do that. Or another one, dynamic data is a big problem, right? If you have anything on your page that changes in your tests, especially, you're going to see a visual diff from those. So like a date picker is a good example, right? And you can sort of fix these with like fixture data instead of faker data, you can sort of move in a direction where you're having more static deterministic things that you're using in tests, which I think is a relatively good fix. But this is still a big problem and I have some ideas about how to address this kind of thing. So old test browsers, so like we talked about before, this, what you see on the right here is what was rendered by Capybara WebKit and what you see on the left is what's rendered by Firefox, right? And these are not the same thing. There's like, not the border image doesn't work and the web font here is not a web font in this one and the problem with this is that often the browsers, the old test browsers that we're using underneath are not really modern in any fashion, right? Like Capybara WebKit is an old fork of WebKit that doesn't support these things. If you phantom.js all the way up until the new like 2.0 version doesn't support these things too. It was a fork of Chrome 14 from five years ago, right? It doesn't render the modern web. It also has 1700 open GitHub issues that are like basically untree-aged, so go for it. So that's a really hard problem, right? And then some other problems like you can't really control for is this sort of like sub-pixel anti-aliasing problems. The way that text is represented on a page is not totally deterministic, right? These things might shift by one pixel. GPUs don't actually guarantee in some way is that like floating point operations will always come out to be the same thing, right? So if you have a gradient that's rendered on one machine and you try to render that same gradient on another machine, they may not be pixel perfect. They probably won't be. If you compile some code with different optimization flags, GPU floating point operations will be non-deterministic, right? So we look at pages as if they are the same all the time, but actually getting them to be pixel perfect is a big problem. Some tools attempt to solve this with some sort of like open CV computer vision researchy things where you try to say like, oh, is this a button? Has the button moved? And you sort of try to derive the page back from the image, right? So that's, that's hard. So PDIFs are only half the battle here, right? So back to our main goal, like what if we could see every pixel changed in any UI state in every PR, right? And this is really what I think is the difference between like visual regression testing sometimes and what I frame as continuous visual integration, right? In the same way that like your unit tests are not the entire thing you're doing to test your system and you need processes to like be doing continuous integration. You need to be merging changes with all of your other developers all the time. You need to be testing them instantly in CI as fast and as parallelization, you know, as parallelized as possible. There's a difference between like doing visual regression testing sometimes and continuous visual integration. And these are sort of the big problem spaces that create that. So that would require, being able to do this would require that things are really fast, right? There's basically as fast as your test suite. You can, you need to be able to handle complex UI states. You can't just test a static page. We're not just here to just like look at all of our static pages. We need to be able to test components in all the different component states. You need, and it needs to be continuously integrated into your workflow on basically every commit, right? In my mind, this can't be saved until you're either in production and even staging is like a little bit too late for me, right? Like I want this to happen basically all the time. So I'm gonna talk a little bit in the last part of this talk about how we sort of architected Percy to try to address these problems. So here's like, here's how Percy integrates into like an RSpec feature spec. It's basically the same thing that we created, right? You have a feature spec, it visits some page, it does some action on the page. And then what you do is you just drop in, you know, Percy Capybara snapshot the page. Given a name, say this is the home page, right? So what's that actually doing underneath, right? When these things get pushed up to Percy, like are we pushing up images? And I say that with question marks because those will come along with all the problems that we noticed before, right? So we don't wanna do that. So what we actually do here is we push up DOM snapshots. And if you think about this, like it makes a lot of sense because the most efficient, the most like lossless encoding format for a website is not an image of the website or a video of the website, it's the website, right? It's your assets. It's your DOM state that you've created. So we actually push up the DOM and HTML snapshots and technically we push up SHA-256 fingerprinted versions of those assets. So we actually never upload things twice. So the first thing might, the first run might be slow, but then after that it basically, my goal is basically to say like, you know, zero time is spent in your test suite after the first run, but it's not totally true. So then we do a bunch of like hand wavy magic underneath that to actually say like, you know, we push that stuff into storage, we can talk to GitHub and commit statuses, we can coordinate work with this like Percy Hub and actually this is the big part that actually addresses most of the performance issues is we can parallelize this, right? So you've pushed us up a bunch of DOM snapshots as fast as your test suite can go and what we actually do underneath is we run them as fast as your concurrency limit allows. So we can actually totally out of band of your test suite be parallelizing and running and rendering these DOM snapshots in a deterministic rendering environment and then be able to like, you know, show those to you in a nice way. So this was the sort of like main innovation that helps I think this thing come to fruition. So as of yesterday, actually I wanted to like talk about this, we have hit a million visual disrespected in Percy as of yesterday, so I was really like proud of that milestone. So here's a couple of quick Percy examples. I talked to some of our customers and got permission to show you just a couple of pages to see just like what Percy the product looks like and how I've sort of been trying to address this problem. So here's Charity Waters build CharityWater.org and they have some amazing, they're very design-centric team, they're big, big Rails apps. They've pushed 162 total snapshots on every build basically and this particular build which is called footer updating the new footer markup had 96 visual diffs, right? And you can sort of like go through each one of these pages and just be like, oh look, look at all these footer changes. And then this is the diff and I can click that and say like, oh great. So I noticed that this footer is different on all of these pages. And this is a lot to go through, right? So I just recently added this like overview mode where I can just like see all of my pages all at once, right, and be like, okay, and just confirm, like really quickly do a visual review and just confirm that all of these changes are the ones that I want, right? These are the visual changes that I've, that I have actually like, you know, we want to make as part of this PR. So here's another example. So this page is basically like we're updating, you know, the PR is new press page and we're trying to like update our new press page, right? And this one is just the first iteration of that PR where like they've removed some CSS styles and like, oh look, this page is totally break and it is totally broken, right? And they would never want to like launch this page but it gives them this sort of like iterative review process where they can go here and they can say, oh look, this is what our page looks like currently in this PR. And then also the important signal of none of the other pages have changed, have changed based on this CSS change, right? And then you can go through and you can sort of see like, what are the other pages here in this app? So on the like workflow and tooling problem, so this is the last thing I'll show you. So we just provide this as like a CI service, right? We just like as your tests run, they actually push information up to Percy and then Percy marks this PR with another CI service right here. Percy, visual diffs found, right? And we can just click details, jump right to the page and be like, oh look, this is the state of this PR, right? This is that background changes that we made. And I can go through and I can decide like, yes, this is the right visual change that was intentional for this PR. Let's go ahead and mark it. So I'll do a little, this is the one HTTP request that my demo or that my talk requires. So let's hope that this works. So okay, so like I go here and I'm like, great, this is, I'm doing a visual review right from GitHub. I looked at these things. This is what I want. Click approve. And then GitHub will mark that status as green, right? So now this sort of gives you like a lightweight visual review process for all of the different UI states of your app and at the PR level, right? And not like at some later stage. So yeah, that's basically what this Dom Snapshotting mechanism has helped us sort of like tackle a bunch of those different problems. So that's it. I just want you to take away from this talk that like visual testing is possible. It's a thing, we should be doing it. It's a new way to think about testing and it can help give you deployment confidence, right? I think of this as like the last stage of the CD pipeline where you just need to like in your acceptance phase you need to make sure that all of this stuff is looking correct, right? And you need to be able to approve it. And that this is still a very manual step but we can probably automate quite a bit of this. And then also there's like a lot more work to be done to make this a mainstream engineering practice. One last thing. So because of this Dom Snapshotting model, I'm able to, I just want to give you a sneak peek of something I'm working on over the next couple of months. I want to be able to do this for Ember regression tests for Ember tests. So if you are an Ember user, I would love to talk to you. Just email me micatpercy.io and let's talk about like getting you as a beta tester of this in Ember tests because I actually think that this is probably the world where this makes the most sense, right? Like not everyone is writing Rails regression tests or Rails feature specs in a lot of ways because they're really hard to write in sometimes but we are writing a lot of JavaScript tests nowadays, right? And as we sort of further separate our worlds to like this is just an API back in and this is like a single page app front end and those lines become clearer and clearer. We're gonna have a lot more of these tests and so to be able to get this kind of power all we need to do is be able to send up those Dom Snapshots and render them. So if you're interested in that, please let me know and I can like, I'd love to get your hands on the beta. So thanks so much. Yeah, the question is like, what is the baseline? Like how is the baseline created? So basically I think you can do that a bunch of different ways. I usually just pick like master, like whatever master has last created that is our baseline, right? And then we provide a mechanism and personally you can say like I want a more manual version where I actually like approve a master build and that becomes the baseline. So I think it's you gotta have kind of both but yeah basically I think if you're really doing like master is always green and always deployable then you should always be testing against master. Yeah so we don't right now, right? I think the question was do you do cross browser testing? So I think that that's a would be a great evolution of this kind of testing, right? It's doing more cross browser testing but it comes with all of those problems I mentioned too and you'd be surprised to learn that most browsers don't provide a full page screenshot API and Firefox is the main one. So I think that you can get like 90% of the benefits of visual testing with like one good modern browser but then that would be a great evolution of this kind of idea would be to do cross browser testing. Yeah that's a good question. The question is like what tech stack are we using? It's all custom built on Google Cloud Platform. I've like dockerized all of the environments. It's basically a Rails API, a full Ember strictly Ember front end and the workers like run XVFB which is a virtual frame buffer. It's all on Linux. It runs Firefox. Yeah it's running Firefox. Oh so you're asking about Percy access control like who can access that Percy page? Right now I just tie it to like GitHub auth so it's if you can see the repo in GitHub if you have access like team collaborator access to the GitHub repo you can see it in Percy and anybody who can see that can hit approve. Yeah I haven't built any like complex like role authentication kind of things yet so. Yeah I totally missed that part so let me just do that quickly for the people who remain. Okay so part of this thing is we have all of these different like screenshots at a particular width right? But we have the original DOM of these so we can just resize the browser to a smaller width and actually show it. So here's like responsive testing. So here they have a 320 PX version of this so now I can see the footer change in all of those different ones right? And I can like fold with this and like this is what this page looks like you know quote on mobile basically just like at this breakpoint size right? So the DOM snapshotting model also takes care of that in that you can just like render it at different widths. This is not testing on the actual device right? But it is like you know giving you at least the responsive side of it. The question was can you disable the local test run and only have it on CI? That's actually the default behavior. And then I've had some people ask like I wanna disable it for only specifically this branch so there's an environment variable we provide called PercyEnable which you can set to zero or one and it will force that environment to be on or off. Cool, thanks so much.