 Hi, thank you. Thanks for having me. My name is Mike Potinakis. This is me. I'm the founder of Percy.io. This has been a sort of multi-year passion and endeavor of mine. And Percy is actually an Ember app underneath or on top. And so this is actually kind of fun to be able to show you what I've been building and that it's actually very applicable to our world. Only until recently, we haven't had actually an Ember integration. And that was the thing I was showing Nathan at EmberConf. So now we actually test Percy in Percy itself. We won't get that meta in this talk, but I hope you'll enjoy this. By the end of this, my goal is to basically have you be excited about visual testing and know that it exists. And we'll talk about a lot of the different fundamentals of it. We'll do a demo that might go horribly wrong. And then we'll actually get even deeper into what are all the problems in this space. So let's go for it. How many of you saw my blog post? Last time I did this, nobody raised their hands. Two people, yes. OK, so this is all going to be new to you. If you want to check this out, this is at blog.percy.io. That's the short version. You get the long version tonight. So this comes in about three parts. I break it down into the problem in general, a solution, not like the solution, and then a Percy demo. And it has some sub parts in there too. I also have a couple of stickers tonight. These are limited edition, only 200 in print ever in the history of Percy. And these are the last six or so. So you have to answer some questions deeper in the talk. Just shout them out, and you can get some stickers. The last of the stickers. So the problem is basically this. Unit testing is kind of a solved problem. We have a lot of different tools and techniques to be able to do this. We have lots of tools for testing like the behavior of our systems, for testing the data of our systems, for testing the end-to-end deployments of our systems, and smoke testing the deployments, and doing integration tests, and testing our components. And we have a lot of strategies for this, right? But how do you test something like this? So like, I don't actually even know what happened in this particular one. Like the color of the text became the color of the background. Maybe there's no text on it anymore. I'm not actually sure. This was of course fixed like this. File an issue, get your developer to fix it. This was an app I used to work on. This is the 404 page, like what the 404 page was supposed to look like. We launched a feature, and four weeks later, we were told that our 404 page looked like this. You've all seen this, right? Like, a CSS file got moved somewhere, and it didn't affect anything, and so they launched it, and the 404 page was broken. And of course, nobody caught this because nobody QA's the 404 page ever, unless you're working on it, right? So then we deployed a fix, and the fix looked like this. So on mobile it wasn't QA'd, and the CTA was covered up by some weird Z-index problem. And then I went to pull slides for this talk a while back, and then the page looked like this. So I reported this to my old team, right? So in the business, this is what we like to call a regression, right? And specifically, this is a visual regression, right? So how do product teams fix this today? Shout out the answers. You don't, yeah, that's a good answer. That's the best answer. Left for cons. Left for cons. Like, how do you prevent these things from going out in the first place? QA, right? Like, you're actually testing this. Actually, you get the first sticker. Boom. So yeah, QA, right? And QA is like, this could be, you are doing QA, your developer team's doing QA, maybe you actually are big enough to have a QA team. As your company gets bigger and the risk of launching these kinds of things gets bigger, you will probably have more and more QA resources, more and more QA time. This is often sort of out of band of your actual development processes, right? This is something you do before releases. So maybe it's deferring, it's stopping you from having like a nice continuous deployment or continuous delivery sort of workflow. And QA is like very, very useful and very necessary, but it's also really slow and really manual and really complicated. And even, it's really hard to catch everything, right? Like, even in this app that I was showing you before, that was just a medium-sized app with tens of models. It wasn't actually what I would consider like a large engineering project, but it has hundreds of flows and thousands of possible permutations of like, how do you get into that particular UI state? So, and you're also dealing with constant feature turn because we're developing these all the time. So it's really important, but hard to catch everything. And it's also very expensive, right? You're often spending human, often engineering hours on this, right? So if you spend a couple of hours engineering and fixing visual regressions per month on your team, you've already spent thousands of dollars, right? And this is something that actually gets more and more expensive as these kinds of bugs increase. So let's go back to that example. I'm a big like testing advocate. I do a lot of like, everything's basically TDD with me. If I found this in a normal bug, I would want to write a regression test for it, right? Like write a regression test is that test where you found a bug like this, you go write the test, the test is failing, you have fixed the bug, now the test is passing, all of a sudden you have coverage to like that bug will never regress, right? Like it'll never come back into your app ever again. So let's go into our acceptance test for this and try to write a regression test. So here is the acceptance test for this particular flow, like posting a project, it creates a new project, it visits some page, you're filling in some information, you click a button and then you have some sort of expectations that are standard like the header now says project posted or whatever. So there's a problem here, right? That doesn't fail, the button still works, right? Like this, it's not broken, it's not technically broken, it's just visually wrong, right? So this test doesn't cover it. So what can we add to this acceptance test to make this a regression test? Am I supposed to like assert that the class is the thing on that or that like the computed style of it is correct or is the text in it, but maybe that wasn't actually a text bug, maybe it was actually a color bug? Or usually I see things like this where it's like you expect that you have a class that's actually applied to that element, but that's not actually really testing the visual side of it, all you're testing there is that the class is bound correctly, but what if that class is overridden in that particular situation, right? So I'm just not going to do this, right? And I'm just going to never do this because I'm just gonna go fix the bug and I'm gonna launch it to production and I'm going to like never pretend that like we had to write a test for that thing and then it'll get fixed and then it'll break again, we'll fix it again and we'll just keep doing that for a little while, right? So nobody wants to fix a bug in that area, like write a test that's that flaky and sort of you know, hard to write in this kind of environment and in a constantly developing app. So the problem fundamentally is that pixels are changing, right? But we're only testing our abstractions underneath those pixels, right? We're not actually testing the pixels themselves. And to actually go a little bit further than that, even with all of our current testing strategies and methodologies, we still lack confidence in deploys, right? You can have a million unit tests for all the data changes you want in the world, but if you move a CSS file, you're gonna have to go boot up the app, click through that page, get to that modal dialogue and look at it, right? So we still lack confidence in the visual side of our deploys. So that boils down to like, how can we test apps pixel by pixel? And for that we move on to like the solution. I don't like to say that this is the solution, like the absolute solution for all of your testing needs ever, but this is sort of like a new tool in the toolbox. And to do that, I'm gonna introduce a new concept you may or may not have heard of these. They're called perceptual diffs or PDIFs or visual diffs or CSS diffs. This has been pioneered many times before. There's a lot of tooling to actually create these kinds of things that I'll show you. It was pioneered a lot by like Brett Slatkin at Google on the Google Consumer Surveys team. He has this awesome talk online. You can watch about how they added this to their test suite after he accidentally launched a debug pink dancing pony to production in a big Google app and then had to like do a big rollback of this. So I'm gonna show you perceptual diffs here. So what is a visual diff? So a visual diff is actually pretty straightforward, right? Like given two images, what's the difference between these two images? Like computationally, what is the delta between these two images, right? And that could be this. So like all the red pixels you see there are the pixels that have changed between these two images without any context about what the images are actually about. Could be a website, could be a photo, could be whatever. Those are the pixels that have changed between these two images, right? And actually let's do a quick example. I'll give out a sticker for this one too. Name the differences between these two things and I'll tell you if you're right and we'll see the pdiff in just a second. Yeah. Oh, it's the background color. Background color on the big one. The thumbnails have changed. Thumbnails have changed, yep, the end in the thumbnails. Link is gone. And then the large red button is missing. All right, you got all of them. So I'll put a couple of stickers here. You can come up afterward and grab them. I'm not gonna run back and forth every time. So yeah, so you can immediately see with the pdiff that those things have changed between these two images, right? And you don't have to sort of like sift through everything in this table to identify if none of those characters have changed. You can just sort of immediately see like those are the pixels that have changed between these two images. So let's go like do a pdiff. Pdiffs are actually really pretty straightforward. So I have these two files, new and old. So let's just open new and old. So here are these two images and they've got some differences in them. So instead of sifting through those images, let's go like compute a pdiff. So I have the image magic library installed and it comes with the image magic compare tool and I can compare new and old and I can store the result in diff.png. Okay, it's done and I open up diff.png. Great, there's our first pdiff, right? So like those are the pixels that have changed between these two images. And by default, they overlay the images. They do this translucency. You can pass a bunch of different flags to this. You can pass like a fuzz factor. If you don't care if they've changed between certain thresholds, you can get rid of backgrounds. You can do all like a bunch of different things with this. So pdiffs themselves are actually pretty straightforward to actually make. So let's go through some pdiffs in real life. So here's a pdiff in an app I used to work in. You might not be able to tell what's immediately different with this, but the pdiff sort of tells you it's something at the bottom and you can see like, oh look, the do you agree to the terms of use section of this page is just gone? Right, it no longer exists. And I kind of love this because this is actually a Rails app and this is a backend change manifesting as a front end failure. So this is a Rails form object that got into some weird state that is manifesting as this visual front end failure. So you can actually kind of catch those things pretty quickly and this is actually also an error state, you're noticing. We're actually testing in our Rails feature spec, we're actually testing the error state of this page. So this is an example of a pdiff that actually might be a correct visual change that you've actually added somebody to this page and the stuff got reflowed down so you can just look through the pdiff and say, oh look, somebody got added to this page and things moved down. So you kind of have to learn how to read pdiffs in the same way that you sort of read code diffs because they actually have a lot of information there but they're a little bit noisy sometimes, right? So you might not actually be able to see this but it says at the bottom there, if type of jQuery not equal and define jQuery effects off through, it's got something like script down at the bottom and the footer there, right? And you can sort of immediately see with the pdiff that something has changed in your footer. This was also a Rails app that had a gem that was in a broken state and the gem happened to be injecting a broken script into the bottom of the page. So if you ran the test suite for this app, all the tests would probably pass but your footer has some junk in it, right? So this is just like some pdiff art that I found. It's totally not useful to you at all but like an image got shifted over by a certain amount of pixels to create this like nice pdiff art. So totally useless but like kind of cool to see. And then the final thing that I'd like to show about these kind of pdiffs is actually a zero pixel pdiff is a really strong signal for you, right? Like to say in a refactor of your app, in a classic refactor of your app, you're like changing all the plumbing around, all of the internals are changing but the actual interface or either UI or the API is not changing, right? The thing that somebody's interacting with is not changing but all of your internals are. So a zero pixel pdiff gives you the confidence to know like no that nothing changed on that page, right? Like I moved all that CSS around but everything still looks visually correct, right? So actually having a zero pixel pdiff can be a really strong signal for you in a lot of different cases. So some simple uses of this kind of thing is just like catching visual regressions, right? Before they actually go out to production. Some advantages I've seen is like in the CSS refactors or deletions. So question, are you all like terrified to delete CSS? Yes or no? Yeah, I see a lot of nodding heads, right? And it's scary, right? You don't know what legacy parts of your app are using that CSS. You don't know where it is. So we just leave our CSS getting bigger and bigger and bigger until the point when like you might hit the IE8 truncation limit of like the number of selectors that can exist in a CSS file and then you finally have to go split your CSS file. I don't know, has anybody hit that bug? I definitely did. Okay, couple, yeah. So like go add visual regression tests to like the top 50 pages in your app then go delete a bunch of CSS and see what happens. If nothing changes and you have tests for the 50 pages you care about, great. You can probably delete that CSS, right? So it kind of gives you that confidence. So safe dependency upgrades. This is actually one of my favorite ones because you want to be like constantly advancing and upgrading your dependencies. So like for example, this plus greenkeeper IO which automatically upgrades dependencies and if your test pass they just merge the PR for you you could actually catch if the upgrade of that dependency causes any visual changes but not any test breakages, greenkeeper would actually stop and show you like oh, this is actually broken. So like safe dependency upgrades is actually a big thing that this can give you confidence in. I've seen people do visual regression testing for emails which is kind of awesome and then testing D3 visualizations is actually pretty cool. Like I'm not a D3 expert by any means but usually you can write tests for like the inputs and outputs. You can say like if I give it this data and it gets transformed it should look like this and then you just sort of trust that D3 is gonna take that and do the right thing with it but what you really care about is what the visualization looks like, right? It's a visualization library. So why not like do a visual diff test for your visualizations and then if your visualizations change or if the way that you're manipulating your data changes you can say like, oh yeah, that's now correct way to visualize that thing or no, that's incorrect now. So it gives you that kind of info, yeah, yeah. Actually I will get into that a little bit later. Okay, so that's visual diffs. We are done with that. So like what's the big deal? Why aren't we all doing this right now? Why is this not like happening? Why isn't it so easy? And the answer to that is that all solutions of course come with like way more problems than you anticipate. So now we're gonna dive into a bunch of those and talk through those. So I like to bucket the problems with this kind of architecture with into three categories. Tooling and workflow, the performance and non-deterministic rendering and we'll go through each of these. So on the tooling and workflows front there are a bunch of open source apps that do this. There's like phantom CSS, you can go like try to incorporate those in but of course because they're not services they require that you for example like version the baseline images into your get repo. And that's a big workflow process that you all have to get used to that is like pretty hard to actually get past. So like where do you store the baselines? Like how do you deal with baseline changes and updates? Where are you storing these files? How do you present them to your developers to actually like see them day to day? So that's one on that. And then on the performance front this is actually the one that I think is the biggest reason across the spectrum all the open source apps, all the proprietary apps, all the tools everywhere like performance is the big one of I actually think why we're not doing this right now and I'll go into like a little bit of details of why. So for example some of the pages I showed you some of the pdisk they're kind of contrived right? They're like kind of contrived, they're kind of short, they're kind of small pages, they're not very complex. I have some pages in Percy for example that there are 30,000 pixels high, 40,000 pixels high when you render them. So rendering that kind of page, just rendering it, just rendering it and saying hey browser write me a screenshot to disk can take 15 seconds, 20 seconds. And then computing the pdisk can take another two, three, five on top of that. So let's say that we have like 100 UI states that we wanna test and it takes 20 seconds to render and diff and sort of store each one of them. How much time is that gonna add to your test suite? That's 2,000 seconds, okay how much time is that? That's 30 minutes you're adding to your test suite immediately if you run all of these sort of serially. So that's a huge amount of waste of time basically of like added performance impact to your test suite. So I think that's actually a big reason why we don't do this right now. And then non-deterministic rendering is actually quite a big problem. So for example, in browsers things aren't changing. Like you mentioned, things are dynamic. So the big obvious one is animations. So take this pure CSS animation here on the left. If you do a visual diff of this, what diff are you going to get? You might get this diff. You might get this diff. You might get this diff. These aren't useful to you, they're just kind of noise. Like you know as a human looking at that that that's the same thing, it's working correctly. So for example, what we do in Percy and you can see this on our blog, we just inject some styles into the page to try to like freeze animations in a bunch of different ways so that then we can say like, oh look, no visual diffs detected in this. So we can try to like freeze animations in these visual regression tests. Or dynamic data is also a big problem, right? So like if you have anything on your page that relies on a date for example, or like shows the current date, those things are obviously going to change pixel by pixel. So like you can sort of deal with the noise. And then I have some ideas about how to handle this stuff. We could give you like a, give me a list of selectors and we'll just sort of ignore those parts of the page. There are some apps that do some like open CV analysis of things to say like, oh look, that was a background color change and it only changed so much to just like don't show it. But by default in Percy for example, we actually just show you all the diffs all the time like a hundred percent with no thresholds. Because actually if you're changing a background color like if your developer comes in as like we need a new shade of blue there and like adds that, that's an important thing you might want to actually know. Your designers might actually want to be ensuring that you're using a consistent color palette. So these things actually have like a bunch of different ways that we can address them. Or for example, things you might not actually anticipate. So like often the browsers that you're using in tests are not actually modern browsers in any sense of the word. In Ember we're a little bit lucky in this way where we actually, a lot of you are probably testing in Chrome, but like how many of you are testing in Chrome in your Ember apps? How many of you are testing in PhantomJS? How many of you are testing in something else? Shout it out. We're using headless Chrome. Headless Chrome, okay. No tests. Testing production. Testing production, okay. All right, so like the, so for example, in the Rails world this is actually what Capybar WebKit might show you. So what you'll see on the right is actually what it's supposed to look like, but this is what Capybar WebKit would render if you asked it for a screenshot. And you can see like the border image isn't working and like the web font doesn't work and there's a bunch of other things going on. And that's because this is an old fork of WebKit from many, many years ago and it doesn't support those new behaviors, right? Or like PhantomJS 1.9 before their new like 2.1 release. This is what it would show you too. Like no background images, it didn't support web fonts. Like it doesn't know how to float things correctly. And this is because it was also a fork of WebKit from six years ago of like Chrome 14, right? It doesn't render the modern web. So we're often like thinking that these things that are test browsers underneath are going to render us the right thing, but often they're actually a little bit behind. Or like things you would never think would be a problem. So for example, I dealt with this bug where a customer said like, hey, this button sometimes when it renders it'll render with the hover state on and sometimes when it renders it won't. And I was like, what could that possibly be? And it turns out that in our like XVFB virtual frame buffer where we're like running the browser, it happens to be that the cursor is in the middle of the screen. And if the windows were overlaying each other and if they happen to render on the top window, then the mouse cursor would possibly be over that state or that part of the page, which happened to be where their button was positioned, which happened to trigger the hover state of that button and cause like these flaky visual diffs. So like things you might never anticipate to be problems are problems when you try to do pixel by pixel perfect testing of browser renders. Or like also things you can't really control like sub pixel anti-aliasing is a big problem. Things might shift by one pixel. GPUs don't actually guarantee that floating point operations are deterministic. So if you like, for example, render, if you have a GPU rendered gradient on one machine and you move that GPU rendered gradient to another machine, they might not be pixel perfect. They might like look okay to your eyes, but they might not actually be pixel by pixel the same. So these kinds of things actually, unless you have a really consistent rendering environment, which is not like each of your dev laptops rendering these things, they can actually cause like lots of pixel by pixel diffs. So I use the same GIF as from the previous talk. So thank you, we've all converged to the same thing now. So this is sort of my point is like p diffs are only half the battle here, right? There's a whole class of other problems that come up in this space. So let's go back to our goal from beginning, like how can we test apps pixel by pixel? And here's where I like to like start reframing this in the way that I've started to recently think about it. So how can we make visual reviews as easy as code reviews? Like how can we do this at the same time that we're doing our code review processes? I'm sort of hit upon this recently because I actually think that like these kinds of reviews, these kinds of visual diffs are actually very similar to the sort of code diffs that we're looking at in our code reviews, right? Like with a code diff, you can't, like yes, a linter is like an important tool to like look through it, but it's not everything. It doesn't understand the architecture of the code. It doesn't understand the thought process of the developer behind it. It doesn't understand like that you might be able to remove half that thing because we have a new system that does it, right? So what you want in your code review and you're doing code review by choice, right? Like you could just all push to master directly with no code review, right? You're doing code review by choice because it gives you value. It gives you the coordination across your team and it gives you this sense of like continuous integration, right? Where you're actually pulling in all the changes from all of your developers all the time. You're integrating them continuously. You're getting immediate feedback from your test suite. Like that thing is okay. And then you're looking at it to make sure that you actually agree with the change and then you move forward and you're doing this continuously all the time, right? So visual reviews have still become that thing where it's like this manual QA process that's like disconnected from that. So like how could we make visual reviews be closer to code reviews and like as iterative given all the problems that I mentioned before. Okay, so this is like, this is where this becomes continuous visual integration in my book. So it's not just like, oh I'm running a set of tests sometimes, right? Like that's not unit test. That's not like, that's the difference between like unit testing and CI, right? How to continuous visual integration is like, how do you do this all the time as part of your development workflow, right? And at the PR level, not at like some later stage, not like on a staging server and definitely not in production. So visual reviews like code reviews requires that everything is fast. It's really fast. You have to figure out the performance problems. It has to be as fast as your test suite effectively. It has to be done at the PR level and you have to be able to handle complex UI states. You can't just like, like render all the static marketing pages and be like, oh nothing's changed, right? We wanna test all of our components. We wanna test all of our component states. We wanna know that like deep and when I clicked into that like dialogue like that modal and that animation is the same. So you have to be able to test complex UI states. So this is where we move into the demo. The demo may go completely awry but we'll see what happens. Okay. So I was thinking for this demo, like what can I show you that's an Ember app you might resonate with? And it turns out that the Travis CI UI is completely a big Ember app that you can go and it's open source and you can download it and you can play with it. So that's what we're gonna do. So I forked Travis web and I have it running in a place here and we can go like look at it. So here is my local install of the just the UI of Travis and like I can basically do everything on this page like I would normally do with Travis, right? Like I can visit different pages. I can like see they're like getting started and you can see all the finicky like the title thing is rendering. And actually this thing talks to their production APIs so I can actually go to like my actual thing on Travis CI on my local instance of this and I can like dig around in my repos and actually look at my builds on my local install here or I can like go to a repo that doesn't have any builds and I can just dig through this and be like, okay, great. So this is like that. So let's go like look at their testing suite. So it turns out that they actually have really awesome tests which I met the guy who wrote these recently and I was like, these are awesome tests. I'm going to learn a lot from these. They have like nice page abstractions. The test suite runs in like 30 seconds. It's a great reference for like a well-built Ember app. So for example, they have a bunch of acceptance tests here and like this is like the acceptance home sidebar tabs tests. They have like one that's like with repositories tests for the dashboard without repositories tests for the dashboard. Let's go like find out what one of these does. So I don't even know like where but we'll just go to the bottom of this one and we'll like if you know this Ember helper we can just return pause test in this acceptance test and then we can go run these. So you can actually see down here that like the little Ember window is rendering and it's actually paused now or I can show we haven't hit it yet. Maybe it's this one. Okay, so we've like, nope, it's not. I guess I could just filter to just that test. I think it's this one. So we've paused, we're at the pause test part of this test and we can look at it. And there's like a little animation here like this thing turning around and there's no information. This is like what this page is supposed to look like as of that test date, right? Okay, so let's look through a couple of others. So like this is the with repositories without repositories. Let's go down to one of their component tests. So like here's integration components owner repo tile test. That sounds important. So because this is, okay. Because this is a component test we can't there's no acceptance test helpers that are there but because this is Q unit we can actually just use the stop command and then these tests will refresh and then we can just pick owner repo tile component. And then now we're stopped and we're paused on that test and then this is what that component actually looks like in the little ember testing dialogue. So that's important, right? Like that's the row of every in the build repos in Travis. You can actually see like this is the row component. So like, okay, that one's important. Okay, so let's pretend that we are like a new, oh no, sorry, I'm not there yet. So let's go like add Percy and integrate Percy into this test. And I've done a little bit of the work here so you don't have to read the docs. I've done ember install ember Percy and I've done the setup of like adding it to the module for acceptance at the top so that it's imported module for acceptance. It's imported to the top of the, it registers the async helpers and then I've also dropped something into the config. So that's all in the docs but I won't like walk you through that. So how do we like get Percy into these tests? So let's go back to our sidebar tabs test and just go to the bottom here and just say, okay, Percy snapshot sidebar tabs test. And this is the name that you sort of give it to say like this is that state. You can just like, you can just pass in QUnit you can just pass the assert object and it'll actually build the title out of the names of the trail of the tests along the way or in Mocha you can pass like the this.test object but I'll just like write the name manually. So here's the sidebar tabs test and let's save that. Let's go to the owner repo tile test and we'll add something here. So Percy snapshot. I'm also adding these after the assertions have happened. So this is the dash or owner repo tile test. And this is because I'm now in a component test I'm not in the acceptance test suite. I actually have to go import the helper because there aren't like a registration system and number four for async helpers in the tests. So I have to do import Percy snapshot from Ember Percy. Now that's registered there. And let's go back to the acceptance tests up to your two with repositories tests and we'll add one here. So our Percy snapshot dashboard with repos and we'll copy that one and we'll do it in the without repos also. Oops, I don't know why that's not copying. Without, okay great. So we've added like a base set of visual tests for this. So now we'll like commit that to master and we'll go ahead and save it. Okay, so initial visual integration. Great, get push. And I'm gonna pretend that I get pushed it and I'll just show you the result. So I've get pushed it and then Percy ran and here's the Percy interface for like the list of builds. So that's the one we pushed actually is this one down here. So we'll click on that one and then this is what it shows you in Percy. So here's like the sidebar tabs test and that's what that page rendered. Here's the dashboard with repos. That's what that page rendered, et cetera, et cetera. That's the gets started page and you can see this as no new snapshot. So we consider this like 100% visual diff because you added something. This is a new test. But then the next one I'll show you where there's a diff. Sometimes this will just say like nothing's changed. And also you can actually, I've set up this. I'll show you that config that I added. Config environment, Percy. So I just dropped in, I copied and pasted from our docs the base breakpoint configs and I said by default set the breakpoints for all the snapshots to mobile and desktop. So that actually took all of these pages and also rendered them for mobile. So then you can see like here's what that page looked like at this mobile breakpoint. This isn't actually on the actual device of course but this is just like at those actual breakpoint widths. So you can sort of see and compare and you can sort of like go to this like overview mode of like all the different things at the same time. And I'm gonna work on like making this UI a little bit better too. So cool. All right, let's go do like actually what we're here to do and do like a visual diff. So say I'm like a new, I'm the new naive developer like on the Travis web team, right? And my task is to go to this page and they told me like they want on this page, they wanna change the button of the branches. They wanna read this docs on getting started. They want it to be blue. So I'm like, okay, I can do that, right? So read the docs on getting started. Let's go find it. Read the docs on. Okay, here's the button. Okay, so it's just button, button green. Okay, great. Let's go like find that. Here's button green. Okay, so the background color of that button is turf green. So let's just go find turf green. Here's the declaration for turf green. All right, so I'm just gonna change that to blue. So we'll make a new branch. Let's see, I'll do my non-expanded ones. Get branch. Sorry, sorry. What are we gonna call this button color? And look at the diff. Okay, it's blue, great. Change button color. And I'll get push that up on this branch. And I'll just pretend that I'm get pushing that up. So, and look, my local repository or my local version of this thing refreshed and did the library load. So like, oh, look great. And there's a hover state that's different, but I didn't QA that. So then let's go to actually look at what that tests, or let's go, sorry, let's go make a PR like I normally would. Sorry, I'm jumping around a little bit here. Ignore that one. And we'll make the PR against my like fork of the repo, not against the main master Travis web, so they don't get this and be very confused. Okay, so like here's the diff, turf green, blue, great. All right, and I'll create this pull request. Change button color, good. And then my test will run in the background and let's advance 30 seconds in the future where all of my tests have already run. And look, the tests have passed, all the tests passed, but Percy found some visual diffs. This is how we integrate into your like development workflows, we just drop a GitHub commit status onto that line, it says visual diffs found. So then I can click on details and I can go see like what did that change actually do. And of course, you know what's coming. The button changed, so it got great. I accomplished my goal, but also, turf green is used everywhere. So like it changed all the headers of that page on the main getting started page. And it also changed all of the links on the main component. Notice that this is the difference between an acceptance test here and a component test here, but they're both being visually tested. So you can sort of see like, oh look, the pixels have changed. And then also smart signal for you to know nothing has changed on this page. So you can actually say with confidence that like turf green is not used on the sidebar tabs test state, right? And also you can see that like at the mobile breakpoints, like that's also where these things have changed. So all the pixels that have changed are the red ones. And you can see like it changed the header there, et cetera, et cetera. So I'm reviewing this and I get to that PR and I'm like, hmm, that's a little bit weird. But then we're sitting there and we actually look it over and we talk it through in code review and we're like, actually, that looks nice. Let's go with like just blue for turf green. Like we're gonna change our branding to that. So here's like, I'll try to get this to work with GitHub in real time. So the review process that I was talking about is basically this, like you've looked through these diffs and you're like, okay, I'm basically happy with these, I'll approve it. And then the sort of lightweight review processes you click approve, the bill is green. So now you don't have to like advance past a red build. And then when your master tests run, so this gets merged into master, I click merge, it gets merged into master and then the master test run, push a new build to Percy and then Percy uses that new master as the baseline. So it's constantly sort of advancing with you automatically. We have a mode where you can like, you know, switch it to a more manual mode where you can approve master baselines and say like, this is the approved master, this is the approved master, but by default we just use whatever the last master was as your baseline. So yeah, that's basically all the demo. It didn't break down, all right. Okay, cool. So, okay, so that's the demo and then the last section of this talk, the last five minutes, I'll show you basically the entire architecture of how this works and how it addresses the problems that I mentioned before. So like, here's a random acceptance test. It does some normal things, it shows the dropdown and it clicks dropdown toggle. So we drop a Percy snapshot in there, we give it a name. What is that actually doing underneath? Is that pushing images to Percy? I forgot to say, if you don't care about how it works and if you just wanna use it, you can like gloss over this and stop paying attention now. This is going to be like in-depth look at how this works underneath. So we push that, like are we pushing images from your test suite? And I ask that with like lots of question marks because that would come along with all of the problems we mentioned before with performance issues and with the non-deterministic rendering and with the workflow problems. So of course we can't do that, we can't push up images. So what we actually do here is we push up DOM snapshots. And if you think about this, this makes a lot of sense because the most lossless, the most efficient encoding format for your website is not an image of a website or like a video of your website, it's the website, right? It's the DOM state that you've created. And the DOM state reflects basically everything you've done in that acceptance test, all the different like network requests that have happened, all the different like manipulations of the DOM of the state. There are some things as like some smart people like Nathan have pointed out to me that like it doesn't capture, there are things in the DOM that like there are properties that aren't attributes like certain hover states and certain like input fields don't actually reflect things in the DOM that are actually visually represented. But that actually helps you solve basically most of the problems we were talking about because you actually get a capture of the state of that DOM of the DOM and then we can just render it on our side. So we don't actually a little bit more details. We don't actually push up the DOM snapshots every time we do a handshake where we say, here's shot to 56 finger printed versions of all of the assets I'm about to send you. And then the response from the API is, oh, you only need to send me these couple. And then the response is, okay, here is the actual content for those. So basically that amounts to, we don't ever upload anything twice. So the first time you actually run this in your test suite it might be kind of slow, but the second time and third time and fourth time it's basically as a fast as your test suite. I like to say that it actually adds zero overhead to your test suite. So what we do is we actually run this in your CI environment as part of your normal CI service when we push stuff up from Percy. And we grab some stuff from like the environment variables of your CI service, like what is the current commit shot so we can like talk to GitHub and get more info about it. Like what is the description? Like who made this thing? What is the, a couple of other things? Are there like weird parallelization things that you're doing in your CI suite? And then we set those GitHub commit statuses. And then underneath we actually push that stuff into storage of course. And then we push it into like a Redis queue worker that actually like enforces some limits and tries to schedule these things on these workers. And then the workers actually run like Firefox 45 ESR. We run like the extended service release versions of Firefox, pinned at certain dependencies in Docker containers with all pinned dependencies so that we can keep the environment as stable as possible so that it doesn't change and upgrade like the fonts on the system, for example, that might actually result in rendering changes. And then the key, like the absolute key insight that I wanna leave you with on this in depth look is we can parallelize it. So if you think about it, you've just given us a bunch of dom snapshots and that's really cheap. The really expensive, computationally expensive thing to do here is to actually render them. So we can parallelize them as fast as you would like, right? So effectively you're taking, I thought of this like yesterday, effectively you're taking each line of your test suite and you're shoving it into a service and then the service can parallelize them like a hundred at a time if you wanted to. They still take 20 seconds, 30 seconds to render, which I'll show you a little bit of details about that. But you can actually like parallelize these. So that basically completely out of band of your test suite, you get this other service that is completely dedicated to just like visual rendering and diffing. So on that, basically the dom snapshotting model basically allows us to solve most of the workflow problems, right? It's integrated directly into your GitHub PR workflows. The performance is basically as fast as you want it to be. The non-deterministic rendering is basically offloaded to like me as a concern, right? Like I have to make this environment be as good as possible. You may need to do some like test changes, right? Like you may need to use more like fixture data or like deterministic data rather than like fake or random data. You might need to like freeze time in your acceptance test suites, which I'm actually gonna write a blog post about soon and hopefully create this little Ember helper that actually helps you if you're using moment.js you can just like freeze moment.js across your entire test suite so that all of your times are deterministic and it's actually pretty straightforward. So I have a couple of customers with like, they push up 200 snapshots every build and they build hundreds of times a day and it adds basically zero overhead to their test suite and we can keep up with all of the rendering. So like does this work? Does this work? And I'm happy to say that like yes, this is something I've been like really passionate about for a while and working on for a long time. And we've now as of this morning or yesterday night we've rendered three million visual diffs in Percy over all time. So if you actually take three million visual diffs and at a 20 second average render time, which actually I have pretty good numbers to show you that like that's actually how long it takes. So here's our like back end dashboard monitoring for the rendering pipeline and the purple the bottom section that you see is actually the amount of time that it takes to say, hey browser write me a full page screenshot and the top blue section is actually computing the visual diff. So computing the visual diff is like basically nothing. It's actually asking the browser to write you a full page screenshot that's the expensive part of this. So it takes around 20 seconds on average to do it. So if you take 30 million in times of by 20 seconds we've actually done 17,000 CPU core hours of rendering which is equivalent to like one CPU running for two years in nonstop in a row at 100% in Percy in this parallelized world. So that's it. I basically just wanna leave you with that continuous visual integration is possible. It's like a thing that we should be doing. It should feel like it's an iterative code review process and it's just part of your normal development workflow and it should give you more deployment confidence, right? It should be able to, I think of this as like the last stage of the CD pipeline that we haven't quite automated yet where like I have to go look at things. Now you can look at things really quickly in hundreds of different UI states and approve them in a really lightweight way and then let your continuous deployment processes just continue. And just a couple of hints or suggestions. So you should probably start with breadth of visual coverage over depth of visual coverage. So the tendency is to want to get in there and be like, oh, I'm gonna take this component and I'm gonna render it in 19 different states and I'm gonna snapshot all of those different states. And then all of a sudden you look at your test suite and you're like, oh, I only covered this small section really in depth, right? So start by, the way that I've done it when I'm integrated into these big Ember apps is just go to each file, each file that you have a test in and just add one snapshot, one snapshot to each one. And then tune it over time, right? Like if you have a visual regression that comes up in your world and you fix it, go add a snapshot to it. And then all of a sudden you have a regression test for that kind of a visual regression, right? And also you may need to actually do some work here. So like this is not totally for free, right? You have to update your test suite to prefer fixed data over random data and you might need to like freeze time, which I'm happy to talk to you about and like send you gists and things. And also I'm here to help. Like this is what I'm doing now. So if you want to talk to me at all, I'm also all available online on Twitter and in the little like support thing on Percy, I'll get back to you as quickly as I can. And I'm happy to chat with you more about this. Thank you very much. That's it. You might have questions. Yeah, yeah. So the DOM snapshotting model also gives us a full abstraction, right? So basically we just have to write a client library in your framework or language to talk to us. So we have like Rails support right now. We have a basic Python client and we have an Ember client with like a basic JavaScript layer underneath. So we eventually want to take that JavaScript common library and expand it to like an Angular test suite or React test suites, et cetera. It does take a little bit of work to finesse that thing to like work as seamlessly as the Ember integration does, but it's definitely in the direction I want to take it. It does need to integrate with the test suite. That's kind of how I'm taking it now. Probably we'll also do like a live testing thing where you can just sort of be like, here's a bunch of URLs, go do this on some basis, right? You could sort of make that an easy stepping stone, but I actually like this because it integrates, because your test suite is the thing that you're looking at in CI. So integrating with your test suite gives us immediate like integration into your development, your PR development workflow. So. What's the roadmap? It's not only at 4.0 years? Yeah, absolutely. We will have an enterprise packaged version of this soon. So please come talk to me. Is that a support GitLab CI? Yeah, I'm working on a project right now to sort of like decouple a bunch of this stuff from GitHub. So yeah, we'll support GitLab and basically actually, Percy doesn't have too many ties to GitHub very directly right now, except for our authentication. So we'll decouple that a little bit and just sort of let it work for everything. Yeah, yeah. Is there a visual change? Is there a solution there? Yeah, I think there are some solutions there where you can sort of, well, okay. So I guess the question is, how far can you automate code review? Is that kind of the same equivalent question? And I think the answer is you can't do it 100%, right? Like maybe there's some deep learning algorithm we can invent that will allow us to do a lot of that at some point, but code reviews are a valuable process because you have humans looking at it and visual reviews I think are the same thing. For example, if none of the changes change visually, if all 100 of them are zero pixels, we actually mark it as green automatically. So you don't even have to go to Percy if nothing has changed. But no, not really right now. It's basically like as a visual review process. Yeah. So chess, project, resetting, mark, where you can serialize it. They do like DOM diffing, right? Yeah. Yeah. Asserting certain classes are here, certain like, I know this is the good DOM and any progression, make sure that the DOM is still the same. Yeah, totally. And I was wondering if like Percy, I guess in a way it's a stepping stone for like Ember, for example, to introduce snapshot testing that way because you already have to break that part a little bit and after the DOM is more concerned, sending it off to Percy is another concern. Also locally, like when you have Percy snapshot running, there's no assertions that itself gives off so you can't actually fail the CI from that test run itself by you're failing later on Percy. Right. Part of that is the concern of like performance, right? So you can't like synchronously block the snapshot on if it's changed or not because then you like force all the performance concerns into the library or into that line. But yeah, I actually really wanna add Percy's testing to Jest because they already pull DOM snapshots so we might as well push those DOM snapshots up to Percy and get the actual visual side of it too. I think that would be like a very natural progression. Yeah. Yeah. That's a good question. I'm not sure if I could actually run it offline. It definitely handles a bunch of error cases where like if you don't have Percy set up with like the tokens, it just disables itself. I think in that case it'll actually just disable itself safely and your tests will still pass screen. I have a bunch of checks and it's an open source library too so you can go check it out but I have a bunch of checks in there to say like, oh, if this thing isn't enabled or not working just like disable Percy for this run. So yeah, it should just not do anything. For example, in your local test suite if Percy doesn't enable itself at all if you don't turn on the project and token environment variables that you see in the setup. So if you run your local test suite it won't even do anything. It just ignores those lines basically. Okay, cool. Thank you very much. I have three stickers up here for the first three people who want to have them.