 Okay, so let's, let's resume. Hi again. Thank you for coming to this session. My name is Diego. I am a software engineer and I really like testing. I switch from developing actively to doing both things a while ago. And I am Colombian and I live in the cities in the background in Berlin. And a bit more about me. I work in a company called Element 34. We do everything related to Selenium, consulting, training, et cetera. And we also push a bit for a product we sell that is on-premise Selenium Grid. And it makes me very happy to work there because I can work with Selenium all the time, which allows me to also have expertise and experience in Selenium as well so I can open this knowledge to the open source world. And I like, I like to make parentheses here because I like to contribute a lot to open source. So I help to maintain the Docker Selenium Images. I help in the Grid Workshop. I help to organize meetups in Berlin and Hamburg. And this is not to brag, but just to give you an invitation that the Selenium community needs more people. More people doing documentation, more people doing meetups, more people actively helping each other. And that's why I invite you to join this link at the bottom, which is the IRC Slack channel for Selenium. Everybody's there, all the committees are there. So I take always like 30 seconds, one minute in my talks to promote the open source community. So please join us, help us with more information, with testing, with documentation and so on. Because these just make us grow more and have a better community. Okay. So let's get into it. Initially I want to cover the reasons why I decided to talk about this topic. A bit more about functional layout and visual testing. Then I'm going to jump into the part where we decided how kind of like what criteria we use it to decide which one to use or to use them all or to combine it. Then some final words on QA. So the content of this talk was developed while I was working in a previous company called Zalando in Germany. And I was part of a team that was in charge of helping other teams to test in a more simple way. So there were like, I don't know currently how many teams there are, but around 200 teams developing software actively. And we had the role to help other people get better in their testing. So in this team, regarding UI testing, we always thought the main components for UI testing are functional, layout and visual testing. So slightly covering each topic when we talk about visual test, functional testing, which is, we still don't need so much introduction. It's the one that we are used to, the one that we use commonly every single day. And it's the one that help us to actually verify that our software is working in the way we expect. So it's our safety net or the end-to-end part. The layout testing is the one that is helping us to validate that our responsive application behaves nicely in different environments. Even if it's a tablet or a mobile laptop, et cetera, et cetera. So the way it displays, depending on the scenario and the viewport, it actually works. So I just made this a small recording of how the GitHub page works when you are like switching different viewports. And that's what we expect that the application behaves in a proper way when the conditions change. Visual testing mainly takes care about the situation where we want to check, verify and test that the application is showed properly to the users based on what we expect. So if they're using a phone that the elements are shown with the right color, right distribution, et cetera, et cetera. So mostly the visual aspect that the content is not like completely unorganized. And also that the company has a good image reflected on different devices. This has a few challenges. In general, one of the most complicated ones I have seen or we saw in the team is that when you have dynamic content, it's very hard to handle these things. For example, the homepage of the company where I was, it had a floating banner. So then how do you test that? How do you check that it actually works in the proper way of the time? And there are many more like when you scale images, when you have anti-aliasing and so on. I'm not going to go deep into that because there's extensive amount of material in the internet or in those things. So the journey that I want to describe today is that there was a team that had decent UI tests, functional tests in the company. And they approached us asking for help or asking for a road that they could take because they wanted to improve their testing in a way that they wanted to have more coverage. But they already had some tests. So when they came to us, they wanted basically these things. They wanted to run tests more often, ideally on every commit. They wanted to have more test coverage, but this didn't mean that they wanted to have a duplicate amount of tests. So if they had already 50 tests or 100 tests, they didn't want to implement 100 more just to check things in the UI. They wanted to find a way where they could actually do more testing without writing many, many, many more tests. And one key thing that is very important when we see new tools is they wanted to reuse the current things, the current setup as much as possible. So then as a team, we were thinking like what we should do, like we could like actually propose solutions out of the way. But then we thought maybe we can take one step back and think, okay, we have these three types of testing that we propose always. But does it make sense that they use all the three of them right away? Or maybe we should think first how to mix them or how to not mix them. And to decide that what we did was to define two main criterias. The first one was the application characteristics. And the second one was the stage in the development process. So those two things were key for us to decide what type of testing we could use. So let's talk about the application characteristics. This means, for example, the type of application I have to test. It could be kind of static or like this dynamic CMS application types that are like full of forms like government sites that you have like to fill forms and click, click, click. Or you have an e-commerce website or you have an internal site in your company that is used maybe for reporting expenses or something like that. Or maybe you have an application that is entirely graphical like doing like plotting graphs or I don't know like throwing things like that. Also, does the application need to work in all browsers? Does the application need to support all devices or viewports? And more than that, does the application work in different languages? Like it's internationalized and localized. So does it work differently with different currencies or maybe different laws applied to the application where whatever it's used? Like the common case of that is like if you use YouTube. In YouTube in some countries you can see the video and in other countries you cannot see the video. There are some laws applying to copyright and so on. That's why we came up with this decision table that is first I want to highlight that this isn't a silver bullet. This is what we came up with as a guideline that teams could use to decide what to use. So we have the four types of applications that I was talking about. And what we think strongly is that functional testing is always a must. It is what is giving us this sense of security. And then we came up with these keywords that we were naming there. So maybe nice to have and not need it. Not needed is clear and I'm going to start with that one first because in the case of an internal tool, let's imagine that you are a company of 200, 300 people and it's an expense report company when you're traveling. So not all the 300 people in the company travel, maybe only 50, 60. So the user base is very small and perhaps you don't need to spend time writing visual tests or layout tests for such a small audience. And when something doesn't work, you will get feedback really fast. So perhaps on every release you just do some manual check and that's it. And there is no more action for that item. In the case of maybe and nice to have, this just means that you need more information to say yes or no. If you have a huge application with a lot of users with different countries, et cetera, et cetera, you can see that from your analytics information. You can see how important is the feature. And in general the key word is to say how bad is if you screw it up. Depending on that, you can decide, okay, we should do this testing because this will give us more security. And the nice to have in general means that if you have time and resources, you should do it. But also putting the balance how critical is the feature that you're implementing. Now let's talk about the stage in the development process. This was the main input to build the previous table. So let's talk about three types of application that visual or layout testing could apply according to what we work on. So let's think that we're working in a jail world and then we have our features clear. We start the sprint and we want to implement a feature. Then we open a feature branch and we start working. We start doing commits and so on. And one of the first things we wanted to achieve was to run tests when we commit. But this is a bit tricky. It's quite simple for unit tests and for API tests because they're usually very fast. But with UI tests, this takes longer and longer because you have to spin a browser and do several things and so on. So we were playing around with the Selenium capabilities to find a way to make tests a bit faster. And what we did was to play with the page load strategy. Before playing the video. So on the upper part we have the website on the left side on the German version. The right side is Spanish version. And on the upper part is using the normal page load strategy. I will explain in a moment. And on the bottom part is using the known page load strategy. So you will see that it's just a simple test. It goes to the application. It does a search. It clicks on the first item. It goes to the detail. First size. It goes pretty fast on the bottom part because it's using the page load strategy. None. I'm just going to escape and present again. So what does this mean? Has anyone played with the page load strategy ever? No one? So this is a standard thing defined around which one of the standards of the W3C. I forgot. But what it means is that when you use the default one is the page load strategy normal. So you can see it in line number five. That's the default one. And this means that when a page is loading, when you say driver.get, Selenium will wait until the page loads. Well, there are tricky things with progressive apps, but in general this is the rule. And then when you say page load strategy normal, then after driver.get, your test will try to interact with the page like right away. And this could lead to situations where you don't find the element. That's why you need to start using explicit weights in a better way so we expect where and check when the element is present. But you can see I imagine if you had like a hundred tests and you wanted to run maybe 20 or 30 the most important ones and you can drop your test from 15 seconds to eight or nine seconds. You sum up those seconds and then you will see how you can try to run as many tests as you can per commit. Okay, so let's keep moving on. Let's say that our team was like working well and then we came to a point where we want to open a pull request and merge to master. And that's when we think we should run the other type of tests. For a dynamic application, mostly with web forms and so on, we thought layout was a good option. So we found on the internet, many people have used the Galen framework, which is a framework that lets you specify a spec file. And this conceptually works pretty simple to specify the objects you want to use in your test. You define them with a CSS locator and then you specify relation between those elements. For example, on the online number setting, you have the star. So this means that these rules will apply to all elements and always. And what you're saying there is that the header should be always as wide as the screen that the logo should be always inside the header and the search text inside the header. And for example, on line 14, that when you are using a desktop viewport, the header should be round about 102 pixels and the logo should be always centered and inside the header. The code, I apologize because it's a bit too much, but the focus things are line number two, there is a resize when the test starts, then online number four and number eight and number 13. This is when inside my code, my test, I use the Galen API to call the spec I wrote in the separate file. So Galen applies all these rules, checks, checks, checks, and then reports something back. And in the end line 16 and 17, I just do an assertion that there were no errors. It is quite simple, I would say. Here is a small video showing how this works. And in general, I can mention that it's the same test that we saw before. So you can see how the screen is behaving in a responsive way. And in general, we can say that the Galen frame is super well maintained. There is an active community. There is a lot of movement. There are features implemented very often. And there is a very good documentation that was the last thing I wanted to say. Some parts that we didn't like so much is that it has its own DSL. By this, I mean that it is, sorry, just one moment. So it has some DSL, so you have to write this spec file to actually declare what you want to do. So therefore, you can partially use your tests, which was one of the things you wanted to do. And when you declare this specification, then you decide what you want to do there. And so far, I haven't checked in the last two weeks, but there are only bindings to use this API for Java and JavaScript. One thing that's why we suggest that it could work well for applications that are very kind of content static, not so aggressive changing layouts is that in an e-commerce website, what we saw is that we were checking a few things, but we never checked the things that I'm highlighting already. We never checked for the payment methods on the bottom part, which are very important for an e-commerce website. We never checked for the suggestions to come there. We never checked for the detail to be there. So what that means is that if I want to check for those things, I have to write a spec of this size. And the test that is calling that spec maybe is this size. So when you have an error reported, I mean, you see the initial test like this size and then the spec like this size and you are duplicating the locators on both sides maybe. So it was a bit too much for applications that have a lot of elements that are important. So that's why we think that it should be used in a different context. Now let's talk about the e-commerce and the animated graphics type of application. We opened the pull request and that's when we think the test should be executed. So the approach we took, because there were so many tools out there, we took the approach of trying three different things. One, doing our own solution, which was basically checking what open source libraries for image comparison were out there and then incorporating that in our tests. So we also have tests in JavaScript so that was okay. And we were using WebDriver.io. So this is the normal test like what we saw in the video. And WebDriver.io, as many other frameworks, has hooks to do something after the test, after every check. So what we're doing there, you don't need to read the code. What happens there is that we take the screenshot, we take the file that is used as the baseline, we pass this as a parameter to another function that is basically the JavaScript library found online in GitHub and we ask for, give me the difference online for team. And the method here is doing an image comparison, plain raw image comparison. This image library is pretty good because there are a bunch of options that you can specify. I omitted them because it was quite big. But in general, what happens is that when the comparison is done, you get that JSON payload back with some description. And the important one is this one called mismatch percentage. So if it's bigger than zero, there is a difference somewhere and I need to decide what to do. I need to put this into the report, I need to trigger an email, I need to do something. Okay, but this was a proof of concept. So basically in that point, we didn't need to decide what to do, but we were just doing a proof of concept at that moment. Then an example of this, this is the detailed page of the article, then we run the test again and we had some differences. The colors of the article were in a different order, so we got a visual difference. With the final part of the checkout, the basket in this case, there was a visual difference because the main important part is that the recommendations loaded faster the second time we run the test. So now these are things that we have to think and adjust our existing tests to put away. So we wait that the recommendations are there and then we take the screenshot. The part that was really, really tricky that we didn't figure out yet how to continue, that's a part of ignoring it, is the scroll bar. Because the page was now longer, the scroll bar was different and then there is a visual difference there. So from doing your own solution, the pros are that you own the code, you can do everything you want, it's your own solution. And many libraries have tons of different configuration options. So if you have tests in JavaScript, you can reuse them because it's just a call to the library. And many languages also offer the option to call JavaScript methods. Here I'm highlighting some libraries I found that are active and maintained and people respond to issues. That's what I found because there are many, many more. And the cons is that you have to maintain more code now. You have to handle this comparison by yourself and you have to document it properly because when you leave the company and there is an error there, who's going to solve that? You have to read what is happening there. And also you have to handle the false positives by yourself and especially you need to write or establish some ways to handle the baselines. Because when there is a difference, you need to say, okay, this is a valid difference and this is a new baseline or this is a bug. So the next one we were checking is this tool called Gemini, which is an open source tool as well. And I'm just going to show the video briefly. So this tool has a configuration file where you specify things like where the grid is, the base URL, what is the size of the viewport, what browser do you want to use, and you can put some plugins that are provided by the tool. The noton I think is that you have to rewrite the whole test. I mean, the DSL they have is like their own one and there is no way to go. But if your tests are simple and concise, maybe this is not an impediment for you, because in general it's like drive to the place where you have to take the screenshot, take it, and that's it. They have this really cool, we can call it like an HTML runner that you have a local server where you can actually trigger your tests and you can see what is happening. The single check is highlighted here on the left and you will see what happens when the screenshot is taken and compared to the baseline. So in a few moments, this will pop up like a screen on the, like the screenshots, like the baseline, what was taken in the test, and the difference on the right. So we didn't really worry about so much about the results then because we were trying to verify what information the tool is giving us and how we can act on that information. So maybe the order we can manage it by talking with the developer, the anti-aliasing, we just need to adjust the parameters of the tool, but in general it was a better approach for us in the results part. Not in the test part because then we needed to write again the things. So it was like then putting things on the balance. And one thing that the tool has that is pretty cool is that it already has a flow where we can accept or we try the test so we can say if this is a valid baseline or not. So this is already solving things that I didn't think about when I was getting into this journey of UI testing and of visual testing. Okay, so the pros and cons about this tool is that this tool was done with this in mind. They wanted to do visual testing. So this is already pretty cool because it has features that is it solving things that we didn't think about yet. The cons basically is that it has its own DSL so we need to rewrite things depending on the amount of tests you have. You could balance this and here are the links that are very useful. The next one is that we decided to go and try a commercial solution, ApiTools, and also I have a video of the tests we were doing. This we found that was the less intrusive solution, so by this I mean that our tests didn't need to change that much. Basically, we had to add the dependency of ApiTools, declare the object in the upper part, and then whenever we reached the point where we wanted to do a check, we just did something like eyes.checkWindow and we put in quotes the name of the check and that was it. So at the same time here while this is done, the test is running, so we go to the ApiTools UI in the cloud and we see that there is a test running. So the first time you run the test, it all passes because it will assume that that's the baseline and the second time you see that something failed. So it's not basically that it fails, that it's something unresolved, so somebody has to go there and check what happened. In this case, the main, the home page failed because this is a carousel and the image is changing, so what did you do? These guys have a cool option that you can actually select a region and then ignore it. So this was, I mean we didn't think about it, but this is a feature that we could use. You can also manipulate things like saying the carousel has not to move when I'm testing, but they have cool options there. We could accept the difference right away. It was okay for us and in this case we could say I can ignore or I can actually accept it. So since we were doing a proof of concept just to see what information was out there, I accept it right away and the last one was completely fine. Okay, so that was ApliTools and then this is the last time I'm going to do this. So when I executed the test again, well it passed it because I was ignoring the region. So that was very helpful for me. About ApliTools, again the same as Gemini, the previous tool I was checking, that tool was made with that purpose. So that was extremely helpful because even though we didn't explore all the features, we were confident that some problems that we were going to face afterwards, there were going to be solutions already there like this ignoring region part. We were able to reuse tests and many languages were supported and different validation levels. Gemini can also do this like you can say I only want to verify the layout or I want to be very strict like pixel by pixel, etc. The cons that I found when we were doing this proof of concept is that the test will take a bit longer because the screenshot you are taking is going to be uploaded to the cloud. And you have to wait a bit but it's not that much like a few seconds more but if you are like very in a critical path then consider this. Paid, I'm putting this as a con because as an open source community we engage a lot into open source software. In this case I put it here for that reason but personally I don't think that is a big issue. So this is the same situation when you have a Selenium Grid for example and in that case when you have a Selenium Grid you can decide either you build it yourself like two engineers working five months building something for your teams and then you have to multiply their salary times five months or you just, I don't know, you buy a contract with sales labs or you contact our company. But anyway the purpose is that you have to balance the effort that your team has to do versus buying something that is out there and working right away. And that's the important thing of doing proof of concepts. Seeing what fits in your environment, maybe it's enough to have a grid local, maybe it's enough to have your own library or maybe you have so many tests and so many developers that is just easier to buy a solution. And the validation limits. So in this case, in this case the validation limit is an issue because if you have many many tests then they're going to reach the limit and you have to go to a bigger plan. So let's talk about our learnings after this journey of evaluating different things and what we could do. In general, for layout testing, we found it very useful when the website was, let's call it like in a stable structure in a way that between different pages the content was not changing in a crazy way. Like the layout was kind of consistent and it also allowed us to have more control in the way that the spec files was not going to be like this size. The creator of the framework mentions that it's very useful, mostly in cases when you are doing redesign changes, when you are doing CSS changes. In these cases, many times the locators will change and some other small things that will change that perhaps some functional tests will not catch. But if you have relations between your elements in the website, maybe you can run it when you are doing a redesign. And for that, he described that was very helpful and I completely agree with him. And in last, I think it could be a good option to do smoke testing in life environments. For example, if you have a huge website that is serving maybe not only one country, but like the whole Europe or US and South America, it's very probable that your website is not going to be served by one single server. They're going to replicate your pages across different services like Akamai or something like that and probably some CSS or something is missing somewhere. And that's very helpful to check that North America is being served well, that Europe is being served well, that India is being served well, like Japan is, everything is working in different countries. So it's a good way to do small, small tests in production. And for visual testing, in general, it was a good impression and what we found is that with less code we could do more validations. And that was a bit too tempting that we said, okay, then we can remove a few tests and replace them with visual testing. But then this is a decision that we said we're only doing a proof of concept. We will leave things like that and as soon as things evolve and depending on the info we get back, then we decide what to do. And yeah, in general, we were reaching the point of that this was going to make us feel more secure. We're covering more things with just a visual test. And we also think that they give more value and more valuable information when you combine them with your functional tests, because that's how you can reach the points of your normal user flows. And then that's when you can get the screenshots in the critical parts that should always work for your users. The last part was extremely important because when we were plugging a new framework or tool to the existing tests, this was going to produce more output, more information. And in this case, we were going to get more screenshots in every time, in every time the test was executed. And in this case, something that is very important is to define a criteria to decide what is a new baseline or what is a real difference that could cause a bug. Because if you just say, if I don't define this criteria, then you see tests failing and you are just going to end up having someone clicking on just accept, accept, accept. And when something really bad happens, you end up blaming the tool and not the process. Very often with frameworks like Selenium, Appium and etc., etc., you don't think about how to use them first. And then when the information you get back, you think, oh, this is too much, but it's better just to plan it first and then see how you can use it. And overall, what we found is that there is too much information out there, too many blog posts, too many open source libraries, too many things that if we spend only time doing that, we would have spent like two months only checking documentation. So what was helpful for that was one of the first slides I showed, like what we want. And we wanted three things basically. So if this was making us rewrite everything, this was a no-go. If this tool was not being maintained, a no-go, and so on. So that was a good filter to decide what we could use and what we couldn't use. And maybe this is like overseen often, but we reaffirmed to ourselves the excellent value, the great value of a proof of concept. It's not enough to read documentation and try the example. You have to try to take maybe a small portion of your tests and use that with your tool, with the new tool that you want to evaluate and then you decide if this makes sense or not. Even if it's a commercial tool or an open source tool, ask for a proof of concept of one, two weeks, three weeks that you can try and see if this works in your environment. Because the Hello World example works everywhere, right? Also, in best time, I mean, don't think that you plug this and it's working the next day. Just think that this is like something that you have to learn a bit by bit. And the most important thing is that even if the tool is very simple to integrate, no matter how easy it is, you will get a lot of information back. So the reports will contain images, logs, things to do. And the most important thing that we found is that whenever this new tool finds a problem, you need to have a link to action thing. If it's an image that is not working, then you have to have a link that says approve, project, send an email, call someone, do something, but not just like plain information. And the last one I mentioned already, this is a learning curve. So if you want to add visual testing, I mean just add visual testing from the moment, don't add cross browser testing or mobile. So like start one by one because imagine that you add visual testing and now you got like tons of reports and now you add also Firefox, Edge, et cetera, and you get like that four times. So then you're not going to like the tool at all because you got just an information overload. I just want to give some credit to some people that helped me to build the content and also to develop it. To Leo when we were working there in Salando, happy that I spoke with him several times about this and also to these links that I found online. From all the links you found online, you can find online. These are in my concept, in my opinion, one of the best ones because you have clear information of how things were applied to specific context. And with that, I just want to thank you for attending my talk. Thank you for having us here in India for being such excellent hosts and I'm open to questions. Does anybody want a beer after the one who asked the first question gets a beer? Ah, cool. In visual testing, creating the baseline is the most problematic one. If the first one runs bad, then that will become the baseline and you are comparing against the good run actually. So you need to go and scrap it and I can run it. It's kind of adding more time. Whenever you do visual testing for different tool size or viewport size, you need to create the baseline and then compare it and remove some of the portions. So the amount of time you are spending, are we getting enough ROI on it? What was the last thing? Return on investment, ROI. ROI, return on investment. At the return of investment. That's why it really depends on the way of how you define, how you want to proceed. Because the first time they are defining basically perhaps only one browser, one viewport. But you should try to make solutions in a way that if you have to scale, you don't have to write everything. In that case, it could be as simple as you have to, you need to have different directories for the different browsers and different viewports. One extra thing that has to be decided usually we have this issue is that what or where do you put the images? Do you store them also in the code in Git or SVN or whatever you have? Or do you have a storage somewhere else in your company? Those are things that when you decide that early in the process, then you will see that return on investment when you use the tool, it gets really high. But if you just want to do the proof of concept and with that state you want to continue? No, I mean the proof of concept is going to give you some information back about what was good and what has to be changed and what can be improved. And you need to apply that before we continue. And then you can see like an appropriate return on investment, I would say. Are you suggesting a few visual testing tools like Apple Tools, Gemini? Among all the things you discussed, which one you think like the best? What we started using for the moment was our own solution like plugging the JavaScript libraries because it was a very early stage. I have spoken with my friends over there and they're considering now to evolve some other solution because this early stage has given them so much information that they could already see what they have to implement if they continue with the solution. So they are balancing now if they should move to buy something or to change the tool. Thank you. Oh, hey. One of the areas that I test on my everyday basis is a view of the emails. How do they look? And what do you think about binding the functional tests written in Selenium with tests in Galen, which I actually use. Because I have some scenarios written in Cucumber and once some of the steps are testing some functionalities, if everything is correct, like a content and so on. And one step is running the Galen to check everything by the spec file. I think that's a really great approach because from my point of view, unless you have, I don't know, 1000 templates of emails that are incredibly different between them. I think the layout test is a perfect option there. So it just gave me an idea that I will put it in the next time I mentioned what could be an application of layout testing. I completely agree. And since you use Galen, you know that you can also check little parts in a visual way, like elements of the form. I mean, I don't know what else to add. I completely agree with you. I think it's an excellent use of layout testing. Okay. I mean, basically I wanted to ask if it's better to separate functional tests from the visual tests or the combination of both of these is pretty correct. I am pretty sure that there are scenarios where it makes sense to separate them. What we saw when we were doing that proof of concept is that it gave more value to put them together. Because when we have to take the screenshot anywhere we have to reach the user area, for example, and for that you have to log in and that's already done by the functional test. So I am sure that there are scenarios that it could be done apart totally, but we didn't reach that part to be honest. I'm basically having a pretty simple case in this email testing. Okay. Thanks. Thank you. Thanks. Do we have anyone else? I'm going to be around anyway, paying a beer that I promised. Thanks Diego. That was really helpful sharing your experience and insights into the different UI tools. We can definitely learn something from that. Thanks a lot, Diego, for being here. Thank you.