 So, welcome to the last talk tonight here in the ZKM Open Hub. So I'm happy that so many people are here even though it is quite late and we have trunk at the bar. So I guess that most of you know what this talk might be about. I was quite surprised what it is about because I had no idea that you can do some automatic tests on websites and web-based software. And now talking with Jeremiah, it made some issues clear that if the web designer replaces a button and the automatic software wants to press on that button, that isn't existing at that point anymore. So quite some difficulties. He works, Jeremiah works in a startup in Karlsruhe who addresses these issues, these problems, and they want to harden and improve automatic test systems so that the designers can replace elements and buttons and the automatic test doesn't fail or doesn't break, as he explained to me. So I hope it's, oh, I'm very sure it is going to be a very interesting talk. So welcome to Jeremiah and enjoy. Thank you for the introduction. So thanks for coming. The talk is about what has been introduced, Recheck and the Sorcerer Stone. In case the Sorcerer Stone is a mythical instrument that can turn traditionally stuff into gold, and we want to use it to turn selenium into adamantium, which means that the tests won't break anymore. So if you do traditional test automation, like not even on UIs but on any interface, for instance, in that case, you get some XML as input and you want to remove specific parts of the XML. Especially the remove part should be removed and the key part should be kept, as is obvious, and you assert that traditionally which what is called an assert statement, which means that you say, please make sure that this is true. In that case, make sure that the remove part is actually not there anymore and that the keep part is still there. So this test should turn the above XML into the below XML and just remove that one tech. Unfortunately, if the software is buggy and creates XML that looks like that and that is not even XML anymore, then the test will still pass because we just make sure that it contains the string keep, which is still the case and doesn't contain the string remove, which is not the case anymore. So that case, that test then passes, and this is a problem, obviously. So of course, we can add another assertion and another assertion and quite some more assertions, but at some place, what we really want to make sure is that the result roughly looks like that. So we have a notion, we have an understanding of what we expect. So at some point, it would just be easier to say, make sure that this is the result, right? And this is called golden master testing. The idea that you put away the result at once, like at the complete thing and compare it against the actual result of the output of the software. If you do it on the unit level, you can use a library, for instance, that is called approval test, and that does exactly that. So the first time you run that test, it will fail because then the golden master will be created. And then the second time you run that test, it will pass as long as the output is exactly the same as the previous output. And it will make sure that it's exactly the same. Like really, it will also make sure that it contains the same white spaces and stuff. So if you think about that, in that notion, testing then with traditional approach, like assertion-based testing, is like blacklisting changes because in that case, it could still at some point be that the XML changes, right? You maybe change the functionality or you change the input and then the result should also change. And with assertion-based testing, what you do is you blacklist those changes. And in that case, you have to whitelist the changes. Maybe you say, okay, I don't care about the white characters, then I will ignore them. So at some point, we are talking about blacklisting. And if you think about a test and you don't put an assert, if you don't put any assert at all, if you don't assert anything, then you don't check any change, right? You don't get notified about any problem. In that former approach, what you saw was a complete whitelisting without ignoring any change, which means that you get notified about any change whatsoever, even if you say, I'm not interested in that. And usually, what you want is somewhere in the middle. You want to ignore some volatile stuff like the white space characters or, yeah, so the ideal amount of checks is somewhere in the middle. But the problem is that with traditional approach, with assertion-based testing, you approach the problem from the left, which means that in case you don't check for something, that you will not be notified about it. So there's a problem. There's an example at Google. Google has this funny little thing, a unicorn. And when they change a website, they put the unicorn as a marker for the manual testers. So if you are a manual tester at Google and you see such a unicorn, then you know that this functionality has changed. So you should test it very thoroughly, very scrutinize it. Unfortunately, at some point, someone forgot to remove that marker before the change went into production, which means that at some point, the customers, the actual users had unicorns on their screen because there was no automated test that checked for unicorns, which obviously is the first thing you should check for, right? And this is exactly the problem. You can't put an assertion, you can't write an assertion for something that you don't expect. So if you blacklist changes, if you write assertions, you won't get notified about the unicorns. If you whitelist changes, if you ignore stuff that you say I don't care about, then you will get notified if something very unexpected happens. And this is also the reason why, for instance, when you do firewall configuration, you're well-advised to do whitelisting rather than blacklisting. And also, typically what we observe is that you probably want to check more rather than less. So in a typical scenario, you only have to whitelist a few things that you don't care about and check everything else. Like in the XML example, you maybe whitelist white space characters, but you want to check the actual content of XML, the text. So now, what about websites? Like, if you use Selenium, it's just the same, right? You probably want to check the whole website and only ignore the volatile stuff. And I can showcase that. So let's see if that works. So, does it work? Yeah, okay. So here you see, I can make that even bigger, I think so. No, I can't. Okay, can you see it anyway? Okay, so here we have a typical Selenium test case and we just want to execute it. And, oh, okay. So this is a typical Selenium test case. And now we turn this typical Selenium test case into a recheck test case. So we can get rid of the assert here and we just need to wrap it into a recheck driver. So the original Chrome driver will be wrapped in the recheck driver. And at the beginning of the test, we say start test, at the end of the test, we say cap test. And this is all you need. And now, if you execute that, unfortunately, I did so before. Okay. If you execute that test, as you can see, we are entering stuff above. And the test case failed as expected because it was the first time we executed the test. So we didn't have the golden master to compare against, right? So we couldn't tell about changes. So we just execute it again, just the same test with no changes whatsoever. And now we will see everything that has changed compared to the previous run. And because it's static HTML, we didn't change anything, so the test now passes. And the first time we executed the test, here the golden master was created. So here we have a screenshot of the website just for manual reference. And we have an example containing all the attributes. And now we can compare against that. So if we, this is the website, the HTML, if we now change that, for instance, we change that text to something different. And we execute it. So can we see that? I don't even know what I changed. So now the test fails and says, okay, we changed something here. So let's undo that. Okay. So this is now golden master-based software testing. Now it becomes interesting. Not only can we track changes very easily, we now can make the test unbreakable. Because what we have here, usually what happens if you have such a test and you reference, for instance, a button or a text field by name or by ID, and then that name or that ID changes, then the test doesn't know where to click or where to enter the text. Right? And this is when the no such element exception occurs, the dreaded one. But if we have created a golden master before, then we can, our retake driver, can go to that golden master and can find the button we are looking for. And then we can make one-on-one comparison of all the attributes and find the one with the highest match. Like we can use those redundant attributes. We can use the X path, the name, the ID, and everything else. And it's even better because we don't, not only do we have all the information about that and element we are looking for, we have the information about all other elements as well. So we can compare all elements with all other elements and find the highest match. So even if, for instance, you wanna click the accept button and a lot of those informations have changed, it's still the highest match. So you will still find the old element on the new website or the closest match to that old element. And I can give you an example here. So previously we wanted to click on that ID and on that ID. And now we can go here and change that ID and remove it even completely. And if we re-execute the test, it will still execute. So although that ID changed, it can still find that input element and still find the age element, although it actually doesn't have any more. It can still find it and use it. And here it just says what changed. So here it says the email changed and the age changed, but the test still executes. Now even if I'm not interested in those changes, I can completely ignore them. We implemented Git-like ignore syntax. So you can just specify which attributes you wanna ignore. So here I wanna ignore all attributes. And now if I re-execute the test, it can still execute. And now because I ignore all changes, the test is green, it passes. Now I can do something very interesting because now I have the problem when I update the golden master for which we have a CLI by the way. So I can use a CLI here to update the golden master. But when I do that, then the test breaks because then it can't find the old information anymore because when I update it, the old information is lost, right? But I can use a virtual ID. We can use what we call retest ID. So this one is for, let me see. The email, so for the email, here I now use the retest ID. And that retest ID is not on the actual HTML. That retest ID only is in the golden master. And therefore it's not affected by any changes on the HTML whatsoever. Now I can re-execute the test. Obviously it still fails. But now if I update the golden master, the test will still execute. So this way, this virtual ID that is not on the HTML, like I can show you again, this virtual ID. So this retest ID only lives in the golden master. It's not contained in the actual HTML and therefore is not affected by changes. So I can change the HTML however I want. I can change the X path, the label whatsoever, the test will still execute. And with that, I just made my test unbreakable. And maybe some of you know that sometimes it's a problem if there are no IDs or no good X path expressions or whatsoever. So if you wanna reference a certain element within Selenium, you have to choose a selector, a criteria for that specific element. For instance, here it's a CSS selector that is rather ugly. But it works apparently for that specific element. And now we don't need to do that anymore. Now you can just use the retest ID, which is a virtual ID that's not contained in the HTML, but you can use it to reference your attribute. And you can make it, you can give it any value you want. So you can change that to any value, like even a sensible one, as long as it's the same value that you're using here. So those two have to match. But otherwise, it doesn't matter what string it is, right? You can give it any semantic ID. And this kind of means that, so the problem with testing is, as I mentioned earlier, that you check changes, which means that test automation or regression testing isn't really testing, because you're not looking for bugs that are in the software already. Like if you have a software that you gave to the customer and the customer is using it, your test automation won't find bugs that are in there. What your test automation is doing is guarding you against introducing new bugs. So your test automation guards you against breaking existing functionality against regressions. That's why it's called regression testing. And in that regard, it more resembles a version control system than actual testing. And what we did with that is we perfected the version control system because a traditional version control system, like this is a paradigm we know very well, right? It's been invented in the 90s and been used since then. The latest and greatest is Git. And what it gives you is control over changes of static artifacts. Like if you change a code file, if you change a configuration file, then Git, for instance, or SVN or whatever you use, will pick that change up and notify you about the change. However, usually we are not so much interested in the static artifact as much as in the actual code, sorry, in the actual software, in the actual program. However, that is something entirely different. The software that executes is more than just the code. It's also the data, it's also the configuration, it's a runtime system. And this is the reason why a single change, a small change can have a huge impact. And unfortunately, traditional version control systems don't allow us to manage the changes to the actual software. They only allow us to manage changes to static artifacts. And we use test automation to close that gap. So with test automation, we take the dynamic execution of the software and turn it into code. And then we can govern that static artifacts again by version control system. And now with recheck, we bring that to perfection in doing that directly. We, with a GitHub, with a Git-like syntax, so as I showed you earlier, with a Git-like syntax, you now can ignore changes that you are not interested in. Like if you don't care about JS data, if you don't care about, I don't know, the ping attribute, you can just ignore that. You can even do so for specific elements that you can reference either by XPath or by the retest ID or by any other element reference. So these attributes are ignored globally and these attributes are ignored for that specific elements. And now, retest or recheck will notify you about any changes that occur to these elements. And the artifacts that are created, like the XML, can be governed again by Git and GitHub. And therefore, even like merging and stuff works very well. And you can now use the CLI that comes with retek and apply those changes. So for instance here, we created a report by executing that. So here it says that the report is created, sorry, that the report is created at that location. And now you can use recheck CLI to apply those changes. So what happened now is that the XML was updated with my changes. And now I can re-execute the test and it will, sorry, where is it? Of course I have to apply the other retest ideas before. Sorry, that's the wrong test. Yeah, am I? So, and now I can execute that. What did I do? Screwed up. Okay, okay, let me undo that. So now I can apply those changes again and now the test is green again. Right. So, I think I'm a bit, I finished a bit earlier than I expected because I thought that life coding would take more time. So I'm already open for questions and answers. That was quick. I'm sorry, I expected that the life coding would take longer. Hi, thanks for the talk. Of course I suppose these unbreakable test cases come at the price of losing confidence in the tests because you maybe are not sure if the guessing was right, right? So, at least if I understood this correctly, it's sort of a trade-off for testers who are too lazy to update their test code, which broke because it didn't find the right ID, versus losing some confidence that this guessing of what the new ID is that should be filled in the test is looking now. Do you have any, I don't know, empirical or experience based on that, how well this works in practice? So are there often cases where this suggests that the regression test is still working while in fact it took the wrong field or did all the experience show that it's really working out nicely? So, thank you for the question. So, to recap the question, what we do is at some point a bit of a guesswork. So we create a one-on-one assignment of all the elements and if something drastically changes, that assignment might be wrong. So what we do is we have a threshold of how well we think the assignment worked out, like if you change everything, then we have a low confidence in that assignment and then the test actually breaks. So we put that number to 30%, like if there's still a 30% match, then we can find the element and we will show you later, like after a change, you see what changed in the test. So the thing is that the test still executes, you get to the end of the test and then we show you all the changes. We show you what label changed, what ID changed, what task is changed, what style changed, whatever. So you actually get more confidence in the overall test because you get shown all the changes. And then if you apply those changes, those were not regressions, those were actual intended changes and yeah, the test still works, the test doesn't break. But of course, if there are too many changes, then the assignment gets below the threshold and we don't do it. So it's only critical if you test live systems. Like if you have a button, delete database, you don't want that button to be pressed during test execution, right? So in that case, you probably shouldn't do that. Then you would want the test to stop or have a very high threshold for that confidence, right? But otherwise, the test executes and you get shown all the differences and then if you say this is correct, you can apply those differences. Does that answer the question? Yeah, basically, so you have an additional step, a manual step where the tester confirms that these changes you made to the test case were actually right. So some human has to look at the changed test case. Well, if you, let me show you, get back to that. Sorry, I can't execute. I can't, what's happening here? I'm sorry, I can't go back to the coding anymore. I wanted to show you. So yeah, after executing the test, you are shown all the differences. Like at the end of the test, so there's no, the manual step is after the test execution, not during test execution. At the end of the test execution, you get to see all the differences that you didn't ignore. Like there's like an ignore file when you can say I ignore ID changes or whatever. But all that weren't ignored, you can see. And if you want those changes, if those are intended changes, you can use, for instance, the recheck CLI to apply them to say, okay, update the golden master. These are intended changes. Does that answer the question? Okay. So these recheck IDs, they are generated while building the golden master. But if you rebuild the golden master, how are they generated or are they the same? Are they the same? So the golden master is not entirely regenerated. We have a golden master that we apply changes to. Like we compare the golden master that we load from the last execution against the current execution. And only things that change, like if a style attribute changes or an ID or a class, that then gets locked, right? But the original golden master is not regenerated in a sense. Okay. So the ID stays the same if you don't change about the threshold. Right. All right. Any more questions? Hi. I was wondering, so you talked about the threshold for checking if it was changed or not for an element, which is 30%. Is it something that can be changed if we want to be more permissive or less permissive? Like to put it to 50% or something? Yes, so you can configure it. The 30% is a default and we just realized that it actually makes sense to have this threshold dynamically calculated. Because for instance, if a lot on the website changed, like if the website changed completely, then you have to be very sure that your element didn't change, right? But if the websites stayed the same, like if there were almost no changes except for your single button, then your threshold can be lower, right? So this should even be a dynamic threshold, right? But right now you can configure it. Default is 30 and you can set it to any value you want. Yes. Any more questions? So is it an open source project or is it still, okay? I mean, is there already a community behind it or how old is it? It's very, it's kind of very new. So we started this at the end of last year, essentially. And what you saw is all usable, all live, but we're still working on it. In terms of, it consists of mainly three elements, which is recheck as a main library because you can implement it for web, but you can also implement it for XML or for, I don't know, log files or whatever every interface that you wanna use for gold master testing in general. And we have the implementation for a web that you just saw and the CLI, which allows you to update the golden master. So these are the three components it consists of and our role model is Git in that situation. So we are a startup and we give that away for free. That's open source, you can just use it. And we like Git and GitHub. We create a service where you can store the golden master files if you want to do so. But they are created also locally so you don't need to use that if you don't want to. So it's completely open source. Do you have some experience about very dynamic websites as well? So often it happens that you click on a button and stuff is changed with some timing and so on. Do you have solutions for that as well or is it at the moment more or less getting a golden master that is the complete page and no fragments? So right now we concentrate on that part but we have some people that are using it have come up with good ideas because we know how many elements roughly should be on the side at the point of checking. So we could also use it very well for the timing problem. Like if you know there should be 300 elements roughly and there are only 100 then you know that you should probably wait longer. But you can have like 10% more or less like if there's 20 elements more or less it's not that bad. So there are ideas but right now as it is right now it's only for checking gold master wise. Any more questions? Okay, so thank you very much for coming and yeah, we are very grateful if you wanna try this out and give us feedback or even contribute to it. Thank you very much.