 Thank you very much for coming. My name is Łukasz Wczisław. I'm here to talk about collation analysis in automated testing, as you can read. At the beginning I want to introduce myself, tell something about who I am and how the history of my education moved me to such a spec of this presentation. Formerly I thought about something slightly different, but it evolved into this. Then I will tell what was the problem and what was the purpose of this research. I will tell you about some map that stands behind all this thing and I will go to the use cases and some conclusions, maybe some anecdotes. So first of all, what was my main purpose? I was a former computational biologist, bioinformatic. I was a person who used very strong supercomputers to simulate biological processes like protein folding, in example. The most problematic issue with such research is that very pronounced to any performance issues because there are billions of billions of particles that can be connected to each other in many ways and can have influence one or two in each other. The performance was crucial. As I moved to informatics to IT, I started to be a tester in 3MD app in Poland. I started to work with big test suits, not very big but big enough to consider the speed of the test, the time that it would be done and the resources consumed are important. So I thought about making it some more reasonable to do some optimization of this test. And I think that nowadays it's a big team, machine learning, so why don't all the tests learn one from each other? You know, from the history, one test will know that other tests was run and almost the exact time if some other tests failed, this test also failed or if some other test was run several iterations and it couldn't reach its expected outcome, the other test would be either shortened or not done. Then I realized it was better from the beginning because if tests are correlated in such a way, the test suit is badly created. It has some wrong assumptions and the test conditions are badly formulated. So this is my assumption. I don't want to read it again. I want to make test suits more elegant and save some time and resources because for the perfect test suit, there should be no correlation between the tests. If there are some correlations, tests should be, I don't know, passed because we don't need to be made, then that might be some value measured that can even benchmark test suits, the correlation coefficient. Let's see for an example. There are tests in rows and software versions in columns. Green is passed and red is failed. It seems obvious. I considered tests at Boolean functions, which are true or false. Normally, when you compute correlation coefficient, you need to do regression first because it's Boolean function even from zero to one. That wouldn't make any sense. So what I took was probability, historical probability of passing the test. That was the value that I was measuring if the test is below that probability or above that probability. Next, what I've done, it's obvious that tests that have 100% pass it doesn't really matter for us. Just the same as software version that fails all the tests. You know, if we have power issue at the moment of testing, it doesn't really matter. It's not matter of tests. So I left only the meaningful test cases, meaningful tests that have sometimes failed and I created a covariance matrix from them. This may look suspicious, but it's really very easy math. It can be done in Google spreadsheets. From this, I've created a covariance matrix, just as I said before, where on the diagonal is just a variance of each test that is meaningful and the other fields are covariances between the tests. Next, I computed a piece in correlation coefficient. It may look scary, but it isn't. Trust me. And create a correlation matrix. What we can see above, it's just a shape of the test outcome that was from the beginning that was our input and below it's scattered by me. It's only half of the matrix because to be clear for a clarification, it should have on a diagonal one because in each field on the diagonal because correlation between a test and the same test is one, the coefficient is always between minus one and one. Minus one is negative correlation, if something passes, the other thing fails and one is positive correlation. If something fails or passes the other test, the value is always the same. Zero means no correlation and in a big test suite or many test suits aggregated in regression, the real values are near zero, plus or minus per new zero. If something is like in this example, I don't consider one because it would be obviously error, but 0.5 or something like this, it is significant. We can do that just to validate the test, but it wouldn't have much value for us. What can have value? We can, with each regression in continuous integration, continuous delivery, we can test each version and check the dynamic of changes of this coefficient. If we can see that some values that shouldn't be connected seem to be correlated, correlated from some version in the past, correlation is getting bigger with each iteration, we should see yellow light and check whether the conditions of the tests are prepared correctly. This is a real test outcome from, we are in the free and the app maintainers of Swiss firewalls firmware of PC engines. I hope maybe some of you used these routers and this is the test results of the firmware of one of the lines of the one of the routers. We are maintaining them for above two and a half years, but these tests are not a good example because our tests are evolving too. Some tests were created and I don't know, started to be used half a year ago, so we have not completed large enough base of tests for long enough history to make this really useful, but large silicon vendors or hardware producers, I suppose it could be useful just because the very often cases that one person or one team is responsible for one suit in regression. All in all, they might not be aware of correlations between the regression suits. So in the conclusion, the dynamics can be useful in large sets with a large history and this is a proof of concept. Do not consider this to be a white paper, something like this. Idea is quite new and we started to work on this more seriously on our firmware, with our firmware, but this is an anecdote as comes quite, it may be some of the some of you know it, have heard of it because these four sets of data have the same as a certain below mean of x and y variance of x and y correlation and so on, although they are obviously different. So we have to be very, we have to look, I don't consider the statistics to be a red light to be fail if something rapidly changes and something is rapidly grow because it may be some from some other reason, but yellow light should be considered visible if something occurred like this. This is bibliography and thank you for your attention. I don't, do you do any computations, all the computations are done in google spreadsheets. Thank you. Yes, I heard about it, but this is a new concept. We are just looking for ways that could improve our test suits and I think it should be considered. Thank you. Becky, pardon me. I can't hear you, excuse me. People who are watching the stream have not heard that, so you should probably mention that there is a thing called mutation testing. Yes, have I heard about the mutation testing because it can be used, it can simplify all the computations that we used here with statistics. Yes, we've heard about it, but it's in progress. We are looking for new ways to do it and I think it be considered. Thank you. Thank you very much for your attention.