 I'm not going to take a quiz at the end, so don't worry. But let's get into it. I see quite a few faces over here that I haven't threatened to attend, so that's good news for me. There is a big section of the population that I went around and threatened and made them attend. So people who learned something in the last two days, I'm going to take pictures of whoever doesn't raise hands. There you go. OK, I hope that you will learn something more from my session, right? This is who I am. My name is Bajoia Tatiti. I work for Sony PlayStation. Another simple poll. Who knows about Sony PS4? There you go. And I didn't even have to threaten you. See? Love you guys. Love all of our customers. I have been working at Sony PlayStation for four years now, or so LinkedIn tells me. I live in San Francisco, not in the city, but in the suburbs of San Francisco with my family and my innumerable pets. And I have been passionate about UI automation for a long time. True story, I wanted to take out the number of years, and I forgot about it in the excitement of attending the conference. I did not want to date myself with the number of years I've been working, but let's just gloss over it, OK? So I am not going to get into the topic of manual versus automation, right? That is a topic that is a debate that is very near and dear to most of our hearts. Everybody, I think, in the QA community has a deep opinion about whether manual is a good versus automation is good. I am only going to talk about what worked in our company, in our environment. About four or four and a half years ago, we were trying to get into a software cadence from three months to bring it down to maybe once in every two weeks. We were trying to get to be able to repeat our promise to quality to our customers without fatigue. We really didn't have the resources to have the army that was needed to be able to go through the entirety of the manual regression. At that point, I think we had 10 applications running on the PS4. Each of them had thousands of test cases to grind through. And that's why it took us three months to be able to release one version of the software. And we really wanted to get into the habit of releasing faster, getting good quality code out faster to our customers. That was the value add that we wanted to bring in. And we had no way to do it with manual testing. We had to go the automation route. We realized that manual was not working for us. This is my favorite statement about automation. This is where my beliefs root in. When we started doing automation, in the beginning of when we started doing automation, we used to think it's auto magic. You write a script, you check it in, you start running it on Jenkins, boom. Everything happens. It should pass. Everything is good to go. What we realized through the journey, what we realized through the journey of maturity in UI automation is that it is not auto magic. It's simply automation. It's simply software code. Just like any other software code. This is also software code. And it needs the same kind of attention to quality and detail that any other software code would need. This is our, it's a little worked on the left hand corner is testability. So this is our personal experience with test automation stability. The tests are very stable if you have a very small suite. If you have five test cases, 10 test cases in your smoke suite and you're running them, they're very stable. You have complete control. Very soon with our team of testers, we were grinding out, churning out thousands of test cases per application. And the stability of the test cases took a nosedive. It was pretty much unusable. Everything went into red. There was no predictability on what would pass, what would fail, but then we had a lot of automation. We had put in a lot of effort, created a huge suite. In reality, we were doing a lot of work, but it just wasn't giving us the return on investment. Now talking on return on investment, this is what our regression analysis showed us, right? The return on investment is that small green slice in the pizza, which is 3%. That is what regression was catching as the actual product defects. The rest of it, the big red part is where testers were going in and using wrong choice or bad choices for test development. I am guilty of using sleep sometimes in my code. I'm pretty sure most of us have done it at some point in our careers. There are many other things. We hard code stuff just to make it work. We put in incorrect weights. There are quite a few things. There is a range of test script issues that can happen. And that was the big chunk of what was happening in our regression suite. The other bigger chunks were environment issues and network issues. Things that we said we didn't have control on, right? We start running a test suite. It depends on a content or it depends on a product to show up on the page. It doesn't show up that somebody has gone in and changed that product or the content. The content of the page has changed. Somebody has kicked off a performance run and the network is completely clogged up. So our tests basically don't load anything. That is what was happening between the environment and the network issues. The result, of course, was ignoring the results, right? How many times have developers come and told you forget about automation, just run quickly. Run a quick manual run through the bill and tell me whether it's good or not. Quickly just tell me, right? I got that request pretty much every time we were sending out a release. Every time I would come in and say, I have a regression run. Wait, give me three days to analyze the results. The release manager would tell me, forget about it. Just ask your team to run through it and tell me whether it's good or bad, right? That's what was happening. The entirety of the effort that we had put in to create automation was just down the tubes. It was just getting ignored. Nobody cared about it. We went in and analyzed what was wrong with automation. What, why wasn't it adding value to our software release cycle, right? And we realized that there were three primary problems with our regression suite. It needed a lot of effort for stabilization. The amount of effort that we were putting in writing the test code, we were putting the same amount, if not more, towards stabilizing the test code. We were also, it was obviously a high maintenance project. The number of people that we had writing test code was equal to the number of people who were trying to maintain the test code. And obviously, it took one day to run regression and three days to analyze results. So it wasn't really helping anyone. We worked out into why we were seeing such bad results. Why, what was going wrong and what the primary problems were. Remember that pizza that I showed you? And we realized that the three primary problems, the three biggest problems that we had was the badly written script, the environment issue and the network issue. And I'm going to keep on working on these, sorry. I got a popup and I knew that something wrong would go, something would go wrong during my presentation and it has. So let's talk about these three buckets. This is going to keep on coming up through my presentation. So it's a repetitive version that I have of these three buckets because these were my three biggest enemies during my automation journey. So what was the solution? We started a test run, we made small changes, we checked in the changes, it ran for some time, it started failing again, we pulled it out again, we made small changes again. I started looking like that or very close to that. That's where TSVS comes in. It's a little hard to get off your tongue, TSVS. I have colleagues who still say TVSS, TVS, something like that. It's a little hard. It's basically a service that tests the validation and the stability of your test suite. It's nothing more than a service that validates that your test suite is stable. And the two pillars that, and the primary things that TSVS relies on is the fact that it is self-contained. So whatever it needs to be able to successfully validate the testability, it has in its environment. It does not need to rely on something external. It increases automation reliability. I apologize for the bleeding out of the screen. It increases automation reliability. That's its primary problem. And it helped us achieve 100% automation. Now, again, 100% automation is, again, a very debate-worthy topic because as soon as I say 100% automation, there are quite a few people who say, what happens to those very corner cases that never get automated? Well, those are non-automatable test cases. Anything that can be automated should be automated and that's what I mean by 100% automation. Just reiterating what TSVS does, it is a service that integrates into your CI. It runs our TSVS service, runs as part of our Jenkins jobs. And I'll show you how it runs. It checks for the stability of the test suite and it continues to monitor the health of the test script. It just does not check it once, but it continues to monitor the health of the test script every time it runs, okay? And then if it finds that there is something wrong with it, it takes corrective actions. It takes automated corrective actions. These are the two pillars of TSVS. And I'm going to, my stand is that these are the two pillars that any software should be sending on, testing and maintenance. We would not dream of sending out dev code into production without testing. As a QA community, we wouldn't have a job if that were true. So why do we send out test code into production, which is regression runs without testing it? And why don't we factor in the maintenance that is required to maintain the test code? It's software, it's the same thing. It needs testing and it needs maintenance. So let's look at the testing part of it first, right? What happens during testing? This is where I need you to stay with me. This is what I meant when I said stay with me, right? There is a lot of information that I'm going to pack over here and this module of the service runs before the test gets merged in. So every time a PR comes in that has test code change in it, we write all our test code changes in Python. So every time a PR comes in that has got pi changes in it, this module will get kicked off in our CI. What it does is basically the first step is static code analysis. Most development code goes through static code analysis. So it's obvious that test code should go through static code analysis also, right? Static code analysis for those of you who are not very familiar with the term is basically analyzing the correctness of the code without executing it in a static format. And so we lint our code and make sure that there are no basic fundamental programming errors. That's the first step. The next step is runtime. And this basically checks whether your test case, the one that you've written, is granular enough or not. We don't want small test cases as small as unit tests. We also don't want test cases that run for like 15 minutes and go through eight different screens and 50 different assertions. It's bound to be unstable. So by checking that the runtime of the test case is at that sweet spot that is stable for your application, it tests that your test case is granular enough. It's not too small or too big. By default, in our environment, it is said to be too. The next one is failure rate. This is an obvious one. You don't want to check in test cases that are not passing even during testing. So before you check it in, run it a few times. By default, we run it five times, but run it as many times as you want to and check to see that it's not failing while you're testing it. Parallel runs executes the test case on multiple platforms. Remember those times when test cases failed because you only wrote it for Chrome? And then after you checked it in, it was supposed to run on IE, Firefox and Chrome. And the Chrome one passed, but then IE and Firefox failed. So this will make sure that that is not happening. That the tester has not inadvertently only written a test case for a particular platform. It will take the test case and run it across all platforms that the application supports and make sure that everything runs and passes. And then there is sequential run. So while you may have written a terrific test case that runs every time, that is perfectly granular, that runs every time and passes every time, runs on all platforms, you may have inadvertently done something that messes up the environment when your test case cleans up. During the tear down method of your test case, you could have deleted an account that other test cases rely on. So what this will do is insert your test case into a small set of other test cases and run that set to make sure that not only just your test case passes but everything else that runs after it passes also. It's basically making sure that the setup and the tear down of your test case has been correctly implemented. So all of these things need to pass. Every time a PR, every time a test PR is raised, all of these modules will run. Some of these modules can run in parallel and then everything needs to pass before the merge button is enabled. So if any of these modules fail, then the merge button is not enabled. You cannot check in if testing of your test has failed. It raises an error and the tester who's written the test case needs to go in and fix it. So we are done, right? What have we gained out of this exercise of testing our test case? We've obviously written well-written test cases. We've been forced to go out and make corrections to those small changes, small mistakes that we may have made in our test case. We've gained a well-written test case. It's granular. It's hit that sweet spot of granularity that your application needs. It has stability. We've run it multiple times. And the local environment, it passes on my local machine. I don't know why it doesn't pass on Jenkins. All of those things are taken care of. We've all gone through those motions of emotion. Written those, remember those three buckets. I'm gonna keep coming back to those three buckets because these are, these were my biggest enemies. It threw my automation journey. Those that badly written script environment issue and network issue, by the testing phase, we had taken care of the badly written script. Most of the normal errors of the badly written script were eradicated. We had done, right? We've written a good test case. It has passed. The testing phase we've merged is enabled, so we are happy. We get to go home. Again, I didn't look quite like that, but this is the best picture I could get of relief. This is how all of us felt when we could ultimately merge in our test cases. But this is reality. Even with all of that testing, the number of test cases that were passing in our regression suite increased dramatically. If there were, out of the 1,000, if there were 500 random failures earlier, the random failures had come down quite a bit, but it was still not generating the green bills for us. We were not being, we still couldn't get that 100% confidence. We were still wasting time in analyzing the regression results. The reality was that it still didn't do the entirety of the job for us. It had tested out test scripts, but it hadn't done everything that we needed it to do. Now comes the maintenance phase, right? We've tested our test case. We've merged it in, but we need to keep on maintaining that piece of software. We can't just let it go to the boons. We can't just let it go run wild and never look at it again. And we need to be able to automate the maintenance of that test suite as much as we can, because if we are burning human resources behind maintaining, then we might as well have gone back to that shampoo loop that we had of making small changes and merging in. So what does the maintenance phase of it do? It runs as part of the nightly regress, again, it's a CI integrated service. It integrates with the CI and runs as part of the regression, nightly regression runs, okay? It identifies fail, why a test case is failing. It identifies why it fails and then takes a decision of whether it needs to retry it or not. And then once it has analyzed the failure, analyzed why it failed, it will take a call of whether the test case is failing because it's a flaky test case or not, and then take corrective action towards a flaky test case, okay? So going into details of test maintenance, again, a lot packed into this slide, right? Sorry about that. First is the test execution. In our environment, at least what we realize is that since we had such a large suite, since we had 6,000 test cases running, if we let the entirety of the test suite run in one go, it took about 13 hours end-to-end test cases, 6,000 of them to run. If we let the entirety of it go, it would definitely, we would definitely start to see environment degradation at some point. So what we realize is that if we break up the test suite into smaller chunks, we call it chunking, very innovative. If we break up the test suite into smaller chunks and take corrective action in between those chunks, say reboot the machine, check to see that all the accounts that the test suite depends on are present, basically do a refresh of the environment, then there is a much better chance that the rest of the chunks will run and pass, okay? It also makes sure that if there is something that went wrong while a particular chunk was running, then only test cases in that chunk will fail. The chunk will eventually time out and then we refresh the environment in between chunks so the rest of the chunks will pass. It's not having a degradatory effect on the rest of the test suite. The next one is log analysis. This is important enough that I'm going to dedicate a separate slide for it. What we have done is basically create a library of all known errors, and obviously this library is a work in progress, right? We keep seeing new errors in our environment, errors that are coming back from a server, API that is unresponsive or not sending back the right results, something that's gone bad in the environment, some credit card that we are using in tests that has hit a rate limit, something like that. And we identify those errors and we keep adding it to the library and then during failure, we check the error logs to see if it is hitting a known error or not. If it is, then that test case is marked as an error as opposed to a failure. We know why it's erroring out and it's not really a failure. It's not a test case failure or an application failure. It's an error that is happening. The next is the retry. Once we've identified that a test case is erroring out, we've basically proven to ourselves that this is an environment issue and has more or less nothing to do with the application or the test case. So we will run it against mock services. We know that this can be, we can get over this hump by running it against a mock server, right? So anytime there is an error, the retry will automatically run it against a mock service. It will also update the test summary to say which one was an error, whether it ran and passed against a mock, et cetera. It will update all the test summaries and then it will retry the test scripts that are marked as not as an error, a failure. That could be an application failure or a test failure. It will retry those test cases but not against mock services. It also generates a heat map. So basically the heat map shows those areas of the application where test cases are more prone to be flaky or failing. So while we go through the retry logic, while we are retrying each test case, we are basically collecting all that information to see which part of the application has more propensity of test cases failing. We obviously need to put in some logic to see why that is happening. The heat map basically provides those heated areas in our application for us. When we've identified a flaky test case, a flaky test case is that where there is no way of predicting whether it's going to pass or fail. A failing test case, if it's an application defect and a regression that is introduced by development, will always fail. However, if it's not an environment issue, if it's not an error test case, it will fail sometimes and pass sometimes and that's a flaky test case. We can almost say that we don't know why it is failing. And so every time we identify a flaky test case, we will auto disable that test case and auto log a defect against it. So we will go write a bug against it in JIRA and put the same priority as the priority of the test case. So if a P1 test case got disabled, you'll get a critical defect against your application and it'll get assigned to the author of the test case. So the next day when the application team comes in and triages their defects, they have a critical waiting for them that needs attention. That also forces the application team to make sure that it is addressing each flaky test case. It's, you know, most of the time when we don't force this action, application teams know that there are flaky test cases and they keep on ignoring. So this is forcing the application team to take action. And then the last part is reporting. This is very important. You've gone through a lot of very complicated logic. In trying to maintain your test suite, you've done a lot, you've identified error versus failure versus flaky versus disabled test cases. There's a lot going on. You need to somehow summarize all of that information along with the heat map that you generated for your application, summarize all of that and put it in a dashboard for visualization which is easily consumable by the application team. Where the application team can come in, understand what is going on, take action where it can and then try to get an RCA of the things that are not easily understandable. My favorite three buckets again. The badly written script was already taken care of mostly when we were testing our test cases. The maintenance part of it takes care of the environment and the network issue. So my three buckets just got emptied. Summarizing the TSVS, the Testability Validation Service, it is an amalgamation of testing and maintaining your test code. Again, you would never not test or not maintain your development code. Test automation is just like any other software. It needs testing and it needs maintenance and you need to do the same thing with your automation code. And that's what TSVS does for us. Remember that pizza where the 3% was my ROI on all the automation that we had written. After we put in TSVS and after we'd gone through the motions of fine-tuning it, the pizza looked much better. The ROI looked much better, okay? Now, what did we realize? What are the things, what is the main value add that we got out of TSVS? Obviously the primary value add is stability. The number of automation runs that had now turned green was more than 85%. So we were actually seeing green bells. When there was a regression failure, it was taken seriously, it was all hands on deck. We would get developers get involved with what the regression failure was the night before. There was automation results that were not getting ignored, okay? We were also being able to do more automation in less time because we were not wasting that much time in analyzing results, trying to maintain test code that was really knee jerk reaction, reactive knee jerk reaction. We were getting more done in less time. We had successfully downgraded our release window from three months to one week. We had actually outdone the ask. The ask was for two weeks and most of our applications were now going out in one week. That was simply because we had reliable automation that ran overnight when none of us were at work and then gave us reliable results the next day. There was very little manual intervention that was needed and obviously easier defect identification. We were not waiting till the last minute when most of your test cases and automation fail, you will and you ignore most of your automation results. You also tend to miss the regression defects, the critical regression defects that were introduced. So by making our automation stable, we had basically, I created a way where if there was any regression introduced, it was getting identified easier and faster and earlier in the cycle. It was definitely not leaking in into the customer environment. And we had maintainable code. We were not, I didn't look like that guy with purple eyes where I was just wasting my time maintaining code. Now, there are quite a few things that are still work in progress with the SVS. It's not done. There's always room for improvement. We are, remember that part where we were trying to check whether the test case that you're trying to merge in, where while it runs and passes, great. It is not messing up the environment by inserting it into a set of other test cases. That is a random choice. We basically go in and randomly choose five or 10 test cases out of our regression suite. What we want to do is intelligently choose that set, intelligently identify the test cases that will have the biggest impact by this test case getting merged in and then run this test case in that set, more intelligent picking up of the test cases that will run as part of that sequential execution. We also, when we identify a flaky test case, we disable the test case and log a JIRA. But then after that defect is resolved, that enabling that test case back is a manual process. The tester needs to come in and enable the test case back and merge it in. We want to do the reverse integration of that. Every time a defect is resolved, we want the test case to get enabled by default. We also want to eliminate chunking while it was good and it works. It also takes up a lot of time. When we are breaking up and refreshing environments between smaller chunks, it takes some time and increases the execution time of our run. So we want to eliminate that and we only want to refresh the environment when we have identified a problem that can be solved by an environment refresh. So we want to put in logic that identifies whether the environment needs a refresh and then refresh the environment. And of course we want to have zero variability threshold for flaky. We want to get to a point where we don't need retries to identify whether a test case is flaky or not. We want to completely eliminate that. Any time there is a failure, we want to get to a point where we have identified whether it's an error or a failure and whether it's a failure because of a regression introduced or a flaky test case. These are the improvements that we want in our service. Coming back to my mantra that I started off with, it's not automagic, it is automation. It's just like any other software development cycle. It needs testing, it needs maintenance, it needs a lot of attention from you and a lot of intelligent attention to be able to maintain and to be able to not just maintain the test code but have reliability on the test code. Okay, that's all I had. Questions? Thank you. You guys not only showed up without threatening, you also clapped at the end, so that's very good. Hi. How long it can take for a pull request like the one that we saw to be approved to run all the tests? So we try to make sure that the testing phase does not exceed 15 minutes. So we parallelize some of the modules that need to run and make sure that it does not exceed 15 minutes. So the amount of time that a pull request will wait for this module to finish is a maximum of 15 minutes. But in reality, a pull request waits for much longer than that to get code reviewed. No, we haven't. That's a good point. We don't triage defects by automation yet. We don't triage defects automatically yet. Triaging of defects is still a manual process that is team-wise. However, the defects logged out of this system are already grouped according to the severity of the test case that raises it. So if there is a P1 defect that gets raised, it will get attention. That is the error log analysis library that is different. So basically what we are doing is making a library of all the known error codes that the environment can throw while our test is executing. And every time there is a failure, we capture the logs and compare the error codes with any known error codes that we have. No, normally if you raise a bug, you might be seeing the same errors repeating in different builds, right? For example, a particular test case like a button is not clicked. So that particular failure you might have been seeing in across different features, right? But the failure would be the same, like the function is not working. So the error code on the API side will be the same. So in that case, why can't you try it automatically because you know the error code is happening again and again. So we do, we add that error code to the library, right? And then if we see that error code again, we know that this is already either has a defect associated with it or is an environment to show that the UI test, the client side really doesn't have control on. So we will mark the test cases error. And also like adding to your, this framework which was very good actually, and this change-based testing, have you heard about, right? No, I haven't. So it's like change-based testing, like every build whenever a build is being released, you are mentioning about reducing the chunks which we are doing, right? So you can implement this change-based testing where the priority test cases would be picked up based on the components. Understood, so you are, so I was not very sure about the term, but we do have something similar in our environment. We don't have it implemented yet. We are trying to implement it where we basically use tools like code analyzer and try to check what files got changed and then pick test cases based on which test cases are affected by those file changes. Actually, we are using that framework with the machine learning implemented. So we could able to achieve it very easily. So you can, at least if you want, you can use that kind of approach to reduce these chunks. That's very interesting. Absolutely, very interesting. Let's sync up after, okay? Thank you. After management on this flaky test, because even though you do whatever the things which you are doing, at least there would be a minimum of one percentage of issues will be there on this particular flakies. Sure, I would say the convincing was actually very easy because the amount of resources that we landed up wasting in just analyzing automation, we were at a point where the entirety of our effort towards automation was just throw away. We had a huge automation suite which nobody really paid any attention to. So it was very easy for us to convince that we needed to take a deep breath, stop writing new automation and make sure that we had a service and automated way of maintaining the code that we already had. And it was basically bringing back a much more return on investment for us. Okay, so I can understand if the product is coming from the long time. So if you are implementing it from the initial stages, it would be much more easier. But how can you give a call for the existing project if you wanted to bring on these things? See the biggest problem which we are facing is the existing test case is happening. Existing projects have 16,000 test cases. And every time you are getting the issues where we need to revamp the complete architecture or revamp the complete description. So actually this wasn't a new application for us. This was an existing application. You're right, it wasn't always easy to take a step back and revamp all the 16,000 test cases that you already have. What we did was we made sure that we were implementing a service like this and making sure that we were using it for all test cases going forward. So whatever damage is done in the 16,000 is done. Going forward, we are obviously writing more test cases and we were making sure that we were taking care of those new test cases. In the meantime, we were also picking out smaller chunks based on priority of the existing test cases and revamping them. So at some point, those two lines will merge and you will have a test suite that is stable. Great, and one last question. Sure. In the flow, I didn't seem like how we are deriving the coverage of the automation whether we have been covered. It was coming out of from the manual test cases and converting into the automation. That particular piece, I missed it. You may not see it, I'm late so that I might be missing that particular part. Sorry, I'm not sure if I get your point. So are you talking about code coverage? Automation coverage. Automation coverage. Test coverage with automation coverage. Right, so test coverage typically is done through unit testing or automation coverage. Code coverage is typically done through unit testing. These were all for the end-to-end tests that we had in our environment, the UI functional tests that we had in our environment. Okay, I will take off. Thank you. Great stuff, first of all. So this framework, you're only using it for the functional test and serenium test or also for the unit testing? Like the unit testing are not following the same kind of? No, the unit tests are not following. The primary hit on stability was for the end-to-end functional tests, especially UI automation tests. Something like this is hugely beneficial in that area. Unit tests, if they are badly written, may not be that stable. The entirety of the unit test suite is not prone to be that unstable as UI automation suites are. Any other questions? Thank you. All right, thanks, Vijaya.