 So one night I was reading one of my favorite science fiction books called Hitchhiker's Guide to the Galaxy and as you can imagine it is a story about aliens in space travel and in parts of it they use by their words the fastest spaceship ever built and it can go anywhere in a part of a second. It was driven by the so-called Infinity Improbability Drive which the name came because it was so impossible to be invented that the chances were zero to infinity and this was how I felt about parallel testing and the lack of tooling. This is how the title came out. Of course today I'm not going to talk about interstellar travel rather than parallel testing and the main question that I'll try to answer will be how we can deliver high quality software faster. My name is Anton Angel. I'm CTO of Automated Planet. I have several years of experience in the field of test automation and for six years I worked as quality assurance architect in two big Bulgarian companies and part of my job was to write and design scalable test automation solutions and at the same time I consulted a couple of companies regarding their test automation and let several trainings. So here is our game plan. Even if you don't have the goal to speed up your tests through parallel testing I think you will find lots of insights about distributed testing and eventually how to optimize your continuous integration or continuous delivery and definitely you'll see many examples from our own company. Now I will quote the state of testing report for 2017 by Smart Bear. 68% of the people want to deploy weekly, daily or multiple times a day. So to the end of the presentation you will see how our company made it possible to release our product a couple of days a couple of times a day. How do people run their heavy tests? And here by heavy tests I mean tests that take more than one second to execute. Usually these are database UI API or system integration tests. They can take from one second to a couple of minutes but on average we can conclude that they take around 30 seconds. All of the channels automate the planet has over 25,000 followers so we asked them to participate in a short survey and I will use the data from the survey to illustrate some points and they summarize the practices from more than 50 companies. The first question that we asked our followers was how many tests do you have and I summarized the results in five groups based on the count of these heavy tests and as you can see on the slide just a second more than almost half of the respondents mentioned that they have between 500 and 1000 heavy tests. So on average we can conclude that most companies have around 750 heavy tests and if you make a simple calculation based on the discount and the 30 seconds per test we can conclude that you need more than six hours to execute all of these tests sequentially on a single machine which means that you cannot use these tests for continuous integration or continuous delivery. Why is that? Because if you want to make it possible to deliver your product multiple times a day you need to be able to execute all of these tests at least two to three times a day or under three hours. Why is that? Because sometimes there is a book in the tests or some random environmental issue occurred during the run and you need to repeat the whole process again and again. Our second question to our followers was when do we execute your tests and their answers were identical to what I read in the state of testing report. As you can see 40% of them already execute their tests in continuous integration but if we closely observe the numbers here we can conclude that 60% of the people struggle with the speed of their tests and even there were some edge cases in the answers such as some people execute their tests during the weekends only during the nights or even monthly. So I can define and please three types of parallel testing. The first one is when you rely on the on your unit testing framework. For example if you have 100 tests and you want to execute them on a single machine with five CPU cores on each core 20 tests will be executed. This is what Dan was showing in the last talk. The second option is to use distributed testing or to run your tests simultaneously with multiple machines. To do that you need complex tooling such as Microsoft test agents and you can mix both approaches running the tests in parallel on multiple machines and utilize the power of the your unit testing framework. In our observation the optimal number of simultaneous test process is equal to 1.5 up to 2 multiplied by the number of CPU cores which means that if your your virtual machine has two CPU cores the optimal number is equal to 4 and in my experience in the companies where I worked in the past most virtual machines that we used had two CPU cores so they're quite not so powerful. One big machine or many smaller to answer to this question I need to tell you the difference between horizontal scaling and vertical scaling. Horizontal scaling meaning means adding more machines to existing pool of resources whereas vertical scaling means adding more power to an existing machine and usually many smaller machines are cheaper than one big one. What are the advantages of distributed testing? The most obvious one is speed. Instead of executing your tests for 20 hours they execute for three. Then since you since the tests are executing faster you can release your product more often. Even if you don't have the goal to release daily you can optimize your continuous integration process by including and executing all of your tests in the continuous integration. This way you'll have higher coverage in shorter throughput time and as you know the more often you execute all of your tests their return of investment increases and the more often your tests are executed you'll be able to get easier the flaky not well written tests. You know to understand how why we solve the problem and how you need to understand what we needed to test. Our company specializes in building test frameworks and tools for testing web, mobile, API and desktop but it's not a visual application it is set of code libraries so for example for the web part we use web driver under the hood and we add some features on top of it so we had to test that we had to test a different browsers all of the integration with cloud providers and so on so in the end we ended up for the web part with 4500 UI tests and I'm not counting all of the unit tests that we have. If you run all of these tests sequentially on a single machine you will need more than 16 hours to do it as you can see here this is an image from our Visual Studio Team services build so this was unacceptable since we wanted to be able to release our product like to give updates every day multiple times so we created MISA what is MISA it is a free open-source distributed test runner it is written with the latest .NET technologies such as .NET Core, ISP, and many more it is cross platform meaning that you can run tests on Windows, macOS and Linux and it is designed to be cross programming language agnostic which means that you can run tests written in different technologies such as C-Sharp, Java, Python and so on some of the most prominent features it can distribute your tests across multiple machines and even run them in parallel in each machine without modifying the source code it can smartly balance the tests across the remote machines based on the previous execution times then you can retry the failing tests a couple of times to see whether there is a bug in the tests or a real problem in your system under tests and so on this is really handy it is delivered as a single common line interface only a zip you unzip it you don't even need to install Java .NET or something else we have some extensions for for example for UI testing for cleaning the previous web driver sessions killing browsers and so on so that the state of the test agency it's kept clean and as mentioned is cross platform so before me so we had four thousand five hundred tests they run for 16 hours and after that we created four virtual machines with two CPU cores and we decided that we will execute one test at a time on each machine so no parallel testing on each machine and the tests were executed as you can expect four times faster or a little over four hours so before showing you a short demo I want to show you a little bit how we use it first we need to start the Mesa in server mode which means the server is the thing that controls all agents and runners then on each machine we have an agent the more agents you have the faster your test you will be executed we use the keyword test agent we specify agent tag which later you can use to filter on which machines your test will be executed and we mentioned the IP address of the server and the last part is where you execute your tests usually this is started from a continuous integration job as you'll see in a minute we use the keyword runner and we mentioned a couple of others parameters such as where your test files are where the test results will be saved and so on now it's time for demo I made a video should I move it it is shifted here should I move it so I created a simple web page written in bootstrap that we will we'll write a test against as you can see we all of the fields it's like a simulation of a check out page nothing fancy one of these tests written in web driver takes up to three to five seconds depending on the machine where it is executed this is what our test will do and now I'll show you one of our web driver tests here it is so first we will create the HTML file in the test in it this is really important this is like an attribute in MS test that we will later use to show you a comparison with me so with the native test runner where our test will be executed simultaneously in 16 threads here we take a new instance every time of web driver we get a free port using this method here which is thread safe so each test has a unique port and unique driver and we start from in headless mode next we will we wait for the we go to the URL we wait for the first element the first name on the field then we then we just start to fill all of the fields that's it and finally we quit the bra the driver this is one of our tests and my goal was to make the comparison to have like 1000 tests so I created as you can see here from this from the another project created a simple console application for generating all of these tests 1000 tests in so this is the application that we use to generate the tests not a fancy Indian just write all of this text to a text file that's it the tests are located in the test project below here in the checkout page tests and as you can see as you will see the test file is quite enormous like over 40,000 miles so the goal here now is to show you the difference between how when we run the tests distributed and in parallel and when we use the native test runner of MS tests so I created 11 virtual machines in Azure in one of them I will start the tests in 16 traits executing them with MS test and one other 10 I'm using Mesa to execute them so here is how we use it Mesa in our continuous integration job to test Bellatrix first this is the typical workflow get the source code then we restore some dotnet packages like downloading third-party libraries that's it then we build the project all of the projects actually because we have 90 of them then we copy the the built outputs files to a special folder delete the old test results files and at the end run and execute the tests with Mesa as you can see it is downloaded here in this folder and there as you can see there are many parameters here because we use all of the features of Mesa but the most important one is the runner the server where we need to mention the IP and the port and the path to the jail to the to the file where our tests are and in the end when the test results are published we publish them to the visual studio team services and here is how the results look like in visual studio team services these are our continuous integration tests that execute quite fast as you can see they execute for less than three minutes using Mesa and most of them as you can see they execute for less than a second and which is quite fast for UI tests actually and this is our demo continuous integration job we start on each machine six threats test threats so this means that we have ten machines and multiplied by six this means that our test using Mesa we will start 60 simultaneous test processes this is the machine where our test Mesa server is started how we started actually first on each machine I have unzipped and downloaded the Mesa binaries so first we go to the folder where Mesa is downloaded we will create the folder this when then I think then we open the common one and the only thing that we need to type is Mesa.exe init server that's it this will start Mesa in server mode then on all other machines once the server is started we need to start Mesa in agent mode how we do that I have already done that but I will show you this is how Mesa looks like when the agent is started I have even created a batch file where to be easier to not type it this every time we just again navigate to the folder write test agent attack and the point where the server is that's it so we use our continuous integration job in the example to execute the tests in 60 threads on these ten virtual machines so we start the job and on the eleventh machine here as you will see on the demo we will start the same project but using the native test runner to execute the tests in 16 test process which if you recall the formula is not the optimal number for machine with two CPU cores all these machines have have two CPU cores which means the optimal number here should be four and I have if you execute all of this test sequentially on a single machine you will need more than one hour around 70 minutes I tried to execute them with the parallel runner with the optimal number of four and they executed for more than half an hour but anyway I want to make the comparison and see and to show you that even if you boost and increase this test rates count the test time won't improve and even there will be some problems as you will see here the thing that I will show you is that in the file which is not looking quite okay in notepad we have this attribute for pointing that we will start the tests in 16 threads and at the end I will just make the I made the video a little bit shorter so that you can see the results and not wait like 40 minutes for the native runner to finish here as you can see this is the built with MISA the tests were executed for six minutes which means that they executed more than 10 times faster than using the running them sequentially and this is how after 40 minutes looks the native test runner it completely crashed as I told you up to six or seven threads it's fine but maybe this is still a problem of MS test I don't know but first the Chrome started to crash then the runner trod as you can see here throw system out of memory exception because all of the threads anyway up to four threads it's safe but after that I don't know so usually when we execute our test this way we just distribute them across many machines that's that is our way so let's continue in order to solve the problem I read many things for a couple of days and because I wanted to catch up and notary why only on my previous experience from for the companies where I worked for in the past and the funny thing is that for eight years the ways how people execute their tests in parallel has not really changed to addition to that I we asked our followers how do you run your tests in parallel and as you will see their answers were almost identical to what I read some of them use Microsoft test agents a small number of them others prefer to use custom tools or continuous integration tools as you saw maybe yesterday to distribute their tests however the majority of them rely on the their unit testing framework testing G Java J unit and in combination with Selenium grid if you are talking about UI tests of course as you saw not all unit testing frameworks are excellent for running your tests in parallel for example MS test didn't support that before a couple of months ago when they released version 2 with the parallel option but as you saw they have some problems still however if you are okay with and your unit testing framework is behaving okay you can use it together side-by-side with Mesa and you will be able to distribute your tests across remote machines and you can use some of the other cool features like retrying the failing tests now I will make comparison with some other popular solutions with Mesa one of my favorites is Selenuit yesterday we had a presentation about it it is like I don't know Selenium grid on steroids it gives you clean installations of popular browsers in Docker containers it is excellent for running your tests in Opera Chrome Firefox and even now I had to put a bullet here as we saw yesterday it supports even APM and mobile testing however how it differs from Mesa well it is not a test runner so if you want to be able to run your tests simultaneously you still need a way to run them in parallel so it's not a test runner the same is valid for a standard Selenium grid and it's tuned versions such as Selenium Selenograph JSON wire grid router you can run tests in parallel with them but they still need a runner like if your runner doesn't support parallel testing execution this means that even if you have 100 hub nodes your test will be executed still on one of them some people as I told you because of the lack of tooling use continuous integration tools to distribute their tests the usual practice is to mark your tests with attributes or notations such as machine one machine two and then use separate runs to execute this test on across the machines but this is for me it's a bit practice because the balancing of the test is manual every time you are to remove tests you need to calculate that time in your head and it's bad and even there is one more problem at the end since you have multiple runs you need to find a way to merge the test results created by all of these simultaneous runs I love Microsoft tools and actually for a couple of years in my past companies we used Microsoft test agents to execute all of our tests however they have a couple of drawbacks they're hard to set up and troubleshoot the documentation is missing almost entirely it is like a black box how the tests are distributed across the machines are balanced and of course there are non-lun windows as you saw from example Mesa has many moving parts now I'll tell you how it does its magic first we have the agent the first time we we have the server the first time we start the server a portable sql-wide database is created and a web service is self-hosted on that machine and we use the server to coordinate the work between the agents and the runners so they don't communicate between each other they communicate through the server using HTTP so they don't have separate databases in this database here we save like things like walks messages sent from the agents test execution times and so on this means that if the server is down the whole thing won't work so the first time when you start an agent it registered itself says active then it continuously checks whether their new test runs scheduled to be executed on that machine then the so-called extensibility points plugins are welded and they offer you a way to execute logic in various points of the test execution pipeline such as before the run after the run on abortion and so on after that we want the so-called test technology plugins which help you to execute your test to merge the test results to okay the test from the DLS and so on so they are really important and after that based on the parallel options the agent creates multiple test batches and start all of the processes and waits them to finish and then it merges all of the results and send them back to the server when you start the runner it again first waltz all of the extensions and the testing extensibility points plugins and the test technology plugins then it filters all of the tests that it welded from your files and it distributes them across all of the agents then it zips the test output files and sends them back to the server where each agent can download them and after that the second part is where it will wait for all agents to finish their execution in one more process it continuously checks whether new new messages are sent by all of the agents to be printed on the console so this is useful when you want to to see what is happening on the agents in your continuous integration job in one more process it continuously checks the health of the agents so if one of the agents is down the whole test run will be aborted and in one more process it verifies its own health so if you for example I don't know stop the continuous integration job and the runner is there then one of the agents will notice that the runner is down and the test run will be aborted so at the end the runner merges the results from all test agents and based on the retry failing option if it's turned on and they're like failed tests here the whole process will be repeated again for them and if they succeed the test results file will be updated at the end we again weld the so called extensibility points plugins and execute logic for them from them how to get started first you can go to our website download the zip unzip it as you saw start executing and for a free tool we have like a great documentation about it but more importantly you will find a link to our github page where you can report books suggest new features and we are looking for people to help us to write the extensions for different technologies such as Java Python Ruby even if you're like not comfortable to write it yourself you can just send me a like a simple test project and tell me how the tests are executed with your owner and I will write the plugin for half a day that's so you can go to github we have a like a channel there and you can write me so in summary we talked about parallel testing the different types the various benefits the different styles how people execute their tests in parallel I showed you the project and a little bit demo and why we built it and I wanted to end up with some kind of statistics but I decided to go even further so again I used the similar two to the one that I showed you the common line that generated the tests but this time I generated 100 thousand tests and each of them executes for one second which means that if you execute them sequentially on a single machine they will need more than 28 hours to finish so the next goal was to create again in Asia 10 virtual machines but this time instead of two CPU cores they had eight CPU cores and 14 gigabytes of RAM so if you recall the formula the goal here was to start Mesa in agent mode on each machine but this time start 16 threads on each machine which means that for 10 machines we will have like 160 simultaneous test threads so 100,000 tests this is the time if you execute them sequentially and to them to them the tests were executed for less than 20 minutes using this setup as you can see here this is our visual studio team services built for that so I will end up with a quote from the most famous from the CEO from the most famous programming website Stackoverfall I will summarize it to be great QAs we need to improve the happiness of our developer fellows by giving them frequent releases and this way giving them negative and negative and positive feedback faster and this way improving the quality of our product thank you helps to turn the microphone on hello can you go back to your biggest experiment slide and talk about which browser is actually initialized in those nodes where in the in your biggest experiment slide that you had where you ran and you showed some stats couple couple of slides back so can you talk about which browsers you executed you know some of the test results reports or anything with that which one this one yeah yes okay here there were no browsers because of that I made the demo with browsers this is only like a test like for example if you run a database test so this is actually unit tests so I wanted to show you in the second example that this is not a tool only for UI tests you can run API test database test system integration test it doesn't matter this is it and because of that the first demo was with browsers previously in the first demo we had from in headless mode but you can do it like in not handless mode it works as well hi and we have a question here in front okay sorry hi Anton hi this is Vimal so the real problem of selenium grid is like a single point of failure where in if the hub goes for a toss our entire test execution also goes for a toss right so when you talked about the balancing right is it something like a load balancing structure like wherein you bring multiple servers not the test agents the servers under the balancing structure so that if even one of the server goes for a toss we have other to support no actually the balancing of the feature of the balancing of the test is like for example Microsoft test agents usually distribute the tests based on the test counts for example if you have five notes and 100 tests they will just split the test 20 by 20 by 20 you know but for example if your first 20 tests are quite longer than the other batch your whole test execution will rely on the execution of on the this agent so our balancing is we execute all of the tests once then we record all of the test execution times and then we split the tests not based on the count but rather than on the on equal batches of time but it's not like a world balancer where we'll start new instance of Mesa or something like that question back here hi this is very very interesting stuff and I think it's great that you can execute so many test cases but my question is when you have so many test cases executing in so many machines and you have a book how easy it is to detect this book to identify how it happens and and to be able to get that feedback and quickly fix it and then do they quickly release twice per day you know usually I just prefer we keep a green built in our company like we have right now over 7,000 tests and I'm not counting the unit tests so it's really rarely when more than 10 tests failed per build so I don't need to check all of the time since the framework is stable I don't need to check the results all the time I think the question is when an actual bug is introduced to the product and your test catches it how easy is it to isolate and analyze that failure it depends on what or what framework do you use you know for example in our product the testware which we run they have I strive to have meaningful exception messages for example when they fail but this is like a product specific thing we strive to do the same for past companies where I work for in the past my say includes the reporting though correct yeah it's not so about the runner itself okay okay we're here we've got about five minutes so a couple more so my question is is dotnet framework required for this Mesa sorry out of dotnet network can we run this Mesa you don't need to install dotnet not required no no it's not required since for dotnet core we build the tool when you create the binaries they can be built like in a special mode where the the dotnet itself is included in the binaries so you don't don't need to install it what all the languages supported by this right now we support since our company designs like frameworks for dotnet it primarily runs this for dotnet core dotnet framework and all of the frameworks around that but the guy now help us to write the plugins for Java and I as I told you if you need to run your tests written in different technology you just drop me a wine and I will write the plugin for half a day in continuation with the isolation so in continuation of previous question yeah isolation of the bug so we do like capturing of the images signed video recording well just execution now major issues when we do parallel execution the consolidation of the logs loses the image basically because image quantity is size is huge and we can't consult in single page and transferring from sorority to sorority again consumes a time so how that scenario is handled in Mesa like if my scripts are unable to capture the video or the images then how the consultation happens second major problem is when you trigger the browser in headless mode most of the browsers fail to capture the image images come as blank so is it my right to do trigger the browser in headless mode okay actually this was part of I was going to talk about that but I skip it in because I wanted to make a real demo about the video recording this is the reason why for our company we don't use the parallel option we just distribute the tests in a single mode on each machine so we have integrated the video recording in our framework and this way for example if you start two browsers you cannot see the second one you know this is why we use many machines instead of like ring that tests in parallel on each machine this is the first question the second one the way we implemented the thing with the screenshots we don't use the native web driver screenshot thing because of this problem and not only because some of the web driver implementations they just take a screenshot of the visible part of the desktop so we use a javascript library and execute the javascript that that for example merges the dome and creates a image from it so we create full page screenshots based on javascript and this way it doesn't matter whether the browser is in headless mode or not okay maybe one more all right one more so what is the fault or in design for either agents or the server what you have sorry fault tolerance in the sense like know if something goes down obviously kind of like able to grab all the results and everything so I cannot hear the question but so essentially question is like we have n number of agents running and there's one particular server that's essentially kind of distributing the work right so what happens is one of the agents goes down so what happens in that case like no I mean is this main server being updated or there's some kind of monitoring happening on those agents why I'm asking is like if something of that nature happens it might go into infinite rate right one of the machine what is given for a work is not able to complete the work can you repeat I think I got it if you were running 20 tests on an agent test number four goes and somehow the agent dies what happens to the rest of the tests and how do you