 So a very good afternoon to everyone who is present here and to those who are watching us on YouTube, hey mom and dad, this will make them a little happy. So today's breakout session, I will be talking about an advancement in the field of browser automation testing. And this is something which has been a part of my work since the past one year. So before we begin, I would like to give you a short introduction about myself. My name is Thamsil Sajid Amani and I work as a software engineer at BrowserStack. The main projects which I have worked on include the Selenium WebDriver. And some of my notable works include implementing the Virtual Authenticator and WebDriver Bydi in Selenium. And I am also a member of the Selenium's Commuters team, which means that I have access to review and merge the community's contributions. So let's start with the big question. Let me ask you like if you have ever been in this field of browser automation testing or if you have met someone who is in this field, so it is very likely that there would have come a point where the tests would be passing in some browsers and they would be failing in some others or that the tests were not able to handle events in real time. Now this is a very common problem which many of us face and in today's talk, we will try to find a solution to this problem. So I will take you to the history of browser automation testing and then we will check out the present tools and frameworks which are available to us to do the automation testing. And lastly, we will take a leap in the future and see what WebDriver Bydi has got to offer us. So it wasn't until the 1993 that the web came in 1993, all right? But it wasn't until 2004 that people thought about automation testing and with that came the Selenium project. Now in 2006, a person named Simon Stewart, he came up with a different approach of automation testing and it was called the WebDriver. Now these two projects, they merged in 2011 and it was called Selenium WebDriver. And this Selenium WebDriver, due to its popularity, was adopted in 2018 as a standard by the W3C, that is the World Wide Web Consortium and it came to be known as the WebDriver Classic Protocol. Now while this advancement was happening, other advancements too were taking place simultaneously. So in 2009, we got the Node.js and with that, there was an increase in the demand for automation tools based on Node.js. So with time, we got many automation tools such as WebDriver, IO, APM, Nightwatch, Cypress, Testcafe and Puppeteer. Now these testing tools can be categorized into three sets based on the underlying technology they used for the automation testing. So tools such as the WebDriver, IO, APM, Nightwatch, they used the WebDriver Classic Protocol, which I just told you above. Tools such as Cypress and Testcafe, they used Web APIs and Node.js to do the same thing and Puppeteer uses Chrome DevTools protocol. Now in the context when we talk about browser automation, there are two approaches. One is high level and the other is low level. So in high level concept, what we do is the tools, the automation tools, it injects a JavaScript within the browser and perform tests from that script. Now because of this, the automation tool is bound by the sandbox which is created by the JavaScript code. Now this approach has a drawback. What is the drawback? The security issue. We know that browsers are very picky and very cautious. So they do not trust any event which is coming out from a script. Hence, if you want to perform certain complex functions during your testing, like if you want to open a new tab or a new window or you want to test within iframes, the browser will not let you do that because it does not trust you enough and it won't let you move out of the sandbox which is created by the JavaScript code. So how are you going to overcome this problem? So for that, we come to the low level approach. In this approach, we execute the remote commands from outside the browser. So what this does is it adds the events directly in the event loop of the browser and the browser allows it. Why? Because the browser trusts it because it thinks that these events are generated by a user, that is the remote commands are being sent from outside the browser. But now you will ask how is the browser differentiating, like which event should it trust, which event should it not. So for that, the developers at Mozilla have defined a property of an event which is called the isTrusted property. So the isTrusted property is true when the event is generated by a user action and it is false when it is generated or created by a script. So by reading this property, the browser knows which events it has to trust and which events it should not. So as I said earlier, Cypress and TestCafit use or leverage the web APIs and NodeJS to bring about the automation testing at the high level approach. And at the low level approach, we have got two subsets, one set of the tools which use the web driver classic protocol and the other set which uses the CDP or Chrome Dev Tools protocol. So I will take your focus to these two protocols as they might help us find a solution to the original big problem which I asked you. So how does the web driver classic work? We have our automation tools and we have our browser drivers. So the automation tools will issue the commands to the browser drivers via HTTP request. Now this is an important point of focus. The connection between them is through HTTP request. And the browser drivers in turn will control the browsers. And now what are these browser drivers? These browser drivers are binaries which the browser vendors maintain for their respective browsers. For example, Google will maintain the Google Chrome driver for Google Chrome, Mozilla will maintain the Gecko driver for Mozilla Firefox and so on. Okay, time to see the web driver classic in action. So in this demo, we will navigate to a page BStackDemo.com and we will click on the add to cart button for our product iPhone 12. Let's see how we can achieve this with Selenium. So we navigate it to our website and click the add to cart button for iPhone 12. The Selenium code for the previous demo is shown here. So in the first block, what I did was I am setting up my browser. So before that, I am putting some options such as I want what window size I want when the browser fires up. And then I am creating an instance of the driver, which will use the Firefox browser along with the options which I have just created. Next I am performing the three set of functions or actions. So one is I will go to my URL using the driver dot get function. Second is I will find my add to cart button. So what I'm so in the first line when I do iPhone 12 is equal to driver dot find element by ID. Here I am finding the element by using its ID one. But it is possible that under this ID one, there is not one but multiple elements. So all those elements are collected and given to iPhone 12. And in the next line, using that collection, I am again finding my add to cart button by its class name that is shelf dash item underscore underscore by dash button. And in the third step, I using the click function, I am clicking my add to cart button. So this is the simple Serenium script for the previous demo. Now these actions can be translated into HTTP request as shown here. So to navigate to the page, we used the post request followed by the URL. We got a success successful reply. Second using we are finding our iPhone 12. So again using the post request and passing the element and using the CSS selector and its ID value one, we are finding iPhone 12 and we get a successful reply. And the third part, we are going to click the element again using the post request and the element ID and we get a successful response. Now this is the best case scenario that happened. But what if I tell you that iPhone 12 is not loaded yet because there may be some prerequisites or background checks that the iPhone 12 is still yet to load. So what will happen when you will send this HTTP request? You are not getting a positive response. So this is a problem. And we have to see what HTTP, what solution HTTP has given us to counter this. So HTTP gives us the long polling solution. What happens in this case is that we establish a session normally. Then to find the product, we will send a request is our iPhone 12 loaded. No, we get a negative response is our iPhone 12 loaded. No, we get a negative response. So doing this is not very practical and in today's modern web that is web 3.0. But in earlier times when web was 2.0, this solution would have suffice, but not today. We need better solutions and HTTP polling does not cater to that. So let's see the advantages and disadvantages of web driver classic protocol. The pros is that it has the best cross browser support. Why? Because all the browser vendors have implemented it in their respective browsers. It is also a W3C standard. As I told you, it is built for testing and it is built for scalability. So by scalability, I mean that in the selenium project, there is a component called selenium grid. So you can use selenium grid with web driver classic to run tests in parallel. So you can run a sim a single test on multiple browsers at the same time. Now the disadvantage is that it is not event driven. What I mean by this is that in our previous case when I said the iPhone 12 was not loaded or still loading, there will come a point when it will load fully and it will generate an event. But there is no technique or no way to send that event back to our automation tool. Why? Because HTTP does not allow that. And the second disadvantage is that it provides only high level control. Now what is this high level control? I will tell you as we progress forward. Our next protocol is the Chrome DevTools protocol. So CDP in short, as the name suggests, this DevTools protocol is designed especially for chromium based browsers. And some tools such as Puppetier have used it for doing a browser automation testing. So the working of the CDP is such that the automation tools do not need the drivers to communicate with the browsers. So in this case, they communicate directly with the browsers. So the automation tools will issue the commands through the CDP and the connection is made on WebSocket. So the focus point here is WebSocket. Earlier in web driver classic, we had communication through HTTP request. Now the same previous example, when seen from the CDP point of view, we have the following set of commands which the automation tools will send directly to the browser. So to navigate the page, we have the command page.navigate followed by the URL. And to find the iPhone 12, we have the command runtime.evaluate passing a parameter which has ID is equal to one. And finally, to click the add to cart button, we will perform two commands. One is to press the mouse button and the other is to release the mouse button. Now the advantage here is that if you take the previous problem where our product or element, the iPhone 12 was not loaded. So after some time when it is loaded and the event is generated in that case, that event will be able to go back to the automation tool. Why? Because in this case, the connection is through WebSockets and we know that WebSockets are bidirectional. So hence in short, we can say the CDP is event driven and it is bidirectional. Now I will compare CDP with web driver classic on this point that is the level control. So what is this level of control? Okay, imagine that while building a website and testing it, there are two aspects. The first one is through the developers aspect. So a developer wants to know what is going on with his website in depth, such as he wants to know what are the console logs or errors which the website is generating or what are the network requests which are being sent. So some browsers such as Chrome have developed a tool which is called the developers tool to help the developers to debug their work. So the CDP protocol which gets its name from the developers tool keeps the focus mainly on the developers and helps them with debugging. Now we come to the second aspect of building and testing a website from the user's perspective this time. So what we want is that when a user goes to a website, he is able to click a button properly or that he is able to enter a text in the text field properly and he is not. He does not want to know the underlying functionality like he does not want to know what console logs are being generated. He does not need to know what network requests are being sent. So the web driver classic was designed in this case. It was designed to emulate the user first and not the developer. So but now as we have seen the web has moved and the developers have advanced and now they need more out of their testing. So in short the pros and cons of CDP are like this. The pros are we have it is event driven and it has bi-directional messaging because of web sockets and it provides low level control for the developers. And the disadvantage is that it is only supported by chromium based browsers and it is not a standard. It is not a W3C standard and hence there with it changes with every version of Chrome. So if your test is using puppeteer so you need to make sure that you are using a version of Chrome which is compatible with puppeteer with that specific version of puppeteer. And lastly as we have seen CDP is a bit complex. It is built for debugging and not for testing. So now at this point you have this information the advantages and disadvantages of CDP and web driver classic. Can you think of a solution to our original big problem? Anyone? Okay. Fine. So what if I tell you I take the best of both worlds? I take the advantages of CDP and I take the advantages of web driver classic and I merge them into a single protocol which I will call the web driver by die. So the web driver by die works in such a way that our automation tool can talk to any browser and any driver. The commands are issued through the web driver by die protocol and the connection is made via web sockets. Now the web driver by die is still a work in progress and it was started in 2020 by the W3C browser testing and tools working group. This work is not being done by a single party. This work is a collaboration between many partners which include the browser vendors and open source browser automation projects and companies which offer browsing automation solutions. For example, for the browser vendors we have got Google Chrome, Mozilla Firefox, Microsoft Edge and Apple Safari. And automation projects include Selenium, Nightwatch, web driver IO and companies which are offering the browser automation solutions include browser stack and source labs. So all these stakeholders they work together in harmony to bring out a simple and unified web driver solution for the testers that is easy to implement in their respective tools. So the question time, what does it take to implement the new protocol? So to implement the new protocol, we have got three stages. The first is the specification stage. Second is the verification and third is implementation. So for verification, the W3C has a dedicated web page for web driver by die where it has given the definition for the modules, the commands, events and errors for anyone who wants for any tool or party which wants to check out or implement web driver by die in their project. For verification, we have got this GitHub repository of web platform tests. Now these tests are written in Python and anyone or any partners, any tool which wants to implement by die has to pass these tests in order to successfully say that they have implemented web driver by die and the implementation progress of web driver by die for the respective browsers can be checked on this dashboard. So to get an idea fair idea, for example, the browsing context module is passing 444 times out of 440 482 in the Chrome browser and the network module is able to pass only two tests out of 16 tests in edge in Windows OS. So this dashboard gives you an overview of the current progress of driver by dies implementation in the browsers. You can check it out using the link. So I can say in short that the journey from web driver classic to web driver by die is not a future but it is currently there as there are tools which have given initial supports for web driver by die. For example, Selenium web driver IO and puppy tier. You can check out the links how to use web driver by die in the respective tools. Okay. Best part. And for the demo. So in this demo, I will navigate to the page which I have created locally. I will click on a button to raise a JavaScript exception and then I will try to catch the console error that comes on. So manually doing this, let's see what happens. So when I click the button, I got a JavaScript exception and the test says error not working. Let's try to catch this console error using Selenium with web driver by die. So in the first step, I am initializing an inspector using the log inspector module. This log inspector module in Selenium helps us to monitor the console logs and error messages which are generated in real time. In the next step using the inspector, I am trying to listen to all the JavaScript exceptions that are being raised. And once the exception comes along, I will pass it in the callback function. And in the callback function, I am assaulting for equality. That is I'm trying to check the text and type of the event or the error which I got. So if you remember earlier, we got the text as error not working and the type of the error was JavaScript. In the third step, as we did before in web driver classic, I'm using driver.get to go to my page. Using the find element, I will find the button and using the click function, I will click the button. Let's run this test and see the results. Oh, our test passed. But there was an error which was generated, right? So what we did was we caught that error in real time and asserted for equality. Now this thing was not possible in web driver classic because if you did this test using web driver classic, the exception would have been raised and the test would have failed saying that there is an uncaught exception and you can't catch it in web driver classic because HTTP does not allow you to do that. Another important thing which we can do with web driver by die is network interception. So what is network interception? Let me tell you. For example, you want to know the status of your PISA order. Your automation tool will send a get request to the URL to know the status of your PISA order. But at the browser end, you can tell the browser, hey Firefox, listen, if you get an HTTP request with a certain body, for example, in this case, you get the HTTP request with method get and the URL as defined here, then don't wait to send the actual response, but instead send a mocked response. So in this case, you will define the mocked response, which will be sent. And I'm sorry, you are not getting any dinner tonight. So network interception is useful when your tests require a response which have many prerequisites or there are many background tasks which need to be done and which are irrelevant for your test. So instead you can send a mocked response and your test can resume. Now this is my personal favorite accessibility. Oh, by the way, it is to be noted that network interception is yet to be implemented in by die as of 15 September. Now we come to accessibility. I'll explain it with a simple example. Those you are a builder and you are constructing a building of 10 stories and being a good builder, you want that every individual, including those with disabilities also are able to access all the floors, all the 10 floors of your building easily. So instead of so along with providing staircase in the building, you will also provide it with elevators and ramps so that people with disabilities can use it to access all the floors. The same thing happens when a developer builds a website or a web app. He wants that anyone or any person who visits his website is able to go through it or interact with it easily, even if it includes people with certain weakness. For example, a week cited for a week cited person that text on the website should be able to increase or decrease in size based upon his preference or that some people prefer screen reader, a tool which converts the text to speech or the use of alternative texts for images to give more meaningful sentence to the images. For example, if you are enabling the screen reader and it comes and it comes to a picture which is without an alternative text. So the screen reader will blurt out something like IMG underscore 584792.png, which is like meaningless for the user, right? But instead, if you use alternative texts such as cute cat, it will be much more meaningful for the user. So these are some examples of accessibility in a building a website. So the question arises, how is web driver by die going to help in accessibility? Now to include accessibility in your website, the accessibility APIs such as the screen reader, they need access to something which is called the accessibility tree. This accessibility tree is defined from the DOM model itself, but it differs from it such that the DOM focuses more on the structure and layout of the elements while the accessibility tree focuses more on the semantic meaning. So knowing that how are you going to obtain this accessibility tree from the browser? Any guess that was easy one. Of course, your new friend web driver by die will help you with that. So the good news is that the talks have already started for implementing the accessibility module in by die. This issue was created on June 6 and you can visit the link to get more insights and please comment your thoughts on it. And also so that's how we saw how low level control can be performed with web driver by die. I showed you how we verified uncut exceptions, how we could perform network interception, how we can use accessibility in our test. And also there are other things such as mocking geo locations, which you can check out on the roadmap and a small but important point which I want to tell you is that the web driver interoperates with web driver classic. Now what this means is that in the earlier example where I have used the log inspector, if you will see that there wasn't much code change which I did, the driver was set up in the same way as in the web driver classic. The way we navigated to the web page was done in the same way and the way we clicked it was also using the click function, which was there in the web driver classic. So the only addition we did was using the log inspector module. So this is how easy it is for the developers in the present time to use web driver by die. There is no complex code change required. And you can start using it in your test from today. Another thing is the web driver classic is going to be reworked to sit on top of by die. What I mean by this is that earlier when I showed you the navigate examples like when we were navigating to the web page or when we were clicking using the click function. So these functions under the hood were using web driver classic. But now we will use by die under the hood. But one thing remains constant on the front and the code will not change for the user. He does not have to change his code for him for using web driver by die. That is on our part, we will change it under the hood. So that was web driver by die for you and open even driven cross browser and built for testing protocol for the latest progress. You can check out the link. And I like to say it's about improving the developer experience because that will improve the user experience, which will make the web a much better place for all humanity. Now if your situation is like that, please don't hesitate to ask any questions. No questions. You can ask later also. No problem. Okay. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you very much. Thank you.