 So as Ankita mentioned, the topic of my presentation is continuous test data generators. So I hope a lot of you would be acquainted with test data, first of all, and maybe with test data generators as well. And those who are not, I think I'll walk you down through it, like the what, how, and why of it, and the part where we run them continuously and target them every day, every time, and even a lot of times in a day. Well, hello. I am Jatin Makhija. I'm currently working with Bingify. And I've previously worked with a lot of, you know, I've worked with startups. I've worked with service-based companies. And, you know, if you like it, you can definitely correct with me or LinkedIn. I would say definitely follow me on Instagram. And if not Twitter, I think I'm really passive over there. You will only see my tweets from the back 2016, again Selenium conference. So I'll walk you through what are continuous test data generators. When to use those test data generator, how to build them? I do have a case study and a demo for it. I have a recorded for it. And then the pros and cons and definitely will come to the Q&A part as well. Well, how many of you have been using, you know, automation techniques, automation tools to automate web applications, right? So does these kind of credentials look familiar to you when you're running your automation tests? Say a test.user at the red Gmail, at the red company name, at the red Amazon, at the red Xego, what not? But you have been running these test cases from say months, if not years. In my case, I was running it for two and a half years, doing the same set of credentials and not finding any new bugs, but just maintaining the code. Now I know that, you know, generating the test data every time or for every run is tedious. You have to go for a signup form. You have to enter your name, your password, your address. You have to fill in some marketing activities like, you know, where are you coming from? What is your market size? Would you be interested in receiving the emails because, you know, GDPR is in place? So there's a right to forget as well. But if you have to touch that, you don't have to do that. You know your application. You know your application. You know the internal calls. Why don't you get a magical post call out of it? Like this and generate your data in fraction of seconds, maybe in two seconds or something. You just have to figure out that magical call, hit it up with the tools like Postman or API testing tools and generate your fresh test data. And voila, you can definitely run your new test data. You can use your test data, run it on your existing set and it will have the potential to find new bugs. It will find new bugs. It will generate possibilities of the areas you have not been touching. It will increase your test coverage. Well, not me, even Andy tried it too. They also tried it up, you know, they tried it up, hooked up test data generators with their existing tests and found out that it had potential to find new bugs in every single run. It all depends, you know, how artifactly, how articulately you are making your test data. Well, as I said, like even James Thurber said, if there's a probability, just go for it. Just go for it, just run your test with a new set of data. Well, going by the formal definition, test data generators are nothing, but they are programs of function capable of producing test data for a random or a specific attribute. So when I say specific, you need to know your application in order to generate more coverage, more depth, like you have to know your application, then only you will be able to get the maximum out of it. You can even run it for, say, functional testing for database testing, you know, when you say populate a new database, you populate a new data, there's a new table, say, instead of user names, now there is address-wide fields all together. So you can use those generators. When these test data generators are sustained over the period of time, we name them continuous test data generators. So consider applications like, you know, Google Analytics, Host Analytics, there are others like, you know, VWO, which is Visual Website Optimizer, Optimizely, even Banking Applications, wherein you have to feed in data day in, day out. You have to have data in order to test it out. You cannot just go make a single visit to a Google Analytics and test it out. That a single visitor coming or not. You have to scale it up. Well, we did a similar exercise. As I see, there are no changes in our code. We did not do any changes in our code, but we ran with a new set of test data. And voila, we found 14 bugs. And the bugs, they were other of the like, the labels were not adding into a freshly created campaign, if you heard about. Even when we experienced and when we, you know, modified our test data, we were able to find excesses errors. Because deep down when our test data generators were so advanced, we populated them with excesses attacks. And in the new fields introduced, which by humanized, by manual test, it was getting skipped. These test data generators were able to catch it up. When we're running using them. Well, it's been a very common principle, you know, when you are running your test repeatedly over and over again. Say you have written 4,000 test cases. Maybe you are taking eight hours to run them up. But are they finding any new defects? Every time you are running, or are you just maintaining them? It's like a pesticide paradox. You would be giving them the same set of inputs, getting the same set of output, but no net new bugs found. But when you play around with real data, not just the mock data, let's leave that to your test. Like when we mock everything up, we mock the front end, we mock our back end. But when we take real data, the probability of finding defects increases. Well, at Vengeify, we have a couple of SaaS applications. And for SaaS applications, it's really necessary that every marketing lead counts. If you are also of the same genre, I think you would agree that we spent a lot of dollars even getting single clicks on our ads. And when that money is spent, we want the maximum out of it. But what we observed is, when we are running any campaigns, we are running Google ads, we are running Facebook ads, but we were not getting enough signups from New Jersey and Orleans. We did all the experiments, we did all the data inputs. We use VPNs, say maybe there's a different altogether thing that we are getting banned in New Orleans and Jersey. We tried different test data. We would use these like test data, like John Jogger Street, Carlton Hallway, New Orleans. But when we dig into our analytic systems, say segment.io or say analytics, so we figure out, this is what the users were entering when they were signing up. There is a huge variety of test data which they are entering up. But when we are doing it, we are just getting limited to certain options. Do you see anything fishy over here? Why New Jersey and Orleans users are not able to sign up with us? Well, it was all about the rejects. When we deep down into our code, we checked it up, you know. The main thing was, our assumption was that all the addresses would start with alphabet. But in case of New Jersey, New Orleans, and maybe a lot of places where your address is starting with, say, numerals, or maybe some other special characters, we won't be able to sign it up. Then we ran the test, we updated our rejects, and how we did come to know it, we analyzed our data, we prepared our test data generators, and in this case, we use Faker.js, which is another open source library which you could use in your test data, because these kind of test data, it cannot be created by human eyes. You're living in, say, India, you're living in Europe, you won't be knowing how and why, how the test data is over there, what are the addresses over there. But these libraries, they are accustomed to do that. Then the question comes, how to build these generators? So the concept is simple. You have to have your application knowledge, you have to analyze all the inputs, you have to create certain control flows, select a path, create test data, and most importantly, run them regularly. Not just once, not just once, just create your test data once, and then again run it for three years. Again, use that test.user at theregime.com every day, every run. That won't help you out every time. I'm not saying discard your old test because regression is definitely important. We all care for regression, right? We do not want any bugs to be found in our existing systems, our existing users, but what not? You might get a $4,800 a new user, which you could just simply lost out because you were just testing your old test data. Well, to develop any test data generator, I would say functions like math.random, finding a new address altogether, these functions are very handy. Say, consider an example. We were taking, say, username, right? You have a test.user name. What if you could just couple it with test.user name plus math.random function? Voila, you have a new test data. You can run your test cases on that user, and it won't cost you a thing. So in our case also, we use a lot of times math.random and other similar functions. And as I said, you need to have application knowledge, but at times it takes a lot of pain. You would be saying that I have to refactor my code. Say, I wrote down test cases. They are lasting too much. They are taking too much time. No problem. Maybe you can start with less. You can start with the online resources available. These are all present over there. Mocharu, generate data, faker.js, they all are handy. It won't even cost you a penny, but yes, some efforts at your end. So as I said, there are a lot of pros. So maybe there are applications, as I said. There are analytics applications. There are banking applications where continuous flow of data is required. So in such cases, you can run these test data and data continuously and find the best out of them. We can also couple them with master strings and find voa main or security effects. It also increases your test coverage. So it might be a case, your existing test data was just testing certain endpoints which were only limited to what you had coded previously. But once you update your test data, you would be able to capture much more part of your application. Well, coming to cons, yes, it will take time. You have to develop it. You have to work over it. It also helps, say, consider an example. So you have a SaaS application. You have a e-commerce application. You give a feature, say, whenever a user buys something, you will give them points. But eventually, that feature was no longer required and it was discarded. But when it was introduced, you inspected, you invested a lot of time in developing that code. Your test cases would still be running on those accounts wherein that feature is available. But an actual 99% of the users are not using those features now. So it's time to remove that code. It's time to remove that code and reduce our maintenance efforts. Well, as I said, from Wingify, we have a SaaS application. One of it is Visual Website Optimizer, which is, you know, in an all-in-one platform for doing A-B testing, for visitor recordings, for heat maps, for form analytics, for surveys, and whatnot. So my demo is focusing on one such application wherein a lot of users are registered, a lot of data is being captured, just like Google Analytics. And we have been tracking that data. We have to ensure that such applications work. And they work every day, every night, and every second of the hour. No single data point should get missed out of it. So with BW at hand, I'll briefly tell you about BW. So as I said, it's an all-in-one optimization shoot. So it's process-driven. You could create A-B tests. You could create visitor heat maps. You could create a lot of recordings in play. Maybe how your visitors are behaving. Those kind of features are already available. So how we created the test data generators for it. I have a very short demo for this. Just walk you through it. So this is the application, right? So when we have this application, again, if I have to sign up, I have to, you know, maybe create a free trial account or fill in all these forms again, say email, password, full name, phone number, and again, do it through Selenium or any other UI tool to create one test data point. But why? I know my application. I know there exists a path which I could take and register a small user out of it. And once that is done, I will have a fresh data point with me. I'll have a unique ID just like, you know, just like Google Analytics has a unique ID with them when you have to pass an information to it. Just that account ID. We have for VW as well. So the moment it loads, it will show you that, you know, your account setup is pending. You do not have anything set it up right now. You have an empty piece to test. You do not have any test data. You do not have any campaigns. You do not have any goals prepared over there, no forms, no nothing. Have you heard about AB tests? We do have like market players for AB testing. One of them is VW. Other players are Google optimized, optimizedly. So once an AB test is there, you know, so what we can do is, we do have certain generating points and we can simply key in our account IDs over there, you know, and key in certain usernames which we know that they are capable of generating basic input data points. So we can simply do that, you know, that is being done. I am at my bay. I'm sitting at my chair. I'm relaxing with a cup of coffee. And these data generators will give me a single set of points wherein all the data points would be captured just for creating the campaigns. So my campaigns are ready. I don't need to run all of my automation tests for UI generation. You know, I don't have to click here and there and put it up. I can simply use certain APIs to do that. So now the campaign has been created, but since there is no data, we do not have any data input to it. So what now? So we have test data generators developed for it. I have to have some application knowledge to run these test data generators. I know that when I created my script, the first of all campaign, it also has a unique ID, which is one in this scenario. So I'll key in that value over there. I'll key in the account ID over there. I can set what I want from it. So an AB test is basically, you know, you would be comparing two variations. Say, so for example, you want to experiment with your homepage. Say you want to have a green color signup button or maybe have an orange color signup button. So this you can do with AB testing. And with AB testing, you would be focusing all the visitors to your campaigns. And when such campaigns come, they are basically data points, what not, but data points. The data points would be encompassing on the main user, which was already existing green color button. And say, if I'm running an AB test, the 50% of the users would be going on, say orange button. So I can simply configure it over there. I can say mention how many visitors I want to make part of it. Say 2000, for orange button, I want, say, 1800 visitors. I want 40 of them to click on it. I want 30 in the second variation. I want to click on them. So as soon as I run them, I have created a lot more campaigns as well. So those test data generators are now running in the background. I would be definitely firing up even qualitative data. So you have heard about Web Engage, right? Web Engage is there, Kualadu is there who are responsible for qualitative data. So we do have a similar product for that. It's say on-page service. So would be, you know, entering some data points for that as well. Because it is application specific over here in this scenario, I have to key in the credentials as well, just to fill in the data points, which I'm sure if you are working on a single application, you would be knowing in and out of it by now. When you're doing automation, you definitely know how it works. So I've created my test data generators. I just simply ran it up. If you will see, the data has started populating up. In three minutes, the data points are there. I had created an A-B test. It did not have any data. Now it has data. It has data which has been just created in couple of minutes. I have about 4,000 data points in one and a half minute, which I could not do by mocking the actual user. I could not ask for a single user to go over there and click on everything. We have the data ready with us. Well, that's the number of data points I could generate in a day using a very simple machine. I could invest a $40 in a cloud machine, maybe on AWS, maybe on say Google Cloud, maybe on Digital Ocean. I could generate this much of a data point. I could definitely work with four machines, costing me $160, and I would be able to generate millions and millions of data points just in a day, imagine. And everything with 100% success. So you might be noticing, you know, you run your test when data is involved. You want to be sure that every data point was captured. In such scenarios, you have to run your data points every day and out. You have to run them continuously. Then only you can go and grab your developers if even a single data point is missing. Well, build and bugs. You build them, you run them, and you will have the probability of finding new bugs. Well, I think that's it from my end, you know. So you can find me here and thank you. You have any questions? Thank you, Jatin. We have time for one question. Any questions? So is your solution like a plugin for Jenkins? No, so we use Jenkins as a wrapper over these test cases. I encapsulated everything through Jenkins. I haven't shown you the code. There is code. There are different technologies involved. So for here, say we have used Jmeter, we have used Python, we have used Java, we have used Postman to do all this. But Jenkins was just for a layman tool. We do not have everyone, I have the knowledge of these test data generators, how they are coded, how they are developed. But what you can do is you can simply key in certain inputs and get the all data out of it. So another one is about the combinations, what basis the test data will be generated with respect to what kind of combinations and is there any possibility for masking as well? Well, I'm not too sure about the masking part, but the generator part is all about you. How you are generating your test data generators, how are you coding it up? What all are the possibilities you are inviting up? Like the example I showed you, I was keying in values for two fields, which was one for the orange button and another one for the green button. You could even have a red button. You could enter the values for them and all the data points would be filled it up.