 How many of you use closure or know what closure is, how many of your testers value to do testing? So what is the problem? So testing real world stateful business logic is very hard. There are a lot of cases to worry about, there are a lot of combinations. So for example, here is the microwave oven state-to-state diagram. And as you can see, there is a lot of transitions that can happen. And it has inherent states built into it, which needs to be tested. Even if you have a functional programming language or any of that sort, you still need to test against the certain states. And it becomes difficult to eliminate all possible combinations. With how many times should you set the timer back? You may not encounter a bug if you set the timer once, but the third time there might be a bug. And because if you try to eliminate all possible combinations, the test becomes really unreadable and they become really hard to use and what this test is really about. So what I am proposing is that using DSL is a way of writing tests which can solve this problem. So what is a DSL? A DSL is a domain-specific language. It's like a programming language designed specifically to express solutions to problems in a specific domain. So this is not a general-purpose language. It's targeted towards the domain. So let's take a real example of that. So at HelpShift, we are embedding, we are making embeddable support desks for native apps. So as a developer, you can take our SDK, put it inside. Each domain can have several apps. Lots of plans, boombies, and Ghana has a Ghana app. Each app can have several sections. Each section can have several FAQs. And each FAQ can have several translations in several languages. Right? So here is a Ghana.com FAQ page. How many of you know Ghana.com? Yeah, full. So that's the domain. This is the app. That's the FAQ section. Within that section, these are the available FAQs, which can be searched by the customers. And here's how you can create an FAQ on our dashboard. Here's you can add a title, a body, and some meta data, like whether the FAQ is available or not. And there are two published meta. So one is that globally, whether this FAQ is available, and each translation can separately be turned on and on. And I need to talk about that because it's important later. So there is a problem, right, in our system, which we were trying to solve, is that currently we could not have one FAQ common across several apps. If you have, say, a privacy policy, you need to create those privacy policy again for a second or a third app. So we needed to solve this, but there was a constraint that they wanted, the customers wanted only the content to be shared, but not the whole meta. They didn't want to share all the meta of whether this FAQ is available or not. So we introduced a feature called linked FAQ. So here is an example of that. So they say, for example, there is one app which has a section called general. And within that there is an FAQ which has this translation with that body in title and there's some meta data. Then there is another app with another section, and we want to sync FAQ one to app two. So this will create another FAQ with the same translations as the body. And if I update one body, the backend will automatically update that. So this was the feature that we were supposed to test, and it quickly became really hard to try different kind of combinations. What happens if I delete one FAQ, does that get deleted or not? Should it get deleted or not? What happens if the app is deleted? What happens if the section is deleted? What happens if I update this one? Does that get updated? What happens if I update that one? What if there are five different apps and so on and so forth, right? So how many combinations would you try? It's really hard to know that. So here is an example test case. So for example, I would like to add one app with only English language. Then add another app. Then add first FAQ under the first app. Then link first FAQ to second app to create second FAQ. Then I would like to test whether the translations of both these FAQs match, right? Then I would like to update the English title. Then I would like to check again whether the content is being actually synced and updated to the second FAQ as well. Then I would like to delete the FAQ and then check whether FAQ is actually deleted in the database. Then I would make sure that second FAQ is still remaining intact. So can anyone see a pattern here? Anyone? There are two different things that are happening, which are complexed, which they should not be. That one is a simulation part where I need to do something and one is verification part. Normally in a test, they are complexed. I am proposing that they do D0P. They can be decoupled into two different things and the DSL can allow you to do that. Let's see how. So part of each action, what are the parts? So say for example, this is an action that I would like to do in my test. I would like to add app one with only English language. So first part is there is a type of action that I would like to do. Adding an app. Then there is a certain name that I'm referring to that's app one. But this app one is something that I am referring to. Internally it will be created some ID which the system will refer like ID one, two, three, four, five. But that's a runtime value that I don't know of. So I'm referring it to in a way that I can understand in my test. Then there is an argument to that that I would like to have only English language in this app. Maybe I would like Spanish. Maybe I would like some Hindi language as well. Then there is a certain expected result as per my spec, as per my understanding, as per my model. And there is an actual result which comes from the database, which comes from the state. So there are five parts to in each action that needs to be understood. So what if we could express this action that we need to do in terms of data? So here we are going to use something called as def record. It's something like basically like a class, but without inherently having functionality built into it. Just think of it as a hash map. It's just a hash map of this is the constructor. And just those are the arguments to this constructor. So I'm saying there is an ad app which will have app one and that's the hash map configuration. Does this make sense? Okay. So here is a similar set of test cases, but in terms of data. Right? So here I'm adding an app. Then I'm adding one FAQ. But here I'm referring to the name. So app one is being referred later. So I'm saying when you added that app, call it app one. And when you want to add an FAQ, call it FAQ one, but add it to app one. And so on and so forth. So this allows me to say the test in exactly the way I would express it to a developer. Add one FAQ, then add create one app within that app, create another FAQ. It's similar, but this is executable. Right? So, but to run this, this series of action, you need a certain context to maintain because it's kind of like a programming language. Right? So to have that context, you need three parts. One is the environment. So the environment contains, when I say app one, I actually internally it will mean the ID of that app, which is 970. Right? So this is, this is a giant hash map. There are three parts to it. Environment. Within the environment, there is apps. Within the apps, there is app one, app two, whatever it may be, depending on what actions you take. And then the app ID of the actually what I need to refer to, to actually do something. Then in second is latest simulated state. So after each action, what is the current expected state of the system? From my understanding. And then there is a log of after each action taken. I want to log what happened for what that action. So if I had added an FAQ, what was the simulated state at that point in time? What was the actually the state of the elastic search? Mongo, whatever, and so on and so forth. Right? So you keep adding to that log history after each action. Make sense? So what is the, let's just have a high level overview of the DSL. So we'll start with an initial context. We'll take the first action. We'll pass it to the run DSL. Then you'll get an updated context with the updated environment, updated log and updated simulated state. Then you'll pass it again back to run DSL with the second action. Again, give it to run DSL. We'll get another updated context after second action. And then so on and so forth till you are exhausted with all the actions. So let's take a more concrete example of that. So here, that's the initial context where the environment is empty, empty hash map. Latest simulated state is empty because there is no state right now. And the log is just a list of, empty list right now. And then you start with one action that I would like to add app, call it app one with language configuration nil. It's seeing nil because by default we always have English language. You give both to run DSL function and you get an updated environment back. The updated context back which has the environment apps with app one with whatever ID. And some log and some simulated state. Now you pass it back again, recurs back. And then you get the second action. And you again give it to the run DSL. And again it adds. So this time the action was add up to, here you have add up to and then the environment. And then you keep going. But what happens inside the run DSL function? So you dispatch on each action, right? First you need to, what do you need to do is first update the variables in actions to its binding. So when I say app one in my argument, I need to understand when I try to add an FAQ what this app one really means. What is the ID of that? And I need to put that back in the action. I'll give a real example of that. And then you actually need to call the actual function which will do the side effect. Which is part of your actual implementation. Which will be done by the run action function. And then there is the result of that. You need to use that to bind the result back into the environment. So when I create an ad app, once the app is created, whatever app one I'm calling, I need to add that back into the environment. And say when I, in future, when you say app one, I mean this ID, right? Then you would like to update the latest simulated state that after adding this, after doing this action, this is what I expect the state to be. And then what actually was the state in the DB? There is no verification at that point. You just collect history. Normally in unit testing or any kind of testing, you do something and then you immediately assert. Here the idea is you don't assert immediately. You store what happens and what you expected. And later all you just need to go through it and just verify that. And then you continue with the next action and then keep going. So each def record action implements a certain protocol, a DSL protocol to be able to do this. So those actions that we saw, those five actions are each part of a protocol called DSL protocol. So this allows me to add behavior to each data. We'll see an example of that. So here is a def record ad app where the, I once I am adding the app, I want to call it app one with the argument lang config. I want to dispatch on the actual implementation of this and called resolve action bars. Here there is no var. There is no variable that I need to refer because I will be creating an app one, but there is no variable or any previous variable that I'm using. So it just returns as it is. Then I want to run the action for to add an app. I need to call some internal implementation of ad app with the argument lang's config. That will return me an ID with which using that I will update the environment. So in the environment, I'm adding apps app one and that ID is 970. This ID 970 I got from run action, which is a side effect that happened and that buying run action, buying is to signify that there is some side effect, right? Then you would like to update the latest simulated state in my test. I don't care about apps. I care about FAQs, right? So I just say I don't care. I don't, I won't say anything about it. Similarly for update log, I don't care. I just return as it is. Next time I would like to add another app. I dispatch on again on ad app, call the resolve action bars. There is nothing to resolve run action, get ad up, add up to add it to the environment. You get a new ID for that. Update latest simulated state, nothing to do. And then you continue. Now here's the interesting part. Now I would like to add an FAQ, call it FAQ one to app two, which was two app one, which was already created previously. I would dispatch on ad FAQ implementation for resolve action bars. So here the war is app one. What is app one? I don't know. So I would look into the environment, see what app one refers to, which is app ID 970. Put it back inside that map, which I need to put it because that will be used by the actual implementation. Am I making sense? Okay. And then I will actually call run action with ad FAQ for app ID 970. I cannot call app ID with app one because that app one is meaningless at runtime. And then I'll actually get an FAQ ID back, which I will update the environment that if from now on, if you call FAQ one, I mean that FAQ ID one, one, two, right? And then I would like to update the latest simulated state. So there is some model that if after this action, this is what I expect the state to be. And I added to latest simulated state. And then I fetch whatever is actually there in Elasticsearch, whatever is actually there in Mongo, or whatever database that you care about, whatever that actual implementation has done, you store that as well. And you store what is the snapshot of that latest simulated state. So that gives you a snapshot history of after each action and whatever action taken. So then I can just run through the whole log and figure out what at what point in time what field. So there is no verification right now. Once you run through the actions, you do verification after that. So verification is straightforward. You compare the result of first action of what I generated, what I expected it to be. And what was there actually in the DB? If they match, you pass for that first action. But say the second action, I got something like this, but actually the title did not get updated. I can just immediately say second action failed. This is what you expected. This is what you got. And this is what you tried. And this can be done later. I can store this whole log history in a database. Look at other things, you know, anything. So let's look at a real example. So here I'm saying run DSL is the function that will take domain is what I need to, what site I want to add an app to. And then I'm saying add app one and call it language configuration. Let's see what it returns. So it returns a context where apps with app one, which has app ID, this, and there is a section ID. And later simulated status empty log is empty because I don't care about it. Let's take a slightly bigger example. So here I'm adding an app and then I'm adding an sq to that app return a huge output. Let's just see what all it has. So it has the environment, a later simulated state and a log like we discussed. So let's only look at the environment that has apps. It has an app ID and that there is an FAQ FAQ one refers to that FAQ ID. And it also says that app one has this FAQ ID. It's a set. So here I'm saying that add an app, then add an FAQ to that app. Then add another app link FAQ one to app two and call it FAQ two up FAQ two here. Link FAQ one, call it to FAQ two and to the app two. Then I would like to update that FAQ two with the new title and the new body. And then let's just not delete the FAQ for now just to see. Got some output back. So here I got this new title, the new body. These two and you can see here linked apps. This FAQ is linked to another app. So how is this being done? So let's first look at the DSL. So here is the protocol. This is the protocol. All it says is these are the five functions you need to implement. These are abstractions, right? There is no implement. It's like interfaces in Java. There is no implementation in this. All you have to say is to implement this protocol. You need to implement these five functions or any data or any death record. So here add app. Here it's saying this death record takes an app var and an app ads. And it implements this DSL protocol. And here is my implementation of that. So in this resolve action vars, I just return the action back. I don't do anything with it, which is what we saw. It was not resolving any vars because that was not required. In run action, I'm actually calling it an actual implementation of adding an app. With whatever app arguments that I need to pass it to. Am I making sense? Okay. And then similarly, you need to have an update environment, update data simulated state, and then update log. Similarly, for add FAQ here, I need to resolve the action vars. So I take the action in the context. From the environment, it gets the app ID back. I get the section ID, put it back into the FAQ config, and then create a new context. A new action, sorry. This is creating a new action back. But how is this being glued together? And there is a death record for context as well. So here is the context that is environment, latest simulated state, and a log. That's it. It's just a simple hash map. But how is this being glued together? What is this? So there is a run DSL function. You can pass it a domain and an action. It will recursively call itself back with an empty context. So here I'm creating an empty context with an environment, log state, and a log, and a latest simulated state as an empty thing. Then I get it come back here. It just makes sure that whatever you pass to it is the right thing. By doing some pre-checks that everything satisfies a DSL protocol. All actions. Then you check whether context is an instance of context, whether it has this key called latest simulated state, it has a key called environment, and whether a log is a vector, and so on. So then within this, I check. If there is a sequence of actions are there, then take out the first action, resolve that action, and get a new resolved action. Then run that action with the resolved action. You will get action result. Then here I will pass this context, and this thread macro will pass it to the last, as a last argument to each function as a result of that. So update environment, which will have the result action and the action result, and this new context, that will return an updated context, which will get passed to this, which will get passed to this. Finally, you have a new context after running the action, and you run the DSL again recursively with the rest of the actions left, with the new context. That's the key. You just need to recurs, pass the environment around, and keep adding to whatever you want, and keep updating the environment. That's it. So I'll just run a test. So here, for readability, I've also been printing what actually happened. So here it's saying action taken is this, and then I print the diff of what I actually expected and what I actually got, and there are three parts to that. Something that is only an actual, something that is only an expected, and what is common. So if this two is nil, then it has passed. That particular action, because there is no difference between them. So this is common, then it has passed. So here it's saying common to both is this, for elastic search data, and so on. Then I can, so this allows me, so this is a very important step in my opinion, because sometimes what can happen is, there is a bug, and you've simulated it in a very wrong way. So what happens is it can return nil, and there is actually a nil, and you might get nil equals to nil, and then you say passed. But actually it's not really passed. So you need to have some kind of inspection of what this dynamically is being generated, is actually what you really wanted to. I have been bitten by that, so that's the reason why I've added logs for this. And then you can see that it says, pass to test one, no errors, no failure. Any questions so far? So what are the approaches that we can take to verify something like this? Well, we can act. So normally what we do in our normal unit test is that we have hard-coded actions and hard-coded expected output. This is what I'll do, this is what I'll expect. But it limits the amount of combinations that you can try. You can have maybe five or six of them, right? It becomes too cumbersome if you really want to try everything. Then there is actions being simulated and expected output is hard-coded. Say for example, I do these five actions and I know that number of FAQs will always be five. I don't care about the result, I just know the count. I can hard-code that, right? And then this simulated and expected output is generated, which is the approach that I just showed you. So there is a simulated set of actions that I take and there is a model that will tell me what I expected to happen. And then I can use it to verify. But there is another level that we can go into using this, which is we generate everything. We generate the actions and we generate the output as well. So what are the advantages and disadvantages of both? If you have hard-coded actions and hard-coded expected output, it requires no knowledge to understand. Any newcomer tester or even a developer can come and just read your code, right? Anyone can read them. And it's very hard to get false negatives because you can really read exactly what is happening. But it becomes very cumbersome to enumerate them and the modification is hard. So this is a good fit when the system to test is not so critical and it's not so big, it's not complicated enough that I can enumerate them easily. Then similarly, list of action and hard-coded final expected output where output is hard-coded but you're still simulated. So it's slightly better. It becomes more readable because it's like a documentation kind of, right? And it's very... One other great thing about this, I can share it with the dev. I don't have to create a gira ticket which describes what needs to be done. So I've done that. I used to just share the code. Just run this. You will get the expected failure case. And they used to run it. Keep running it till it fails. And when they fixed it, just run it again. They know that it's fixed. That was a great win in my opinion. But the problem is that you need to maintain this DSL. You need to tell how to write this DSL. And it can become a problem, right? And expected output may have some runtime values like FAQ IDs which you cannot hard-code. So that can become a problem. So that can be solved by having expected... generating the expected output. Because if you have that, there is no runtime... even you can even work with a runtime value. But there can be false negatives if your model is also incorrect and the implementation is also incorrect. In the exact same way, you may both miss out on some certain bugs, right? And then there is generated action and generated expected output. So one of the great advantages of this is you can generate large number of tests and you can generate large number of combinations. And one possibility you can run into is that if I generate, say, a 100-action test case and it fails, I don't know what exactly failed. What was the failure? Do I really need 100 actions to do this? Or is there a smaller shrink case which will still simulate this bug? So there is a way out. There is something called as stets.check which will allow you to shrink your output into a smaller test case and it becomes really great. But again, it becomes hard. You need to require the knowledge of understanding how to write this. So let's look at an example of this. So we had a feature called issue audit train. So basically we have another feature called issues and we wanted to log exactly what is happening by the agents who are replying or the customers who are replying on that issue. So it maintains a log of who took an action, what action they took and we need to understand what kind of users we have. So at HelpShift we have app user, the customer who is using the app, then there are support users and even the support users, we have two types, agents and admins and what kind of actions they can take while app users can create an issue, reply to an issue, and admins and agents can create an issue, reply to an issue, resolve some issues, edit acts. So there are certain more actions that they can take. So here is an example of such an issue. Here is an app user who has said, hey, I have an issue and here is an admin who is replying to that and here is another agent who is replying to that issue. So there are two things. Here, issue audit logs. These two are the issue audit logs that we are adding and these are just messages, normal messages that are being sent and then we are interleaving what audit log happened. Like my uncle and my uncle added some tag called game, like that. And here when this issue was resolved, here we add a log that this issue was resolved by this agent. This is useful in debugging. This is useful when, say, you have a customer like Supercell who have like 500 agents, right? And they are like, who resolved this issue? I want to know. Instead of going back to the database and figuring out for them, they can just look at this. So this was a critical system. We needed to get it right and they wanted a certain order of audit logs to happen. They didn't wanted logs to be separated from the issue. They wanted it to be interleaved and we can't have logs being dropped because if logs are dropped, it's a problem. So this and there can be any number of messages. I don't know how many to test, right? So this is a hard problem to solve and you can't do this with just unit testing. So we went through this route. So I'll first demo this feature. So to generate actions, you need to have a certain kind of finite state machine of your domain itself that if I start from this action, what can I do next and what can I do next? So in this case, when I start, I can create an issue and only the user can create an issue with these arguments. Here, this thing is just a vector of three things. What kind of user it is, here in case is the app user, what kind of action they are taking and what are the arguments with that. So this is slightly different. This is using multi-methods instead of protocols, but the idea is pretty much the same. So then when I create, when I start, I can go into a new issue. So I should explain this first. So in this map, basically it says all the left keys are from transitions. Within that, there can be several transitions and to go into the transition, there can be several actions for that. So for example, if I want to go from transition one to transition one, I can have action one to n to do that. So the algorithm, what it does is it starts from whatever transition, randomly picks any transition and to go into that randomly picks any of those actions. This did not be random. You can have some kind of weights attached to each action and say it's very likely from going to from transition one to transition two, action two is taken, 80% of the time. That depends on your model. It can be done. So here I'm saying when I start, I need to go into new issue to go into new issue. There is only one action called user creating an action. Once I go into new issue here, there are three states that I can be in. I can still be in new issue. I can go into in progress issue of that state or I can be resolved. To be resolved, either the agent can reply as resolved message or the support can change the status directly as resolved and so on. Say for example, I start with new issue, it will pick that then from new issue it will randomly pick one of these three states. Whatever state it picks a new issue, it will pick randomly one of those three actions and then so on. This will allow me to randomly combine actions and generate expected output without having to know what I'll be generating. Let's run it and see how it's done. So I'm saying create on this domain, create one issue with five actions. I'll add a sleep so that it's easy to see. Here you can see it randomly added some actions and when I try to look at the log activity of it, here it's saying this person reopened issue, here are some actions were taken. This is random. Let's look at a bigger example. Here you can see it added the more actions to it and this is, I don't even know what this issue, what all it's done. But this is the simulation, where is the test? So here I'm saying create an action, one issue with five actions and then check whether the trails are tested and messages that are being rendered on the screen are also correct. So when I run this, I'm printing this. So there are two parts to it. One is that here it's saying this is what I expected the trails to be and this is what I got. This has passed and then what is being rendered? I expected seven messages to come. I expected seven messages to be there and I actually got seven. Here is the table of what I actually expected and what I actually got. And it's showing the table, this count. This count is important as I'll show you. So even if I run this again, no matter how many times, it always passes. But what if I increase the number of messages? So right now I'm only... This is it. So here from five actions, I'll make it ten. It passed. All those messages have matched. Let me remove that sleep. Let me make it 20. Something has failed. So it did pass up till now, but it failed, I don't know why. Right? And it's saying except for the first message, everything else is not matching. So if we really look at it, it's too big, but if we really look at it basically, the order, it's an ordering problem and it happens on a very, very particular specific case. Right? Let me reduce this to say 15. So as you can see, this allows me to kind of dynamically change and again it fails. Right? Here you can see true, true, true and false. Let's see false 23. Okay? And here it failed again, which is 15. Interestingly, we paginate older messages after 15 messages. This can be related. And turns out it is. So this is the original input data that we got for edge case as an example. Right? And this is doing a lot of stuff. It's hard for me to understand what part is actually important. Right? So the DSL, I can either write a function that is simplifying of the test case or I can do it even by hand. After doing it by hand, I realized this is simple enough. But the problem is, this bug only happens for older messages after 15 messages are there. So there has to be at least 15 messages. Within that, it will never fail. That is why if we had a count of 5 or 10 actions, it just passes all this. But after 15 messages, it may fail only if the audit trail happens to be at the border of pagination. If it is before or after that, it will still be fine. Why? Because there are some audit trails which will have two messages combined. And if that message gets stopped at the pagination, that message will come again at the second pagination. And that is why you will have the same audit trail twice. This bug was really hard to figure out and it took us quite a while after we have done loader exhausted testing that we found this bug. Does this make sense? So what are the advantages of writing a DSL? It helps in finding really, really hard to find bugs. For example, we found duplicate issue messages where the order needs to be just right to find it. Finding ordering bugs is something that we found. And it increased the productivity of a developer and tester because I could share this code and directly say this is how you can reproduce the bug. The disadvantage is that it adds a lot of high cost in maintaining the DSL. Right now, I am the only person who understand this DSL. So that's a problem. If you have a team, like 5 or 10 people, you need to teach them how to use this. And another problem is that what I face is if your DSL is just data if you make some mistake you won't get compilation error because it's just data. Unless it's actually being evaluated by a function you won't get any error. That can be a problem. And you have to educate your team about how to use this DSL. So in conclusion DSL can be used for workflows for testing. Separation, I think that's the key. Separating versus verification. Store your results. Don't verify them immediately. You can store it in DB, store it in memory, do whatever but don't verify it immediately. And then you can do all kinds of things with it. And it's a step towards generative testing. If your system is really critical if you really, really need to test all possible combinations then this can be really, really useful. Here are some resources. There is a one follower who is a very well-known person in DSL community and there is DSL in action written by Debashish Ghosh. It's an excellent book and there are some other links for that. Thank you. Any questions? Yeah. Yes, yes, yes. But QuickCheck is a step ahead because it's more of property-based testing. Sometimes it's really hard to define a property. Right? And this is kind of in between where unit testing and generative testing where I don't know the property but I do. I can model myself and I'll explain it in step by step what needs to be done. So this can be also useful in, say, for example, there is a bug that was found by QuickCheck and I would like to fix it. Now I would like to make sure it never comes back again because QuickCheck may or may not check that again. Right? So I'll express that in my DSL and I'll make sure it's always the same. Yeah. Any other questions? Okay. Thank you.