 Good morning everyone. My name is Shama Ogle and I work at ThoughtWorks as a senior QA consultant. Today in the session I will be talking about testing conversation AI and this is one of my personal experience where I had to strategize testing for one of my projects that I was working as a consultant. It was in my previous role. So to begin with this there is a small context that I want to set what kind of problem statement it was. So there is this US brand which is a major brand with beauty industry and they wanted to set up a chatbot and go live with the customers so that they can give some capabilities to that bot and help the customers in the non-business hours like order management, cancellation of orders, checking the availability of the products or reordering the previous orders and also if they want to change the pick up address or any of the things that they want to do with the order. So that is a small functionality that they wanted to expose through the chatbots to their customers. Now this came to me and I started strategizing things. I created a test strategy, I automated it, I put it on built pipelines so that we can give faster feedback to the customers and the developers and also do the release management early. The next part is also doing the monitoring part. Now while I was working on this project one fine afternoon one of my colleague reached out to me and said that she wanted to understand what exactly I'm doing as part of this project. So I said okay fine we can have a conversation and we started having conversation about this. So I was very excited because I got a little you know opportunity here to you know use the fancy words like AI and to show off a little so that I can just say that okay now I can show off my skills and I can tell that I've worked on AI and I've tested the AI. So I started my conversation with the simple line that the test project is all about testing the chatbot. She immediately stopped me. She said what? You mean you are going to test a bot? Are you going to tell me that you are going to test that simple tiny window that you get where it pops up and where you can chat with your sales representative? I'm like no, no that's not that's not the life person sitting behind and actually chatting with you that's a bot that is built using machine learning deep learning AI NLP all the fancy words that you have heard about in industry today. So she was like then why do you want to test it because it is an AI you don't have to test it. I think you should only test the user interface see if it is working or not see if it is on the UI or not check the presence and you're done you need only Selenium you don't have to do anything. So what are you doing? So I said let me explain okay and I can tell you what exactly it is and so what actually happens if you don't test in AI you know there are certain examples I want to share with you and I can tell you that why did I choose to test it and why did I choose to test the AI itself that is working behind the scenes. What happens when the bot will go bad okay yes it is just a conversation but it may go bad. So there is one you know recent in the recent past Microsoft released a bot called a stay on Twitter it was a chatterbox teenager who was supposed to have conversation with people and see how the human interacts so that it can learn. So that was the only intention. It started with a very simple statement and this was hello world and if any one of you are using Twitter here I'm sure you know how Twitter you know works and how people start attacking you right. So same happened with this poor bot and as soon as it started in few minutes it started you know supporting Hitler and it became a Hitler fan and in few more minutes it learned from the conversation it had with humans and it wanted to build a wall and teach a Mexico lesson and in some time it also you know started becoming racist it became it became racist and gave all the racist comments. So it was so bad and it had you know already you know damaged all the intention and all the repetition that Microsoft had they had to shut down the bot within 16 hours that they released it. The next is this is just one of the conversations right what if you what if your bot whatever it is designed to do does not even do that. So there is a simple bot which is designed to give you weather information. All you need to do is tell the name of the city and it will tell you what kind of weather it has. So suppose you are planning your trip and you want to know what kind of weather it is so that you can carry things accordingly you want to carry your sunglasses or umbrella or how do you do that. So this bot simply could not understand a simple word like weekend. So it is very obvious for you as a conversation you can you may say that I am planning a trip on this weekend to so and so please what do I do what do I should you know carry with me. So it could not understand the word weekend so it wasn't trained it wasn't tested for few of the scenarios that it should handle. The simple you know CNN news bot which is designed to give you news updates the simple task was to unsubscribe from the news notifications that the user was getting it could not do it. So these are very straightforward you know examples where the bot could not do what it was designed for. So this is the functionality that you had to test right. So it could not do that so you cannot ignore the functionality of the bot as well. So you have to test it. So finally she agreed she said yeah I understand the damage can be too much so you will have to test that. But I see that you know there is still a challenge how can you test a conversation. It's nearly impossible to cover all the conversations that any human may have with anybody. So all you're trying to do is testing a conversation. I said right even more than that there are some more challenges as well with all the bots. They are self-learning. In the testing we know that we have a typical test case where we have test data steps and then what do I need to expect as a output. Think of this scenario where you do not have a fixed output every now and then the output keeps changing because the AI will learn itself and it it may do better or it may do worse but how do you know if you have to test it how do you know what you should expect right. So that is the challenge these systems will keep continuously learning and improve and it becomes very you know challenging for you to test them. Second thing is it is nonlinear input that means if you have to do a particular task let's say you want to book tickets from Bangalore to London. If you want to do that you will go to any of the UI applications or mobile applications. You know that this is the screen where I need to put the from to city. I have to choose the date from this field and all I need to do is you know check the options and choose one of the options and go ahead make the payment and I'm done. Think about the bot you have to have a conversation to do it. You do not have any set of steps that you have to do. You may start with looking from dates and the date formats may change. If I'm from India I may form you know follow date format like mm ddy mm yy the people from the other part of the world may follow some other different you know date formats. What if they come here and they want to book a ticket from Bangalore to London and they give the input like mm ddy yy the chatbot that you are talking to has to understand and process that right. If it is not designed to handle these scenarios it will fail. It will it may book some different ticket altogether and the day you will appear on the airport they will not allow you to board the flight. Think about this right. So the next is non-deterministic user interactions. They may start from one conversation they may have other conversation in between and it will have no flow to all together. You will have to also remember the conversations you had and you have to give appropriate responses. Also there is no barrier to the users. You cannot stop the user to ask you any other certain questions. You have to make sure that if you don't understand you fail gracefully and get the user back to the conversation with your own capabilities right. How do you do that and that too with conversation. So before you have you know how to test it it is better to understand how exactly it works. So once you understand any application how it works then is when you can strategize your testing and that is what I did. I started understanding how the chatbot works and what it does is it will the the main engine here is the NLP engine which will understand what a human has is interacting with what are the text. It will understand what is the human trying to do and then process it accordingly right. And then it may be as part of your implementation you might want to go back to your databases and get some information and then provide. For example you want to search flights from source to destination you might want to talk to an API or you might want to talk to your databases and then fetch that information give it to the user and maybe you know later on take the conversation ahead. So you might want to also handle all of these but here today we'll be only talking about how do you test the understanding of the language and how do you process it and how do you validate whether it is processing it and responding you correctly or not. So as part of the next conversation I I explained her like okay this is what how it works and I what I need to do is I need to also understand how this sentences that I enter as a conversation are understood by my bot. So what bot does is NLP engine it will tokenize all the words that you have given and try to identify intent what is the goal that the user is trying to achieve. Booking flights I'm trying to book a flight checking the order status and trying to check the status of my order right. So those are the examples and then what is utterance it is the different way I can ask for the same question. I can say I want to fly from Bombay to Pune I want to go to Bombay to Pune I want tickets give me tickets show me tickets these are all different ways I can ask a bot to do the same thing. So bot will also have to understand and it will have to understand the utterances. What is entity it will have to you know extract the variables I want to fly from Mumbai to Pune Mumbai and Pune is these source and the destination right. So it will extract these entities and then process accordingly call the API is talk to the databases and then give you the output right and then the next is channel. The channel is nothing where and how the user is actually interacting through your bot. So you might have you know deployed your bot to any of the chat engines right like your WhatsApp your Skype or cake or a telegram right or it might be part of your web or it might be also part of a mobile application. So that is a channel and through the channel you will connect to your engine right which is a NLP engine and then it is interacting with the APIs and the databases. So this is the entire setup. So this is how your typical request would look like. The first thing that will happen is it will take the entire utterance. So all the sentences are utterances. It will find the intent okay it will break all the words into tokens and for try to find what is the intent. The next is it will extract all the entities and then it will process this. Once I understand how does it work the next is how do I start testing this. Definitely first thing is I have to start designing my test cases. Second is I will have to definitely automate it. I cannot be running all my test cases across across all the platforms that I have made my bot available and across all the builds right and the next part is a very important part for any of the bot testing is crowd testing. So as the first thing still the question remains the same right how do I cover all the kind of conversations any human would have. So she started again questioning me the same thing. I understood everything that you told me. I understand how the bot works. I understand how NLP works but still how are you going to cover all the conversations. So what you need to do is need to categorize your bot into different categories and for each of the category you will have to write tests. The first category in this is personality. Through bot think about this if I ask you to get a task done through a bot like booking a flight would you like to do that. How many of you will trust that bot completely and blindly go and say I am okay going and booking why take it do you have that trust why don't you have that trust because it is a bot you don't want to talk to a bot right you don't want to talk to a bot and especially you don't want to do any transaction with it you don't trust it right what if I tell you that there is a salesperson as she thought there's a salesperson you can do it through chatting you'll be still comfortable right oh I don't need to go to a UI put everything wait and do that but if I give a chat application wherein I have to just open and say I want to go from this to this place this is the date and they give me the tickets wow and if I tell it is a sales representative fine I'll be okay right so what does it mean it should give me a human touch the moment you understand that this is a body will go away and I'm not sure if anybody would like to still go ahead with it so you'll have to first give a human touch so think about this if you go to a counter and talk to a sales representative and say I want to test sorry what else I want to book tickets so first thing is that person you know they will have a name they will tell you okay I can help you with booking the tickets or what do you want to do these are the options I have so that is how the conversation goes so the personality of the bot is very important you have to design test cases and make sure if your board does not have a name you give a name to it you introduce yourself to a person who is talking to you I am so-and-so I can do this for you so that the person knows that what are your capabilities and what you can do for them and then the conversation starts right rather than just saying hi so that is the first kind of test that you will cover the second part of test cases that you will design is onboarding you have to tell what your capabilities are unless you are Siri or Google Assistant the next is intelligence so in this intelligence think of this okay if I have to order food and I have a bot and I decide to you know I take a home delivery option and I have to give them my address there are some people you know they will give all the address together very nicely in a better format like they'll give you the flat number the road the area and everything think about the people like me who give 100 lines of you know sentences I'll first give my flat number next line I'll give something else then I'll directly give me area then maybe I'll give my building name and then maybe I'll put the city so what should understand how to process multi-step conversations and it has to also understand till where I need to wait for the user to complete it when I need to ask for the user for more details right and when I need to start processing so these are the kind of tests that you'll have to cover as well also you'll have to cover many utterances asking the same question same thing differently then comes the most important thing error management think about this if I ask you something you don't know the answer all you'll say is I don't know every time I ask you differently as you say I don't know how how long will you have the conversation with me no right so you'll have to fail gracefully there are multiple ways you can say that okay I don't know maybe I can do this for you if not then maybe you know you can contact this person or you can contact here fail gracefully next is understanding you'll have to understand small talks I cannot directly jump to the task I'll have to have a little conversation maybe if someone is asking me how are you doing today I should be responding to it if you're asking you're greeting me I should greet you back so you should do small talks you should also understand the abbreviations if I'm using especially these teenagers right I don't understand at least what they say but understanding teenagers is very important they'll use all the shortcuts they'll do a lot of spelling mistakes they'll use a lot of I don't know bad words love words and they'll send you a lot of emojis pictures I don't know what all so you'll have to understand what does that mean right so your body will also have to be intelligent enough to understand this and respond appropriately cover these test cases as well and the next is very important speed and accuracy accuracy is very important I'm asking for something you're giving me something else so response is not important the right response is more important the next is navigation this is again very important when it comes to a conversation AI through applications if you want to go back to a previous screen you can easily use the back button and say cancel but think of bots you are having the conversation you want to go back to the previous conversation suppose you're doing the order and then you realize you won't be able you know you won't be at home at that point in time to pick the delivery so you'll have you'll say oh no wait I want to do a pickup so what if the board does not understand to go back and the change the request so these test cases you'll have to include in your test cases so once you have designed all your test cases the next part comes test automation you want to automate it and you don't want to do all of this repeatedly think of the utterances you have would you like to sit and type all of these sentences and validate it every time you cannot do that right you simply cannot do that so how do I automate so I was looking for a lot of tools there I even at some point in time I decided to write my own script to you know put certain assertions but finally I found this tool and I was very happy because I'm a selenium user for a decade and this tool also has similar syntax okay I don't have to really struggle a lot or understand or you know do a put in a lot of efforts to learn this tool and then use so it is simple selenium based I wrote code but even you can write your test cases the test cases that you wrote for having the conversation you can use the same test cases put it in the plain text file and automate it it's that simple so I started you know exploring this more and this tool has a lot of things right to offer it has the command line it has also the integrations with your you know Jasmine mocha anything that you want to you know connect with then it also has a beautiful UI based tool where you can you know if you do not want to do coding or if you're not a coder and you still want to automate everything you can still do that with this tool and you can put it in your you know you can integrate it in your CI CD build pipelines and you can just make yourself free okay so I can just show you how to do this I'll be doing some demo so to install this is again a node application so the first requirement is you have to have node in install and then this is so for CLI you will do NPM install Bortium CLI and then if you want to connect to any of your bots so there are a lot of bot engines it may be Microsoft Lewis it may be Google dialogue flow it may be anything right IBM Watson any engine that you're using they have connectors available with it and all you can do is you can connect to your bot and then start writing your test cases automate them and put it on the CI CD and if you want to use a bottom box which is a UI based tool you can do that as well you can put it on your local or you can also use any of the cloud instances you can also install it on your local clouds so this is how a typical I'll you know setup looks like for a dialogue flow I'll just take you through so this is my dialogue flow right and this is my bot agent that I have I have all my intents and everything mapped here all I need to do is create a service account here and extract the information that bottom is looking to connect to your bot the information that bottom looks for is the project name this is you can give any project name this is for your reference and that reporting purposes container mode has to be dialogue flow and then you can you'll have to give the Google project ID service credentials and then the private key and all you can do that is using your service accounts okay you can go to service accounts and you can do that so I have also given a reference link in my slide so you can go through that documentation it is a very simple step once you have that information you can connect to your bot and so you can start testing your conversations so what are the various ways that you can write test cases is as I said whatever conversation test cases you have written in your you know plain text or excel sheets mostly people prefer to write in excel sheets you can use the same sheets and put it on your CI CD all you need to do is a little efforts you need to take to just format it a little bit which I'll taking you through and you can also put it on the CSV files or you if you like writing code you can write you know you can write test cases using javascript and it is very simple it is just like your selenium syntax okay I'll quickly give you a demo on this how exactly we do I have created a small you know agent on dialogue flow which is a coffee shop all I'll need to do is I'll need to order a coffee any drink or snacks and I can choose whether I want to deliver your pickup and then I can use the card and then go ahead I have not given card details on any of that sort but yes that is the simple flow that I have used so I'll show you through the bottom box that I've already set up okay so this is the bottom box it is running on my local if you see can you see this or do you want me to increase the is this visible okay so it's running on my local and if I want to register a bot you can go to bot register a new bot and say what kind of bot is it okay so you have all the options here any technology if you're using so I'm using Google dialogue flows I have selected dialogue flow and here is what the information you need to give you will get a JSON file from the service accounts that you have on your dialogue flow okay and all you can do is download that file and drop that file here automatically it will extract all the information that it needs and you can connect so I've already set up this so here is my bot okay and you can get a window here to do the live chat as well if you wish to do so you can do your manual test cases here so let's say connect and I can start having conversation I can say I want to tea and then I say I'll I'll need a pickup okay so simple conversations and once you have the conversations maybe you might want to save this as a test case you can also do save as test case and you can give the name to the test case okay and say okay so you're ready to go with using this test cases and put it for automation it is as simple as that okay so what I've done is I've already created test cases and this is called your test sets so let us have a look at what kind of test cases that I have already created okay and how do you write it so you can write your test cases in this fashion so if you see this you'll have to have a text file okay where you will record all the conversations all you the first line is the name of a test case the second line would be hash and me so whatever you write after this will be the text that you want to send to a bot okay this is your request and hash bot and whatever sentences you record here would be the responses that you want to assert for you want to look for okay so this is a simple scenario a very straightforward test case where I ask for something I'm expecting something okay so this conversation can go on now can you think about something here which may change and which may fail yeah so it is good morning right and after after one hour it will say good afternoon your test cases will fail so is it a failure it's not a failure right so what do you do you want to parameterize it right in automation world you would you would say I want to parameterize this I don't want to use this right so what you want to do is you want to extract all of that put in utterances file and use that utterances here okay so utterances may be in the request that you are sending I may say hi someone else may say hello someone else may say hey right so there are different ways I can have a conversation with the utterances that we just spoke about so what I'll do I'll create a file called as utterances hello utterances thank you utterances sorry utterances all the utterances I'll put all the phrases that I can use to do the same thing I'll list down and then in the bot section I can list down all the sentences and all the responses that a bot can respond and that I can put it in another utterances file and in me and bot section now I will not use any plain text but I'll use this utterances files what will happen is so whatever utterances that you have as part of me will be like your data driven all you know every single utterances will be sent to the bot and it will check for the responses in your responses utterance file that okay this is my first utterance I'll send that request I'll look for the bot utterances file and see is it responding in any of these sentences so if it is fine then okay that is passed it will move on to the next utterance so it will be like a data driven so it will pick all the utterances validate against all the responses that you have recorded for your bot utterances this way you can have a lot of combinations in a single file and you need not to write all your test cases separately again and again okay this is the second step then there is another way as well if you see that if you talk to some of the bots they will give you some images or you know pictures or buttons or links right so you want to validate them as well how do you validate again same format all you need to use is buttons so in the bot section if you say buttons that is what it is responding with and if you want to click on a button as a user in the me section you will say button and then you will give the name of the button that the bot is giving so that will actually trigger the click okay so it is that simple and end of the day after I test all my you know test cases that I've designed for a bot and nlp I want to also test the user interface if it is integrated with any of the app or if it is integrated with your web you want to check if it is working there or not right you want to go to that web and you want to actually do that so that is the next step so this is how you can cover all the different cases and you can start automating I'll just show some of the demo that I've already created these test sets so if you see that I have created this test set okay so the I have two utterances files okay one is hello and one is thank you and I have this small talk where if you see this is how I have written right I have used utterances files wherever I want to match it with any of the responses that the bot gives and it is not fixed so once I set up all of these all I need to do is go to the test projects and run or else I can just simply start running here it failed right so let's see why did it fail yeah so in the utterances file I have not put buy and I have put till next time so buy is not there in my utterances file so it said that I was expecting this but this is what I got the response and this is a failure now what do I do here think of the scenario that you trained your bot and you have added some thousands of utterances right and you trained your bot now it starts using some some other responses as well what do you do now you cannot go to each of the utterances file and every time say that okay whenever you train I'll have to go and add these many right that will also become tedious so what I do is I'll say okay if I if this is the intent make sure that you will respond me which matches this intent okay so in your dialogue flow you will have certain intents and you will have all the training phrases that you have trained your bot for now as you keep on training them I cannot keep my test cases but I'll say the response should match whatever response is that you have trained for from this intent so I'll now put an assertion on the intent level and not put the assertions on the sentences level so this way I can also validate that the bot is actually identifying the intent correctly and it is validating my intent and it is picking up any of the new training data that I have given I don't have to really care about all the utterances that I'm recording as of now so that is the other way that you can do okay so you can cover each and everything like this and once this is there I will also need to put my test cases and run them through a interface so I will be running this on a sauce lab okay I have my browsers and I have all my devices set up if you want to add more or if you want to also add your sauce lab or you know your local selenium grid if you want you can put a local selenium grid and connect and run you can do that as well so what you need to do is you need to just go here and say device providers and you can set up and you can give the credentials for your sauce lab or you can give credentials to your local say you know selenium grid if you want or your apm right for the mobile devices the next is you can register all the devices that you want to test it against you will list it here and in your test all you can do is label them okay you just label your test cases say I want to run these test cases on this this device and this particular platform and all I can do is start running these test cases I'll quickly show you on my sauce lab account okay hold on a second oh okay sorry I ran the so actually this was for the buttons so this is how you can you know capture the buttons and the images using the buttons and the links and the images assertions that we have so let me sorry for pulling up that sorry I have to just I can just show you the test cases that I just ran you know few hours ago so this is what the test case I ran which I just kind of integrated with my sauce lab this is on the browser if you see or this is you know integrated with your web browser and the test cases for mobile was apm here it is so this was on the android device that I had selected so it picks the android emulator whatever device that you have set up and it will pick that device and it will run so I have done it on sauce labs because I didn't want to set it on my local and get into trouble so so it will connect to your apm you have connectors with your selenium apm and all the nlp engines that you have so you can choose to run through your user interfaces on the apps or you can also choose to only test your connectors of any of the technology in the backend that you have used so this is all about the automation but it's still not ready to go live right can you think why why is it not ready to go live you have done test case design you have automated it it is on the CI CD build pipeline you are getting the faster feedbacks everything is in place but still you cannot go live and you cannot face the world right that's right exactly so how do you get that data as a tester I have my limitations of thinking of the utterances the words right look at my it depends on my personality my background and where I come from my language everything right so I have limited you know a set of utterances that I can use so what I can do is I'll have to open it for a different set of people who can give me more data so I can you know as I told you I just don't understand teenagers but I cannot behave as a teenager and collect all the data to do that I have to go to the teenagers and ask them to give me this data so what I'll do I'll hand pick people and open it for crowd testing so you'll have to hand pick people from different background from the people who are really tech savvy who do not understand technology and very simple users who are from different professions who are from different age groups and geographies so that you can collect as much as more utterances and as much as more data you can train your bot and then you can say okay I at least have covered a little bit ground and now I can go live so that I have at least covered a more ground is what I can say next is once you open it for your crowd testing you have trained your data I will next do want to do is monitor it and see how is it performing right I will I cannot I cannot just say that I'm done and my bot is there it has it has been trained and it is performing I also have to constantly see how do I measure the quality so you'll have to see the goal completion rate how many of you tell me this how many of times have you interacted with any bot and you could get what you wanted and easily without any frustration and irritation like the way you would do it with any of the human interaction not right so you'll have to see how many people could complete what they wanted to see like you know if you wanted to book the tickets if it starts asking you many questions and it starts taking you from here and there you might want not to continue and you can just quit I have done that right so you'll have to see goal completion rate and that is where you can say that yes my bot is successful the next is service self-service that how many times bot did it all on its own and did not I know intervene the human sales representative or it asked you to talk to the representative and gave the email addresses and phone numbers and just shut down right so you'll have to also see that how how many times it was successful on its own and did not take any help error rates how many times it did not understand the user and it fall back to the you know fallbacks that we call like it did not understand it kept asking the same question how many times you know it hit all the intents and what were the success AIM ML rates how many times it learned user retention rates is how many times the same user kept coming back to you is what you have to collect and the most important thing is after you have your conversation get the feedback from the user that you have interacted with and see where you can improvise and this is the most important thing is that what I feel so that you can improve your quality better and you can even serve your customers better and better right so that is it and now you can say and you can say go and you know put your bot live and bot can say hello do we have any questions yeah so I was having a doubt like if there is one Indian company which is implementing AI and their customers are from outside the country like US and UK or some other Japan and Chinese and all so how they are going to handpick the audience and they are going to tell them that analyze or use that thing so that basically utterances how we will be able to collect actually that was the doubt was there yeah so the the utterances that you'll have to collect is from the geographical location only because the user base is there so most of the common utterances you may you know come up with but you know something which is regional based or the jargons or anything that they use has to come from there so the crowd testing is the best way that you can do and also you have a lot of data sets available so the botume also has a lot of data sets available made to you based on the different domains that you are working on if it is a banking app or if it is a e-commerce or if it is beauty or anything of that sort they have sets and they have it geographical as well they have in Dutch they have in English so I have gone through that you can use that and still I believe that crowd testing is the most important thing when you want to collect test data with respect to bots and when you do this hand picking the right set of people is more important you cannot do that in India right you have to do it there you have to pick people there and you have to cover as much as people there with the different mindset professions each group so that you can get relevant data is one answer I guess we need to take the other questions offline with Shama sure thank you Shama thank you