 So, so welcome everyone to the testing conversational AI strategy to automation session by Shama O'Gaulay We are glad they can join us today So without further delay over to you Shama for this wonderful Thank you so much Devesh So let's start people and I just wanted to you know Ask one question before I begin. Have you ever heard of tea? Anyone of you heard of tea? You can give thumbs up So a day was a Microsoft bot or released on Twitter And I just love this example whenever I discuss about conversation eis back in March 2016 Microsoft launch this bot and this was designed to have conversations with Twitter users Or to learn and mimic the human You know behavior by copying the speech patterns So what that was the intent and it was supposed to engage with the people is between 18 to 24 and let's see what happened Okay as part of that experiment So as usual, she started with a very sweet hello world message and in few minutes We could see that she became you know Hitler's fan and Few more minutes later We could see her or you know that she wanted to teach Mexican a lesson by building a wall and In couple of more time she also became racist and that was enough damage made to the brand and They had to shut down this within 24 hours Or Let's see another example Which is CNN news what and the spot was designed or to you know achieve The news updates and the user tries to do the unsubscribe and it did it says yes I unsubscribe but the next day since you know it continues to send alerts So it continued to send you know alerts and then we could see that Something went wrong, but what failed I don't think board did not understand what it understand It did its job, but then I think underlying integration failed here, right? Another example Poncho, so this is a weather board. It is supposed to give you the weather information for a given location, okay, and With this conversation at least I could clearly see when the user asks about an umbrella or asks about the weekend Weather information it fails to understand that pretty straightforward, right that we use on day-to-day basis weekends So before I know Now that we have seen enough of the examples how they can fail and what can go wrong We need to understand what makes The conversation AI so different from any other Traditional softwares and what are the challenges here while we do test them, right? The first thing is they are self-learning systems Then what do you mean by self-learning systems? Most of them are built using natural language processing In a machine learning deep learning algorithms and these are under constant training and improvement on day-to-day basis So the tests that you're writing today with the expected outcome may change in the next run, okay? because they have got green and Having such a non-deterministic component in your system under test which would make the software testing completely useless Because it is changing as soon as you cannot tell the reason whether it was a failure or it was a defect or it was an improvement Okay, that becomes a challenge and And when using a chatbot or a voicebot for that matter or either You know, there are no Interaction barriers for the users as compared to any other traditional web application or a mobile application where you have a UI which allows and Predefined means of interaction it will have pages buttons links to navigate to give information But then in bots You have to cover all that kind of unexpected user input in a very decent way Plus the non-deterministic user interactions such as a human language is a way of speaking texting Also using the phrases jargon short forms of tons of different possibilities makes it really challenging for you to cover all the tests and Also, if you this is specifically for voiceports if you see On an average at least we have seven and a half billion humans out there today, right? And that means that I have seven and a half billion of voice samples with different textures Okay, and for you why it does not just matter who is clicking the button who is calling that API who was filling in that form but then for a bot it does matter because the voice is in action and Before you just jump into talking how to strategize such kind of Application testing and how you know how we can proceed with that Let us see how it works in a nutshell so that it will make our life easier and we could do Stabilizing them better. So this is an actual how it works like the you know, the user may have input coming through either Boys or either chat text format and any any any platform it might be, you know, like your sky your Twitter or WhatsApp or Telegram or also the bots like Alexa a Google Assistant, right and What it does is the next is it takes the input from these platforms and it is read by your NLP here and NLP here would process this based on the input and it would direct these To maybe, you know a third party or internal APIs or also the deep databases And it will try to find or to convert the user's speech or text into a structured data by tokenizing it first that is it will You know break down the user's input into the series of words called tokens and then these tokens are Representative with different value in the application the next is it will also go sentiment analysis that it would try to study Or what is a user experience being like it would transfer the inquiry to the human whenever necessary and Also, it does the normalization where it is trying to correct your typos and you know trying to understand your phrases So this is what you know NLP layer does in a nutshell and then it will also Do the dependency parsing which is checking for what are the subject what the user is trying to do and what are the Entities that again extract think of these entities just like your variables. So this is what the NLP layer does for us So let us just see an example before, you know, we could understand it completely So here is an example. Let us say that I'm trying to book tickets and this is the user utterance that the User is giving as an input. Okay, and this input says I want to book tickets from Mumbai to London Now what my NLP will do is to try to understand the user intent and here it is booking tickets and Next status or trying to extract the values that is the entities what we call them as and the entities here are the source and destination Mumbai to London Okay, and now we have a clear idea what utterance look like What is the intent of the user and what are the entities? Okay, so now let us see how I can strategize testing this so In this inner talk, we are going to explore all these strategies to design a test case And we will move forward How do I automate the tests that I have already designed and also we will see what are the other things that I have to consider while I'm testing such applications and My name is Shamal and I have been working You know on testing reports AI and LPs so far and I just wanted to share my experience with you So that it could help you to better strategize if you come across any such applications Oh, so let's begin. I'll talk about this test strategy first, definitely we start with the Designing a test cases that is manual tests And then we will move to automation testing. We'll see how do we automate and then we will also You know analyze once I have the results. I'll have to analyze the performance matrix How it is performing we find the kps and then take a decision whether I can productionize it or not and secondly Last but not the least do the crowd testing which is very important. We'll discuss this later So let's begin um In test case design, we are going to categorize this with uh, you know, if seven different categories And we will design test cases for each of the categories and the first category is personality and onboarding So make sure your board has a personality It has a name associated with it. If you remember, uh, if I say Alexa, what does come to your mind? You know what elixir is right? It's support right if I say um Google assistant, you know, what exactly it means. So they have their own personality. They have names I could relate to them if I go and talk to a person and don't I don't know the name The first thing is what I am going to do is as ice breaker I ask the name and then start greeting and then start my conversation So the same thing should go with your bots as well Keep them a name and the personality matters because if it is a sales board if it is or you know a board which is In the customer. I mean a customer service. They have to be polite sales board have to be a little aggressive, right? So this is what the personality matters and then onboarding is where the board will tell you what it is capable of doing for you Introduce itself and then give you options what you can do, right? So that way it will minimize the conversations and you can get straight to the point by knowing its ability So write all the kind of tests around there So if you see an example here, uh, still, you know, uh, it is uh, directly jumping into asking a zip code and when the user is trying to engage with the spot By doing the small talk, it does not understand all it needs is a zip code. It does not understand anything So this is what you have to keep in mind that it has to also engage with people to do a small talk Introduce and then move on to the topic when the user is comfortable going ahead, right? Uh, next is intelligence In intelligence, you might want to test for Multi-step conversations. How does a board remember for example? If you are giving an address now for a delivery, uh, there are, you know, a different kind of people, right? Like let us see this so, uh, the person might give you every component of the address in separate lines Or there are people who type everything at once and then say, right? But then what has to understand when to stop or when to process that So that is the intelligence of it and does it handle it better way? You might want to test it You might also want to test whether it remembers Um, you know going back or to the conversation here if you see, uh, it Wanted to go back to the conversation Um and see what all information the user has given remember it and process the information But not with every single step ask the same, uh Information again and again. So you should remember what the user has already given you input. Remember, we don't have a UI here I have to collect all the information that I have and remember and then process all of that at once, right? So have those test cases as well Navigation this is very important Um with my, uh, you know web application or a mobile application I can go back to the previous step By clicking the back button right and I can make the changes to the inputs that I have given right But then with bots you cannot do that. It's a conversation flow. Okay. Let us see this example The user is uh, you know, um trying to do the pickup But then when the user sees that the time, uh, the open times of the store are different Uh, they he wants to now a change Or to pick up but then The bot does not understand that what happens here is Uh, bot just failed to understand and make a change from a delivery to pick up, right? So I the the user could not go back Another example the user is trying to do shopping. I wanted to buy For bananas by by mistake The user says for apples and then corrects its, uh, you know immediately But what bot does is it does not correct it, but it adds four bananas and four apples as well So it It did not understand going back to the conversation and making those changes, right? So these are the kind of test cases also that you have to include Error management, this is again very important in y you will get a error Message saying that this went wrong. Please enter this. Please enter that right, but then in Bot again, it's a conversation flow. Uh, it has to understand Uh, what you're trying to say and if it is not designed to do what you are asking It it has to give you options very gracefully. So that is feeling very gracefully not that if Think about uh talking to a person every question that you ask the person simply says Sorry, I can't help sorry. Sorry. I can't understand. It is very annoying, right? Same happens with the bots as well If every time the bot is giving you a response is that sorry, I don't understand. I am afraid. I don't understand That's not the way you might want to handle. You might want to handle by giving user an option that this is what I'm designed to do Why don't you um, you know, ask me these questions, which I understand something like that Handle the failures gracefully and try to move forward with giving options to the user so that the user could pick one of them This is again very important or understanding, right? Um Every user has a different way of texting using emojis nowadays. At least you can Create your own emoji, right? Uh, so think of that you send such emojis and the bots you can understand that and then respond also think of sending links like you know the media videos pictures um and You know short forms phrases local language and also the context for example Uh, what do you understand? Uh by Selenium if I ask you, right? If I ask you, uh, if I ask a person who was working in it industry, what is selenium? Their understanding of selenium is different. It is automation tool, correct? And What about A chemist if I go and ask, uh, hey, what's uh, what selenium? What is his understanding or her understanding? It's completely different. It's a chemical, right? So understanding a particular word in the context is very important and that testing also you have to do Whether the board understands the context, okay? Have those test cases as well Uh, if you see here, uh, the user is trying to give an emotion what correctly responds and identifies that and Next is speed and accuracy Um with UI if you're loading a used data, you know, you're getting a loader the user knows. Okay. This is being processed But with what you ask something and what is processing that? Meanwhile, you might want to see how is your board responding while it is processing the, you know Used response or it is taking time. Uh, why not, uh, you know, tell your user that hey, you know, what I'm processing Just hold on there. Uh, just this example I just love this example where the board tells you, uh, you know before you three people Approximately waited for two minutes of time to get this response. So And engage the user in the small talk instead and then give the response that was beautifully handed and secondly I just don't want to throw any response just like that without thinking at the user But then I want to answer the security, right? For example here, uh, fine It is engaging the user with a small talk The board asks how would you describe a term bought to your grandma and the user says my grandma is dead Uh, without understanding the sentiment behind it or what it is It gives you, you know, um, a normal response A genetic response. All right. Thank you for your feedback. That is completely annoying, right? So you might want to test Uh, such scenarios as when uh, what it is responding it the response is not important, but What it is responding and whether it is accurate to the context is more important understanding the user's intent and responding to the intent is important So have the test cases around all of these as well, okay And there is another example here It it just does not understand what the user is trying to say and continues sending messages Right So that said, uh, we now have all the test cases Suppose around all the categories that I can think of a robot and I have created my test cases Now the next step is I'm going to automate that To automate I'm going to use botume as part of the stop and we will see how we can do it Okay, and in nutshell a botume is a platform where it is Uh, it is root of open source tools as well as it has an enterprise edition where you get even more features Um, and uh, it consists of a port where it will allow you to integrate or run your test cases on platforms like slack or telegram whatsapp where you are deploying it And then it also has a lot of connectors nlp engines like if you're using your uh back end nlp as Dialog flow or ibm what's in macros uh microsoft lures There are tons of them and all are supported as part of botume You can use this connect to these nlp engines and then you can also use this through cli Or you can have a beautiful ui, which is botume box. I'll show you in the demo And you need not to have any programming background Or you need not to write any code as such When you're using the botume box, but then if you're writing your code You can use your cli and put it under ci cd as well through botume box So you can integrate that you're with your cd Let's quickly jump on to a demo Without wasting time. So I have my files here Okay, and what I'm going to simply do is I'm going to run using botume cli and I'm going to run with mocha Okay, while I run these test cases Let me quickly show that I ran or sometime ago And let us see The report so it executed all the test cases. I had around 15 different test categories and two of them failed and it says Uh that it failed for some reason what response was this expected was this okay, and the same here Uh, I was trying to test with a picture, but then it failed to Uh, you know a respond to a picture I'm sure you're confused what I did right now, right? I just had a bunch of Some files here I started running it if you could see and they showed you some reports random reports But then we don't know what happened. Okay, but then let us see what happens step by step. Okay First is I'm going to set up my botume Okay, I it is uh node based so you will have to have more installed on your system And you can install using npm, which is a package manager by this Uh, you know a simple command npm install hyphen g globally install port mcli Then I'll do in it which it is going to initialize the project and it's going to add two files One is the bottom dot json file and the conversation file. Okay, and then I can run the test using mocha Now all are these files. So what in your dot json file is a file which it Uses to configure and connect to your bot. Okay, uh, think of this I'm I'm sure people are You know aware of selenium and apm and if you're using that you would know what are the capabilities Okay, uh, the capabilities Uh are like your desirable capabilities that you can use to connect to your bots So that is where exactly we use the same thing in botume as well And we would put those in the bottom dot json file. I will show you a sample bottom dot json file Here it is. I'm just telling the name of the project I'm going to give the source. Where is my port and I'm going to give certain arguments This is a simple json and if I want to run it locally I might want to tell I want to run it on pro I want to connect to uh, you know, vector IO. This is one of the connectors where you can run your test cases on um, you know a browser And you could also run your test cases on different devices different platforms Or using one of the uh, uh tools such as uh, you know, uh, sauce labs connect to it Define all your capabilities here If you look at this section, it is exactly like your selenium and your, uh, you know, uh apm So you can define all of these capabilities where it will connect to your Devices and run the test cases Okay, and the next is the conversation file the convo files are nothing but your test cases Now the test cases can be written either in simple A straightforward text format Or in a notepad or in an excel sheet or in a csv file or whatever you're comfortable with or if you like coding You can write the javascript code as well So let us see how we write the test cases. The first line is the name of the test file Okay, hello dot conversation or what i'm trying to do is i'm just trying to do Uh, greeting. Okay. This is the name of the test case And then there is a section called as hash me And then there is high and then there is hash bot. Then there is, uh, you know the response So whatever you're Giving is part of hash me that is the user input that you're giving to bot. Okay And followed by hash bot is the response that you're expecting from your bot. Think of this as an assertion and think of this as a Simple input. Okay. Uh, now if you think of a bot, right, do you think always this test case is going to pass? What do you think? So, uh, not always right, uh, think of this, uh, good morning If I if I ping my bot in the afternoon I could not say good morning, but it might say good afternoon and that is what I need to test But how do I maintain all of these validations? Do I need to write those many test cases? No, not at all. Just the way we do, uh, Parameterization in our automation test cases with different tools I can do it here as well where I can create a utterance file and put all the possible ways that I can talk to my bot I can say hi. I can say hello. I can say hey Right, I can put all of those possibilities and then use the name of the file there So what this bot is going to do is it is going to parameterize this test case and your input has so many different parameters it will test this application test this scenario for every single Uh input from your hello utterance And how do I put assertions and validations here? I can use another utterance file and use that utterance file in my bot. So what will happen? It will pick each of the you know Example uh that you have put in hello utterance and it will assert and look for any of this Occurrences. So any of this occurrence is good morning. Good afternoon or good evening If anything of this is responded your test case would pass because it might depend on time as well, right? So next is how can I um send in emojis or pictures or links? I can do that by um, you know giving or the input using the keywords like buttons If if you remember there are Uh, if you have seen the bots will give you certain buttons or links as well to click on You can interact with them using the keyword button Or you can also send a picture or an emoji or a link or a video using the keyword like uh media And when validation you could also Use what the bot is sending, right? And the next is how do you want to end? You know run the test cases on either your browsers because your bots can be embedded into your browsers If you have seen or when you're trying to search for something You get the pop up where bot is asking you if you need any help and it can assist you, right? So it can be integrated and embedded in your web applications or it can be separate Bot posted on any of the platforms like slack or messenger or to help you to Help you with the orders, right or it can be embedded inside your Or mobile application as well. For example, swiggy is one of the four delivery applications in india Uh, it has if you want to raise, uh, you know return request or you want to refund Or you can use the bot and you you can connect to the customer care Which by in the scenes bot is responding or you can run if you have you know your bot on multiple platforms I want to test it across multiple platforms. Uh, just I was showing you the uh Bottom dot json that I can uh mention all the different platforms And uh define the capabilities and those test cases can be run On different platforms, right? So what next I have tested it, uh, I have automated it I can put it on the ci cd as well But then as part of the next steps, I might uh want to see How my application is behaving based on the responses that I get uh For that I might want to remember that My models are self learning. So all my test cases not might pass every time But then I need to validate do the analysis and then I might want to take a decision based on my analysis Do I need to uh make this as a production candidate or not, right? But then uh the underlying things would You know remain right like I mean the underlying things where I need to train my bot based on the kind of responses I see from the users Or remain constant. So I might want to also strategize. How do I train my bot? How frequently should I do and Oh Based on my analysis, I'll also have to take a decision Whether it is a bot failure or I need to improvise a improvise on training my bot So let us see an example. Okay, let us say that I want to train my toddler With the two of the initial forbids a and b and I train uh my toddler with these pictures I show uh continuously and repeat this and I want my toddler to memorize a for apple And this is the picture of apple before ball and this is a picture of ball Now uh after uh, you know a week's time I want to test whether my toddler has learned or not. So it is a test time. So what I now do is I'll call my toddler and tell Hey What is this? Right, and my toddler says, uh, this is an apple Oh, I was and I confused apple. No, okay, let me try with the next one. Um What's this? Okay, and my toddler just now confused, uh, and says, I'm sorry. I don't understand this So what happened? I thought my toddler for a week's time and I felt here my toddler learned but then when I test it's completely different Or my toddler recognizes a ball as an apple and apple does not recognize at all. So what happened? It is it something to do with the learning capabilities or I have to improvise on my teaching skills All right, so what do you think I need to improvise on that? Yes, I might want to think about Uh, you know, how do I keep different samples of apples and different samples of ball so that my toddler does everything Right, so let's fix that and train it Okay, and once I train with all the samples now my toddler will be in a position to identify an apple And ball and differentiate between them that it is not just a red color Oh, you know circular object that is four or uh, you know, uh, and my Apples are not always red, right? um So with this we do understand that training data is again very important and The underlying database or your board is only as intelligent as your data So make sure you keep analyzing your data and you keep training it Um every now and then to see how it is performing But then how do I know how it is performing for that? You need to define the kpis and you have to evaluate those matrices. What kind of kpis can I do? So in our testing we do, you know defect matrix, uh, how it is performing What are the different defects that pick it in right traceability matrix to see the coverage and defect density The same way, um, if for nlps I can track something called as confusion matrix They're just going to give me, uh, the false positives and false negatives. That is Uh, what is I mean just the previous example false positive is your ball? Uh, you know is an apple something like that, right? So And based on the confusion matrix I can derive multiple parameters and these are some of them Accuracy precision and these are the formulae that we can use. This can be all automated We need not to calculate that for every run and the next is We have to do it on every basis. So how do I do that? I will just show you This on bottom box. I was talking before our bottom box is again The same thing that we did previously we can do with bottom box. I can have my chat box registered here And I can have all my test cases here. I can keep it on git and I can connect to my git repo and I can then run the test cases Okay, um by you know putting the name of the board and the test said that I want to run Um, and as part of the analysis, there is something called as bottom coach and using bottom coach is what I can Uh, you know, uh, see how it is performing and this is the confusion matrix that I was talking about So what it will do is it will uh, it will give you Uh, the list of all the utterances your board has identified and all the utterances It should have been identified correctly So if you see all of these greens if everything is going fine You will see the diagonal with all greens that what this is trying to tell you is Uh, the intent, uh, how are you doing? Or the whenever you asked what it responded with how are you doing? But then if you see these two outliers, you're trying to check, uh, the balance But then it mistakenly identified that, uh, What's possible and one, uh, uh, appearance it mistakenly identified by Account balance check, but it was supposed to be earning check. Okay. So these are the outliers So you'll have to see that why did not it understand why is there this ambiguity and then try to See so if you click on this, it will give you the prediction confidence And it will also tell you what was the predicted intent and what was the expected intent And if you click on this further, you will see the utterances itself. So it's the email contents A surging message contents. So there are, you know, uh, small ambiguities that you see here And you might want to improvise on training because The remember or when you're, uh, having the conversation with bot The user is always right right because the user knows what he or she wants But the bot has failed to understand So this is the basic assumption that we always go of when while we are doing the analysis And try to analyze what are these utterances, which were Not understood by the bot and how we can train them or how I can improvise on my, uh, energy itself And if I look at, uh, one of the instances which was Uh, you know pretty fair and you can show that as well Um, so if you see I just ran Um, you know one test yesterday Which you know had a couple of good samples, I would say I can go in bottom coach and then I can analyze this if you see I see a good number which are passing. Uh, typically this is how you will see But then there is one utterance which was failed to understand and that one occurrence was Understood as banking or view activity, but it was supposed to be the, uh Banking account opening right and there was one incidence where it failed to understand and mistakenly identified that as a language spot, but it was supposed to be identified that as You know a different utterance So, uh, this is how I can analyze and get into, uh, you know the content and intents and utterances to see how it is performing and as the next step I will also have to make sure that I'm not only testing the nlp layer. So everything that we discussed right now Was how to test your nlp layer specific things, but then you remember We have all the other components of the application as well still up and running My nlp is only understanding the user intents and extracting the entities and processing it But then what it does next is it might be just calling an end point an api might be it is internal api might be it is external api And then reading the response and giving it to the end user But then the business logic again still remains on the api side. So you might want to thoroughly test your apis as well You want to thoroughly do the performance testing to see how the bot can handle the load You'll have to do security testing because the user here is entering the data the personal information And uh, it's it is not supposed to be stored Um at your back end and it should not use that data for anything else All right So you'll have to keep that in mind the data security aspect as well And also the database testing because you are involving a lot of data and every time you're Fetching the data from the databases you might want to test that thoroughly as well And all the other kind of tests that we perform. This is just a very few things that I have picked up And uh, last but not the least we'll also have to do crowd testing as part of crowd testing What I'm doing is I'll have to do, you know, I'll select the different genders age groups professions a different language local languages localites and You know a huge these bunch of people identify this bunch of people and ask them to test my bot That is just interact with my bot with using this I'll be able to check my chatbots performance again I'll be able to collect as much as different styles of utterances with coming it from different people because uh user Uh, you know, uh two different users might give two different inputs or they have their own texting style And I want to capture that and see whether my bot is responding according to that as well and it is understanding it correctly as well So So, uh, this is what I do with crowd testing and end of the day I collect all the utterances and I do the analysis just the way we did it here Uh, with our podium coach analyze the matrix see what was correctly identified what was incorrectly identified and see if I need to train my bot more with the Different kind of samples that I've received from the users or I might want to improvise on the algorithm and it failed to understand Okay, so these are the things that I'll be doing as part of the automation and crowd testing and that is it I'm Open to take questions people if you have any questions We have a couple of questions so far So this is my satshi teku So, uh, what are the security of pivc features, uh by ai? Especially when Uh, so while implementing any application today, uh, it will uh, maybe it is web or movie We have to take care of the data security and the Consent of the user as well, right because there are a lot of data protection laws that have come up Same thing you have to keep in mind if you're having a conversation with a user where you need to take the personal information you need not to You know, you need not to store that back and that security aspect is what you have to implement So that is one aspect that you might want to test thoroughly. Is it using that data anywhere? Is it, uh, you know publishing that data? So these aspects you'll have to test and also you'll have to keep in mind that it is not reaching any of the data policies as part of your country And you take the consent from the user as well okay, great, great So the other question is Are bop is a laptop dependent just like mobile apps are driven by what kind of phone and what kind of force you are Uh, so bots are actually implemented using any of these nlpd or you know There's like you have frameworks like dialogue flow or you are using ibeam Watson um, and you're using Lewis microsoft louis So there are a lot of such platforms and they have their own, you know way of implementing it Uh, so you might be using one of these technologies to implement that But then they also support a lot of different platforms for you to deploy that Okay, so just like our web application is supporting multiple Browsers and your mobile applications. I mean, uh, the response is less as well. You can operate that on mobile as well It's the same way you are developing it using one of these and then you can deploy that on multiple platform It can be one of your application It can be on web or it can be independently deployed Uh, thanks shama for sharing your experience with us today