 I'm Swapan Rajdev. I'm the CTO and co-founder of Haptic. And today I'm going to talk to you about conversational agents, how we build them, how do we deploy them in production, and how can we really make them useful in certain use cases. So to start off with conversational agents, in popular words right now also called chatbots, are basically computer programs which can interpret and respond to statements made by users in natural language. What that really means is the way we communicate with other humans, or like a human-to-human conversation, they should also be able to interact like the same manner with this computer program. And the way the computer program does this is by using natural language processing and communicates with the user over the internet or other protocols. The reason I say other protocols is because these can be over the phone. It can be over in SMS. So it could be anywhere where you have communication with another person. You should be able to deploy a chatbot there. It could be voice. It could be text. Tomorrow there could be some other form of communicating. Hopefully you can put it in there as well. So there are plenty of examples of chatbots. Today we've seen Amazon's Alexa. We've seen Google Assistant. We've seen Siri on the iPhone. And we at Haptic, we also build chatbots for various different enterprises. So we probably have interacted with the chatbots somewhere. And I'm going to dig a little deeper into how we go about building them and what it takes to build them. So today there are a lot of companies and a lot of SaaS platforms which allow you to build a chatbot rather easily. So you go on that platform and they tell you, follow a few steps and you're able to build a chatbot. The idea of my talk is to dig a little deeper, so not talk about these platforms, but talk about the technology behind it, which is how do you really build those chatbots? What are the different ways of doing it? And what does it take for you to build your own chatbot without using one of these platforms? So there are essentially two ways, two approaches, that a lot of people use. One is what we call the retrieval-based approach, which is also what in machine learning we call it the rule-based system. The idea here is you build a dialogue management system. And it's like a state machine if you go to look at it. You define a structure of the conversation. So you're like, hey, I want this conversation to go in this format. And then the bot tries to follow that as much as possible. The main thing here to realize is the replies that the bot gives out is predefined. So someone has to go and define these replies. And then the bot will make sure it can send the same reply back to the user. The other approach is a generative approach, where the bot really does not follow a certain structure. There is nothing for it to look after. It basically learns and is able to generate new replies. So it's like creating new content based on what the bot has learned over a period of time. So both of these approaches are what is commonplace today to use in chatbots. And retrieval-based approach, if you go to look at it, is actually the most common approach that a lot of these companies use. So everyone, if you go to look at it, the platforms you go on is the dialogue management system. They tell you how you can build out these different things. It's more popular than the generative approach. So let's dig a little deeper into both of these. I'll help you understand a little bit more along the way. So to start off with, let's just imagine how a conversation goes and what the technicalities behind that are. So when a user sends a message, the first thing that you really have to realize is this is a conversation. What a conversation means is there is back and forth dialogue going on. So when you say, hi, how are you? The other person replies, good. I'm good. How are you? So it's a back and forth between two parties. And all the information that you need to understand does not come in one interaction. So over the course of the interaction, every single statement has a little bit more information that you have to keep gathering. So it's very important to realize that context management, which is you're able to get everything that's happened in the dialogue so far, is a very important piece to be able to build great conversations. Once you've done your context management, then you go into pre-processing. Pre-processing is like any other machine learning. You're supposed to massage the data a little bit. You're supposed to prepare it in a way that your algorithms are able to understand that. So you do your pre-processing, then you go into intent detection. So intent is literally what the user really wants to get done. So every user will be like, hey, in this talk, I'm going to take an example of booking flights a lot. So it's like, hey, can you help me book a flight? So you're able to kind of detect, this user wants to book a flight. So you try to detect the intent. Then you move on to entities. Entities is basically metadata around the intent that really helps you get more information. So if you're booking a flight, you need data like, hey, what's the origin city? What's the departure city? So these cities, the dates, it could be name of countries. It could be name of fruits. These are all entities which can vary over an intent. But the intent remains constant. The entities keep changing. So you have to kind of be able to detect the entities. Once you've done all of this, then you come into the dialogue management system. So you've figured out, hey, I have my intent. This is my entities. This is the state. If you were to picture a flow chart, it's like, this is my state. That state really tells the bot, hey, this is the action you need to do. This is the reply you need to send back. And then you send that back to the user. So essentially, these are the four or five steps that are really required to be able to generate a reply back for the user. So digging a little bit deeper into this is the dialogue management system, which is at the core of how you kind of build out the retrieval model. So just as I explained, the user asks, hey, I want to book a flight. Then you basically think about you draw your flow chart. You're like, hey, what's your origin city? What's your departure city? What dates do you want to travel? Then there's a question, is it a round trip? Because if it's a round trip, you need the return date. If it's not, then you proceed forward. And you go ahead and book a flight. So the dialogue management system really helps you kind of build out this, you know, the picture you have in your mind of a conversation into how a bot can really interpret this. So the way this really works, right? Now, when you look at a flow chart, it's a very linear conversation, right? It's like you first ask for the origin city, then you ask for the departure city. But when you talk to other people, when you're having a conversation, it's never that linear, right? So the user could come and say, hey, I want to go from Bangalore to Delhi, right? So it's not like, hey, what's your origin? What's your destination? You've already skipped two steps, right? So that's kind of where the power of dialogue management system comes, where it's not really a simple flow chart. What it does, it tells you to, hey, define your intent. So you're like, hey, the user can say, I want to book a flight. The user can say, I want to book cheap tickets. But the user could also say, I want to go from this city to this city on this date, right? So if you look at it, I've actually defined cities in a different way, because this is what you really call slots for anything, right? So it's like filling in the blanks. You just have to figure out the blanks, right? So at this point, the user could have given you only one blank. The user could have given you all three in one statement. So you really have to figure out, hey, how much information do I have? And what all do I not have, and what do I need, right? So your dialogue management system helps you first say that, okay, what's the intent? What are the different ways the user could actually start this conversation? Then you go and define your entities, saying that, hey, for me to book a flight, these are the five different entities that I need, right? So you kind of show your origin city, departure city, and for each one of them, now you're defining the reply. So as I said, the retrieval model, the reply is supposed to be inputted in. So you put in your reply saying, hey, where are you flying from? So if you don't have an origin city, the bot will reply this. If you don't have a arrival date, it'll be like, hey, can you please, do you really want to fly back? So these are different ways you can really define the replies that a bot will do. And once the bot has all this information, then it can go ahead and perform the action, right? So then you're like, hey, now that you want to book a flight, I can go ahead and do this. So to sum it all up, a dialogue management system really helps you define the flow of the conversation. It helps you define entities and intents and how the replies can go back to the user. Apart from this, one very important part about a conversation is that a lot of natural conversations that we have don't really stick within one intent throughout the conversation. So let's say you say that, hey, I want to book a flight from Bangalore to Delhi. And then you realize, hey, I have a check-in bank. So now I want to know the check-in policy. So once you ask this question, hey, can you help me book flights, you're suddenly in the middle of a conversation move to, hey, can you also tell me what's the check-in policy for JetLIs? So now your intent has changed. But if your dialogue management system cannot handle that, the bot will break. It'll be like, I don't understand what you're saying. But that's not what is supposed to happen. So the dialogue management system should allow you to move intents within a conversation. So now that you've got the check-in information, you can come back and be like, OK, let's go ahead and book JetLIs. So now the bot should be able to continue on. And that's what really makes a conversation natural, because this is really how we communicate and how we get into it. So to give you an example, this is our dialogue management system. The text might not be very readable, but every single of these nodes is essentially each one is like a finite state machine, or all those intents, entities, everything is defined under that. Each single one is a separate intent. The arrows actually add a little bit more power, because let's say when you're booking a flight, after you've selected the flight, now you want to get the person's name, email, phone number, maybe a few other information. So now you can say that, hey, after you've finished this part, move on to data collection. So you really can't move to data collection immediately without selecting a flight. So now these systems can really help you build complex flows, which would need like 50, 100 different steps to be able to complete a conversation, while at the same time helping you move around intents. So you can move from one color to the other. These arrows help kind of the bot move around. This is a simple one. We have complicated bots with almost 200, 300 intents. Arrows going all around the way, but it's really where the power really comes in. So how does this all work? So getting a little bit more technical into this whole thing. The crux of NLP, which is the basic underlying of NLP, is what we call word embeddings. Now what this means is it basically means a numerical representation of the text that is coming in. So whenever you're trying to do anything in natural language processing, just like how computers understand binary 0 and 1, your algorithms need it in the mathematical representation. They need a way to understand it in maths. So the whole point of word embeddings is how do you take a word, how do you take a sentence, and how do you convert them into a mathematical representation that your algorithms can understand. So there are multiple different ways to do this. I'm going to talk about a few of them. I'll start very simple. I'll go a little deeper, and then I'll also show you the power of word embeddings. So the most simplistic approach and a very simple approach is what we call the frequency-based embeddings. One part of that is a count vector. So that's the example I'm showing here. Let's say you have two sentences, which is, I want to book a flight. And you have another one, which is, I want to order pizza. So you build a matrix with all your columns being all the unique words that you have. I want to book a flight order pizza. These are all your unique words. And then your rows, each one represents the number of time that word exists in that sentence. So you have 1, 1, 1, 0, 0s. It's only 1s and 0s in this, but it could also be 2 and 3 and all of these things. So you can kind of build this. And now you literally have a mathematical representation in matrix of your sentences that you use. There's a little, you can start getting a lot more advanced there. So obviously you can imagine when you have a lot of sentences, a lot of unique words, it gets a lot more complicated. You can narrow this down. Another very popular approach is what you call term frequency, inverse document frequency. So what this means is literally the words like uh and the. These always exist in a lot of statements and a lot of documents. The power of these statements is not that much because they don't really add that much value versus a unique word such as flight, such as book, order, pizza. They add a lot more value into that sentence. So what this algorithm does is it penalizes the words which are available in a lot of documents. So the unique words end up getting a lot more power than the regular words. This is a few examples of frequency based. You can obviously know there are a few advantages and disadvantages of this, but then you go to the prediction based embeddings, which is more probabilistic. So these actually what they do, they actually try to look at the context of the sentence. In every single language, the way the word appears in a sentence, the meaning could really change. So it's like, can you wake me up in the morning versus I actually I just lost that example. But there are actually a lot of examples like that in the English language and all languages. So it kind of helps you get a probability of what the next word in the sentence will be. So you kind of start getting a lot more context into your word embeddings. So now you're also taking the structure of the sentence into consideration. You can also go a little bit more advanced, which is what is a lot more popular today is the deep learning based embeddings. You have libraries like Word2Wek, which give you actually a lot of different options along with deep learning. You also get into sentence embeddings, which is not just words, but your whole sentence can be represented in a certain way. And it also comes to character embeddings sometimes. So you could have each single letter have a mathematical representation and use them. So it can get a lot more advanced. There's a lot of material online there, which can be read and get deeper into it. But that being said, what's really the power of these word embeddings? And that's kind of something I wanted to highlight today. So as I said, every single word in every language has a particular meaning. So there's not just that, hey, every word by itself means anything. A word in a particular context could mean different things. There's a relationship in the structure of the sentence and how this goes. So the power of word embeddings is really how do you represent it into the mathematical representation while still not losing all the meanings of that word. So let's say we were to take a word and boil it down into three dimensional space. So x, y, z-axis, very simplified. If you were to do that, there's a certain few examples here which is, hey, man comes at a certain point, woman comes at a certain point, a king comes at a certain point, a queen comes at a certain point. But now what really matters here is if you know man and you know how to get from a man point to a woman point, you can actually use the same thing to go from a king to a queen because it's just a male and a female. So if you literally take king and you subtract man and add woman, you'll come very close to where the queen point is. So that's the power of building really good word embeddings because you can then start using these mathematical formulas on top of it and really start getting there. So same thing, the other example is walking to walk, swimming to swim. So getting a past tense of a word, you know exactly what mathematical formula will take a word and convert it into past tense. So now you can really use the power of this while generating different replies for your user. The other very interesting thing is sentences that mean similar things are very close to each other. So when you have countries, most of the countries will be around the same representation, adding and subtracting a little bit here and there will get you from one country to the other. The capitals of those countries might be similar. You know exactly how to go from one place to the other. So that's really where all of this really ties in, which is these mathematical representations can really define how good your algorithms will work. Because the more you spend time on this, the more better you make it, the less effort you have to really do in getting complicated algorithms out. So as I said, once you do your word embeddings, you've done a lot of the work that is required in building a retrieval pipeline. So now as I said, you preprocessed it, you build your word embeddings. Now we're going to talk about here, how do you get intent and entities. So to get an intent, it actually becomes a lot, lot more simpler. So what we do, and we basically just run a simple classification algorithm. We do simple similarity-based approaches. So to give you an example, let's say you have a bunch of data of I want to book a flight, I want to buy cheap tickets, and these are the different intents that you have. You have three different intents. Now you basically have built your KNN classification. It's a very common classification algorithm. Now you've defined in a space that these three intents come around these areas. A new sentence that comes in, depending on if it's in the red area, it's intent one. If it's in green, it's intent two. If it's in purple, it's intent three. So you can use simple algorithms to really now start detecting intents without. A lot of them you can use out of the box, but obviously require tweaking. And if you've done your word embeddings, it really adds up together. On the entity recognition, it's again very similar. So it doesn't have to be too complicated. You can, in fact, build simple named entity recognizers just based on regexes. So you can use regexes and start building entities. You can then start using probabilistic regexes. You can get a little bit more complicated. We get into conditional random field and RNNs because when you're building an entity, you don't want all the possible values. There are a lot of times you can't really get all the possible values of a certain entity. So you now start building it on the structure of the sentence, which is, as I explained before, I want to go from city to city. So the minute you go get to from, you're like, hey, this is what will be arrival. This will be departure. So you're able to base on the structure of the sentence, recognize your entities, and build those out. At Haptic, we've actually open sourced our chatbot NER. So if anyone is interested and really wants to use it, you can go on our GitHub page. You can host it and use it all around. There's a nice cartoon by Dramitri Malkov. I really love this joke, which a person is saying, hey, espresso, but I ordered cappuccino. And the bot is like, don't worry. The cosine distance of them is so small that they're almost the same thing. So as we spoke about, this is kind of how you get the retrieval approach built out. So the advantages of this is the precision and the accuracy that you get, even without a lot of data, gets really, really good. So a person can go and enter 15 different variations of how an intent could be started. And that should be enough to start getting really good precision and accuracy. So you don't really need way too much data to be able to start building these out. And you can keep training and retraining them, and they're really, really fast to keep kind of training again and again. At Haptic, in one day, we actually have almost 150 times our models get retrained. And a person can do it just at a click of a button. Within a minute, it's retrained. It can get deployed into production, and it just happens. So it's really performant in that sense. It can really kind of help you build your chatbot better and better over the day. And it gives you great performance. So we're able to generate replies based on this in less than two seconds for every single message that comes into our system. And the biggest advantage, some people really don't look at it as an advantage, but when you're really building useful bots, one of the biggest advantages is you can control the narrative of the bot. When you're building good bots, what we really focus on is a bot should have a personality. So when you're writing your replies, you can make them witty. So instead of saying, hey, can you tell me a departure city, you can be like, hey, Goa sounds like a great option, so let me know if you want to go there else. Makes it a lot more exciting. The user starts getting engaged. You can really start building a personality of a bot because you control the narrative. Apart from that, when you're actually building for enterprises, when you're building for big companies that say, you work at a company, you're building a chatbot, if you feed it data for a bot to learn just all data from news articles, from Wikipedia, and then you realize off late the news articles have been talking about your competitors a lot, the bot will now start learning about your competitors and start replying back to about your competitors. Now that's not right. And in a company, that's actually very, very important. So when we build for companies, and if the bot by mistake starts replying about something completely wrong, it can cause us a lot of trouble. So it's actually important because it's actually business case driven. So even though it doesn't sound very fancy, it actually works really well and actually gets the problem solved. Obviously, the disadvantages are it requires a lot of manual data entries. It's like someone has to go into the data, someone has to go into the replies. It actually also fails on unseen data. So if the user asks something that the bot has never seen before, it will end up failing, saying, that sorry, I don't really know what you're talking about. And obviously another obvious one is it doesn't really work in any open domain. So you can't just come and ask it about anything and everything because the conversations are told about a certain domain, a certain number of conversations are there, so you can't expect it to work around everything. So that's kind of the retrieval model. So to kind of take away from the disadvantages of the retrieval model, you then get into the generative approach, which is, hey, how can you start generating replies? How can the bot really learn and start kind of building new replies without really you having to enter so much data? So generative reply is basically the basic premise behind it is a language model, which is a software or an algorithm basically learns how sentences are built in somebody of text and then uses that to create new content. So you're able to kind of learn that, OK, this is how this language works. This is how things happen, and then you're able to kind of start building new sentences which no one has ever told you about. So the way it works on chatbots, you take a lot of your historical conversations. You have a lot of different conversations. You try and feed it into your generative model, and it's supposed to learn based on those conversations. And when it learns, it can kind of really come back, create its own replies, and it really doesn't need a guidance of what we first saw management system. So if you go to look at it, this is what people call the true learning, which is what it's doing by learning from previous data, and that's kind of how you kind of build out the generative approach. So one of the most popular and useful models and kind of what's used in chatbots specifically is this model algorithm called Sequence to Sequence. There's these three fine people whose name actually cannot pronounce, and I don't want to mispronounce them. So they wrote a paper called Sequence to Sequence Learning in Neural Networks. And the paper was actually based on for translation, which is how do you take sentences in a particular language and convert them into a different language. But this algorithm actually works really, really well for a lot of different NLP tasks, and chatbots being one of them. So if you're able to learn from your previous data using this algorithm, this algorithm really starts performing and starts giving you new replies and starts getting in there. So it's a deep learning based algorithm. It basically uses recurrent neural networks. They could be LSTM, they could be GRUs. And the basic premise to get a little bit more technical into this, the basic premise of the way that this algorithm really works, is you take a sequence of words. So if you're going to look at it, any sentence is actually a sequence of words. You take a sequence of words and you give it into an encoder. Now what this encoder does is it takes these things, it runs this algorithm, and takes out a thought vector. The thought vector will then be fed into a decoder, which will then take out replies out. Now, the reason this algorithm is actually very good for chatbots and how it really works is if you look at the diagram, so there's an example of how are you free tomorrow, and then the reply is yes, what's up. So first you take the word R. You put it in a neuron. The output of this neuron, which the input was R, it comes out, you take that output and you feed in the neuron along with U. So now the output of R along with U is fed into this new neuron, which then spits out another thing, and then you take the combination of those two and then you put it into free and then you take it into tomorrow. So the thought vector that comes out, the output after the encoder, has actually taken into consideration each and every single word in that particular sequence of how it came about. So if you, let's say, change the sequence of the words, if you change it a little bit tomorrow, which is, is this free? The thought vector will be very, very different because your first two words was different, so the output of free will be very different. So that's kind of how really the encoder works. And then you go on the decoder side, it does similar things. So now you have this representation of your input sequence. It's fed into here, and then one by one it starts giving you replies. So it keeps giving you every word of the reply, but every word of a reply affects what the next word of the reply is going to be. So that's why it's really called the sequence of sequence because it really takes the sequence and generates it in that sense to kind of really get the information. So the encoder really encapsulates every single thing about the input. So the information in it, the sequence of it, moving words, well, how does that affect? And kind of the decoder just takes that and tries to get out a sequence of different outputs out there. So as I said, these are recurrent neural networks. You can use LSTMs, you can use GRUs. Just giving you a little bit of example, at Haptic we tried a lot of them. We actually are using GRUs now instead of LSTMs and because that just works better for us and it's a little more performant. So to get into how we really use it for chatbots and how it really works. So obviously when you look at a sequence of sequence and as I said, a conversation is a back and forth dialogue so you really have to take into consideration a bunch of conversations that have sentences that have come in a conversation. So to give you an example how we do this is let's say you have a conversation which is someone talking to someone about here you wanna grab that coffee tomorrow. So that's how the conversation goes. So it starts off with are you free tomorrow? So that's one training example. So we'll take are you free tomorrow and put it into our algorithm and we'll expect the output of yes, what's up. But then the next sentence which is the carry on of the conversation. Now next training data will have are you free tomorrow? Yeah, what's up, want to grab coffee. So you've taken all the three sentences, made them into a sequence and put them in because your reply should factor into all of these sentences. So it's not a command where you take one sentence, one reply and just train on that. You actually have to build these permutation combinations so that you can really generate a conversation and not just a command line system. So that's kind of how we really use it in chatbots. That's how we are kind of deploying it and that's, it works pretty well. So as I said, because this is able to kind of generate its own replies, you're able to take all the previous sentences in a conversation. You're able to kind of get context into your conversations and are able to generate a reply without really having to really put in a lot of data or tell the bot how to really follow. The best part is it works on unseen data. So giving you an example, let's say you, while you were training, you had a sentence called happy Diwali and then the output of it was happy Diwali to you too. Now, when you've put this live and a user comes and says happy new year, the bot knows how to reply back to this. It says happy new year to you too. So even though you've never defined that hey Diwali new year as an entity or anything on those lines, it has figured out that hey, you know, what this kind of might mean similar things and I can kind of generate my replies on that. So it really works on unseen data. One of the pleasant surprises that we had while we were using this is we didn't really intend to it but our chatbot learned Hindi because of this, right? So our training data, some of it actually had a bunch of conversations that had happened in Hindi. And when we fed it in, you know, one day someone was testing it out in Hindi and they're like hey, this chatbot knows Hindi, right? So it was a pleasant surprise, you know, kind of how that happens. You know, it can also have its disadvantages. You know, as I said in the previous retrieval model kind of thing, it could start generating replies that you really don't want the bot to reply, right? So it can get tricky to control these things. The good thing about this is this actually sequence to sequence model has become actually very popular. So, you know, the TensorFlow, Google TensorFlow, libraries, Keras, I think all of them provide these out of the box. You can start using them, you know, you obviously, when you try to get more advanced you can kind of keep tweaking them and go on there. The disadvantages here are obviously, you know, there is, you need a lot, a lot of data, right? So we actually had almost two, two and a half years worth of conversations which we generated over the life of a company to be able to kind of really, you know, feed it into this for it to make something meaningful out of it, right? So you require a lot of data and it takes a lot of resources, right? So it takes time to build. You know, we had to build it on GPUs so it took kind of resources. So it is, you know, it's getting a lot better now, you know, with a lot of the TensorFlow and you kind of optimize a lot of it but it does take time so you can't really retrain it again and again. And as I said, it's very hard to define precision boundaries. So sometimes it's what, you know, can say things you don't want. We have had examples on those lines. So we had to build a layer on top of this to be able to, you know, remove replies that we don't want, right? So we actually had to manually kind of take it out because it was kind of doing that. So this is kind of the architecture we've really put together, you know, on a high level, you know, the user comes in, asks a query, you know, you kind of get your entities, you kind of get your intents, you know, the graph model is essentially the, you know, dialogue management system which goes through, if you can find a state, you can find, hey, this dialogue management system, I find an answer through this, then you go in through your entities, do I have all my entities or not, and accordingly take an action. If the dialogue management system does not work where you don't have what to do, then you kind of go into the, you know, the generative way where you kind of start using neural networks, you try to get the response, and then you try to give it back to the user, right? So you kind of use them in tandem to kind of be able to do it. Obviously, as I said, you know, when you first build a brand new bot, you really can't put your generative approach because, you know, you don't have enough data, right? So the way we actually work is we start with the retrieval model, start building up the data. Once we have the data, you kind of train your generative model, and then you kind of go build on that, right? So you can kind of really build on top of this and go from there. Great, so, you know, what I also kind of wanted to touch on apart from all of this is, you know, a little bit more high level, you know, machine learning problem that I kind of really like to talk about is I have my model now, but now what, right? So now I need to put it into production, I need to put it to good use, right? A lot of times I've seen people saying that, hey, my model's ready, now I'm done, my job is over, but that's not true, right? Putting into production, making good use of it is actually part of the exact same problem that you need to make sure it's working, right? So, you know, a lot of times what will happen is, you know, you'll be like, hey, I wanna put this into production, but then you'll realize, hey, my, you know, how does my pipeline work? How does this actually work, right? I mean, do I put this in a server? Do I put it somewhere else? Is it a Lambda function? You don't know, right? So you've not, because you never thought about it, now you're like, hey, how can I go about doing this? There's actually a very fine balance between accuracy and performance, right? So you will continuously have this battle when you're doing it that, hey, you know, I'm getting good accuracy, but the performance in production is not the best, so you might have to find the right balance there, and obviously maintaining this model, right? So when you put out a model, you put out a chat body and realize, hey, you know, you don't expect it to be wonders in first day, right? Just like any other software development, it's an iterative process. You gotta keep maintaining it, you gotta keep building on top of it. How do you kind of go about doing that, right? So if the generative model takes five hours to train, that's kind of your limitations there. So it's important to think about a bunch of these things because you will face those problems when you're trying to put them to use. So a couple of things to really think about before or while you're kind of building it is you really kind of have some requirements defined, right? So simple things like, hey, does this have to be used in real time or can I use it offline? By offline, I really mean that I can use, take my own sweet time, the results of this model can take 12 hours to come out, it's okay. Or do I have to really make sure it's real time, right? So especially in chatpots, as I said, we have to keep it, our benchmark is under two seconds is our benchmark, right? No reply can take more than that. So how do you really kind of build this whole architecture, all these models to be able to do in real time, right? You have your benchmarks on accuracy and recall. So as I said, when you're finding your balance between performance and accuracy, you need to set your baseline, being like, hey, once I achieve this accuracy, I think we're at a good place. If I can get right performance, I can put into production and I can keep improving and kind of going from there, right? So it's really important to kind of have those benchmarks. It's also important to understand what kind of systems you have, right? So obviously with the cloud, with AWS, a lot of these resources have become easily available, but they are expensive, right? I mean, you can rack your bill up quite high if you don't really think about it. So you got to know how much memory you have, how many CPUs you can use, what kind of machines can you spin off? You got to think about all of these things. And as I said, one of the biggest things is how often can you retrain? How often do you need to retrain? Can it be retrained that often? What kind of systems does it take? So it kind of takes a lot of, take a lot of these things into consideration. One thing, which has become standard practice, but you kind of obviously want to take your training and prediction separately, right? So you can do your training. Your training can take 10 hours, your training can take 12 hours, but once you get your model, that performance in real time, which is the actual prediction of what the output should be, should be in real time, right? So you can kind of split it up and then you can kind of deploy it differently, use it differently and kind of go from there. So just touching up on this, one of the things that we've done over a period of time is we've created our ML models and we've kind of built them as an API, right? So obviously with the microservices area right now we're in, it becomes a lot more simpler, but what you do is you take your model, your model is essentially like a pickled file, right? So you can kind of load it into memory, you can store it somewhere, and whenever you want, you read it and then you just kind of do model.predict and you can kind of get your replies back, right? So you try to build an API around it, build a service around it and keep it as a separate thing, right? So what we've done, we're a Python workshop to use, Python, Django, REST framework, built an API, anyone who wants to use it, that hey, slash haptic slash intent detection, right? You hit that API, this API will basically call the model, it will get you the response back. The reason this works is because obviously it's separation of services number one, but it also makes it a lot, lot easier to scale, right? So you put your web service behind a load balance so whenever traffic goes up, just spin off new services, right? Couple of things to keep in mind if you're trying this approach. So as I said, right? Couple of things that we've tried, we've loaded our models into memory. Some of our models are less than 100 MB, take up less than 100 memory, some models take couple of 100 memory, some take a gig and a vow, right? So it could vary depending on your model. So when you're loading it into memory, you gotta see, okay, how much memory do I have, right? Can I actually load this into memory? And obviously, if you're having multiple workers in a particular service, you can only launch that many workers at a particular time so that you don't run out of memory. You could also put them in Redis, so we've put them in Redis. Redis is great for read, so you can kind of read and kind of do all of that. It makes it a lot easier for a bunch of different things, so based on your requirements, you can kind of work well. The challenge with Redis is once your model becomes really big, the read latency becomes really high, so then your real-time performance really, really slows down. That's kind of what comes up with Redis. But keeping how you deploy this, couple of requirements which actually always come is how do you update and swap your models in real-time? So I've created a new model, which is better than my old model, now how do I actually swap the one that is there? So one thing which I've realized a lot of people don't do is version your models. Just like writing code, just like how you have Git to source control it and you have versions around it. Version the models, it's important to keep track of all of these things. And then you can basically kind of, depending on the approach you've taken, you can see how you can update or swap it. So if it's in memory, what you've got to keep in saying that you have to restart your workers, workers take a little while to load these models into memory so you could have like, during that small period of time, you might not be able to reply back. Redis is a lot easier to do these things, but as I said, Redis has its own advantages and disadvantages. So keeping all of these things in mind, you've got to choose your approach. There's Google which has come up with TF serving recently. So if you're using TensorFlow, it's one service that really takes off all this headache from you. They kind of, it's high performance, it's flexible, it allows you to version it, swap it and doing all of that. So it's an interesting thing to check out. And lastly, automate. I think all of these things that we spoke about just automate all of these things because you cannot scale manually, right? Your deployments, you can't go and manually go restart all your servers. You cannot test them manually. Your accuracy, your precision, how well it's working. You've got to be automating a lot of this. So do that from the get go. We've been bitten where we didn't automate and we've kind of got bit a lot of times. We actually now automate everything. Our regression tests on bots are automated. Deployments are automated. We actually also have phase rollouts which is you can deploy on a certain few users. You can kind of deploy on a certain few bots, a few models so that you kind of get data, you know how it's working, and then you kind of spread it out. So it's actually very important to keep all of these things in mind because so that later you don't have to face all these problems. So yeah, lastly, haptic. We've been around since 2013. As I said, we build enterprise bots for enterprises, mainly around customer support, lead generation, feedback. We have over 100 plus bots that we've built out. Around 30 million devices have over one billion interactions happen with our bots. Great, so that's what I had. Thank you very much.