 Hello okay um so let's get started in today's presentation I'll be talking about building a production level conversational interface for national language processing using an open source platform called MindMild. The presentation itself will be about generally how to make a production level conversational assistance so it's not particularly to one technology but some of the best practices that I'll be presenting today can be used for other open source projects as well. Just a brief background on myself I am a senior machine learning engineer at the WebEx intelligence team at Cisco and before that I was working at a startup called MindMild where we built conversational assistance for Fortune Founder companies like Starbucks and Uniqlo. Currently I built voice assistance for Cisco's conferencing units so I think we all probably know by now that you know conversational interfaces are already implemented in both domestic use cases as well as in company use cases about 20 percent of all search results are voice searches and around 50 percent of all searches are projected to be voice searches by 2020 according to commscore and we also know that at least at home you have Alexa Cortana Google as well as Siri being used pretty much everywhere. Obviously this trend is moving to workplaces as well and at Cisco at least we built a product called WebEx Assistant which you can sort of say okay WebEx joined the meeting and the conversational interface itself would then you know join a particular meeting for your particular room. This entire experience called WebEx Assistance is powered by this open source platform called MindMild that I'll be discussing today. So one of the challenges about building a production quality conversational assistant is there are many gotchas that come about when building such a product. One is the accuracies kind of are hit and miss so when you say something like call an Uber sometimes your assistant would infer that an Uber is a person name and it won't be able to get the correct person based on that query. Also there are issues with common slang so things like abbreviations and lingos can get misrepresented. So for instance when you say show me an apartment in Fidai sometimes the assistant won't know that Fidai means the financial district and it would get the query incorrectly classified. Using preset rules also break when the user asks something that's a bit more challenging. So for instance when you say something like confirm order for one cob salad and the assistant says can I get dressing on this sorry. So when the assistant says confirm order for one cob salad and you say I want to also add a dressing on the side it won't really understand because it doesn't know what a follow up query is to the main query. And finally shallow knowledge is definitely fails. So for instance when you say show Oscar winning movies since an assistant could be trained only on just show a particular movie by title it won't be able to get a more sophisticated result for instance when you want to find all new movies that are Oscar winning. So again these are some issues that happen when building a conversational assistant and we will be talking about how to solve them using a pipeline. So the machine learning tools today are sort of divided into two categories. One is you have cloud NLP services. These are technologies like dialogue flow Amazon Lex IBM Watson and so on. And again this is just a general classification. The other side you have more specific ML toolkits like SK learn tensor flow carriers and so on. And in the at least in our case when we reviewed some of these services in the cloud NLP side the issue is that there are quite a lot of prebuilt models or pre prebuilt domains that are very useful while starting up a particular application. But when you have any kind of custom domains that are very specific to your enterprise or consumer use case it sort of breaks down at that point just because some of these prebuilt entities are not really supported in your custom use case. The tools for building some of these custom domains can only handle very narrow models and trivial applications. So for instance in dialogue flow it's very easy to onboard and click a button and train your entire machine learning model. But if you want to change certain hyper parameters in your particular model you're not really given the option of doing that. So that sort of configurability for a custom application which is required in order to get very good accuracies is not really supported. And the ML capabilities here only scratch the surface of what you really need to do to build a production application. So for example for a certain entity for example a person name when you're trying to detect a person name these sorts of entities are very hard to detect because they can be very noisy. Your person name could basically be any kind of sequence of characters. So in order to detect that in a way that's very accurate you probably need a different type of model than what you would need for classifying a sequence of words. So text classification. So changing some of these models mixing and mashing them is really hard to do with some of these cloud native platforms. On the machine learning toolkit side you have very powerful technologies like TensorFlow, scikit-learn and so on. However when you want to string together all the applications for example machine learning, information retrieval, dialogue systems all together it becomes just a very challenging experience to bringing them all together by yourself. And so because of this as software developer coming in to develop a sophisticated conversational assistant it becomes really hard to do just by using some of these technologies from scratch. So that's why we built the mind-melt conversational AI platform. It involves four different components which we'll be discussing. One is the natural language processor, the question answer, the dialogue manager and the application manager. And these are all sort of like talking to each other. It's a Python based framework and it's used to build conversational assistants. This is what powers WebEx assistant within Cisco. It's an open source Apache 2 license and it leverages open source libraries like TensorFlow, scikit-learn, numpy and scipy in the back end in order to essentially drive the entire experience. It provides functionality to build state-of-the-art models for natural language understanding and question answering. So this is the GitHub URL. It's just Cisco slash mind-melt. It's open source. We have a bunch of pull requests, a bunch of support from contributors around the world. And currently it's being used to power both applications within Cisco as well as outside Cisco. So let's talk about the first component that I described previously which is the natural language processor. This actually contains a sequence of subcomponents that are used to infer the meaning of a user's query and it produces a representation that captures all the salient information within a query. And it applies a combination of techniques that involve both pattern-matching, tech classification, information retrieval and parsing in order to get the right result out. What we generally found is that just by using an end-to-end, for example, a deep learning platform to predict a particular result, it really doesn't, it's not really effective, especially for a production-level application. You need a combination of both pattern-matching, involving even regex for that matter as well as machine learning in order to get the right results out because you will always have some edge cases that you need to take care of. So within the first subcomponent within your NLP module, you have this component called the domain classifier. And this classifier classifies a bunch of bodies of knowledge. So for example, if you are saying something like what's the weather in San Francisco, that is a body of knowledge regarding weather as compared to spots. So it's a pretty simple text classification problem that uses machine learning in order to predict which particular domain that query belongs to. The next subcomponent is the intent classifier. And this is regarding, it's classifying what the particular verb of the query is. So what's the action that you're trying to take? Obviously, not all queries contain verbs, but the intent classifier is generally used for picking an action for a particular query. So for example, when you're saying something like, can I find the Mediterranean restaurant in San Diego? The verb here is find restaurant or something like that. And so for that, you would actually use again a text classification problem or a classifier that uses an intent classifier to do this. And just to make it clear, both the domain and intent classifiers are text classifications. So you're taking a sequence of words and you're having one label that is predicted based on that query. The next classifier is a bit different. It's called the entity recognizer. And what we're trying to do here is we're trying to isolate or classify a phrase or a particular word in a query. So this is slightly different from the previous models. And over here, what you're trying to do is isolate certain phrases within a query. So a phrase could be something like a restaurant name like Starbucks or McDonald's, which are single word names, but it can also be a phrase like, I don't know, Persian restaurant or something like that. Or a movie title. So these are sequences and the model here is going to be a bit different. And it's also going to be a bit more powerful than the previous models, just because for every single word within your query, you're making a classification or a label of what that entity is. And so generally speaking, we use a more complex machine learning classifier to do this. It's usually more heavy duty, something that involves an LSTM in order to classify the entity. There's also a role classifier, which is a sub problem of the entity classifier. So for instance, when you classify a particular time entity, you want to set that to a start time or an end time. So when you're seeing something like set the alarm at 5 a.m., 5 a.m. could be a start time, but if you see something like set, show me movies between 5 a.m., sorry, show me something between 5 a.m. and 7 p.m. 5 a.m. is a start time and maybe 7 p.m. is the end time. So these are subclassifications of an entity. Next up, we have the entity resolver. And what you're trying to do here is you are taking a particular abbreviation or a particular entity and you're trying to resolve that to a canonical name within your business application. So for instance, SF, which is an abbreviation of San Francisco, would be carried out by this component. And for example, lemon bread would be resolved to iced lemon pound cake within your particular application. We use elastic search in order to do this, which is another open source technology. And it primarily uses a technique called term frequency, inverse term frequency in order to retrieve the particular information regarding the canonical name of your query. So this is again sort of already baked into mind-meld that we built. So this comes free for you in order to do this information resolution. And the next step is language parsing. And this is important because you also want to group entities that are logically connected together. So for a query like, I want a pepperoni pizza with extra cheese, a calzone, and two Diet Cokes from Domino's on Gary. Again, you can sort of isolate all these different entities. For example, pepperoni pizza or extra cheese or calzone, two Diet Coke, Domino's, and Gary. These are all entities of interest within your application. However, they are grouped logically in a way that makes sense in only a certain kind of grouping. So for example, the pepperoni pizza and extra cheese are grouped together. Calzone is its own group. Diet Coke and two are grouped together because you are talking about two Diet Cokes and not two Domino's Gary or something like that. So these sorts of entity groupings are important. And this kind of problem for language parsing is actually a very well-studied problem. And there's a lot of research that has gone into how do you group these entities together? We have a certain heuristic that we built within mind-meld and not do this grouping automatically based on some rules that are specified within your configuration. So I've finally talked all about the national language processor itself. It has all these subcomponents that essentially parse out all the information that you need from a particular query. The next step is the question answer. And this is your heavy-duty knowledge base that is used to get the right candidates within your particular application. So when I talked about entity resolution, that's essentially talking to the question answer to get the results out of what the right entity resolution is. And again, this is using Elasticsearch for us, but you can sort of use your own knowledge base if you have a certain preference of what kind of technology you want to use in order to do this. So again, this is using information retrieval techniques and you're trying to match a particular text to a canonical name within your application. The dialogue manager is a very important part about a conversational application. Over here, what you're trying to do is you're matching a particular domain and intent to a particular dialogue state so that you can then give a response back to the user or take a particular action based on what the intent is. And all this happens within the dialogue manager and it basically moves forward to the next dialogue state. So you can think of it as a finite state machine depending on where you are in a particular interaction with the user. So for example, when you say something like, I want five pizzas and then the bot responds, okay, great, what else do you want? And the person says two coffees. That's again, doing a follow up query on the same dialogue state. So you need to have this state machine in order to control the total actions that are taken by the user in order to service a particular request. And that all happens in the dialogue manager. Again, this is provided within MindMeld itself, so you can sort of design your own dialogue state system and essentially give the appropriate response back to the user either which is a follow up or a confirmation on a particular action that you want to take. This all happens within the dialogue manager. And finally, you have the application manager and this is sort of the orchestrator that talks to all the systems. So for instance, when you're making a Facebook chat bot and you're making an Alexa bot, you want to interact both of these clients with the same application manager because this platform itself is generic. It doesn't have to be tied to say Google Assistant or it doesn't have to be only tied to Alexa. It can be a general platform that is being served in the cloud and the application manager allows you to do that by syncing with clients so that you can actually talk to different clients at the same time and give back responses to the appropriate client when necessary. So again, this is very generic. You can sort of like deploy this on just any cloud infrastructure or even on-prem and this can talk to any client that is interacting with it. So briefly talking about just an example interaction of what happens for such a query. So for instance, when a user says schedule a meeting with Janice from accounting from 11 a.m. to noon in Aquarello, what the bot responds at least for this query if it's successful is your meeting with Janice has been scheduled for 11 a.m. today. So we will be talking about what the lifecycle of this particular query is within a conversational application. So firstly, you know, I talked about the domain classifier. So over here, what's happening is that you're classifying this particular query to be in a particular domain. In this case, it's a meetings domain and it's not related to expenses or procurement or travel. So your classifier gives a very high confidence score for the meetings domain, which is correct. The next step is classifying the intent. And over here, the intent here is schedule meeting and it's not cancel meeting or check calendar. And hence, the confidence score for that classifier for the schedule meeting is the highest. The entity recognizer here is doing a lot of heavy lifting. It's detecting that within this query, Janice is a person accounting as a department, 11 a.m. is a time, noon is a time, and Aquarello is a room. So there is a distinction between entities over here. One is a system entity. And this is a sort of lingo that is being used even by TensorFlow, sorry, by Dialogflow, as well as MindMild, where you have these natively supported entities that you don't have to really train your application for. And these are like time entities. 11 a.m. and noon should already be detected on a conversational assistant by itself, just because these are generic entities that should be supported, versus things like Aquarello, which is a room name, is not really a generic entity. So you would need to add your own training data in order to be able to detect some of these other entities within your application. Again, person name, it could be generic, it could be more specific towards your application. In our case, we actually use other techniques to detect person names. For instance, looking at the entire organizational hierarchy within the particular enterprise in order to be able to better detect a person name. Next up, we have is the role classifier. Over here, we have only a role classifier for the time. So in this case, 11 a.m. is the start time and noon is the end time. And once that's done, we call the knowledge base, and we get the resolved entities for every single isolated entity that was detected. So in this case, Janus is resolved to Janus Smith, where the particular employee ID accounting is resolved to corporate accounting, 11 a.m. you get the right UTC time as well as noon you get the right UTC time as well. And Aquarello is resolved to the particular meeting room ID within your organization. And this again is using elastic search in order to do it. And finally, we have the language parser. Over here, you are grouping entities that are logically connected together. So Janus Smith and corporate accounting are linked together because you are referencing Janus Smith, who's part of corporate accounting and not say HR. And the times and meeting rooms are again logically connected because a meeting room has certain bookable times, and so they are logically connected as well. And finally, you have the dialogue manager that goes to a particular dialogue state, and this state is book meeting. And so within your dialogue state, you can sort of have the logic of gathering all these entities together and finally making a call to book that particular meeting room. So right over here, you have an invitee. You do a question-answerer.get with an index of the employees, employee ID, and department. So you have the invitee, you have the location. And finally, you can make an external call to your calendar's API to book that particular meeting room with that invitee. Obviously, this call is to an external service. This is not included within the generic MindMeld platform. But you can imagine sort of making this call to Google Calendar or Outlook in order to get the final result of that query. And so the next step is obviously the natural language response. And so within that particular meeting state, that particular dialogue state, excuse me, for book meeting, you would have to fill some slots in order to give back a response to the user. So these slots are these sort of underlines that you see over here. Your meeting with dash has been scheduled for dash. Obviously, if a slot is not filled, you would ask or re-prompt the user to essentially fill that slot. So this task is called slot filling, and it's sort of done in all conversational applications. And so you want to fill these slots and then give back the response to the user. And so in this case, you're saying that your meeting with Janice has been scheduled for 11 a.m. today. So with those slides, I have briefly gone through how the life cycle of particular query happens in any kind of sophisticated conversational AI platform. The MindMeld platform itself is, again, it's open to all developers. It's a Python-based framework, and it utilizes both machine learning toolkit, information retrieval, as well as dialogue systems in order to build this entire application. We do have extensive documentation that you can use to get tutorials, as well as practical advice on how to build such an application. The community is growing. We actually just open sourced about one and a half months back. So it's kind of a new project, sort of like getting used to the larger community. And it also has pre-built applications called Blueprints, which you can use to sort of play around with a home assistant or a food ordering bot, for instance. And yeah, so please feel free to use MindMeld for your next voice or chat bot. You can find all the details at mindmeld.com. I will extend it for questions in terms of any kind of questions on MindMeld, open source, voice assistance, and so on. Yes. Yeah, so in this case, all you're trying to find is, is there a person entity within that particular dialogue? So in this case, you're talking about, you know, filling it with Janice and a time, right? So in this case, it needs to satisfy two entities. In this case, there is a person entity, and there's a time entity. And once you have both of them, then you can fulfill the request. So it's not doing a combinatorial, you know, exploration of all possible names, it's just trying to find if there are the required entity types in order to fulfill that query. So the number of entity types is generally not that large. You might not be talking about like tens or hundreds of entity types. So, you know, it's a much more easier problem to solve because it's just trying to figure out what are the entity types do I need in order to fulfill that request. Yes. Yeah, yeah. So that's actually a very good question. And that's why we think that having sort of like an open source platform to do this is very helpful than what's sort of like there in the market right now. Generally, what we find is for text classification, which is domain classification, it's quite an easy problem. Like you're talking about an entire, you know, sequence of words and you're trying to make just one label classification for that. So generally, we're talking about, you know, between 500 to 1000 queries in order to make such a classification. So again, this is not using that much data at all. And you can build quite a good production system in order to do that. So for domain and internet classification, we generally don't have that many queries. We use simple models like logistic regression in order to do that. For entity resolution, sorry, for entity classification that requires a lot more data because, you know, it's a lot more powerful for every single word you're making a classification on that. So for that, generally, we're talking about depending on whether it's an open intent and open entity or a closed entity and open entity is something like a person name where it could be an infinite number of variations of what a person name could be. So for that, we're talking about between 1000 to a 10,000 queries in order to make a particular classification on an open entity. You're talking about a closed entity. A closed entity could be something like, for example, time, right, where you don't have that many variations of numbers. And so for those types, we generally have much less training data in order to get a good classification accuracy in order for that to happen. And we're talking about classification accuracies of over 95% on entity recognition in order to get it pretty good for your particular application. Yes. Yeah, so that's actually a very good question. So in the mind map platform itself, we support two different types of question answering. So one is, you know, this type, which is a lot more specific to your application where you're trying to resolve, let's see, you know, for example, Janice Smith or corporate accounting is very specific towards your application. So if we have a tool that you would ingest your entire sort of like iterations of all your HR data, for example, within this knowledge base, again, this knowledge base is based on elastic search. So that is more of the targeted approach of like finding for every single entity type, what are the canonical names within your particular organization. Then there's the more open ended question answering, which is more like, I just want to ingest all my documents within elastic search, and I want to be able to query that so similar to what a search engine would be. And so for that, that's generally more of a FAQ type question answering where you're saying something like, you know, what is the what is the HR policy for my company, you want to be able to retrieve just a large body of text. So mine well supports both types. One is more supervised and the other one is more unsupervised. And again, this is an important distinction because in information retrieval, depending on either this more targeted approach or a more unsupervised approach, some of your parameters for searching would change depending on that. So at least in mine well, we provide you with a particular parameter for an argument in your QA call, where you can just specify if it's structured, or if it's sort of unsupervised or supervised and based on that, it would use a different sort of parameters in order to give to retrieve back results. Cool. Yeah, I think that's it. Thanks for your time.