 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager of Data Diversity. We'd like to thank you for joining the current installment of the Monthly Data Diversity Smart Data Webinar Series with Adrienne Bowles. Today Adrienne will discuss natural language processing from chat box to artificial understanding with effective I.O. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them by the Q&A in the bottom right hand corner of your screen, or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag smartdata. If you'd like to chat with us and with each other, we certainly encourage you to do so. Just click the chat icon in the top right hand corner for that feature. And as always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and additional information requested throughout the webinar. Now let me introduce to you our speaker for today, Adrienne Bowles. Adrienne is an industry analyst and recovering academic providing research and advisory services for buyer, sellers, and investors in emerging technology markets. His coverage areas include cognitive computing, big data analytics, the Internet of Things, and cloud computing. Adrienne coauthored cognitive computing and big data analytics, published by Wiley in 2015, and is currently writing a book on the business and societal impact of these emerging technologies. Adrienne earned his BA in psychology and MS in computer science from SU to New York, Binghamton, and his PhD in computer science from Northwestern University. And with that, I will give the floor to Adrienne to get today's webinar started. Hello and welcome. Thanks, Shannon, and welcome to everybody with us today. I hate to be the one that interrupts the David Bowie concert. That's like being the last speaker before lunch. But maybe we can take a break in the middle and play a few more tunes. Anyway, today we're going to talk about natural language processing. And I really want to kind of expand the lens, if you will, to look at the whole field of communications and what's happening with human computer interfaces and some of the technologies that are evolving to make them more personal. So before I get into the slides, I'd like to take just a minute to ask you to think about something. Just imagine, what's the favorite app on your phone? And then think about what your favorite possession is. I hope it's not the app. And your favorite person. And think for just a second about what makes them your favorite or what makes them special. One's a piece of software, one's a physical thing, one's a person. Often the common thread in what makes something special to us is how it makes us feel. That there's some communication or some understanding of us that feels very personal. It could be something about the interface that gives us information in a way that's pleasing to us. It could be, you know, the possession that brings back a memory. There's some context there for why you like it. And with the person, often it's how they make you feel, how they've treated you in the past. But the common thread really tends to be the interface between us and the person or thing. It's how we communicate with them, how we interact with them. And that's a big part of what we want to talk about today. So let's dive right in. Got a pretty simple agenda here. I'm going to set the stage with my thoughts on what natural language is and maybe change the boundaries from what you're used to if you're studying computer science. And then talk about natural language processing basics. The difference between natural language understanding and natural language generation and how they work with each other. Get a basic understanding of why it's so difficult for machines to handle natural language effectively. And then get into kind of the meat of it. What's going on today in terms of tools and services to help you build natural language interfaces. Going from the fundamental, the bottom up from simple chatbots to real conversational solutions. And now I have a few words on getting started. So here we go. What is natural language anyway? If you just take a language class in school and college and high school, you may have taken a language other than English. And I know that we have many people that participate in these webinars whose first language isn't English. So I apologize in advance if I start with an English-centric view of the world. It's basically a natural language we think of as a human language like English, French, German, where to learn a language you need to have a vocabulary. You need to understand some syntax and rules for how to construct parts of speech. And semantics. So syntax is the structure and semantics is the meaning. And that's typically what we look at when we look at an introduction to a new language. If you're writing compilers or interpreters for computer languages, that's really where all your focus is because you have a very well-defined vocabulary, words that can and cannot be used. Syntax, the structure in a programming language is very predictable, if you will. It's either syntactically correct or not. And the semantics that you map to the structures to understand what's being said. Now in natural language, when we're having a conversation or when we're reading something that's been written, there's usually more than just the words. And that's where it gets a little tricky. So I don't want to think about a language as a mechanism for communication between humans, typically, that will capture and convey meaning. So as I'm speaking to you today, hopefully I'm speaking within the bounds of normal English. I will try to keep it syntactically correct. But the issue sometimes gets into semantics. And a little beyond semantics, when I am speaking from a frame of reference that we may not share, what I'm saying may not be what you're hearing. The actual words, of course, assuming a reliable communication mechanism, you're hearing the words. But my intent may be colored by what I'm thinking. So here are the couple of pictures that I just added. The picture of the theater in Chicago. If you've been to Chicago, if you've been to that area and you see that, you instantly know more than the word Chicago conveys. It gives you a context. It gives you a neighborhood. Maybe it reminds you of the Broadway show. But it's more than just me saying Chicago. The word telephone on the telephone booth by itself. That's just a simple description. It's a noun, telephone. But if you see that, it probably conveys instantly that the picture was taken in the UK. It's much more than just the words. So the context around the word of the words may not be part of the formal language. There's no syntax or semantic for the image that goes with it. But when we have a conversation, we generally have much more than words that are being exchanged. And that's what I want to get into today to help understand how we can improve our interfaces to improve our communication so that we can convey much more than the words alone. So when we think about it, sometimes I'm glad that I'm not on video here, but it would be nice if you could see that as I'm speaking, I'm gesturing. I'm not sure it's adding anything to the value here. But in many cases, when you're having a conversation, just the animation of someone's body language will change the way you interpret their words. So gestures, you may have specific gestures, you may have specific symbols, the heart symbol there, the thumbs up that are nonverbal communication. My feeling is that if you can put those in the context of a language, then we should just be able to think of that as being an extension to the language itself. So there are some gestures that are benign in one culture and offensive in another. To me, that's as much a part of the language as the words themselves. And as we go through this, I want you to consider the words of General Michael Hayden who is the former director of the CIA and NSA. And he said that his staff, his senior staff at both places, that you're not just responsible for what you say, you're responsible for what people hear. And I think that's an important thing when we start to look at interfaces and conversational interfaces going beyond simple stimulus response. It's very easy to have misunderstandings in natural language and look at some of the causes for those. But as we start to build systems that are conversational in nature, it's going to be very important to be able to assure to some extent that what you are saying can only be interpreted in one way for the person that you're communicating with. So now we're talking about using human language between person and machine rather than person to person. And so keep that in mind and let's take a look. How do we actually communicate? It's much more than words. There are things that we assume probably without even thinking about it. So if I send you a text at 2 o'clock in the morning and we are close enough that you have a special tone for my text as I have for my children. So I ignore just about everybody else. But if I get a text at 2 o'clock in the morning, my first assumption is, I mean, it's a fact that I got a text. Before I even read it, that tone, I start to think something bad has happened. I have to get from the fact that there's a text to reading the text to understand the concepts and just as important, the intent of the person that sent it. So this is where we get into sort of the negotiation phase of natural language. And when we try and do this in an automated fashion, actually the text example is person to person via natural language, if you will, rather than a system sending me that. But it's the same principle. When we start to do this for natural language processing with a computer, we have to be able to filter out and augment, if you will, what's being received with what the intent was of the sender. Excuse me. So very briefly, in terms of natural language processing, we break it up into two parts, understanding and generation. And a lot of the work that's been done in the first 50 or 60 years of AI was focused on natural language understanding. And this is where it probably would help because if you could see me speaking right now, you'd see that I was giving air quotes to the word understanding. As we talk about this, and I say that something is understood, in the mechanical sense, if it's an application or a service that's understanding it, it may not be done in the way that it's done for humans. But I contend that that doesn't matter. So we need to take a look and see what's the process here and then how can we extend it to understand the effect on the input and provide some effect or some tone on the output. There we go. So in a simple model here, we've got the system in the middle, which is our natural language interface. Or it's an application that has a natural language interface. And it's bidirectional. It doesn't always have to be that way. We can have systems where you just have voice input, for example, or text input, that you don't get corresponding text output. But for the moment, we're going to look for a complete bidirectional duplex, if you will, conversation. So coming in, we've got the text or we've got the voice. And the voice is generally converted into text. We think of that as being transparent at some point because that's pretty well understood and reliable process. So it doesn't matter if we're getting the input in the form of voice or text. But what matters is that we may want to add to it so we can capture more than just the voice. But right now, let's get the words in. We're going to analyze it. We're going to represent it. And we're going to put it in a form where we can take some action on it. That's what the model is all about. How do we represent our understanding of what's coming in? How do those words fit with each other? How do they fit with what we know about the world? And then we're going to use that model and the context of our historical data to generate an appropriate response. So we're analyzing on the input. We're synthesizing on the output. That's a generalization. We may have a system that the input can only be one of 10 possibilities. And anything that the user puts in, we try to map to one of those 10 requests within a domain. And maybe we only have five actions that we can take. But at a higher level, a little bit more abstract, we'd like to be able to say, OK, tell me what you're thinking. I will process it. And within my understanding of the world, I'll give you an appropriate answer. So why is this so difficult? Well, it's difficult because language, natural language, is inherently ambiguous. I don't know of any natural language that doesn't allow any ambiguity. But as a programming language, by design, by definition, if you're constrained enough, there is no ambiguity. So in English, we have rules, right? Like two negatives always make a positive. If I say that is not an uncommon occurrence, it's a negation of a negation. So if I say it's not an uncommon occurrence, then we can interpret that as it's a common occurrence. But we have a rule in English that two positives don't make a negative except it's not formal. But in the case of sarcasm, you get the person that will give you the, yeah, right answer. That's two positives. And it's clearly a negative. So there are things within the language, whether it's structural or by the inflection. If you have people that raise the pitch at the end of a sentence, it generally turns something into a question in English. Excuse me again. So there are things that are in the book that we learn. There are rules for how we process this, but we recognize that in real life there are regional differences, there are dialect differences, and there are experiential differences that make people interpret things slightly differently from the way the rules would indicate. And so to be complete and to be accurate is one thing, but to be useful in real life, your system needs to go beyond the formal rules. So when you look at natural languages, we have grammar theories. I talked about how a language has vocabulary, syntax, and semantics, and that's where you get into the grammar. But there are a lot of different options there. We can have a generative grammar, a system of rules that will allow you to specify all the sentences that you can possibly generate. This is typically how we specify programming languages. If you look at the B and F for whatever your favorite programming language is. But then the reality is that in conversation, we often have things that aren't, strictly speaking, valid sentences. And yet they're easy for a human to understand because we can start to make substitutions. We can have things that are fuzzy. And when we start to try and fill in and say, okay, well, in a programming language, it's very straightforward. We can generally predict what the next type of token is going to be. In a natural language, we may, with experience, be able to recognize that the next thing is going to be a verb and figure out what it's likely to be. That's going to be different in different languages. The position of verbs in English is very different from where they are in German. But because these languages are ambiguous, and because even with the ambiguity, we don't speak precisely in most cases, things like a cultural difference or sarcasm or metaphors, or even a pause are going to be interpreted by the human to have meaning. And so with a computer natural language system, we need to account for those. And just one last vote on the grammar part of it. So if we have something like this, and this is a simple structural representation for how you do an address, it's very easy to process this in the computer if the person has put in complete, consistent, non-ambiguous information. It's very easy for a human to process it even with some missing information because they know what to look for. So the idea is that computationally, the human is doing a lot of work that you don't even maybe recognize that you're doing, but it has to be programmed explicitly into a system if we're going to be able to have the same level of conversation. So how do people do it? First of all, they understand things in context. They understand your frame of reference. You have to be able to resolve ambiguity, which if I start talking about Bob and then later I referred to he, you know that the last mail that I referred to was Bob, and so you pull it together and say, okay, well, he probably refers to Bob. We still sometimes get into a messy situation, but we use our knowledge about the person who's speaking. If I'm trying to do the understanding, what we know about the speaker and the context is how we filter that input. And we also use these visual cues to distinguish between what was said and what was meant to use Michael Hayden's words. So there's a commercial on TV right now for one of the cable companies, and the person installing the system says, oh, yeah, you can get that. Meanwhile, he's shaking his head, no. And you recognize that what they're saying, and the point of the ad is, okay, you may be led to believe that you can get these things, but you can't. And so that's the physical cue, the visual cue. One of the things that we try to do when we're having difficulty in, particularly if there's a language difference or a cultural difference or an age difference is to use pictures that may be more abstract than the words, but convey the meaning very easily. So this is one of my favorites. This is Father of Three Sons, a frequent visitor to emergency rooms. I've seen this diagram many times. When kids are in the emergency room, or actually it's often used for people who don't speak the language of the emergency room, they're asked to point to the picture that relates to the severity of their pain. Right? So is it a mild, moderate, severe, et cetera? And that's all well and good. The problem with that is that two people pointing to the same picture may have very different levels of pain. The diagram isn't calibrated in any way to tell you that. So in this case, I have two sons here, and I can tell you that if they both pointed to a six, I would be alarmed with one and not with the other, because they have different thresholds of pain. And unless you know that, the word and the picture are going to be misleading. So one more thing about the idea of understanding, and then we'll go on to some of the technologies. Again, I think it's very important when we're talking about artificial intelligence to at least subconsciously, every time I say understanding, insert the word artificial, because the way we understand things as humans, the way this knowledge is captured, codified, and stored, and acted upon in our brain is not necessarily and not likely, and in general, it is not the way we're going to handle that representation in software. It's very different. We're not dealing with models of neurons and your hierarchical temporal memory. There are some attempts to do that, but for the purposes of today's discussion, nothing in terms of the capture and codification and use of knowledge refers to a biological model. It doesn't have to. So the fact that information is stored differently and manipulated differently just means that it's a different type of understanding, and we use understanding to represent the concept that we have captured some essential information about the input in a way that we can use to make productive use of it to create context-relevant output. So that's what I'm dealing with, with understanding, and that's important because there are two, at the extremes, there are two fundamental approaches to designing virtually any kind of AI system, but we're going to focus on natural input processing today. And at the extremes, we're dealing with either symbolic logic or statistical models. We're either trying to represent some abstract properties in a symbolic logic fashion where they can be subjected to the rules of formal logic, primarily deductive, inductive, or abductive reasoning, and those concepts. So to use that sort of modeling, we have to have an abstract representation of these concepts. To do something that's purely statistical, all we're looking for is relationships between words, symbols, higher-level constructs that can be mathematically modeled. And that's an important distinction. The fact that we can identify these relationships based purely on a numerical representation, a mathematical representation may be very useful even if it has nothing to do with the way humans are making the same type of judgment about what something means. And I'll give you an example. So this one is from the Bay Eye Labs. I've been using this for a couple of years and the proprietary and confidential doesn't apply. We have permission to use this. Basically, this is a representation of some text in an Al Jazeera publication. And they've pulled out concepts. And even if you just recognize the numbers, if you're a frequent flyer, you may start to recognize that these are all related to aircraft. There's some Boeing aircraft and some Airbus aircraft. And in fact, all the text refers to air travel relationships or air travel nouns. What's interesting about this is that this was analyzed by a system that made no attempt to understand the underlying language. It was representing the symbols mathematically and then looking for the relationships between them and trying to identify concepts based on usage. And so pulling it out. So I would say that concepts were recognized. They were discovered. But in the terms that we would normally associate with natural language, they weren't understood. But in terms of being useful to provide analysis that could be used to direct some output, it's absolutely understood in those terms. The alternative is to use something like a representation of the language constructs itself. And so here I've just got WordNet from Princeton, which is one of the larger publicly available volumes that has understanding of the English language and how words relate to each other, what parts of speech they are, how they're used, the definitions, et cetera. And in fact, this has been used for years on many large projects. So rather than just look for the statistical relationships, now we're trying to describe some meanings. So that's the difference between what we think of as symbolic logic and statistical or what some people refer to as sub-symbolic approaches. One example here that combines both of them using what's called deep QA or question answering is IBM Watson that did the analysis, if you're familiar with how Watson was configured for the game of Jeopardy, it had to do analysis of natural language answers. In the Jeopardy game, you get an answer and you have to figure out what it means and then figure out what's the question that fits with it. So in this case, it was looking for those relationships. And I point this out only because it used a combination of the statistical modeling and the symbolic modeling to come up with a representation to guide the next step. So let's get into how all of this fits with conversational interfaces. This is where we're going. If today you've got maybe a Alexa or an Amazon Echo or Google Home or one of those devices or you use apps on your phone with Siri, you've got something that is taking in speech and representing it in a way that can drive some action. And the action may be to do something or it may be to produce a conversational output. Normally, these things are sort of single cycle. You ask a question, you get an answer. What we're getting to is more persistent conversations, but they're still generally pretty circumscribed in terms of the domain. So we don't have something that can answer questions about everything. In general conversation, probably Watson for Jeopardy was the closest to that because it was multi-domain. But here we have just the input and the output. We're going to take in text speech and gestures, predominantly text today, and output, either narrative text, maybe a story we tell based on the data that can be speech to text, or sorry, text to speech, internally it's stored as a text representation anyway, or maybe haptics. You may, if you're dealing with like a game controller or something, you're going to get physical feedback. So this is where we want to go with the central processing, if you will, the understanding, reasoning, and learning, which is the core for cognitive systems that's in the middle. What I want to look at now is where this is going in the future or where we are today and where it's going. So this is the interface for an application. Sometimes that's going to be completely separate. Sometimes that's going to be the logic at the boundary between you and the application. And so we'll take a look now at the domain of chatbots, and in particular what I call an AI chatbot. So the chatbot today is the new user experience or customer experience that can for practical purposes make or break your enterprise. If you think back to the first question I asked, thinking about the applications that you like or the things that you value, having something that's very personalized and can respond to you in a way that basically tells you that it understands what you mean, that is very valuable. But not all chatbots are equal. So from the simple chatbot here, we've got voice or text coming in. It doesn't matter because, as I say, the process for identifying parts of speech and transforming verbal input into text input is almost trivial at this point. I don't want to spend any time on it. So what's coming in gets analyzed. We break it down and look for syntax and structure, et cetera. Then we have some representation of what came in, and then the system, which may all be within the chatbot or maybe within the application here. We have to say, okay, what are you asking for and can I answer you? And if we assume for the moment that we can, then we're going to respond by generating or selecting a response. So that's the simple chatbot. With an AI chatbot, the distinction I make here is that the chatbot part, the interface, if you will, can learn from experience. So every time it provides a response, it evaluates what happens next to understand the quality of the response, and it uses that to either reinforce the internal system and say, yes, that was a good response. I'll keep doing that. Or maybe recognize that I thought that I had answered the question, but now there's another question that comes. So maybe I didn't answer it and start to update the model. And all of this, if we think of it as a chatbot, is separate from the application itself and the data. So I can put all of this as a front end to something. The application could be something, basically any system. It could be your enterprise, your ERP system. It could be a help system. But we've sort of pulled it out and have the interface separately. As the complexity of the interface, as it gets more complex, if we try to have a broader domain, for example, then we're getting away from the chatbot and we're building a conversational interface right into the application. And for that, I want to just give you kind of an overview of the landscape. So chatbots fall into the bottom two rows here. We've got a responsive system where it's a stimulus response. That's a pretty dumb chatbot, if you'll forgive me, where there's a limited number of possible stimuli. Maybe this is a system for handling customer service for your telephone company, for your cell phone. And so the only things that can do is answer questions about changing service, upgrading, discontinuing, maybe 10 things. And for each of those 10, once I figure out that that's what you're asking, I have a prescribed response. It doesn't matter who you are. It doesn't matter your circumstances. If you ask me to discontinue something, I'm going to go through that routine. So these are the basics, if you will. The smarter ones are where we try to build something that's actually going to have a conversation or a debate. There's going to be that interaction. Today, even the smarter ones in this space are generally restricted by domain. We're getting to the situation that we can have something that's more open-ended, but they're context-based and we can have something that's non-deterministic. And what I mean by that in this context is that if I get an input and I'm the chatbot interface to another application, I can, based on my experience with data, with other users, with other scenarios, with other questions, historical data, based on that, plus what I'm getting now as my current input, I have a wide range of outputs and I may not just be responding to that stimulus, I may be generating something new. And that's where we're going to get into some of the natural language generation. And so these are more applicable for complex systems where there's a deeper or persistent conversation. You're not just asking for one thing. You're asking for things the way you would have in a conversation. And frankly, these could be things that are diagnostic in nature. So maybe I ask you a question because I think that's the question that's going to get me to the answer. But in a real conversational system, if this is the front end for a system with a deeper knowledge than the user may have, I may have to ask you a question in order to get the information I need to provide the answer to the question that you asked. That gets into the, what did you ask? What did you need? And I always go back to my abnormal psychology professor who said, you have to determine first of all, who has the problem and who's paying the bill. The person that's asking the question may not have all the information and may not have the context. So you have to be able to provide that ongoing conversation. The innovative level, which is above chatbots, is where we're going, I think, in terms of conversational interfaces beyond chatbots. So something where we can actually create some content, something novel that hasn't been seen before, rather than selecting it or identifying it. And these things tend to require more attention to the algorithms that are used than the data. So today the prevailing trend in AI is to use a lot of data to train a system and have the system learn from experience with data. Whereas in past years or in the early days, certainly, there's less emphasis on large bodies of data and there's more emphasis on having the expertise or the expert knowledge put in the system. You get back to that if you're trying to come into a system that's going to be more innovative. And so the sweet spot for chatbots today, those first two levels, are where we're building them as an interface to an app where the domain is fairly specific and the task and the set of tasks are fairly specific. So in this example, a pharma chatbot for customer service at your local CVS. If you wanted to expand along either dimension, opening it up so that the same app, the same chat facility would be useful for these types of services in a different domain. We have a customer service for a different business than a pharmacy. That's generally more straightforward than adding to the tasks within it. But gradually, as you start to go up on either of those dimensions, you get towards the requirements for artificial general intelligence. And frankly, we're just not there yet. So looking at the market today, we've got sort of consumer focus versus business-focused markets. And these are some of the leaders in the space, not surprisingly for the enterprise, if you're business-focused. The four major firms in the cloud services market, so Microsoft, Amazon, Google, and IBM, all have services that are available via their cloud platforms to allow you to build a chatbot or a conversational interface for the consumer where you're not going to be building it yourself. Three of the leaders, Apple, Amazon, and Google, have the lower functionality, if you will, chatbots available now, but you're not going to be doing much customization on them. What gets interesting to me is as these things start to emerge and we start to see things like using building systems for the enterprise that end up being deployed in such a way that you can access them via a consumer product. So I just did an interview and there's a video about it online where I'm talking to the CTO for Watson's platform and partnerships about solutions that IBM is working on with Apple where you can build a model using IBM Watson and some of their visual analytics and then deploy it on Apple iOS. So it's not using Siri, but it's combining the technologies for Chloramel on the Apple platform with Watson in the cloud to build something that can be deployed mobile. So these things are starting to converge, which to me is kind of the exciting part of the business. We'll go through the next few pretty quickly. Getting into the idea of perceptive input, this is where we really change and start to look at affect. So now the key to this diagram is that human input, if I have an interaction with you in person, you have all your senses available. You can see, you can hear, you can get the words that I'm saying, but also the context and obviously you have other senses. So that's what's actually there, but what's derived that we have to capture in this model is an understanding what do these gestures, what do these words mean in terms of emotion, what concepts are there, what's the intent, and all of that has to go on the input. And today it's fairly common at this point to analyze text for tone. Here the cloud players has these services. I'm just using the IBM example here. It could be any one of them. But the idea is that by looking at the text, you can analyze the emotional tone based on a model of the words. Again, this could be done with either a symbolic model or a statistical model or a hybrid. But once you start to do that, you get more a richer understanding and again, air quote understanding. So we look at things like tone, emotion, visual, and then on the output side, expressive text to speech. There was an interesting thing at Google I.O. recently talking about one of their advances in natural language processing, which was when you have a signal that is people talking over each other. This is one of the examples we used. It's very difficult for computers in general to pull out one signal from another and understand how these work. This has been a long, been a problem with natural language understanding. But Google has done a really good job in demonstrating this that it can pull it out and isolate. And so you can have multiple speakers being processed at the same time. But if you're listening, you can actually filter out and just listen to the one. That brings me to understanding emotion in both the tone of voice and in facial expressions. And one of the companies that I track on a regular basis for this is something called Affectiva, which is a spin out of the MIT Media Lab. And they've done a lot of work recently using deep neural networks to identify the tone or the emotion of a person speaking, not based on the words, but based on using classifiers to look at how they are speaking. And that's something that, again, goes beyond the words. I mean, you can look at a set of words and say, yeah, these are angry words, but you can say words in a way that words you might think of as neutral. But the way it's spoken would lead a human participant in the conversation to understand that the person is angry. This is really important when you're trying to do things like customer service. It's one thing when you're dealing with fully automated, but even with human call centers, sometimes there's that misunderstanding. And so now this type of technology is being used in call centers to guide the human call center operator to better understand the context and the emotion of the person calling in. The other example that I use here from Affectiva is in what we talked about, gesture and body language. They were one of the pioneers in doing facial emotion detection. So it's similar to the range of emotions that I showed with IBM that's doing it based on the text. Here, this is an example where if you opt in as you're watching something, a video that this case happened to be a Budweiser commercial with a puppy, you allow your computer camera to capture your face and it will track your emotions. Obviously, implication is you can do something like this to calibrate TV commercials or other things, but the basic technology can now be used in a conversational system. So as we're having a conversation, if the person at the other end isn't a person, it's a bot or it's a human that's guided by technology, these things are getting much more accurate to the point where you can use that to guide your response because you have a better understanding of the context and frame it. So one of the trends that I think is important as we talk about these new technologies is to offer emotion detection using visual analytics as a feature or a service. We're going to see that pretty much everywhere, I'd say, in the next couple of years. Now that's all been understanding on the input side, and we use that to create the model so that we understand the context. Now, in the simplest case, we will have a flat effect when we give our response, but if we really want to make a system conversational and have that personalized feeling, what you really want to be able to do is have the effect on the output. So we have to perceive, have the perception on the input, and apply the right effect on the output. So natural language generation, this is the application of AI technology to generate or produce context-appropriate message in a human readable or understandable form. It can be text-to-speech, and it's the context that creates the value. So this is something that can be done in a batch mode, if you will, but we're getting to the point where it fits with conversational mode. So the point here is when we're getting into something that's going to generate rather than just select output, there are so many words. How do we find the right words or the right effect to produce the right result? And for that, the output has to go from the model where we understand, where we've captured the concepts, the meaning, the intent. We know what the person wants. We know what they're looking for, and we have to give them an answer. And so now, on the output, we're going to be using language, avatars, and text-to-speech to complement, and this is a key, complement the tone of the input. You don't want to mirror it. If somebody starts shouting at you, your automated system, that's not an indication that you should be shouting back. There are systems out there now. You may have already tried things like Google Smart Reply, which will attempt to read emails and generate automatic responses for you. These are getting better. The key is that from a modeling standpoint, we've got the data. That's what's coming in, and a model to help us interpret it, and now we're going to generate the appropriate output. And that can support an emotion component for the output if the model supports it. So a lot of things right now are at the stage where the emphasis is on understanding the emotion of the input, but not on producing something that is emotionally complementary. In general, it's selecting or trying to identify a path that will resolve the problem. But where we're going is a much more nuanced approach. And so I'll just mention here just a list of vendors that are doing work in this area. We've actually got a report that I'm working on now. It'll be out in a couple of months, so if you're interested in this technology, do stay in touch. Right now, the major uses for natural language generation are things like producing longer reports rather than being conversational. But it's the same underlying technology that's identifying what's important instead of doing a long-form output. What we're looking for is taking this and applying it in conversational bursts. So this is an example from North Carolina where their clients is a basketball team and the idea was they looked at the data. It wasn't conversational input. It was the data that put in the model and said, okay, we can identify online season ticket holders that aren't using the tickets they're selling them and maybe give them an offer that will help build loyalty. The same type of natural language generation for the output could have been used or could be used if the input was coming directly from the subscriber, the ticket holder. This one is from Narrative Science. A number of ways that their technology is being used now to generate narrative from data. And all of the dominant model for it today is looking from structured data in database as long as the input is coming in and being transformed using this understanding technology. This could be the basis for the next generation of output. One more here that I want to show. Neural behavioral animation. As we get to the point that we're talking to machines and the machine has to have some representation. There's something that we're looking at. One of the most interesting things that I've seen is this animation that's based on a neural model so that the face has an expression that's appropriate for the emotion that it's trying to convey. This is still in the relatively early stages but it makes a huge difference if you're talking to an avatar that the avatar looks like they are feeling the emotion that you want to convey from the underlying model of what emotion is appropriate in this context. So if you're trying to smooth over a situation with a client that's angry at you, then maybe you'll pick a different avatar and have a different expression than if you are trying to sell someone who's on the fence and you're trying to have a more assertive dialogue. So all of this is leading to this next generation. I'm going to wrap up with just a couple more examples. One that you may have seen recently, Google Duplex, which was demonstrated at Google I.O. about a month ago, two months ago. The examples that they used had Google Duplex within Google Assistant making telephone calls, unaided telephone calls to accomplish goals like you tell your assistant to book a haircut for me next Tuesday and Google Duplex technology could make the call, negotiate the transaction and record it using natural language, let's say, on Assistant, but it knows, again with air quotes, when it's getting into a situation that it can't handle. The important thing there is although it looks, it's very advanced news, an impressive demonstration. They will tell you that Google Duplex at this point can't carry out general conversations. It's not meant to do that. It's meant to handle or to represent you as an agent in certain types of transactions. The ethics of using something like this without recognizing that it is a bot or dubious Google has said that that's not their intention. I think the issue is from a technology standpoint, if you start to look at things like this and say, this is something that I want to build into my system, the use case for this was calling businesses that didn't have an automated system or negotiation because the ideal situation is to have your automated agent call the other party's automated agent. So there's no point in trying to have both automated agents speak in English when it would be more efficient and more effective to have them speak in a standard machine language. But this brings me to my last point here, which is if you're building systems like this and you're getting into a conversation, and as I mentioned, one of the most common uses for them today is in call centers, you always have to consider an unresolved issue, which is when do you know that a conversation is going wrong? A lot of times if you're talking to someone, you recognize that things are not going well and there are different cues and different actions that you can take. This just points to a paper that came out of Cornell in May looking at detecting early signs that indicate that a conversation is going to go bad based on some structural elements. And I think this is the kind of research that has great promise. We can often tell when something has gone bad, there are signs besides that somebody is hanging up or that they're asking the same question again and again. But research in this area I think is going to be very valuable in the future to help us to automate the process of bringing in or escalating, bringing in a different human expert or even a bot with different expertise when a conversation has elements that show that it's likely to go off the rails. So with that I'm going to wrap up by saying the state of the art today is that emotion understanding is more mature than conversational generation with tone, but it's advancing so rapidly and there's already a plethora of tools and services out there that would allow you to build a simple chat bot as a front end. But at this point almost any application is going to have some part where the interface would be more intuitive or natural or more personalized if you could have this natural language. And so the advice at this point is to start looking at these for your entire portfolio of applications, deal with the vendors that are building the chat bot services right now. In many cases for specific applications like customer service or within specific domains, you can already get pre-built knowledge so that you don't even need to train some of these. But start looking because in the next couple of years as they mature by experimenting now, you'll have an advantage by the time they become second nature to your audience. And with that, and a minute or so left for questions, I'm going to hand it back to Shannon and I'd love to continue the conversation. If you have questions now, it's great. If you don't, this is how you can reach me. Adrienne, thank you so much for your fantastic presentation and thanks to our attendees. If you've got any questions, we've got a couple minutes just to sneak one in here in the Q&A. And just a reminder, I will send a follow-up email by end of day Monday for this presentation to all registrants with links to the slides and links to the recording of this session. And next month, we are going to be talking about data scientists. Very excited. So I hope you can join us for that on July 12th. All right. Well, thank you everybody for your time today. And Adrienne, thank you as always. We will catch you on the flip side. Thank you. Take care.