 Hi everybody, well I'm so glad to be here today and thank you for that, honestly that's the best welcome I've ever had giving a talk, so that was really amazing. I'm here this afternoon to talk to you all about conversation and the opportunities that lie in front of us right now to use large language models and generative AI to automate at least a portion of the conversations that organizations want to have with their end users and their employees. Conversation is a fundamental part of what makes us human. We engage in conversation every day throughout our lives. We are immersed in it and it's hard for us to thrive without the ability to converse and because of that it can be really difficult for us because it feels so effortless to understand really the marvelous complexity that underlies human conversation. So the first thing I would like to impart to you today is the way linguists think about conversation. Let's think about what we're trying to automate before we automate it. So as I said, conversation is a skill that we use every day. We no longer even need to practice conversation, we do it effortlessly. This is what we call an over-learned behavior. An over-learned behavior is one that is very, conversation as an over-learned behavior is one that's quite similar to walking. We fail to see the complexity that underlies it until you try to teach a machine to do it. I think that many of you will have seen the videos of the Boston Dynamics robots out there in the world trying to walk and they're doing something that is quite close to walking. You recognize it as walking but it's just different enough than what we know as walking that it falls into that uncanny valley. It's just slightly uncomfortable and the same thing actually happens with a lot of automated conversations. Conversation though is quite different than walking in one important respect. I can walk here across the stage on my own, not in any kind of relationship with any of you. Conversation though is different. Conversation inherently is a duet. Conversation can be described as a joint activity. It's something that we can only do in relationship with one another. So it's not a solo activity. It is a two-part activity where we have to work together in order to make conversation work. So this is a really important ingredient to understanding why it can be such a challenge to automate conversations. The thing about that cooperation between the parties in a conversation is that it is such a precise way of engaging. I shared a fact during my tutorial yesterday that cross-linguistically, cross all kinds of conversations, all different kinds of people engaging in those conversations. There's this standard gap between when one speaker finishes speaking and the next one starts. Cross-linguistically, it's a hundred milliseconds, a tenth of a second. Now remember, we have all mastered the conversation when we're very small children. So we manage this precision timing. We manage this intricate collaboration at a time where we can't tie our own shoes. So it's a very high bar that any automated conversation has to clear because this is a skill that we have and that we are excellent at very early in our lives. One way that linguists think about conversation is as a contract. When you enter into a conversation with someone, you are making a commitment, whether you realize it or not. You're committing to paying attention, at least devoting enough attention to fulfill your part of the conversation. You're going to listen. You're going to do your best to understand what the other person is saying and then you are going to deliver a relevant response in a timely way. Remember that hundred millisecond rule? This is what we agree to when we enter into a conversation. In a conversation, you also have what is both a right and a responsibility. If we as humans sense a conversation is about to go off track or maybe it's slightly already veering off track, we have this responsibility to the person with whom we're conversing to do small corrections. When we are in face-to-face conversation and I'm making eye contact with those of you I can see here up close, we can bring a conversation back on track with the smallest of gestures. Raising your eyebrows, gazing a little harder. Maximally we do things like, pardon me, or excuse me, can you say that again? These are things that we do all the time. If you've ever seen a transcription of a real conversation, these small deviations happen continually and we are so great at correcting ourselves in conversations. We are so good at keeping that conversation machine on track that we don't even realize that it's happening. So here's the promise that we're making. When we put conversational AI technology in front of a user, we are promising that this is going to be a conversation. If the bot invites the user to talk, to converse, it's saying, I'm going to play by the rules. And unfortunately this is not always the case. Before I get to the technology, I want to stop doing the philosophy, let's ground this conversation in a real conversation. I like to share this with people because it demonstrates some really interesting things about the relationship between conversation, this over-learned behavior, and language. So keep those thoughts in mind, conversation and language, and let's enjoy this. It's always nice to have a cute baby to watch in the middle of the afternoon. And I have no audio. Team, can we get the audio to work? Okay, now that we have it, can we please, can we go back to the beginning? You'll bear with me for this 30 seconds, it's worth it, I promise. They'll be back, I'm sure. So I'm not going to do a complete pause here. I'm going to spoil a little bit of what I was going to say after the video as we get back there. Oh, we're back. I don't have to spoil it, cert, that that is an excellent example of a conversation. Do you all agree? That was a conversation. And as you probably noticed, only one of the people in that conversation was actually speaking a language. So this is an important thing to remember as you're thinking about what it means to have a conversation. And even when you're looking at the ways we label these various technologies that come into play here, conversation is this joint social activity. It's the back and forth. It's the turn taking. It's the engagement. It's the relationship part of this interaction. Language is something different. Listen, I know we use those terms interchangeably when we're speaking casually. But there's a big difference. Language is this association. It's the code of sounds that relate to meanings. That's what language is. It's a formal system. You don't have to have mastery over that formal system to be an excellent conversationalist. So let's switch around to technology. And how well technology does in conversation. I'll give you a moment to read through this. This is a simulated conversation that I had. It's a snippet of one that I produced with a therapy bot. And I'll give you a moment, please. So this is a simulated therapeutic conversation. I of course went into it knowing this was not a live human therapist. This was an automated bot that I was engaging with. And the first I didn't even show you the entire conversation. There were many turns of the conversation before this. I chose to show you this piece of it because while everything had been going incredibly well, and everything, I was like, this is actually kind of amazing. You know, I have a high bar here, and I do this for a living. It wasn't until that very last turn of the conversation where the bot asks, do you believe it's normal to be afraid? The bot was expecting what? A yes or no response. I gave an unexpected response. And all of a sudden the illusion shattered. That's why I call this a fragile experience. Okay, that what we're able to do with conversational bots is create these experiences for our users that are just like conversation that seem to be following the rules of conversation until they aren't. And it shatters in an instant. Now, sometimes it falls apart in ways like this that is kind of puzzling that makes you scratch your head and go, huh? Other times, it's actually a much more apparent, much earlier in the conversation. I'm going to show you a snippet of another conversation with this same therapy bot. Quite a number of years as it turns out, right? And I don't know if any of you are familiar, how many of you know who Eliza is? How many of you are familiar with the Eliza bot? A few of us old timers, right? Eliza is often described as the first chatbot. Eliza was coded when I was a baby. So it was a very long time ago. The reason I wanted to show you this example is not to say, hey, this is 1966 technology. And we're doing such amazing things. Our bots today, in spite of the almost unimaginable advances in technology suffer from the same fragility in terms of the experience. And there's a big reason behind that, actually. The fancy name for this is attribution theory. So when we're in a conversation with each other, we are automatically sizing up, who is it that I'm talking to here? We're making judgments about the other person's abilities, competences, perhaps deficiencies. This is what allows us to automatically adjust so that I wouldn't speak to you in the same way I'd talk to a five-year-old or the same way perhaps I would speak to my boss. So we automatically do this in conversation. Remember, it's a social joint activity. So it's automatic for us. And guess what? Because it's this automatic process, we can't turn it off, even when we know we're interacting with technology. So when these kinds of experiences happen that shatter that fragile illusion of this being a real conversation, we're putting our users into a really uncomfortable position of being made to feel as if we were trying to fool them, or as if they were duped into having this conversation, because they feel like they put their trust into something that is fundamentally untrustworthy. And that is one of the big challenges for using, for relying on generative AI and large language models across the board to create your conversational experiences. So listen, it's not just me who has this kind of opinion of conversational interactions as being not quite up to par. These kinds of observations are made throughout popular culture. There have been so many Saturday Night Live skits about bad conversational technology. There are fake commercials. You name the TV show, they've done something where they reference bad conversational technology. Beyond the funny kinds of things where it enters our culture in humor, we also see the more alarmist kind of articles, like these three headlines here. A lot of the complaints that you will see, whether it's in a Saturday Night Live skit or in a serious article in a serious publication, they're framed as a complaint with the technology. This is a problem with Alexa. This is a problem with the conversational AI. And I think that's a fundamental misunderstanding of what's really going wrong for people. It's not the technology itself. It's the experience that we have designed and provided to those users. It's that experience that is insufficient. It's an experience that tells the user, you can't trust this other person who we know is not a human person. But we are always conversing with a someone. But this someone in these automated conversations isn't playing by the rules that we all depend upon. This is why you see behaviors in users very often, when they're engaging with a conversational AI, people do what we call telegraphic speech. So the bot comes on and it welcomes you and says, so tell me how we can help you today. And instead of saying, yeah, I'm looking to make a big purchase and I wanted to make sure I had enough in my account to cover it. No, do you know what people say instead? Balance. They give these responses that are so short, it's if they're paying by the letter, right, to give these responses like in a telegram. So this is just a very expected consequence of putting users in a situation where they don't know the rules in this interaction. So I'm going to get us to the technology I promise. But I'm going to do it in the way I do as a linguist. That's where my training is. I came into conversational AI through linguistics. And so if we're going to think about the technologies, conversational AI technologies, let's think about how a conversation works for us humans. So this is a concept, not my own, but it was really formative for me. It's from a book of the same name there, the speech chain. There's a link here on the bottom of the screen. You'll be able to see it in the slides. We've got the speaker side here on the left and the listener side there on the right. Let's start on the speaker side. Every conversation starts in the mind of the speaker. The speaker has an idea in their head. Now, what is an idea? That's a question for a philosopher, not for a mere linguist. But they have an idea. There's something going on in the neurons in the brain of the speaker. Our brains do this incredible thing and they take this ephemeral thing of an idea and they translate it into a set of motor commands that go out to our muscles, to the articulators. So the many, many muscles of the lips and the tongue and the jaw, all of that coordinated with our breathing, so that what we are able to ultimately do is go from that idea and use it to make sounds coming out of our mouths. What's a sound? The sound wave that emanates from our mouths as we are speaking travels through the air as a series of compressions and rarefactions in the medium. The medium here is the air. So when you're hearing me, it's because the breath coming out of my mouth is moving air molecules. Those air molecules are actually going and impinging on your eardrums. This is a mechanical activity. We've gone from something neural to something mechanical. So sound waves move through the air from the mouth of the speaker to the eardrum of the listener. Your eardrum vibrates in response to the vibrations coming out of the speaker's mouth. The brain of the listener does that same interesting process, but in reverse, we go from an acoustic signal, a pattern of vibrations in the air, and translate that into a set of speech sounds, phonemes. Those phonemes are built up into words. Those words are built up into phrases and sentences. And then our brain says, oh, those phrases and sentences have a semantic meaning. So in a very real way, the air that is between us here in the room is transmitting the thoughts from my head into your heads. And by the way, there is a neurological reality to what I've just said. There is a concept called speaker listener neural coupling. You can measure the brainwaves of speakers and listeners in conversation. And they become synchronized. So we are changing each other's brains in conversation. What we as humans do utterly amazing. You can tell I love this. This is why I got into this field. This stuff is so cool. So where does the technology come in? There is a technology at each of these stages along the way. So the brain of the speaker, okay, well, there's no brain, right? There's lines of code. But when we're talking about what is going to be said, that's actually the newest bright shiny object technology. That's generative AI. People linguists, we tend to call that natural language generation, NLG. We do that because it sort of matches up with NLU on the other side. I'll get there in a moment. But again, natural language generation, this is obviously an AI technology. Brand spanking new, everybody is super excited about this. If this is a bot that is going to use voice, we can then take the words that are generated by the AI and use another technology, TTS, text to speech. Those are the robotic, the computerized voices that can speak out words for you. Let's move over to the listener side. In place of the human ear and auditory processing system, there's another technology, ASR, that is automatic speech recognition. You guys have all used ASR. If you've ever done a voice text on your phone, or dictated by hitting that little microphone icon, as you're using your mobile phone, that is ASR technology in action. And then what you get the output of ASR is words. It's a string of words. To interpret that string of words, to do that final bit that happens in the brain of the listener. And again, remember, not a real brain, lines of code. That's when the natural language understanding happens. As you read natural language understanding, put understanding in heavy duty quotes. It is not understanding in the same way that you and I intend the term of understanding each other. These bots do not have... Let me put it the simplest way I know how. No bot, even one that is capable of excellent natural language processing, speaks English. None of them speak a language. We've taught them to observe patterns and to behave in certain ways based on those patterns. But they do not speak the language. So I talked about generative AI, clearly in AI technology, and the newest shiny thing. I'd like to point out to you, though, that all three of those other technologies are currently and have been for many years also AI technologies. There are deep neural networks and back propagation and hidden mark of models involved in things like ASR. All of these technologies here are AI technologies. So keep that in mind when people are thinking, you're reading about, you're learning about these conversational interactions. Everybody's focused today on generating what the bot says, but the other side of it where the bot is able to listen and understand, that's been AI for quite some time now. So let's jump in and talk about generative AI a bit because I think this is our role as user experience people. When we are confronted with a new technology, a new tool that we can use to better do our job. There are things that we can do because of the generative AI in conversational experience design that there is no other way to do. With generative AI, we have the ability to answer the exact question that the user has asked. I've told this story a couple of times in the hallway, so those of you who have heard this from me already forgive me, but this is I think the clearest example. You can imagine asking a conversational bot, is it going to rain tomorrow? The usual kind of answer you get from a bot, at least pre generative AI would be something like, the forecast for tomorrow will be partly cloudy with a high temperature of 18 degrees. There's a 20% chance of rain afternoon. Now, is your answer somewhere in there? Yeah. I mean, the bot did a pretty good job, right? It understood you were asking about weather and you were asking about a forecast and you were asking about tomorrow. So it got lots of things right. What it didn't do was give you a well-formed response to the actual question you asked. With generative AI, your answer from the bot to, is it going to rain tomorrow? Might be something like, there's a small chance of rain tomorrow afternoon. There's a 20% chance. And then you might go on and give the rest of the forecast, but notice the difference in terms of the experience. We can't do that with prescripted prompts because there's no way for us to imagine all the different possible permutations of ways people might ask, even in this tiny domain of the weather forecast. When you've got these bots that handle huge swaths of an enterprise's needs, there's no way we can prescript everything. So generative AI gives us that power and it's tremendous. It allows us a ton of flexibility in the experiences that we deliver. Up here, I'm calling this entity collection. This is one of my favorite uses of generative AI. There's a lot of cases where the user wants to do some task and there are multiple pieces of information we need in order to be able to do that transaction for them. So if you want to book a reservation at a restaurant, at minimum, I need to know what day, what time, and how many people are in your party. You might also tell me things like, I'd like to sit on the patio or I need a high chair or things like that. It is possible to script out all the possible ways that the user might give you that information. They could, weirdly, call up and say, I'd like to make a reservation for Saturday the 30th for five people at 9 p.m. That's really unlikely, actually. More likely, they'll either give you some of that information or none at all. They could just say, yeah, I want to make a reservation. You can script out all the paths where, oh, if they tell me just the date but not the time and not the number of people in the party, how would that flow? Or if they gave me just a time but no date and you can script those out. It is possible. It's a heavy lift. It's a heavy and boring lift on the design side, and it's worse on the development side. But with Generative AI, what you can do is say, hey, I'm looking for maximum four pieces of information, a date, a time, a number of people. Those are required. I need to get those. And they might ask for other special stuff as well. And if they give it to me, great. And if they don't, that's fine. You don't need to ask about it. And instead of it being this giant spaghetti mess of a design and code that looks even worse, it's a single shape there in your design and it just says collect reservation information. It is one of my favorite things because it's good for the user. It's good for the designer. It's great for the developer. Generative AI is great at a number of different things. It's great at comparisons. It's great at summarization. It's great at aggregating. So oftentimes the bot will answer based on not just one piece of information, but... Oh, that's not me vibrating. Bot will answer not based on one piece of information, but on multiple. And so it can aggregate those things together. Okay. My time is up, but I'm going to tell you a couple of other things here. I'm going to keep you for one minute. And I know I am standing between you and afternoon tea, so I promise I will be brief. There are limitations to Generative AI like there are limitations to any technology. If you are dealing with processes that are highly rule governed, either by your business processes, you may be legally constrained by compliance, etc. If you need to phrase a particular response to a user in a very specific way, Generative AI is not your solution here, friends. If things have to roll out in a very specific and predictable way every time, no matter what, you don't need the Generative AI. All that it does is add unnecessary risk. Because remember, Generative AI is non-deterministic. You could ask the same question today, tomorrow, 10 minutes from now. You will get slightly different responses each time, no matter how good your prompt engineering is. That's just the nature of the technology. There's other limitations as well. Wait, let me go back. The thing I want to say about these kinds of limitations is they are addressable. One of the ways in which they are addressable is rather than relying on open source, large language models like Lambda, OpenAI, all of those, people are now creating their own enterprise custom large language models. If you want to make sure that the bot is always answering based on your company, your rules, your information, train it up on just your stuff. It's an amazing thing that you can do. It's one of the things that is a big thing coming in the core platform and other places as well. These kinds of limitations are largely addressed by choosing the right use cases and making sure that you train your language model appropriately. But there are some fundamental limits to this technology. As I said, LLMs are non-deterministic. If you need things to be precise and one in exactly one way, this is not the tech for this problem. But I'll leave you with three final things. We've talked a lot about bias in AI. Listen friends, the bias that's in AI is all down to the bias that exists in the data on which it was trained. If you train an AI with bias data, it will also be demonstrated in the generative responses from that language model. Nobody really talks about cost. It can be pricey. I have seen more than one client say maybe we'll pull that back a little bit and we don't need the generative AI at every single spot here because the costs add up. It is not trivial. And the final thing I will say, and this is something that this is like the least technologically advanced kind of thing you can imagine. A big problem with using generative AI to give you what the bot is going to say is simple latency. The more complex your prompt engineering is, okay, the more that you try to control it and say it should be just this way, the longer it takes for the large language model to compute a response. And remember the timeframes we're dealing with in conversation, 100 milliseconds, okay, you can get in big trouble very easily when you're, if you don't have this latency in mind. That is what I had for you here today. Thank you so much. Look forward to talking to you all over