 Big round of applause. Hi, everyone. Can you hear me? So my session is about conversational AI, Alexa. First a bit about me. I am Sushi Kirk. I'm online available as Gorg Sushi pretty much everywhere. That means on D.Auto, Twitter, Facebook, LinkedIn, Drupal.org everywhere, and Gorg Sushi pretty much. I work for Accenture. My official title is application development manager, whatever that means. But I am tech leading. I'm also trying to build a team. So I'm hiring. So if any of you. All right. So this is the agenda for today. We'll do a, we'll have some theory. We'll do a demo, probably the exciting part. And then I'll try to break down the demo into the Alexa part and the Drupal part and how the two talk to each other. So this is the agenda, introduction to conversational AI, what are the different options we have when we talk about conversational AI, why Alexa, why Drupal, then some basic terms of Alexa, then the demo, and then break down of the demo. And then hopefully we have time for some basic question answers, basic. Disclaimer, no animals, yeah. But the actual disclaimer here is that this is, I am not an expert in Alexa. I have been doing Drupal since 2006, but I'm still not an expert in Drupal. So this presentation is all about sharing my experiences. I may or may not be answering your questions, but there we are. So conversational AI, what is conversational? Who all has interacted with Siri? I'm sure everybody has. Most of you have iPhones. Hey Siri, I don't want my phone to go off now, but hey Siri, get this done. Or Alexa, who has the temperature today, or things like that. So that's what conversational AI is, and that's what our mindset about conversational AI is, but the definition essentially is the official definition, I think, is a set of technologies behind automated messaging and speech-enabled applications. So conversational AI actually includes not only speech-enabled, but also text messaging chatbots that we interact a lot. So I'm sure you must have seen Facebook Messenger apps wherein they pop up with five different, you know, what do you want to do? And then five options. They are also part of conversational AI. And the main thing about conversational AI, they try to provide human-like interactions between computers and humans, and they can act like a human by, number one, recognizing speech and text, understanding the intent of the user, then trying to decipher different languages, because when it comes to natural languages, it's not a straight algorithm. And they try to respond in a way that mimics human conversation. So as I said, conversational AI solutions can be both text and voice modalities, and therefore various channels and devices. There are various channels and devices which offer these options, from SMS and web chats, et cetera, et cetera. We have phone calls, phone call responders, and we have smart speakers, smart speakers like Alexa. And why is this important? Because I know, I mean, when we started the whole website thing, now we keep on talking about mobile-first approach, et cetera, et cetera. But I think the next generation is going to be more speech-enabled stuff, rather than even mobile-first. People are moving away from websites and from even phones and typing stuff. They want to just interact. So that's why it's important. Some stats, as per that research, by 2020, 30% of the web browsing will be done without any screen, and that's where these things come in. Just a quote, I like this quote. Read it through, nothing special about it. All right, so this slide explains how the conversational AI works. So I'll just come over there. To give you an example, hi, I forgot my account. Password, help me. So this is the user saying something, and then it can either be to a voice-enabled thing or to a text-enabled thing. So if it is to a voice-enabled thing, that's the worst part. So there in the ASR, that is automated speech recognition, that comes into play, and then it listens and then tries to comprehend everything and goes to this part. If it is just a text-enabled, there is no ASR involved, naturally, and we go straight here. So this particular box is of, there is a concept called natural language management, and natural language management consists of two parts. Understanding and then generation. So this part, yes, I got somebody saying, hi, I forgot my account password, help. So this part is responsible for converting that into some kind of intent, some kind of dialogue management, and that's when we go here. Dialogue management happens, something happens in the backend, et cetera, et cetera. A response is created, but that response is still not ready for the consumption because we have the ASR in place, so we have to output as a natural language, and that's where the natural language generation happens, and then we get the answer saying, sure, I can help you reset the password. And across the whole spectrum of this, there's something called machine learning and deep neural networks. Essentially, it's all about AI being, trying to train itself and trying to be more, trying to train itself and improve itself, improve its algorithms. So that's in nutshell how conversational AIs essentially work, and what can be the main challenges, especially when we talk about voice-enabled things, languages, dialects, accents, pronunciations, naturally, they are a big, big hurdle. Noisy surroundings, multiple speakers, unscripted questions, unsupported answers, tone and sarcasm. And then, of course, slang and jargon. So Aussies may have a different way of saying something, but the Americans might have a totally different way of saying the same thing. So when we are creating any kind of an application, we have to try and cater to these different things and try and take them into account. Then, of course, there's a concept of security and privacy. We will not be going into that part of it now. This is just a demo. I'm just trying to showcase what I did. So as I said, voice and text. On a technical level, the divide between the two is not huge. As I showed in the earlier diagram, is just the ASR, which is the differentiator. Other than that, most of these things are pretty much the same. And they both rely on artificial intelligence or simple decision tree structures to carry out the conversation, the two-way conversation with users. Some examples of voice assistance. Echo, Alexa. That's what, why we are here. Google Home, Siri, Cortana, if anybody has used Cortana, Microsoft. I'm not very sure of that. All right. Why Alexa and not Google Home? Specifically for me. These are some of the parameters. When it comes to documentation, Google is great, especially for the initial skill creation. The documentation that Google has is amazing. They have the ladder-based approach. But once the initial skill is created, you don't really know what to do next. It's a bit difficult. It's kind of like Drupal versus WordPress. Sorry. But Alexa, the documentation is great once you get over the initial skill. So once you understand the concepts and create your initial, very basic skill, it becomes better. Cloud integration, both are pretty much at par. Code-less skills, the templates in Alexa, they're amazing. Google has some templates, they're not that great. But the current install base of Alexa versus Google, that really weighs Alexa to be in a better position. But to be very frank, the reason why I chose Alexa, I had Alexa at home. So I did not have a Google, so I thought, why not? Go ahead. Now, the question is why Drupal? Well, I'm a Drupalist, that's why. But jokes apart, the reason is Drupal is a content management system. It has in-build workflows, translations, APIs, content modeling, et cetera, et cetera. We don't want people to be writing, the people who are writing content, we don't want to know development. We want them to be able to just focus on content and we want development to be kept separate. And that's why. Also, Google has a pretty good built-in front end, even though we might not need it, but it always helps. All right, so before we start, let's look at the basic terminology in Alexa. So in Alexa, how many of you actually have Alexa at home or have used Alexa at home? All right, at least one person is there. All right, what do you use it for? Are you fairly new to Alexa? Okay, so tell me the time for example. So in Alexa's world, there is something called Alexa skills. So people go ahead and build Alexa skills and then you can go ahead and install them on your Alexa and you can use them. So take for example, the time and weather and blah, blah, blah. Those are almost pretty much built in Alexa. You don't really need to install them. I'm sure you didn't have to install tell me the time app or tell me the time skill, but I use a grocery listing app. It's called R Groceries and it has a Alexa skill. So when you are configuring your Alexa, you have to log in into your phone Alexa, blah, blah, blah. But there you can search for skills and you can just enable it for your Alexa. So in Alexa terms, there are apps for Alexa, but the Alexa term for it is Alexa skill. So that's what we are doing here. Wait, what happened here? So Alexa skills. Now each skill can have multiple intents. So intents are essentially the goals that we want the user to achieve and within the intents we can have utterances and slots. So utterances are the exact terms that you expect the user to say. So for example, what time it is or how's the weather today? Or how's the weather today in Hobart? How's the weather tomorrow in Hobart? All those things are utterances. And within this utterances, so when I say how's the weather today, that today is a slot because today is a variable thing. I can say how's the weather tomorrow. I can say how's the weather tomorrow in Hobart. So tomorrow and Hobart, these are variables. So these are called slots. So that's pretty much when we talk about a basic skill, that's pretty much what we have. We create a skills, we can have multiple intents and within the intents we have utterances and slots. Now utterances, plural, similarly slots, plural. The reason why we have plural utterances is how's the weather today? Is it sunny today? How's the weather in Hobart today? What is the weather today? All these are utterances, different utterances. The way people can ask the same questions, they can change. So that is why it is when you're creating an Alexa skill, it is always useful to have multiple utterances. What do you expect the user to be saying? And again, slots or entities, the slots can also be multiple, like today and Hobart. And the slots can be required or not required because the place may not be required when you say how's the weather today. It will pick up your current location as in what is the current location configured in your Alexa API and then tell you the weather accordingly. It may or may not pick the location of, so if I say now, Alexa, how's the weather? It's not even up, all right, fine. So it may or may not pick up. So the weather today, tomorrow, today, tomorrow is also optional because if you say don't say today and tomorrow, it gives you the current weather and if you don't see the space, the place, that also is, it gives you the current location. So slots can be required, they can be non-required. All right, so for demo, let me see if it is working. So just to explain to you the demo that I'm doing is I've created a very basic recipe site. The different intents are I have an ingredient, what can I make out of it? So if you can just list out the recipes which has this particular ingredient or the another intent can be I know the recipe name, can you tell me the steps? Or I know the recipe name, can you tell me the ingredients that it has? These are the very basic three intents. All right, let's see if it works. Otherwise, we might have to give it a ceremonial sacrifice. Alexa, Drupal cooking. There are no other devices to drop in on. Right. And none of your- Alexa, stop. If you are trying to get into the camera. Sorry, you have to be patient. Alexa, Drupal cooking. There are no other devices to drop in on. Sorry, give it, please, be patient. If you are trying to- Alexa, stop. Alexa, Drupal cooking. Do you want to search for a recipe using an ingredient? Or do you want to find the ingredients of a recipe? I can also tell you the steps of a recipe. Did you hear that? Do you want recipes with- Alexa, Drupal cooking. Do you want recipes containing Drupal cooking? No. Main the ingredient, please. I have coconut milk. Coconut milk. It's gone dead. Sorry. Alexa, Alexa, we'll have to start again. Alexa, Drupal cooking. This is a development skill, so this doesn't normally happen. Alexa, Drupal cooking. I have other options. Alexa, Drupal cooking. Welcome to Cooking with Drupal. Do you want to search for a recipe using an ingredient? Or do you want to- I have coconut milk. Let's just stop this. I'll show them a demo in a different way. So this is Alexa Developer Console. We also have a testing option here. It's not as glamorous as actual Alexa demo, but it's a backup if it works. Which one is it? Drupal virus? Yeah, okay. So zero, one, nine. D-small? Yep. Okay, so I need to just- So I can actually do it through Mike as well as just typing here. Do you want to search for a recipe using an ingredient? Or do you want to find the ingredients of a recipe? I can also tell you the steps of a recipe. Do you want recipes containing coconut milk? These are the recipes containing coconut milk. Coconut rice. Do you want recipe for coconut rice? Alrighty then. The recipe for coconut rice is, take rice, wash and soak it for at least 20 minutes. Drain the rice. Add rice to pressure cooker. Add coconut milk. Add water. Add salt and sugar. Close the pressure cooker and cook the rice for one whistle. Let the pressure settle. Open the cooker and fluff the rice using a fork. Do you want ingredient for coconut rice? Alrighty. The ingredients for coconut rice are rice, one cup coconut milk, one cup water, half cup salted pinch sugar, one tablespoon. Alright, so this was the very basic demo. Now I'll try to break this demo into the Alexa side and the Drupal side. So coming to the Alexa side first. This is the console for the developer, Alexa developer tool. And if you look at this particular name, if you look at the first name there, that says Drupal cooking, that is my skill name, alright? And then these are the various intents that I have created. And then each and every intent has certain things. I'll just walk you through that. I don't have a USB port, it just works. Yeah, thank you. Then whenever you start creating Alexa skills, the first thing you do is tell which language it is in. So for example, for me, I said English Australian. Then you have to say a skill invocation name. So what do you actually say when you say Alexa? And after that, what do you actually say? That is a skill invocation name. So for example, I had said Drupal cooking. So that's why I had to say Alexa, Drupal cooking. And then it would, it should happen. Let me respond to my thing. But so that's one part. Then comes the intent creation. So if you look at my skill, there were three intents that we did. One was I have this particular ingredient. What can I make out of it? So that was captured ingredient and tell me the recipes. The second was this is my recipe. This is the recipe name. Can you tell me the different ingredients? And the third one was this is the recipe. Can you tell me the steps for that? So if you look at the first one, the first ingredient that I have here, the first intent that I have here, that's captured ingredient intent. And these are the sample utterances. I have ingredient. What can I make with ingredient? As you can see, the thing in curly braces, the ingredient here, it's essentially what we had earlier said, the slots. So ingredients in the curly braces, here is the variable that we will be using. And I have just put in two sample utterances, but they can be many, many variants of the same thing. So what can I make with ingredient? I have this. I want to make something with coconut rice. All those things are the sample utterances. And then this is the slot. And then we are creating a skill. We have to actually say, what is the slot type? So that if there is a relevant slot type, we can actually also put in certain validations, et cetera. So for example, if I go ahead and say that my slot type is date time, I can put in validations as well. In my case, that didn't make a lot of sense. So let's look at this slot a bit further. So the slot type is Amazon food. And if I don't, is this slot required or not? Actually, in my case, I'm trying to capture an ingredient to tell the different recipes. So that is a required slot. It may or may not be required in your particular skill or your particular intent. So in my case, it's required. And if whatever the user said and Alexa skill was not able to capture the slot, it just gives you a prompt. And you can actually ask, tell Alexa that what is the prompt? Name the ingredient, please. Also, if Alexa says name the ingredient, please, again, these are the sample utterances. If you remember during the demo, when I said I have coconut milk, it asked me, do you want recipes for coconut milk? So that is a confirmation statement by this, by, so does this slot require confirmation? Just so that I am sure that whatever the ingredient was is the correct ingredient. So I say, do you want recipes for ingredient? So this is my first intent with the slot. Now let's look at the other intents. The second one was find recipe intent. That I know the recipe name from the first intent. The first intent result told me that the recipe name is whatever. Now I want the steps for that. So for that, naturally, the variable will need to be a recipe, and that's what we hear. How do I make recipe? And these are the some simple, some sample utterances. Tell me the recipe for coconut rice. How do I make coconut rice? I want recipe for coconut rice. There can be multiple more utterances. Similarly, just like our earlier one, we have recipe, and again, this is required, and do you want recipe for recipe? And coming back to our third intent, I know the recipe name I need to know. So I have coconut milk, but, and I have, I know one of the options with coconut rice. Let me look at all the ingredients that are there, do I have them or not. So recipe ingredients intent, what do I need to make recipe? What are the ingredients for recipe? This is pretty similar to earlier one, because we have the same intense lot, and it is used in pretty much the same way. So that is the basic skill making that you need to do on the Alexa end, and all this is available via this developer console. You don't have to really code anything here. There are some built-in intents which Amazon Alexa provides to you. So for example, cancel intent, what happens when you say cancel, or stop, and there are some others. Now one important thing that you have to finally do is, I'm sorry, end point. So because in our case, we have, we have created an Alexa skill, when you are initially creating an Alexa skill, there is a testing place, and you can actually host everything on this developer console itself. When you go to the top of it, you do have an option of code, where you can actually see the code. But in our case, we want to link it to Drupal. So that is why what we are doing here is, what is the service endpoint? I'm saying the service endpoint is blah, blah, blah. We will be coming back to this in our next slide. And please note two things here. Number one, this cannot be a local host. You can't create something on locally, because Alexa needs to access that. And number two, it has to be HTTPS. It has to have SSL certificate, and you actually have to define what kind of certificate it is. So that's the thing on the Alexa. And coming back to our presentation, the demo didn't go really well, but oh well. So these are the three intents that we have. So intents and slots. Now, the Drupal side of the demo. We need a D8 install, of course. SSL certificate, and as I said, the D8 install cannot be local. It has to be on externally accessible place. It's externally accessible server. The most important part now is the Alexa module. And this Alexa module actually uses this library. Then we need to have, naturally, the D8 install needs to have content with the correct modeling. And here comes actual custom module development. So we do have the Alexa module, yes, but in order to interact with the custom skill that we have created, we need to have a custom module. So let me try to show you a custom module. So if you look at this particular module, Alexa, and this is now where we get into the backend part of this particular presentation, you need to create a module, a custom module. So Alexa module comes with a custom module called Alexa demo. You can use that as a base. And I'll start showing that to you now. Okay. So I created a custom module called Drupal South. Very unimaginative, I know that. But I didn't. So it needs to have, naturally, it's a custom module. It needs to have an info.yaml and it needs to have a services.yaml. So basically we need an event subscriber and a request subscriber. And that's the only file that is important in this particular module. So let's look at the actual file here. Okay, and I'll try to go presentation mode. All right. Can you guys see the code clearly? Am I losing some people now? All right. And if you look at the different parts of this module, you start to realize what we are actually doing here. The base module I've just copied as all good developers do copy from somewhere else. So that's what I've done. I've copied it from the Alexa demo module and I created a class request subscriber. It implements the event subscriber interface. And this is the whole, the meat of this particular module. So if you look here on request, the event. So event is the Alexa event that is happening that is coming to our callback. I can parse that to get a request and a response. And then I try to get what is the class of that request. If it is a session in request, just say, okay, bye. If it is a launch request, launch request is essentially what I said Alexa Drupal cooking. So that's what it echoes out. Welcome to cooking with Drupal. Do you want to search for a room for Alexa? Stop. It didn't work when I wanted you to work. So off you go. All right. So here I can customize how my Alexa should be responding in different scenarios. This particular scenario, when I say Alexa Drupal cooking, what do I want my Alexa to say? Do I want them to say, okay, go to taste.com and find out your recipe? Why are you coming to me? Or something like, okay, do you want to search for a recipe? You want to see the ingredients, et cetera, et cetera. And I'm sorry, but before I do that, I should actually be showing you the actual site for which I have to just to see what the content is. Not easy. All right. So this is my site where I have put in some content. If you look at the content called recipe. I've added just some sample content. They're actual recipes. They're not just some sample content. And let's try and look at one of them so that we can see how, so the recipe has a recipe name. It has a prep time. We're not using the prep time and the cooking time, et cetera, in our Alexa skill as of now, but we can. This is what is used for when we are doing an ingredient search. So this is just, everybody here knows Drupal. This is just a taxonomy. So when I'm doing an ingredient search, basically what I'm taking that recipe variable, I'm trying to feed it in here and doing a simple view invocation saying this is the recipe with a contextual view saying recipe, give me the list of all the, this is the ingredient, give me the list of all the recipes. Then this is the actual method. And because I was a bit too lazy, this is all ingredients in one particular variable. So this is what I'm responding to. So coming back to our code because, yes, I should have shown this to you earlier. So the first one, launch request, launch request, and I say, okay, this is, do you want to do this? Do you want to do that? Whatever. Now comes if the type is intent, any kind of intent, then I can do a case, switch case kind of a thing. So this one says capturing ingredient intent. And this is the way I'm getting my ingredient name from Alexa's request. And after that, it's all Drupal. I have a view which has an exposed filter. The exposed filter is the ingredient name and that you just list out the names of the recipes. So what I'm doing here is the view name is recipe list. That's the set display and the ingredient, and the argument is the ingredient. I execute it and I just say these are the recipes containing and then it gives me the value of the title of the view result. That's all. If we look at, if I don't get anything, sorry, I did not find any recipe. Similarly, if I look at find recipe intent, so again, dollar recipe. Now this is just a node load and emit out the body field. That's all we are doing here. So load by properties, title is whatever. And then I look at the body value. Sorry for people who are cold. There's some bad code here, but please ignore that. But this is essentially what we are doing, that this is the response text where I say, you know, what is the recipe body, which is actually the method. And the third one is ingredients. So again, taking the recipe, find out the, find the recipes which have the same title and then list out the, that particular variable. Very, very simple. Nothing else. And then Amazon help intent. This was the intent that is Amazon pre-built. What happens when somebody says help during that time? This is what we emit out. That's pretty much it. And once you're done, this is working. This is working ideally. Ideally your skills should be working. Sometimes it does not, but all right. Okay. So in a nutshell, not to do this. I'm just quitting PHP from altogether. All right. So broad steps in a nutshell, creating an extra skill. Make sure the endpoint is correct. On the rule, install, create the content type and some sample content, install the module, create a custom module with the request subscriber, change the endpoint. Test, first test using the inbuilt test tool. That's where we did the demo. And then test using an actual, actual Amazon device. Some troubleshooting steps because that's what I faced. I wanted you to also know. I cannot see my skill in the Alexa app. Make sure that you have logged into Alexa app using the same credentials as your Amazon developer account. I can see the skill in the app, but my echo device is just not recognizing the skill. This is something I really had a major problem with. And this happens when the language of the skill and the language of your Amazon app is a mismatch. So make sure that you make sure that the languages are the same. If you try to change the language of the skill after you build the skill, it will go. Don't do that. And one more troubleshooting. Alexa is not responding to my address. There is a device log that is really useful in the developer console that really helps. So look at that. Take, for example, in my case, it was having a problem with ampersand NBSP thing. I had to string replace that. All right, questions. I was just wondering, in the Alexa developer console, what do we call that? Is it possible to specify different end point URLs per intent and pass through the parameters? No. So the end point is per skill. It's not per intent. Okay. If there was, then you could just point it directly to a view without having to go through the back. They're not. They can configure your Drupal accordingly. Yeah, yeah. Hello, tough luck on the demo. That's brutal. There was no sacrifice. Is there any methodology used to develop those intents, like designing the conversational language at all? Not really. This was, as I said, this was nothing. This was just a side project I was doing. So I didn't really have time or the bandwidth to do that. By the way, there is a module that has come on Drupal.org recently. This was launched just the day before Drupal gone Amsterdam. It is supposedly it tries to, it helps you in creating a skill on Alexa. The Alexa skill from Drupal, but currently it's not working. There are some errors, et cetera. It's fairly new. I plan on working on this module a bit more, but there's no methodology that I followed. Any more questions? I'm just interested in making it a bit more human. You know the require confirmation thing. If you turn that off, would she just give you the answer straight away? And you know how you say, you talk about coconut rice, and then you say give me the recipe for coconut rice. Is there any way you could make it to say just give me the recipe for it? Yes, there are ways to do that. It tries to pick up the earlier. It does that. I have not really tried that, but yes, I have seen skills that do that. I was just going to ask with the, you know I've said like define the different phrases, call it the terminology. Yeah, I don't see it. Does Alexa automatically, is it like good at interpreting? Like if there's like another word and they didn't expect or like use a different word, does that have to be exact? No. So for example, when I say give me the recipe for coconut rice, if I say give me our recipe for coconut rice, it will not understand that. Unless until I actually give it another. So you pretty much have to define as many audiences as you can. Yes, yes. Okay, that's cool. Anyone else? Thank you. Siri, repeat after me. Assistant, not a pirate with beautiful plumage.