 Good afternoon, everyone. Thank you. This is the talk on the Google Assistant. Actually, I didn't properly think through when I was giving the proposal for the talk. The talk title should have actually been Building Conversational Experiences with Actions on Google. But since most of you do not know what Actions on Google is, I thought I'll simplify it by just saying overview of the Google Assistant. Anyway, my name is Mani. I have a very long name. It's so hard that people in immigration offices basically struggle with my name whenever I cross the borders. My name is Mani. I work in the Singapore Developer Relations team. I take care of all of our developer outreach across Southeast Asia. So today, I'm going to talk to you about the actions on Google Platform. But before I start talking, I want to know how many of you have actually seen Google Assistant in action? Raise your hands, please. How many of you have seen Google Assistant in action? How many of you have heard of the word Google Assistant? Through the next 20 to 35 minutes, what I'm going to do is I'm going to talk to you, introduce the platform, what the Assistant can do, what are some of its capabilities, and how you as developers can build applications and experiences on top of it. And just to introduce to you the platform, I'm going to play a video. OK, Google. Hey, Google. Hey, Google. OK, Google. Hey, Google. Play some dance music. Sure. This is fresh air. My guest will be Kimmy Schmidt on Netflix. OK, Google. Count to 100. Sure. One, two, three. Play vacuum harmonica on my TV. 71, 72, 73. Play the Wonder Woman trailer. Hey, Google. Talk to Dominus. Talk to Lonely Planet. Talk to Cora. Show me my photos from last weekend. Your car is parked at 22B. Today on the news. Turn the living room lights on. OK, turning on the lights. I'm back, baby. Hey, Google, drop a beat. For the calling, call Jill. Set a timer. Talk to Headsbiz. And then just for a moment, I'd like you to let go of any focus at all. Just let your mind do whatever it wants to do. Done. Hey, Google, good night. Turning off all the things. See you tomorrow. So what did we just see? We saw a few devices, a few actions provoked by users, few apps came into the way like Netflix. Let's basically separate the different components that we're talking about. First, we have the devices. Devices such as the Google Home. If you carry an Android phone or an iPhone, again, you can access Google Assistant on the phone. You have Stereos. You have R. Oops, someone's pinging me. You have numerous devices that can basically access the Google Assistant. And these devices, we just call them as surfaces because they don't necessarily need to have the same screens. They don't need to have the same kind of input methodologies that our phones currently have. The second piece and the central piece that we're interested today is the Google Assistant. And the last piece is Actions on Google. Actions on Google is the developer platform on which you can build applications or services for the Google Assistant. Please ignore me. Sorry, we use a lot of hangouts and people are likely sending me messages. So that's the Google Assistant. Like I shared with you, on devices where you have a keyboard or even just touching capabilities such as your phone, the Google Assistant can take that as an input. But in situations where such as you have a Google Home where the only way of interaction is via voice, Google Assistant can take that as well. So you have multiple input modalities. It is available on a breadth of hardware, all the way from headphones, Stereos, such as the Google earbuds that we launched, the phones, it's available on Chrome OS. It's available on your wearable devices. So a numerous range of devices estimated over 400 million are able to run the Google Assistant. And it's just not the plethora of devices. It's also available on a range of languages and countries. Just two to three weeks back, we announced what was so anticipated by Southeast Asia developers support for Thai and Bahasa Indonesia. So so many languages support the Google Assistant and apps built on it. It's also supported on several devices. So that is pretty much the opportunity that you have developing on the actions on Google Platform. Apart from this, a couple of things to note is what Google also brings to you as developers as features into the platform. You can understand who the user is via Google Identity. You can actually perform transactions. You can have individual speech recognition. And all this is available to you just like how you would program on a regular phone device. So literally you can bring all the experiences that you are providing to a user on a mobile phone via Google Assistant, which necessarily does not have the same input modality. And it's just not earphones and phones. Just at CES this year, we made a big announcement. Hey Google, good morning. Good morning, Rachel. You can see the full directions on your phone. Hey Google, what should I make for dinner? Tell me the noodles recipe. Great, let's get started. Photos of China. Here's what I found in your Google Photos. Only YouTube videos of pandas. Hey, Rachel, how's it going? Show me the baby's room. You can be quite attractive, but I think what we as developers need to understand now is actually split the entire video and try to understand the different capabilities that the platform offers to us. So what did we see in that video? First, we saw what is called individual speech recognition. When the person said, hey Google, good morning, it said, good morning, Rachel. And that is because it was able to individually understand who was talking to it. And this is super important because, let's say, I say, call mom. And my wife says, call mom. Both are moms are different and it shouldn't trigger the wrong person. So understanding the individual person is a capability. Integration with Google services or third party solutions. Example is, hey, the person has, how far is this place from here? And it was immediately able to pull the data from Google Maps. And not only that, it was also able to send that piece of information back into the user's phone. So I can take it with me as I travel. Other thing, hooks for smartphone devices, such as the baby monitor control. Again, Google Assistant is hooked to a range of smartphone devices, including smart things and other devices. Launch on phone, something we just spoke about, seeing the directions on phone so that you can carry forward that information after you leave wherever you are. The other thing, ability to invoke other applications and continue that is, again, a big feature so that there is no continuity that is lost between one action to the other. I've talked quite a bit, but let me show you some pieces of the code so that you understand how to build applications and services on the Google Assistant. Cool. This is a high level block diagram of how any service on the Google Assistant is built. Let's basically walk through left to right. From the left, user basically says, and I'm going to take an example of, let's say, a supermarket. Here you are as a user trying to make a purchase into, let's say, something like a fair price online, just giving fictitious examples here. You're basically saying, first is what you do called invocation. You invoke the service. Coming from a mobile context, this is equivalent of tapping the apps button on the home screen. You basically say, hey, Google, let me speak to fair price online. And fair price online behind the scenes basically registers their app to this invocation service or this set of words, fair price online. What the Google Assistant does is basically matches these two and then connects it over to the Assistant app that someone like fair price online has built. And then the second immediate thing that an app is supposed to do is basically say, hello, I'm fair price online. I can help you with orders, groceries, et cetera. So it's a welcome message. So that's part two, action response. And then comes the next piece, which is user request. And then I can say, hey, send me two kilos of mangoes to my house. And then fair price online needs to process that input and then generate output. And this is the entire cycle of a conversation. Here is another fictitious example. This is a snap called personal chef who basically recommends recipes based on the food that I have based on whatever my mood is at that time. And if you see, left to right again, first thing that the user does is basically speaks to a Google Home or Android device, iOS device, whatever it is. And important thing to note is that these devices do not process this data. It is carried over directly to the Google Assistant. And it is the Google Assistant who basically has the intelligence. He basically converts the speech into text. He has high level NLP capabilities. He understands who the user is. And he brings in this layer of functionality. And then he says, you're asking for personal chef. Let me see if there's someone called personal chef that I know of. And he looks through his entire directory listing and then says, OK, I know this guy. He's the app called personal chef. Let me hand him over to you. So that's where it says, sure, he is personal chef. And then comes personal chef with his hello or introductory message where he says, hey, what is the mood in which you are in today? So that's the introduction, the passing over of a conversation. And then after that, the conversation directly flows between the end user's application and the user. So you basically ask a question like, well, it's kind of cold outside, so I'd like A, B, C. And again, you need to do some speech to text, parse, and then give a response. But here is the enormous amount of complexity. Doing speech to text, parsing words, here is a simple example that I can give you. It is a very complex task. Look at the first example. Let's read it. Well, it's kind of cold outside, so like something to warm me up like a hot soup and I want it fast. Out of all these 15, 18, 20 words that the user has uttered, ultimately what matters is that the user wants a soup, a hot soup, and he wants it fast. Everything else is basically just the natural way of conversation. And you want some intelligence to basically strip out the unnecessary words and actually capture the important words. And do you know why this is complex? If you just look at this one statement, there are three words that appear which are very conflicting. One is hot, then there is warm, and then there is cold. And it is very easy if I just do a simple grip that I will actually look for all these three words and the soup can basically go lukewarm. And that is a horrible experience. You want some intelligence to identify that the user wants to have a hot soup. And again, here is the case. Let's say the next part of the conversation is I have some hot chicken and I have some chicken and also have some canned tomatoes. Here he is talking about two different things. He's talking about tomatoes. He's talking about chicken. Both are essential to the soup. But one of them is a protein. One of them is a main. The other is more of a flavor. So we want to understand which plays a bigger role. And it's an inherently complex task. And that's why Google acquired a company called Dialogflow. It's a new name, but actually the original company's name was called api.ai. Has anyone heard of this? Api.ai? Few hands. That's nice. Api.ai is a company that Google acquired, I think, roughly 15, 18, 20 months back. And just three to five months back, the name has changed to Dialogflow. Dialogflow does two things very well. One is called intent matching. What is intent matching? When the user says something, you need to identify what action the user is trying to perform. That is intent matching. Second is entity extraction. This is what I was talking to you about. Identifying the tomatoes, the chicken, the different items in that sentence, and understanding what it stands for is, again, a complex task. So entity extraction and also intent matching are two things done very well by the Dialogflow product. So let's look at the entire flow again with Dialogflow in the picture. So the first half of the entire conversation remains the same. So only the second part, that you can actually delegate the task of doing NLP, intent matching, and extraction over to Dialogflow. And then you connect it back to your backing. And I'm going to show you how all this is done in a product. So can everyone see my screen well? People in the back? Yeah, OK. So this is how a project in Dialogflow looks like. There are a lot of terminology, but let's basically go through some of the basics. Here, whatever is shown, EN basically says the number of languages that this bot basically supports. Currently it's English, but I can actually add more languages. Let's go to what is called intents. Like I said, intent is whatever action the user will want to do is an intent. In engineers terms, whatever features you want to expose. Again, going back to the same grocery example, what you want to expose are features like shopping cart, adding items to it, removing items from it, checking the price, all those are features. And then what would the end user want to do? He would want to add items to the cart, those things. So all that is present in what is called entities. My chatbot does one thing. It basically gives me back a birthday reminder. So when I ask him, hey, when was Paul born? When did Mark get married? Things like that, then it's going to give me a date. That's all. Very simple. And within Dialogflow, there is also a way to test this. So again, live demo, let's see how well this works. When was Sam born? So it says Sam's birthday is on July the 4th. So essentially, what happened is it went behind the scenes, hit my backend database. There's not a backend database, but it's actually hard-coded. It's coming from a backend server. And it's basically giving me back a response. So what happened here? Computer basically understood or Dialogflow understood that I'm actually looking for a person's name called Sam and what is called an occasion. In my situation, these occasions are called entities. What am I trying to remember? It's an occasion. And I have defined two types of occasions. One is birthday, the other is anniversary. And I also have the ability to define synonyms for some of these words because sometimes we basically say, hey, add two cans of Coca-Cola. Sometimes we say, get two cans of Coke. Coca-Cola and Coke, we know are the same thing. Computer does not know it's the same thing. So again, we have the ability to define what is called synonyms. Some people call it wedding day. From where I come in India, we don't say anniversary, we say wedding day. So again, you have the ability to define this. So if I basically say, when is Mark's wedding anniversary? Again, it's going to say, Mark's anniversary is on, July the 4th. And the way this is happening is, let me show you pieces of the code. Whenever a when query is triggered, system basically looks and I'm basically giving it some training phrases saying, this is how roughly a question will appear. When was David born? When was so-and-so born? What is his birthday? When is his wedding day? So these are different phrases in which the same question can be triggered. And it also uses machine learning so that you don't have to define all the different cases in which something like this can be pronounced. And then I'm saying, for me to respond to this request, I need two things. One is a name. The other is I need to know the action. So if I basically now ask, what is the birthday? It's now asking me whose name did you say? Because I did not mention any names. And I can solve this problem in two different ways. One is via what is called mentioning a prompt, which is from where this is coming from, whose name did you say? Or you have something called as context, which basically remembers who, what was the subject of the previous question. So let's say I basically ask, hey, buy me two kilos of mangoes. And then next question is, what is the price of it? Then it already understands that I'm talking about two kilos of mangoes. So there is continuity in the conversation. So the last piece, or perhaps the most important piece, is connecting it to a back end. And this is called fulfillment, fulfilling the user's request with data. Like you said, dialogue flow does NLP well, does understand the query well. But it does not have the intelligence to basically serve an answer. For that, you have the ability to connect it to a back end service. That back end service can be anywhere. It can basically be running on a Kubernetes engine. It can basically be sitting in your own happy machine. It can be anywhere. I have hosted my service here in this URL. And I have a very small tiny piece of code. And this is exactly what is going inside my code. Jan previously was just talking about F-A-A-S, or functions as a service. This is a function as a service. I've hosted it on Firebase back end. It's a very simple webhook, waiting for a request in a particular format. And when that appears, all it does is basically finds it, give a name, then puts it into a concatenated string, and always says anybody's birthday according to this is on July the 4th. So that's exactly what that code does. Obviously, you can extend this by connecting it to a back end database, doing a lookup, and then basically responding to the user's request. So that is how easy fulfilling a request is. Easiest way to remember, let's say you have a mobile API back end. You already have an API back end. Just connect that one to this one. And then they both can talk to each other very comfortably. So that is Dialogflow, the product that I wanted to share with you. There are also things like analytics, so that you know how many requests are flowing in, where are they going to, what time, et cetera, how frequently. The other thing that you should actually know is integrations. Dialogflow as a product from Google does not mean that you are again confined to the Google environment of assistance. You can publish your chatbot or your assistant to Facebook, Messenger, Slack, or the numerous other services, because it's hooked to all of them behind the scenes. And what I'm going to do is actually going to do a quick demo of this. I can basically just do see how it works on the Google Assistant. I can click this and it'll create a Google Assistant project. But I've already done this, so I'm going to go into this tab. This is, all right, okay. Can everyone see this? That's nice, all right. So if you look at this, how many of you are familiar with just an app, mobile app publishing, Android app, okay, a few names? So just like how you have a lot of metadata, you want to know what is the targeting levels, et cetera. You have the same capabilities on the actions on Google. So you basically can say, okay, this is the different languages. You can upload metadata such as images, which I haven't done. This is how my app will be invoked, but they remind a sample. And I can provide a lot of other details to it. I also have a simulator in which I can also have a voice only simulation or basically a mobile-like simulation. Let's see how this one works. Talk to birthday reminder sample. Sure, here's the test version of birthday reminder sample. Good day. So it's responding to me good day because I have asked the Google Assistant, and I've put it as a welcome intent to pick one of these phrases. Hello, I'm birthday reminder. Hello, good day greetings. So it can respond in one of these different variations, and this time it has picked this up randomly. So now here I can use when was Martin born? Again, it's gonna say July 4th. Martin's birthday is on July 4th. So the other thing that you should note is that there are two ways in which this response is coming back. One is more of audio, and the other is a simple text birthday on July 4th. And that is coming from here. It's not in the request, it's in the response, it's here. So depending on the device or the surface on which the request is being made, you can have either speech input or just textual input or a combination of both these things. And that takes me into the last part of my presentation. And we have a Node.js library, and I think we have libraries on other platforms as well. And this is how you would structure a back end application, right? You basically have incoming web hook requests, which I'm currently delegating into this response handler. I have the ability to basically query a request. This is all common stuff. But the even more interesting stuff like I was talking to you about is the ability to give what is called rich interactive media in the same assistant interface. So here is a question that is a lot. But what you have to note is you can also display images. You can have links. These are called chips, which are short buttons that are easy for you to invoke. All this is available just by a simple API such as add basic card, set image, add suggestions, chips, etc. The other most unique thing that the Google Assistant offers is that it ensures that the user's responses or the computer's responses does not sound like a robot. It's so important that you give a human character to your application and that people do not feel as if they're talking to a computer. So few design guidelines as developers when you look at this is you'll have to own what is called the persona of the assistant. Just like how in mobile apps you had branding guidelines, you had color schemes and all these other things which were able to communicate the brand of your application. You'll have to own what is called the persona of the assistant in this case. And for that we have tools. One of them is a markup language that we're providing so that you can have very definitive utterances such as giving a pause, right? Or even saying some things very uniquely such as www, or even playing an audio example, or even breaking in terms of a conversation and just putting a pause. The other thing that you can do to ensure that your users feel comfortable interacting with an assistant is to have persistence. Every time user does not want to feel alien when he's starting a conversation, right? So always going back to a past conversation saying, hey, hello, welcome, you're back. Hey, how was your last order? These kind of things still make a human connection so that that makes the end users feel much more approachable, right? The other thing that I wanted to say is do not always just think of one device in mind such as Google Assistant which just has voice input. Think of the multitude of devices on which this is going to be available all the way from TVs to your headphones to speakers. So always have, you have the ability to query the capabilities of a platform and you have the ability to respond to it depending on where the user's made a query. This is what I showed you in the code. And like I said, you can actually have suggestions chips here. Basic cards, carousels, lists. So in case you want a user to basically get something done quickly, example, something that I do very, very often is every day when I go back home, I basically want to know what is, when is the next bus coming, right? I am sure people in Singapore are fairly familiar with this kind of a use case. And I want these suggestions chips to me every time that I launch this application. So again, providing suggestions chips is very, very vital. The other thing in the last few minutes I would say is no user. Also trying to understand where he is from such as let's say you are from the National Library Board, NLB, then one important thing for you to know is where is the nearest bookstore and what are the books available in that nearest library, right? So you have the ability to know the user's address or pick up other information. You also have the ability to sign in, right? So just like whatever is available in the mobile world today is now available to you in the Google Assistant platform. You also have the ability to transact with the user. This is currently I think available only in the US and UK and not everybody but we are looking to roll this out to more regions and more markets. So again, all the way from starting a conversation with an assistant, making the shopping cart or basically adding items to the shopping cart and fulfilling the request and basically making a payment can all be done with the Google Assistant, right? So that's the end to end life cycle or the transaction cycle that can be done and this is how transactions typically look. You have the ability to put in delivery addresses. Basically then look at the entire order. You say place order and then it goes into your order history where again you can basically look back at the orders that you have placed. So really a end to end platform on which commerce and delivery can happen. And the Google Assistant is not limited to the devices such as the phones or the car stereos etc. Let's say you build a custom device today. Let's say you are building a unique hardware. Let's say you are making a coffee machine that basically will respond to voice input. There again you can basically put in the Google Assistant, right? So think of this as just a library that you can basically put into any hardware for that to basically have voice input commands, right? And if you search online we basically have this cocktail mixer which was built sometime in IEO last year where all you have to say is hey build me a mango shake or a summer drink. And what it does? It basically says okay based on the drink it will basically pull in let's say 20% of the glass from the yellow and then the other from the coconut from let's say the white and it can basically do mixing for you based on the request that you have made. Again there is no touching. It's all voice and that has because of the Google Assistant going into the cocktail mixer, right? Conversational design like I said ensure that the conversation is like speaking to a human. There's a difference. There's a lot but we have basically put together a set of guidelines so that you can create better assistant-driven applications just not on the Google Assistant but generally a voice enabled interaction, right? We also have code labs so that you can basically try out some code and we have a lot of videos such as this in case you want to take a photo of the URL, right? That's me. You can follow the Google Assistant and all the latest news on this. That's me again and finally I just wanted to say in case you're a startup who is basically interested in building applications such as this you want to use the Google Cloud platform in case you're a startup you can basically apply here and get $3,000 of credits with that. Thank you. Do I have a minute for a question? No, I'm already over time. Alright, thank you everyone. Bye.