 Maybe we will just give you a few numbers. So 4.54 billion years. It's how old the earth is Does anyone know the significance of this number 200,000 years? Not quite that probably came before 200,000 years Shout it out if you know Yeah, that's right. I heard that somewhere. This is how long the modern human has been around right and does anyone know the Significance of this number 100,000 years Hang on. It's it's little relevant to what we are Yeah, so we've been using voice as a means of communication for 100,000 years right Which is weird because we've been using voice as a means of communication for so long yet It has taken our personal computing and how we interact with personal computing this long to actually, you know be voice-enabled right, so if you really chart How we have interacted with personal tech and personal computing from sometimes the mid 70s We started off with very basic character mode think of it as Extremely primitive versions of terminal that a lot of developers use nowadays you had like this tiny monitor Which was black and white with like use You know squiggly lines you typed out things It wasn't it was the best user interface, but it was what was there back then Sometime ten years after that in the mid 80s Of course you move to the graphical user interface Era where a lot of companies started operating systems that were graphical in nature, right? So you could see files you could see a word document you could see like an excel spreadsheet Of course, they were very primitive back then and just interacting I mean, I don't know how many people here use computers in like the early 90s like when I first used Windows 95 I found it so difficult to actually move a mouse and drag like a you know like a file or even while playing solitaire It was really difficult to drag like a card because I just wasn't used to that sort of way to interact with computer Of course things progressed and in the mid 90s was the era of the web I was one of those people who use Netscape Navigator and Internet Explorer back in the day We had really really basic HTML websites with blinking text and mark we text that moved across the screen And of course now it's a it's a lot more advanced right and but we still use the web In the mid 2000s, of course came the mobile revolution something that we're all familiar with here in India We might have it might have skipped the web and the GUI and desktop era kind of skipped India as such If you look at it from a popular point of view, but we really felt the mobile era here We we moved from very basic, you know, like handheld You know tiny Nokia phones with like very clunky keypads To the advanced sort of smartphones that we have today and even that if you remember surfing the web Like maybe a few years ago you had to I mean websites weren't really responsive So you have to like zoom in like multiple times to click on a link and You know, but then we Yeah, you have to zoom in like multiple times to actually click on a link and they weren't really they weren't optimized for mobile But then it was people like y'all like designers and coders and people who really worked on You know mobile web who kind of figured out, okay, we have to make these things Responsive right they have to look native to a mobile phone. So that's how you know, we have come to this stage If you can see almost every 10 years there's sort of been a shift in how we have interacted with tech So sometime in the mid 2010s is around now. It again shifted to the era of the voice user interface So we've been using voice as a means of communication for a while, but it's taken personal computing this long To actually come there are a variety of reasons why this has happened One and I can't stress this enough, but I think voice is the most natural way the easiest way to communicate But earlier it wasn't very easy to kind of do this, right? You had systems that could process maybe tens of words to then hundreds of words But just to make something as simple as like say a dinner dinner reservation You you need like extremely good computing power. You also need a lot of data You know to really process what the users trying to say and right now we are at the stage where we have that data We have that computing part. So it's a sort of perfect sound really, right? So voice really represents the next major disruption in computing We are in Bangalore, which is sort of the Silicon Valley of India The lot of startups here in Bangalore, you know most of the for those of you who might not know There are a lot of unicorns unicorn companies actually started here right here in Bangalore and we we hear this word a lot but I think in this case it's true that voice is the next major phase in disruption because It's one of those things we see where voice is everywhere and since it's such a natural way to communicate It's something that we'll be doing a lot more in the next few years Now Hey, I must start off by saying I'm pretty honored to share the stage with Mr. Alan Cooper and he If you saw his talk earlier today, he spoke about working backwards, right? And I think it was a fantastic talk Coincidentally Amazon has this philosophy of working backwards, right where we essentially we kind of apply the same principles that Alan spoke about and It was with that idea, you know, so how many here use a Kindle? I use a Kindle a lot. I think it's a great device That's awesome and essentially we had this idea of having like this device That could allow you to access any book in the world within a minute, right? You could instantly download any book anyone in the world had an internet connection in less than a minute That was the idea and we sort of work backwards towards that idea like Alan said We you know we we didn't think of like constraints basically thought if everything was magically possible what what would be like an ideal sort of device and Obviously that meant you know looking into publishing looking into hardware, but it was with that principle We actually came up with the Kindle similarly we saw this Disruption happen this disruption this phase this shift towards moving to voice and our idea was to have a device that would be like the Star Trek computer the computer in Star Trek, right? Basically, it's everywhere. You can get things done by just talking to it I think for the millennials in the crowd may be Javas from Iron Man or Avengers is probably a better example But that's what we really wanted to do, right? And it was with that idea. We came up with Amazon Alexa and Amazon Echo That's Cross-section of sorts of the device. So let me quickly talk to you about the hardware itself not not doing too much for product pitch, but The the the cool part about the device. I mean it has a great speaker blah blah blah But also I think the very cool part is it has a microphone array of seven microphones right at the top Right and this enables this thing called far field recognition So there was a lot of tech earlier that you know tried something like this But the thing that was very difficult to pull off was far field recognition where you can actually Recognize or listen to someone talk from across a crowded room Even if people are talking in the room, even if there's music playing in the room, right? And this is possible with seven really powerful microphones that you see at the top of the device Cool Now when you when you sort of buy a mobile phone, right? It comes capable It comes it has to your capability right you can I know send SMS you can take photos You can you have a calculator and alarm clock You can even make phone calls something I do think to me must do nowadays But it comes with few of these built-in things But the first thing you do when you buy a new phone is to obviously download apps you get WhatsApp You get all our Uber clear trip whatever Similarly with the Alexa device there are a lot of things that can do on its own You can ask at the time the weather you can have like social conversations with it lists reminders get the news play music But you can also extend its capabilities or really enhance its capabilities by using these things called skills basically their analogous to apps and phones and Right now Alexa launched in India. I think couple of weeks ago officially and there are tons of skills available I think the number in India is specifically 12,000. So yeah, do go check it out So now I've spoken to you a little bit about Alexa. I've used the word echo. Let me just make the distinction very clear When I when I say echo or Amazon echo, I mean the hardware device in itself Alexa is essentially the cloud service that really pass all these devices So I'll probably use them interchangeably But essentially what Alexa is is it's a really powerful cloud service that does some things like, you know speech recognition and natural language understanding and there's a huge machine learning component built into it Right and I'll talk to you about some of those things. It's essentially supported by two really powerful frameworks If the first one is called the Alexa skills kit, which is essentially say you have some content, right? Or you have a service. Maybe it's an app or a website and you want to bring that content Into like an Amazon echo right or you want to bring that to a user through voice enabled You have to use the Alexa skills kit There's also another framework called the Alexa voice service Which is essentially if you have a hardware device anything with the speaker or a microphone You can bring the capabilities of Alexa to that hardware device So maybe you're built like a smart toy or maybe you're built something cool on your Raspberry Pi And you want it to be voice enabled. You can use the Alexa voice service to really Make it voice enabled Yeah, yeah, all of this is absolutely free to So I'll be talking to you a little bit about how you can actually build skills, right? A little bit about the tech and also how voice design plays like the most crucial role in actually building skills for Alexa So let's talk a little bit about actually building for voice with Alexa So this is the sort of typical interaction diagram for a user right here to talk to an Alexa device now Most of the magic as such happens in the Alexa cloud the hardware itself does just two things Right the couple of things that it does is the first one is wake word detection and the second is beam-forming So what wake word detection is is essentially your device is not continually listening to everything you say and recording Because that would really shoot your internet bills up. You really don't want that happening So it listens for this thing called the wake word which in this case is Alexa Right and only once a user says something like Alexa will it start listening to what the user says and then it knows Okay, now is when I know when I need to process what the users say Once it's listened to the wake word and it hears the wake word It essentially does this pretty cool thing called beam-forming where it detects the source of the sound Right and this is pretty much what enables far field recognition Since there's a circular array of microphones It's able to do noise cancellation on the other sides Apart from where the sound is coming from so that it can pick up the sound from across the room So this is all this pretty much all the hardware device does It listens to what the user says after the wake word and then it just sends that to the cloud and in the cloud is Where all the cool things happen If I have to really divide what happens in the Alexa cloud it's a couple of things the first one's called Automate automated speech recognition which essentially is speech to text, but it's slightly more complex and that right So for instance if I were to say a phrase like 40 times you can see that 40 times Those are the phoneme the units of sound for the phrase 40 times Right a user could have meant something like 40 times doing something the number 40 times The user could have meant something like 40 times reference to child drinking time. Maybe you see the child drinker I could have meant I could have said something like 40 times as in the user wants to play golf And it's a reference to playing golf or maybe it's for people who want to play golf, right? It's very difficult to sort of Kind of realize what the user is trying to say what Alexa does is Accurately figure out this by using a bunch of different things one is of course as a designer You have to sort of provide like training data for this to happen, right? So maybe Maybe your skill just needs a number of times something has to be done So you sort of provide that as training data and then thanks to the sort of machine learning That's kind of inbuilt Alexa knows okay the training data says that The skill needs a number of times. So it's probably 40 times It may be a skill has something to do with ordering tea and coffee Right and you're providing that training data. So then Alexa knows okay. It's probably for tea times You know the user wants some time. Yeah. Yeah, so yeah So his question was why do why why does he have to provide the information as a designer or a developer? So like I said your your skill that you're developing Could be something to do with the number of times like how many times you want like a song play Right or it could mean some or your skill could be like a coffee tea ordering skill. That's difficult for you That's something you're developing, right? So you have to sort of provide the training data of the things that a user might say To interact with your skill and then that's how the speech to text happens rather accurate Okay So once that's done once Alexa sort of figured out the speech and converted that to text it's really up to Again the cloud to sort of convert that text into an intent right, so now I'm going little technical here, but Natural language understanding which is basically Understanding what a human is saying and converting that into structured data is is like the sort of key, you know to To having like conversational you are right If I have to draw an analogy to the web, it's very different if you have an app or a website and you have a button that says okay You're sort of relying on the user to kind of figure out if okay the word there is okay So if I click that it probably means I can go to the next step and you're relying on the user sort of innate natural language Understanding say okay if that person click if the person clicks that button. They're going to the next step But with something like voice it's it's a little different right a user might say okay The user can say something like that's good or go to the next step or sounds great or love it or okie dokie Right, so how would you really match all these sort of utterances and handle all of that to go to the next step? Yeah, so to do this you have this thing called an intent which is essentially Like the word says right it is think of it as like a sort of feature your skill might have Let me give you another example Say your skill does weather Right and pretty much the one intent in your weather skill is to get the weather now There are different ways a user can actually say or ask for the weather right a user can say hey Tell me the weather get me the weather what's the weather how's the weather but they can also say something like hey Do I need my umbrella today or do I need my coat today all of these utterances sort of match to the same intent? Yeah, so that's what the cloud does it sort of matches all these different utterances That a user might possibly say and some of which some of which you provide as a designer to an intent Once you get that intent of course it's structured data like I said natural language understanding converts this human conversation into structured data Computer can really understand so in your request you get like structured data which goes to your back end Right in your back end once you get some structured data You can leave it up to your developers to really do what they want right they can you know may be called a database They can call a public API. They can have some hard-coded data. It's really up to you to you know code your skill Once that's done you send your response back So let's take the example of the weather skill someone says hey, what's the weather in Bangalore? It converts that speech to text it converts that to an intent saying okay my intent is like a get weather intent I send that Structured data saying intent get weather intent city Bangalore to my back end which gets the weather and The back end sends the weather back saying hey, it's a pleasant 28 degrees in Bangalore right now That is again sent to the Alexa cloud Which does text to speech? What really again what really differentiates conversation that we are having that we as humans have versus Computer just reading out words is things like you know emphasis or adding Adding like a whisper or maybe spelling out words or maybe saying like a local Indianism for example All of that is taken care of by the cloud by this thing called SSML which is speech synthesis markup language It's pretty cool tech. It's it's it's a tech that's there in a lot of voice technology It's not something only related to Alexa, but essentially with that you can you know add whispers You can add like if you want Alexa to say ballet ballet or matcha you can actually get Alexa to do something so Yeah, Alexa takes care of the text to speech once that's done your device will Tell you the weather or whatever the skill actually does. So that's the sort of general Interaction flow of how a skill works Yeah, okay. Yeah, so there is also the gentleman here ask the question. What is this card component here? Okay. Yeah, so even though Alexa is a voice first device it comes with the visual component as well There is a companion mobile app a lot of experiences are really good voice first But sometimes you want to augment it with like a visual element like say shopping for example So if I say hey, what's the price of a Raymond suit, right? I'll get the price via voice, but I still want to see the suit before buying So you can send like a visual response on to the card I can't do the app Right, for instance if you ask Alexa, what's the weather? It'll tell you the weather in Bangalore like 28 degrees But it'll send the weather for the next seven days in the card so that you have that data on your phone Yeah, it comes on like a mobile app or even on your website if you log in Essentially, it's a think of it as a visual component that really augments the experience. It's still a voice first sort of experience But yeah, you really want to augment Yeah, it is So so we I'll talk a little bit about that in like your design principle But the way we see a lot of these things is how you would communicate with the human Right. I mean if I were to send you an email, it's there But if I were to tell you something now It's it's there in this particularly and then it's up to you to whether you want to remember it or not So it's something similar you can always come back to the skill or come back to me and ask me the same question And you'd probably get the same answer. So that's how we see it. Okay, so I'm here to kind of talk to you about key design principles for actually designing Alexa skills This kind of applies to designing for voice in general, right? It's not specifically with Alexa And and the reason is I think we've all been used to designing visually right now most lot of people I'm guessing a lot of designers in the room you guys would have designed like websites or apps and Even for the ones who probably haven't if I were to ask you to rate like an apps UX Right, I'm sure you guys will be able to accurately tell me what what the good things are what the bad things are because Simply because we've used apps so much right or we've used websites so much It's little different when you're designing for something. That's completely voice first for one There is there are no guardrails as such that UI provides you It's something that's completely natural But the biggest difference is a lot of things that might read well like, you know, like, okay Yes, you're showing like an error message or a success message. They might read well But they don't necessarily hear very well, right when you hear them hear it out allowed There's something very off about it and that's not how users would typically converse Right so because of these things and because you know people find it a little difficult to sort of design voice experiences We've come out with like five principles to really help you design for voice right lot of these principles might seem like sort of common sense, but Again, it's something that you really have to keep in mind while designing for voice In fact, I mean I come from I mean I started off my career as an app developer and a lot of times when we had to Develop an app we always started with like hey, let's see if this API is working. We'd write like some boilerplate code Basically start off with the code like maybe make a call to a database and then sort of say, okay Now let's figure out the interactions. Let's figure out what the user wants It's probably not the right way to do so, but that's how a lot of tech projects kind of start off When it comes to voice though, we really recommend not starting with the tech at all The first thing you literally do is sort of simulate conversations between the user and Alexa like literally Write them down and have people read it to each other purely because then you know how it sounds by year and not like when you're reading it Yeah, so Just going Talking a little bit about the principles The first one states that a skill should have a clear purpose again This might sound very self-explanatory, but I think it's really important to think of interactions that are made either Faster easier or more fun or better with something that's voice first You really do want something Hey, I have a website if I just get it get like this Alexa to read out my entire website It'll be cool, which is not really a great use case right you you really want to think of interactions that can be made either more engaging Or fun or faster By a voice like good examples of things like you know book me a cab like I can just say Alexa book me a cab She says there's a cab three minutes away. Do you want me to book one? I say yes done boom over Another great example is just generally interacting with smart home devices when I enter my house I'm just going to say Alexa switch on the light and it's which which is on the light Yeah, I don't I really don't want to like pull my phone out enter a passcode open the app I have some authentication there and then hit like an on button in the app to switch light on or Even better do it like old school where you're stumbling around in the darkness to look for the switch and then actually turn it on It so that particular interaction is completely made it made easier It's made faster by a voice first and those are the sort of skills that you should actually be I also typically tell people to you know sort of choose use cases that solve like very Modular sort of problems as opposed to saying as opposed to like saying something like hey my skills going to do literally everything I'll help you book a cab book a flight book a hotel Because then you're really setting the wrong sort of expectation Again remember there are no guardrails as such in voice So users can say like random things to your skill right yours yours might be like say I know a skill to book movie tickets, but someone might say something as random as pineapple Right because you're not set that expectation. You're not set that purpose very well So try and set that maybe you know via like a good welcome message or even the name of your skill or whatever, but choose that specific purpose and Yeah, okay The second principle is skills should evolve over time I think this is an interesting one now if you think back to the first time you probably met a friend or a co-worker Think of the sort of interaction that you had with them It it would probably be a lot different from how you interact with them now. This is a good chance You know you introduce yourself. You said hey, what's your name? Where you from like what do you do that sort of thing and That is sort of how we also see Alexa like really evolving over time With the user now evolving over time could mean a bunch of things right it could mean One in terms of remembering Preferences for instance it can remember your name that simplest thing. I say hey my name is so on to the skill So that's typically how that whole process goes With with this though with like a voice base skill. It's the opposite right you really want sort of variety you really want Your skills UI to keep changing in the sense that every time I log on to a skill. I want a different welcome message I don't want the same robotic. Hi. Welcome to the skill right. I want to hey like glad to have you back Sometimes sometimes I want something maybe a little longer like telling me about a new feature in the skill So really try and evolve your skill over time in terms of its UI Which again if you draw an analogy back to the web is not something that we are told a lot to do right because that relies on sort of repetitive motions where a user sort of knows okay this buttons here this menus here So you know these are the patterns I need to do for this particular thing Whereas you know in voice. It's slightly different Okay third one Users can speak actually to your skill again. This sounds very sort of self-explanatory But it's important to remember we are in a voice first Interface right here. So your users cognitive load should completely be On actually using the skill and not on trying to remember things within the skill Right. So a good exercise that I typically tell people who are trying to build skills is to try and think of the previous Conversation you had like just think of a conversation you had earlier today Right and think of the purpose of the conversation Was the conversation social you know where you're saying hey, how are you doing how your kids doing? How was your vacation or was it something that was completely goal oriented where you're saying hey Can you give me directions? Hey, what's the time? Can you pass me this book something that's very goal oriented? Also think of the role that you played in that conversation Right. Why are you the ones asking the question? Why are you the ones? Why are you the one answering these questions? So really think of these things while developing your skill and try and figure out how your users are going to converse What role will your user play at that point of time in your skill, right? Will they be the ones asking the questions or will they be the ones answering the questions? Is the purpose of your skill really social or is it goal oriented? Right and when you start thinking of these things will you actually start thinking about designing for voice first? Because we tend to go back to bad habits and you know rely upon how we've been designing so far and still design for the eye as opposed to designing for the year Yeah, so Like I said your users cognitive load shouldn't be on having to remember like an exact Syntax to operate your skill. They should be also able to operate your skill with their attention distracted as as you would when you're Talking right when I'm talking to a person and someone distracts me ask me for a question I'm still able to answer that person fairly accurately at Alan spoke earlier about you know thinking fast and slow as long as it's a question that Involves something that's thinking fast. I think we'll all be able to answer. So Think of these things again while designing your skill Okay, Alexa should understand most requests now With voice design it again. I come back to the point that there are really no guardrails like there are in UI So sometimes things can get really complicated So let me give you an example Let's take the example of a simple travel skill like something like a clear trip or a go I be both and Essentially the question Alexa asked the user is Where do you want to go this? Where do you want to go? And the user says I want to go to go out this weekend So if you see what's happened there, the user has over answered Right typically in in something like Website or an app you'd have a drop-down. Where do you want to go? You choose from a list of options and that's it But it doesn't work that way in conversation if I ask someone. Hey, where do you want to go? I'd say something like I want to go to go out this weekend So I'm not only answering where I want to go, but I'm saying this weekend as well So it really wouldn't make sense from a design point of view for your next question to be when do you want to go? Because the user has already answered this Right, but unfortunately how most skills of design are very flowchart based like very IVR interactive voice response We're saying press one to speak to this one press two to continue press nine Yeah, so basically what they'd what a lot of skill builders they fall under the strap of saying Hey, where do you want to go the next question will regardless be when do you want to go? You know, then what type of flight you want to take that sort of thing, right? A user can maybe answer all of it in just one shot and say something like I want to go to go out this weekend on a first class Flight and come back next weekend, right? But then it would be completely pointless to ask the next six questions Similarly, a user can do the opposite like suppose a Suppose Alexa asks you, okay, where do you want to go a user can say something like I want to go hiking from Bangalore Right, the user is completely under answered in that case They haven't answered where they want to go but they want to go from Bangalore and they want to go hiking So you've got two other data points, but you don't really have the data point that you have been asked So this provides it's a big challenge in itself really because if you design a skill to be like completely like a flowchart like an IVR You're going to have a really bad experience for your user and your user is going to get frustrated Because then and it then becomes very non-naxial very non-conversation There are ways you can actually do this from a design point of view I won't go too much in detail, but basically there's this thing called graph UI versus frame UI, right? Where in a frame UI you can get into the conversation at any point of time so I can start off with I want to travel next weekend and then say something like I want to go to go for instance, right and There is this concept called slot filling which is essentially you define slots like slots are essentially variable right that complete an intent So assume your skill has three slots Which is where you want to go when you want to go and how you want to go and say the user says something like I want to go to go out this weekend it automatically fills those two slots and realizes okay The one thing that's missing is where they want to go from let me ask that question Right as opposed to it being like this completely linear thing. It's it's little more horizontal And it just fills in the slots and then you realize okay This one slot is missing to really complete the information that I need that's what I'm going to ask the user and then finish this interaction So that's what I really mean by Alexa should understand most requests and the last one is skills should respond in an appropriate way now Again, this might sound like slightly self-explanatory, but I think it's really important why we're actually doing all of this, right? If you really look at it humans weren't sort of designed to Interact or think in like a radio button or a drop-down menu or these sort of constructs, right? So in a sense with like apps and websites, we were kind of forcing ourselves. We are forcing humans to think like computers It's something that's conversational. We're forcing Computers to sort of think like humans, right? So you really need your skill to sort of respond appropriately now this can mean a couple of things It can mean things like okay, obviously you don't have You know like adult content on it or profanity on it because Alexa meant for the family blah blah blah But you also need to realize that you need to shield your users from any error handling Remember, there are no real errors in conversation I'm a fiver if I go to dinner with a friend and my friend tells me something really weird and unexpected I'm not going to say error for not for right. I'm going to say Hey, I really didn't get what you mean or did you mean this or I'm sorry. Can you repeat that? So you really want to shield your users and keep them away from this sort of error There will be times when users say something that your skill doesn't understand Maybe it's something random that they have said maybe it's something that you've not accounted for when you're actually developing your skill But try and handle those instances gracefully like few ways you can actually do this By saying, hey, I'm sorry. I'm learning. Is this what you meant? Or you know, hey, these are the things I can help you with what do what would you like me to do? That sort of thing There are also ways using tech by which you can actually see Things that have not been handled by a skill and then account for them Right, so maybe it's something legitimate that you wouldn't take into account while developing a skill After all, there's no humanly possible way that you can say hey I know all the eight billion ways a user can book a ticket it because again It's completely natural. It can be like a cultural difference. It can be a language difference. So You will find instances where users have said the right things as in they have said, okay Where they want to go when they want to go they've said it in a different way that you didn't anticipate so it's important to kind of analyze all those things and You know added to the design of your skill as well Cool. So those are the sort of five principles that we spoke about So I'll give you an example of a skill that Does this pretty well. There's a skill called travel buddy and you'll see all these Principles being applied in the skill. So basically a user says something like Alexa launch travel buddy and here's what the skill is saying So the skill says hi, I'm travel buddy. I can easily tell you about your daily commute. Let's get you set up Where you starting from so if you see it's it's at expectations. Well, it's chosen It's it has very specific purpose which is to help you with your daily commute, right? It's not saying hey, I'll do like a bunch of things related to like travel and finance and stuff. It has a very specific purpose It's also very goal-oriented, right? There's not much socializing happening with this killer. It's saying how's your day and I'm doing good. It's great to the point right and It also is going to evolve over time by asking questions like where you starting from so the user says whitefield which is a place in Bangalore for those of you are not from Bangalore and the skill says, okay Why are you going I say? Electronic city another tech hub in Bangalore. So then the skill actually gives me very contextualized information and says, okay Great now whenever you you ask I can tell you about the commute from whitefield to electronic city, right? So it's it's evolving right there. It's going to remember my start and my source and destination It also says something like the current drive time is 1 hour and 42 minutes There is an accident on Hosu Road. By the way, this is completely unrealistic time If you were to travel in Bangalore between these places, you would probably be on the road for far longer So it's giving me information that's extremely contextual to where I want to go at that point of time Say I come back to the skill like four days later and I say Alexa launch travel buddy It's not going to ask me these same questions again because it realize it's a commute skill You won't change your commute too often, right? You're going to mostly go from the same place your home to work So it directly just gives me the contextualized information which was your commute is one hour and two minutes as for that day So it's using all those sort of principles that we have sort of discussed Now if you really take a step back While building your skill we sort of follow this thing called crawl walk run But if you look at it, I think the whole voice Ecosystem in itself is is going to follow like a crawl walk run philosophy right where essentially right now We are really really in the starting days. It's something that's going to be very ubiquitous soon But I think all of us are still trying to figure out a lot of things about you know designing for voice because it's so new to all of us So even with your skill you should sort of follow like a crawl walk run philosophy and first sort of determine What's your real core functionality? So if you take the example of that travel skill You know, it's simple thing just give an estimate of how long it would take to get to your workplace. It's very simple No jazz no bells and whistles Then kind of get like user feedback and sort of optimize your skill Maybe you can start getting you know like okay, there's a construction on this road. There's an accident on this road Things like that little more information using like a API maybe so you're really sort of improving like the sort of core functionality of your skill and you also analyze user feedback So you've gotten like you know more utterances that you use a might say and you have handled more of those things well as well And at the end of course, you know You have like this really really evolved version of your skill where you can proactively also maybe alert users of delays Right to be honest. No skill right now is at that phase Not as the ecosystem if you ask me, but that's why we are really headed Cool so that brings an end of sorts to my talk What we really say to get started is to really just start building things right even if you don't know a lot of the tech There are a lot of tutorials and templates that kind of help you build skills and it's really easy to do So just please go ahead and build skills There are some resources that I'm putting up the first one Alexa dot design slash guide is is a sort of summary of a lot of things that I spoke about today and I think it's pretty useful There's a very cool free course on code Academy. So just go to alexa.design slash code Academy and Alexa design slash India has like India specific things. So you'll find some templates local to India You'll find our next events are meetup groups things like that. I just really want to end by saying You know, you guys are designers. You guys are creative people. So do go out and maybe build skills see how you can you know creatively Do voice design and really connect with your users It is something that's completely new. So it's up to people like yourselves to really shape how Voice design is going to be so yeah, that brings an end to our talk I'd really love to hear what skills that you're going to build hit me up on Twitter on my email ID right there and Also, we have this contest running where you can tweet using this hashtag designed for Alexa about skills that you Want to see maybe your ideas that you have a skill and we'll give out an echo to one winner at the end of today Yeah, so thank you so much and you have any questions. Yeah Yes, I think yeah, too. We have like a couple of minutes. So I'll take you Jarvis to launch another app Jarvis ties it all together. So do you do you see it going that direction eventually? Or do you know if that's that's where it's headed? Yeah, that's a really good question And it is kind of where that's headed right even now If you try Alexa and it says if you say something like hey Alexa book me a cab Alexa Inherently cannot book your cab, but there's like Ola and uber in India that can so she'd say something like hey I can open all our uber to book your camp for instance another thing it does is where you say hey Alexa What's my horoscope for today? Right? It says hey, I don't have the horoscope But I can open this skill that will tell you your horoscope and if you say yes It opens the skill and you get the horoscope immediately. So it's sort of moving to that Location that also solves for a lot of discovery problem of skills You know because it's voice first you really there's no visual element for you to look at Alexa and say okay These are the skills I can use right and that is like a companion app and a website But having something like this really solves for discoverability Hi, I had a question. I'll just yeah take that. Yeah, so I think everything I mean it's working great, but we can also see that it's in its initial stages The things that we've spoken about are just kind of convenience things I mean I can check the weather on my app, but it's maybe saving me a couple of seconds if I just ask So is there also some research that Amazon per se is doing to actually solve a problem because none of these are problems that Humans are facing and maybe it's a long journey before that gets resolved But I wanted to know is there anything in the pipeline that can actually solve a problem that we are facing rather than just telling Us what the weather is or you know switch on the lights which I can anyhow do by Going and switching it on yeah, so before I answer your question I'd actually counter what you're saying, but if you look at every tech innovation you can argue the same thing right like okay There wasn't really a problem my horse-drawn carriage could take me to wherever, but then you came up with the car or whatever Like did you really need a mobile phone? I could make calls using so I could really argue that and I think that's If you really look at like apps in the early days They were extremely primitive right and they might not have provided the sort of value that they provide now You had like very basic apps that didn't do too much. So like I said, we're still in the early phase I really think things like You know like smart home devices are just the interaction is a lot better using like something that's voice-enabled If you see like even things like playing music where I could I can get a song instantly I think it's a better interaction as opposed to opening your phone looking for that song typing it out and then playing it So I mean that that's what that's what I'm going I'm getting at that. I think it's genuinely Yeah If you think about infants, they can't read or write yet, but they can they can understand sound and they can speak They can mumble some words So you can think of this as a very fascinating way of getting Infants to expose to internet like they can access things without learning to read or write So there's a lot of fascinating stuff that can happen So similarly for people for blind people for example home screen is a little bit more difficult I think you're now enabling a voice Conversational interface so you can solve a whole bunch of problems that we've not been able to you're right on the you know That the special ability people that Loading the barrier to access tech in general with something that's Conversation right because if I if you take my grandmom I have to really really teach her like a new UI in an app and she finds it difficult still but something That's conversational. You don't really need to educate them and you say you can just talk to this And it would get something done. Of course, there are questions like oh will it you know understand like say Canada for example We are not at that stage yet, but we will be That one last question that I think