 Let's welcome the Senior Technical Evangelist of Amazon, Hermann Viscousso. Hermann, are you there? Yes, yes, I'm here. Welcome Hermann. I'm fine, how are you? Doing great, thank you. Fantastic. Thank you for the great introduction. Take it away, Hermann. It's all yours. All right. Thank you. So, let me start my presentation. All right. So, my name is Hermann Viscousso and I'm a Senior Technical Evangelist in the Alexa team. I do a lot of technical work related to Alexa every day. And that allows me to be in close contact with the latest features and everything we're doing to make Alexa smarter. There are a lot of things that I want to cover today and very little time, so it's very difficult to cover then only half an hour. But I wish that I can give you at least a glimpse of the latest progress that we are making with Alexa. Okay. So, if you want to continue the conversation with me later, maybe you can reach me via Twitter. That's why I added to Twitter there. So, let's get down to it. So, when the first pitch for the Echo device started circulating in Amazon, and that was before, a few years before 2014, the first reaction was, this is going to be very hard, as it would require a lot of inventions, right? There were a lot of things that were not... Sorry to interrupt you, but we're not seeing your screen. Are you sharing the screen with us? One sec. Oh, sorry. Just give me one sec. Right. There you go. Do you have to choose full screen? Can you see it now? You can see it. Okay, perfect. So, as I said, you can reach me via Twitter if you want to continue the conversation. We missed one slide, which is this slide here. This is the original launch of the Echo device in 2014. And before that, a few years before, we were thinking that we had to invent a lot of things to get this technology out, you know? So, a few very important things had to happen before we could get here to this point in 2014 when we launched the original Echo device. So, first, it was the rise of machine learning applied to voice processing that needed to be more mature. And that specifically advances in deep learning applied to voice. And that only happened to give you an idea like about 10 years ago, okay? As you will see during this talk, it will strip all machine learning-based processes from Alexa that technology will not be viable today, okay? All right. So, on top of that, the incredible compute power needed to process billions and billions of voice requests had to be tackled by mature and scalable cloud computing services. The offering became more mature at about the same time, say, 2010 or around 2010. And to put it in context, we launched AWS to the public in 2006. And Alexa runs on AWS services. Today, as long as you can pay for the services, you can tap into as much computing power as you really need with great scalability options. So, we were at the right time then. The third trigger of this evolution was the natural evolution of user interfaces. We seem to have like an interface, like a paradigm shift in interfaces about every 10 years or so. And we went from input devices, classic like keyboards, et cetera, to mouse, to touch screens, et cetera. And we really voice is the next interface milestone, okay? But voice is special. It's not just another interface. It's special because the interesting aspect of voice is that our brains are already wired to learn languages from birth, you know? Voice is natural to us. It is the first thing that we hear when we were born, like our mother's voice, right? And it is the first interface that we don't need to learn afterwards. As long as we know how to talk, we don't need to learn something specific to start interacting with technology, okay? So we brought machine learning, cloud computing capabilities at that time, and this natural evolution of user interfaces together to create Alexa. And we welcome voice because we are wired for speech already. And that's why we have seen such a meteoric rise of Alexa since 2014. So here's some historic context. I don't want to stop for a long time here, but the evolution of user interfaces is a very simplified way. The point here is that we have a paradigm shift about every decade or so that allows us to experience a new magic moment, you know? Today's magic moment has to do with voice user interfaces. So why is this all happening now, you know? Voice recognition has been around for some time now, but ASR, which is automatic speech recognition, which is technology that processes voice and reverses into text, has been around for some time, but has only seen marginal returns every decade since the 70s. These were all improvements over the same classical algorithms or switching to different classical algorithms, as you can see in the graphic here. But after 2010, there is a spike, as you can see there, in the ability for machines to translate speech to text. If you take a look at the scientific papers around the time, you can attribute this to new machine learning techniques being applied to voice signal processing. In just a few years, we make more progress than the previous 30 or 40 years of research. And this puts the technology very close to human accuracy levels, when transcribing speech, and it is now... Right now, it is the point, the single point in the all of AI history where we achieve this milestone. So it's a very special moment for all of us, I would say. So here you can see some of the latest announcements, the latest echo devices. They are like a spheric right now, and they include several improvements, but it seems that we announce new devices like every September. When we announced the second generation echo show, it was September 20, 2018, then we announced a whole lot of, like 14 products and services last year, also in September, including the Echo Studio, and now this September, we announced the new echo. The new echo device, besides announcing a lot of ring-related devices, as with the Echo Studio, the new echo device is adaptive, and it automatically senses the space to fine-tune the audio playback. So that's really something, you know, it's really, really awesome. But we also introduced the Echo Show, which is our device with the rotational screen that you can see right there in the back of those devices. But these devices are special, not only because of the improved audio, the design and the features, but also because they equip a new chip that opens up the door to new possibilities, and that's part of my talk today. The new echo is powered by Amazon's first generation AZ1 neural edge processor, which is a model that's purpose-built for accelerating machine learning applications. With the chip, we can run powerful inference engines quickly on the edge. So these inference engines are running on the device sensitive to this chip. And we started with an old neural speech recognition model that makes the request faster. Basically, the recognition happens much faster as it's happening on the device, and makes Alexa more responsive. I'm going to comment on this on the next slides. So if you take a look at this graphic here, you'll see that this is what we introduced in 2014. We invented to be great at ASR, which is automatic speech recognition that I already mentioned, NLU, which is natural language understanding and text-to-speech. So this is more or less how Alexa works, the building blocks of Alexa. You get an utterance or a phrase from the user, then you process it in this first step, which is ASR, converts the audio into text. Then when you have the text, you need to decide what the text means, and that's NLU, tries to assign a meaning to what the user is saying. And then you have another phase which involves converting the response from Alexa, like for example, if you create a skill, you can send a response back to the user, back to voice, right? And that's called TTS, text-to-speech. And those are the basic areas that are really good in the voice assistant for it to have a high quality. Okay? Our vision is really to make interacting with Alexa as simple as speaking to another person. So this is just basics, and it's a little bit like transactional if you think about it, right? Now I will share with you a few AI advancements which are bringing us closer to that vision of making this more conversational. Okay? And then I will talk about self-learning or the ability that allows Alexa to learn without supervision, okay? So when we misunderstand someone, for example, we are able to quickly course correct by picking nuances in how the person responds or by asking clarifying questions. Like us, Alexa is now using similar self-learning to automatically correct her mistakes by learning from customer feedback signals. These signals include vocal frustration such as Alexa, someone saying Alexa that's wrong, for example, or direct interruptions such as Alexa stop after some frustration with a specific interaction. Once Alexa determines that the particular action was not satisfactory, she automatically caress herself without any human involvement. And one example would be like multiple interactions where the user sent a correction right afterwards. So let me show you an example to make this more clear, okay? Suppose that we want to play a song and we don't remember the name of the song. We say, Alexa play a little short song. That's not a real song, actually. It's not in the catalogue. So Alexa would not be able to play it, okay? But what if several users, let's say enough users correct themselves right afterwards, they remember the song and say, oh, what I really meant is play baby short song. And they basically say that to Alexa right afterwards they made this mistake. Alexa has a system that can actually learn from those mistakes. There's automatic error detection, like these type of errors. There's a reformulation of the query. Basically Alexa rewrites the query correctly because Alexa knows that you meant play baby short, right? And it's paying attention to implicit and explicit feedback to do that sort of correction. So afterwards when the user is actually asking for, hey, play little short song, Alexa would directly play baby short and making an interpretation that is what you really wanted without failing in the interaction, right? So that's one example that we call automatic equivalence class learning, okay? Now we are now talking about self-learning and a step forward by enabling customers, that enables customers to directly teach Alexa, right? But we have an experience right now, which just announces in September, where we want you to be able to directly and interactively teach Alexa something, right? So what I'm going to show you now, we call this Teachable AI and it's, we have our BP and head scientist at Amazon Alexa, which will be showing how to actually teach Alexa something very specific. Let me show you the video right now, okay? Alexa, set the thermostat to Rohit's vacation mode. What do you mean by Rohit's vacation mode? 60 degrees. The heat set to 60. I will remember Rohit's vacation mode means 60 degrees. Alexa, set the light to Rohit's reading mode. What do you mean by Rohit's reading mode? 40% brightness. Okay, I'll remember that Rohit's reading mode means 40% brightness. Now, just to show that she has actually learned what reading mode is, let me first turn the lights off. Alexa, turn off the light. Okay. Alexa, set the light to Rohit's reading mode. Okay. So this is a specific interaction where Alexa is learning something in a custom way, just for you, just Rohit's learning reading mode, et cetera. So you can teach Alexa specific things that are applicable to you. And we hope that we can enable this feature starting next year. The other area that helps us get closer to conversations is context awareness, right? And it's really awful when you're talking to someone that doesn't remember what we just said, right? And that's something that tends to happen sometimes with voice assistants today. So we are working really hard on that area too. So let me show you one example here. Suppose that you say, Alexa, what's the weather like in Madrid, right? That's easy. Any voice assistant can actually solve that. And you probably get the right weather in Madrid. But what if the user says, like immediately afterwards, what about tomorrow, right? If you think about the phrase, what about tomorrow? For a human, that's super easy. We know what we're talking about. We have context, et cetera. We're talking about Madrid. If a voice assistant, if it's not properly handling context, what about tomorrow doesn't have enough information? It's basically referring to tomorrow, but it's not referring to the weather, to the city, et cetera. So we added a new layer for contextual interpretation that is basically using machine learning to carry over the context of the conversation, like for example, that we're talking about the weather, that we're talking about Madrid, et cetera, to make the rest of the interactions more meaningful and more right to the point, okay? So we have this system up and running in several interactions, and I hope that you can try it out later. All right. So another area of innovation, and I'm sorry that I'm going quickly because I have a lot of ground to cover in just under 20 minutes now. So another area of innovation is understanding the state of the user, right? So being context aware by understanding the state or persona of the user, that's what I mean. And we keep innovating in that area for specific interactions. So Alexa can detect, for example, if an interaction was frustrating for the user and try to remediate that. That's something that we are starting to support now. Let's see an example of how context is interpreted based on factors like how the customer is actually talking. Let me show you the video. Rough day. What's for dinner? Who knows? Would you like to place an order or track an order? So Alexa Whisper, we call this Alexa Whisper, detects when the user is whispering and whispers back the response. Here we are using machine learning on both sides of the equation to detect the whisper and to generate the whispered voice of Alexa. Okay. So let's talk about understanding context without words. Okay. Some time ago we announced Alexa Guard. You probably heard of it. It's an exciting technology that allows us to extend a wayward detection beyond words. Alexa Guard right now detects glass breaking in your home, for example, also CO2 allowing detectors. And it's using the same technology that we use for wayward detection, which is when you say Alexa, for example, when you say Echo. It's the same technology, but in this case the neural networks are trained specifically to detect the specific sound patterns. Okay. So we are now extending this technology to other sounds. For example, baby crying, that's one that is coming. And this all happens on the device. Okay. So this pattern detection in this specific type of sounds are happening on the device, not on the cloud. And we hope that the new ACID1 chip that I mentioned before is going to help to boost this functionality. What happens when a sound is detected? You will probably, you get a specific alert in your phone. Like for example, hey, there's a glass breaking at home. There's an alarm sign in the home. Okay. Via the Alexa app. So since it's critical to detect an alarm, for example, CO2 alarm, it needs to run on the device. Because if you add all the overhead of the cloud processing, et cetera, that wouldn't be super quick. And the alarm is something critical that you want to detect right away. But a well-trained neural network won't fit on the device. When you properly train a network to detect several types of glass breaking, like with all the variation, et cetera, you end up with a very big neural network. So you kind of really feel it on the device. So we are using something called knowledge distillation. Knowledge distillation is using a smaller model based on the big model that is trained on the big model. And it's using a student-teacher model. So you can think of these small neural networks running on the device as kids that were taught by a bigger neural network to actually detect those type of sounds. As you can see here, we have a specific type of neural network called LSTM, long, short-term neural networks, or memory networks. And these are specifically useful for detecting sequences, sequences in time. That's what you will see LSTM all over. So it's long, short-term memory to be more accurate. Okay. We are also innovating, making Alexa more natural across several areas. Most people correlate this to Alexa sounding more natural, but again, it's only part of the equation. There are a lot of things where you can be more natural, especially in the interactions. I will give you one example here, which is this. Usually people can't remember invocation names. Invocation names is how you call a specific skill within Alexa. For example, if you want to get an Uber, for example, you will say, Alexa, open Uber and get me a car, for example. But people have to remember the name of the skills and the evocation names of the skills. So discoverability is a challenge. Okay. And we are making skills, we are trying to make skills more natural to discover. So if you take a look at this example, where the user is saying, Alexa, get me a car, that could be interpreted in many ways, right? You might want an Uber, as I mentioned before. You might want to buy a car. You might want like a toy car for your kid. There are many ways in which you can interpret that. So Alexa has to understand what you really need. In order to do that, Alexa is using this method that is also used in machine learning to actually create a contextual ranking of possibilities of what you really meant and go with the best option. That way, when the user says, get me a car and the context is right, Alexa will directly call the Uber skill and get you that car at home. Right? Obviously the right contextual signals have to be there. But the point is that you are speaking more naturally if you just say, Alexa, I want a car. Alexa, get me a car rather than remembering which skill you have to open, et cetera, et cetera. Okay, so let's move to the next one. Alexa now also sounds more natural. To give you an example, I'm going to play a sound of how Alexa used to talk back in 2014 when we launched the device using the Echo device, using concatenative text-to-speech technology, which is a technology that just puts together small pieces of audio to actually create the audio like a puzzle. So this is how it sounded in 2014. Let's hear it. Star Trek is an American media franchise based on the science fiction television series created by Gene Roddenberry. We are now using a different technology that we call Neuro TTS. And we are using neural networks to model and generate Alexa's voice from scratch. We are not using concatenation of small pieces of sounds anymore. So this is what Neuro TTS, coming from Alexa in English, sounds in 2020. Let's hear it. Star Trek is an American media franchise based on the science fiction television series created by Gene Roddenberry. Okay, as you can see, sounds much more natural. For those of you that were paying attention when Alexa said Roddenberry in 2014, there was an artifact there, and now it's not present anymore. This technology in Neuro TTS is revolutionary, and let me explain why. Why this is making a big, big impact. When you use deep neural networks to power Alexa's voice, you can model specific speakers. For example, if we consider AWS, there's a service called Poly. And 27 neural TTS voices sound just like this in terms of how natural they sound. And depending on the context of your request, Alexa can adapt to a specific, or learn a specific speaking style. And this is something that I wanted to show you, which I found really interesting. For example, what if we train these voices based on someone that is speaking on TV, saying the news, for example, reading the news. The news casters have very specific kind of styles of speaking. So let me show you, we did that. We trained a type of voice that we call news caster style. And let me show you how it sounds, which is quite interesting. Let's hear it. San Francisco's $2.2 billion grand central terminal open. Spanning three city blocks, San Francisco's $2 billion transit terminal is open. This is after nearly a decade under construction. Coined the grand central of the West, the new Salesforce Transit Center is located near the heart of downtown. It is expected to accommodate 100,000 passengers each weekday. That's up to 45 million people a year. It welcomed its first passengers and visitors this month. As you can see, it's using all the different details of sound talking, saying the news, right? So thanks to this technology, AWS is now offering a service called Brand Voice. Brand Voice is a service where you can work with AWS as a company or as a brand to actually come up with the voice of your brand that is completely artificial. You know, it's trained using neural TTS and that you can use as an API by passing text and that text will be converted into natural sounding voice. And one example is KFC, Kentucky Fried Chicken. Kentucky Fried Chicken worked with a brand voice and created Colonel Sanders, the typical Southern accent voice, and this is how it sounds. I just wanted to show it to you. Hi, I'm Colonel Sanders, the founder of Kentucky Fried Chicken. Let me tell you a joke. What do you call a chicken crossing the road? Pull tree in motion. Got it? Pull tree in motion. Food for laughter. This is quite interesting because this is a plain text that is passed to the vocoders. It's actually SSML and this is 100% artificially generated speech and it sounds very natural and you can use it whatever you want. You can pass any text to it, so Colonel Sanders would be saying whatever. Obviously KFC can do that, but as a brand, you know, you can tap into this technology today. So let's continue with voice generation, which is one of my favorite topics. As Alexa continues to become more natural to interact with, we wanted it to go a step further. We wanted to take it to the next level. When we speak to another person, we get both verbal and nonverbal cues and adapt our responses accordingly, right? So later this year, Alexa will adapt her responses based on the context of the conversation by adjusting the tone of the voice, stressing certain words, adding poses, and even breaths. So it will sound much more natural than what you heard so far. Let's hear a short sentence without, we call this speaking adaptation or speaker adaptation. Let's hear a brief sound that doesn't have speaker adaptation and then compare it to how it sounds with speaker adaptation. Here goes the one that doesn't have speaker adaptation. Very happy to help. Let's find something you like on Amazon Music. How about something from the 80s, living on a prayer by Bon Jovi? Does that work? So this is the same audio with speaker speaking adaptation. Very happy to help. Let's find something you like on Amazon Music. How about something from the 80s, living on a prayer by Bon Jovi? Does that work? It sounds much more natural when you look at the final question. For example, it sounds really, really natural. So if you can model a speaking style, why not try tomorrow a celebrity voice? That's exactly what we did last year with Samuel Jackson with his permission, obviously. But we not only recorded audio snippets of Samuel's voice and his laughs, for example, but we also generated a vocoder. A vocoder is the engine that allows you to pass text and output voice, right? Obviously using neural networks and trace with deep neural networks. So we trained this vocoder, and this is similar to what you just heard with Coronel Sanders, for example. And let me show you that we introduced that feature as a skill. Basically you had to say something like, Alexa asked Samuel for the weather, and then you will hear Samuel Jackson saying something. But some people felt that they wanted more direct access to Alexa by, for example, saying, hey, Samuel, what's the weather like? Or something like that. That's exactly what we did. And that came with several challenges because that meant that we had to support a special wayward or a new wayward besides Alexa or whatever. We had to support hey, Samuel. And that's not trivial. Let me show you a video about how this sounds because it's really, really natural. It feels like Samuel is there. So let me show you one video here. Alexa, introduce me to Samuel L. Jackson. Allow me to welcome actor Samuel L. Jackson. Alexa will make sure you're good with the purchase. Then Samuel will ask about explicit content and if you want to add, hey, Samuel, as a wake word. You can also say, Alexa, ask Samuel if you prefer. Let's try it. Hey, Samuel, what's the weather? No, I won't get you to motherf***ing weather. I'm just kidding. Let me see. Tonight's forecast calls for showers with a low of 52 degrees. You okay? Samuel L. Jackson just told me the motherf***. What's going to say? Now, remember, it's the real Samuel L. Jackson. It's not going to walk in. It's just his voice. So this is exactly how it sounds. It's really amazing. And as I said, hey, Samuel, it's a new wake word that has been incorporated and will be available later. But now that we are speaking about multiple wake words, let me tell you about the voice interoperability initiative because this is the perfect time to discuss this. We had experience and context for supporting multiple wake words. For example, multiple car manufacturers today support both Alexa and a custom wake word in parallel. So this initiative tries to bring together multiple companies to democratize Asian invocation on the devices. And Jeff Bezos himself said this about the initiative. Multiple simultaneous wake words provide the best option for customers. Phrase by phrase, customers can choose which voice service will best support a particular interaction. It is exciting to see these companies come together in pursuit of that vision. So we have more than 30 companies right now in this initiative, including Microsoft, Facebook, Intel, Sony, car manufacturers, et cetera, Bosses, Sonos, et cetera. And I can't wait to see what comes out of this because there's a lot of people working together to make this like a joint initiative, which is really interesting. So let's move to the next one. Let's change topic, and this is the final topic of the talk, which is conversational AI, which is definitely my favorite topic, top of the list there. How do we go from transactions, like Alexa turned on the lights, to conversations? That's something challenging, but we are paying a lot of attention to. So let me show you, let's take a look at how a conversation can begin after a transaction, and basically thanks to Alexa being more proactive. Let's take a look at the video and then comment on it. Alexa, wake me up at 6 a.m. Alarm set for 6 a.m. tomorrow. By the way, your living room light is on. Do you want me to turn it off? Yes. Okay. So we call this analogy hunches, which leverages a follow-up candidate to make complete and helpful. But this is the beginning, okay? We can take it to the next level if Alexa can participate, anticipate the customer's latent goals. And that is what the customer really wants to do, because the customer might start asking for something specific, but what they ultimately want to do might be something that maybe Alexa did not detect. And we are working really, really hard on that. So let's take a look at this example. In this example, someone is asking for the weather on a specific beach, okay? And now Alexa is proactively saying, okay, do you want to see the live webcam of that beach? And that was a proactive suggestion by Alexa. This seems very trivial, but actually there are several machine learning models working behind the curtains to make this possible. And this is something that is coming in Alexa. So the first time that we showed latent goal detection was during Remarse. In that event, we showed a technological Alexa conversations. And in that opportunity, we showed an example where the customer wants to plan a night out. The customer might ask for movie tickets, might ask for a ride, might ask for a table reservation, but what the user really wants to do is plan a full night out. Let's take a look at the actual demo from Remarse, but I think it's really, really interesting. And as you watch it, keep an eye on how the logos in the top right of corner change, because those are multiple third parties are not native Alexa experiences. They're multiple third parties skills that are being brought automatically into a conversation. Let's take a look at the video. What are the showtimes for Dark Phoenix on Saturday after 9 p.m.? Here are the showtimes for Dark Phoenix on June 8th after 9 p.m. Give me two tickets for 10.35. Okay, two adult tickets for the 10.35 p.m. show of Dark Phoenix on Saturday, June 8th at Regal Cinnabar Palace Station will be $21. Should I book the tickets? Wait, how long is it? Dark Phoenix is one hour and 52 minutes long. Actually, what are the showtimes after 7 p.m.? Here are the showtimes for Dark Phoenix on June 8th after 7 p.m. Give me two tickets for the 7.05 show instead. Okay, two adult tickets for the 7.05 p.m. show of Dark Phoenix on Saturday, June 8th at Regal Cinnabar Palace Station will be $21. Should I book the tickets? Yes, please. You will receive an email from Adam Tickets with your tickets. Will you be eating out near Regal Cinnabar Palace Station? Yes, find me a Chinese restaurant. Here are some Chinese restaurants near Regal Cinnabar Palace Station. Tell me more about Mott 32. Here's more information about Mott 32. That looks good. Give me a table at 6 p.m. Okay, you'd like a reservation for two at Mott 32 at 6 p.m., should I book it? Sure. You will receive an email from Open Table with the details. Would you like a cab to Mott 32? Yes, get me an Uber. Sure, I'll get you an Uber X for two from home to Mott 32 which will cost about $8. Is that okay? Sounds good. Your Uber X will pick you up at 5.42 p.m. Anything else? Show me the trailer. Now playing the trailer for Dark Phoenix. Ladies and gentlemen of NASA, this is Charles Xavier. Help us on the way. So, as you can see, Alexa proactively suggests next actions. As it is understanding the overall intention of the user to plan a night out. It also carries context. For example, it doesn't forget what the user said when the user asked for the trailer. It remembers the movie. And this is very interesting. I wanted to show it to you because Alexa Conversations is a paradigm shift in how we build Alexa skills and allows us to see a little bit of the future today. We're almost done here. But what if you're having a conversation not with Alexa but with a friend and you still need some assistance? Can Alexa join the conversation and blend in? Two years ago, we took a step toward making Alexa more conversational in something we call follow-up mode, which is opening the mic after each interaction so you could continue the interaction. This mode lets customers make back-to-back requests to Alexa without needing to repeat the wayward. You don't have to say Alexa all the time. Today we introduced natural turn taking, which is the ability for customers to speak to Alexa without using the wayward. I will show you a video so it's much easier for you to understand. And this is the last video and then we close the session. I'm getting hungry. How do you feel about ordering pizza tonight? I could definitely go for some pizza. Awesome. Alexa, join our conversation. Okay, let's talk. I want to order a thin crust specialty pizza from Mike's Pizza. Today Mike's Pizza has the following specialty pizzas. Three cheese delight, meat lovers, veggie supreme, pepperoni party. Okay, what size? Medium. Okay, so I have a medium, thin crust veggie supreme. Do you think a medium is going to be enough? I'm hungry. I'm starving. Change that to a large pizza. Okay, so I have a large, thin crust veggie supreme for $18.75. Would you like to place the order? Yes. Okay, completing your purchase with Amazon Pay. You'll receive an email confirmation from Mike's Pizza. You want to watch a movie while we wait? That sounds good. Let's see what's playing. What movie do you recommend? How about The Goldfinch starring Oaks Feigli? Who directs it? The Goldfinch is directed by John Crowley. What do you think? Isn't it a good movie? The Goldfinch has a rating of 6.3 on IMDB based on 14,000 votes. Play the trailer. Playing the trailer now. Okay, so we have to solve a lot of problems here. Like, for example, are people speaking to each other or are they speaking to Alexa? I'm so sorry to interrupt you. I know you're wrapping up but I need to accelerate you even more because we really are coming to the end and I've got a few questions for you. So, 10 seconds. Yeah, going straight into the final thing. This is the AI stack, right? This is where we are right now in 2014. Phenomena modeling. We are going into intelligent decisions in 2020 and we are hoping we can get into reason and autonomy later but we are not there yet. Just to give you an idea of the current state. Let's go straight into questions. Fantastic. First of all, Hedman, thank you so much. That was a very entertaining talk. I thoroughly enjoyed that. Except for the bit where we see Baby Shark being played in a car. I have to tell you, if Alexa ever plays Baby Shark while I'm driving it's going to learn really quickly that that was a bad idea. But apart from that, apart from that, I really enjoyed it. We started getting into conversations towards the end of the talk. So, I'd like to ask you where is the next level and do you envisage a point where we might be talking to a device such as Alexa not because we have a specific goal, maybe just because we want to be entertained or stimulated or just engage in some form of conversation? I don't really know what the direction is going to be but I think that we have this mission that technology should blend in into what the customer really wants to do. So, the point here is that technology becomes kind of invisible. So, I don't see interactions where things become annoying or depends on customer needs, actually, to be honest. It depends on what their customer really wants, ultimately. So, if that's what you want, it's possibly what's going to happen. But in general, we want technology to be behind the curtains, helping you out. Okay, we have a question from Alvaro and it's all about privacy, which I guess we couldn't escape talking about privacy in some regard. So, can you expand a little bit more? A lot of people are worried about the concerns with privacy with devices such as Alexa listening in on what we're doing? Yeah, I'm not the right person to answer those kinds of questions. They are very sensitive, obviously. But in general, we take privacy very, very, very seriously. And one example that I can give you about that is that today you can talk to Alexa and you can say, Alexa, delete everything I said. For example, that's one of the things. If you don't want to basically have a small fraction of 1% being tagged, for example, to allow Alexa to learn, which is basically what's happening ultimately. So, I think that people shouldn't be concerned, but I shouldn't be saying that. Basically, the customer rules and the customer decides if they want to use this type of technology or not. But we take this very seriously and we are adding a lot of things to keep the customer's peace of mind. Absolutely. A very technical question. What is Amazon doing to mitigate the problem of attribute inference from a voice extract? Attribute inference from a voice extract. If you're not sure about the question, there's an article here that has been referenced in the question. So, perhaps what we will do is ask the questioner to contact you directly via the chat. Can you send that question directly to me? I will ask the person to give me a chance to read the article, but we can continue on Twitter if that person is interested. Okay, that would be great. So, we're nearly up with the time. So, a final question you mentioned that every 10 years there's a paradigm shift in terms of input. So, I can't avoid asking you where are we going to be in the next 10 years? What will be the next paradigm shift? I don't know. My opinion is that people really want to be, I mean, I don't know how to say it, but people want to make things easier all the time. And when you make it easier for people, maybe it could be even easier than that, you know? So, we went from keyboards to touch screens, it became easier. We went to touch screens to voice, even easier. And I don't know what people are going to ask, but probably something that you can control with your mind or something like that. It's just thinking. That seems like the next step. Yeah, just thinking. But I don't think we're anywhere near that yet. Okay, we'll have to wait until big things 2030, I guess, to find out. Absolutely. Well, Germán Vescusov, thank you so much for that amazing talk. Thank you.