 Hello my craft friends. I'm Josh Montgomery the CEO here at what? I'm the CEO here at my craft AI and I'm happy to be coming to you live from our lovely offices in Palo Alto, California where we Raise money and try to run the business operations here at my craft. So I'm glad to have the opportunity to come and speak Today I think the topic is the future of voice Which is a fun topic to talk about So three years ago when we started the project it became clear to everybody involved on the team that There would be an opportunity in the near future to combine technologies that use machine learning to create experience for users that is indistinguishable from a human and and A couple weeks ago or maybe a couple months ago now Google did a demo at Google IO where they demoed their Google duplex technology and Prove the thesis for those of you who haven't seen that demo we'll put a link in the in the below but the Google duplex demo is a Artificial intelligent voice assistant making outbound phone calls to restaurants to book a table and to a Hair appoint a hair salon to book a hair appointment and Interacting with the person answering the phone so naturally that the Representative of the restaurant or the hair salon wasn't aware that they were talking to an AI Now these demos that demonstration was really narrow and by narrow I mean it didn't have a broad domain. They weren't going to discuss the application of Nietzschean philosophy to the Trump administration The only thing they were talking about was setting a hair appointment or working with the restaurant and booking a table and When you see that demo you can very clearly see what the future looks like Across the entire domain for voice and I would argue that it's the same for video that if you I recently saw one where I think it's peel or key key or peel whichever one it was voiced over the the An avatar that was based on a video from about President Obama Where when you looked at the at the video it was mimicking his actions and because the the artist was able to mimic the voice It sounded all it sounded just like the prior president of the United States And it was kind of a demonstration of what artificial intelligence is moving towards in terms of being able to mimic a human being and That's really where we feel the voice assistant space is going Both in voice and then of course also with video rendering The ability to combine those two things into an experience that when you get a video call it might be me with my with my ukulele or It might be a complete artificial intelligent assistant and you won't be able to tell that the person on the other end of the computer isn't I'm not gonna say not a real person because if you can't tell it's a person it actually is a person, but not a human person and So we saw this about three four years ago when we started looking at speech synthesis when we started looking at voice recognition speech recognition we started looking at a variety of other different pieces of technology and We set out to build an artificial intelligence that runs anywhere and interacts exactly like a person The idea being that you could run a full artificial intelligence stack on anything That's powerful enough and not network connected or that's low power and network connected with a lot of the intelligence operating in the cloud run it anywhere run it on your your Audio amplifier right for your guitar run it on your car run it on a laptop run it on a mobile device and be able to have a conversation just like that My Assistant and I have a conversation. So, you know in my case I pick up the phone and I say hey, I want to meet with my good friends at Mozilla Next week. Can you reach out and send them an email and schedule a meeting for me? And that's all I really have to do my assistant Johnny Some of you may have interacted with him Has access to my schedule has access to my email has access to my contacts list He reaches out in some cases It'll be directly to the person involved if it's maybe a developer in the case of maybe the CTO Sean white he would reach out to Anna Sean's assistant and offer up a couple of times they would go back and forth and schedule a meeting But the important thing there is that my interaction with Johnny is totally natural And I don't have to tell him every step of the way exactly what to do He has enough intelligence and he has enough access to my life to be able to schedule that meeting on my behalf And that's really where this entire space is going And it's going to dramatically impact business certainly and and that's the I think that the business is maybe the least Creative application of of this technology. I think that there's a ton of really cool things that are coming down the pipe But let's start with the least creative most obvious, right? So in google's case they built this duplex technology and when they brought it to stage the idea is that the google duplex Then calls a real person at a restaurant and schedules a table Okay, that is not the long-term application for this technology The obvious application is that the person answering the phone is an ai And in some cases it'll be an ai calling But in many cases it'll be a human calling and that the person at the other end of the phone the person who initiated the call Doesn't know that it's an ai on the other side Um certainly for inbound calls now admittedly I think telemarketers are going to make Wide use of technologies like duplex to trick you on the phone I'm sure that scammers will use some more technologies in the future But the obvious business application Is being able to accept thousands and thousands and thousands of inbound calls using an artificial intelligence Without having to pay people to sit at a desk and answer The same questions over and over and over and over again all day long Other business applications include building technologies like cars building technologies like electronics that you can speak to naturally In some cases those electronics will have the personality of the brand that built them One of the things we learned during our integration with jaguar land rover was that our friends at jaguar Were really interested in having a voice assistant that represented their brand So a voice assistant with a british accent I think the exact quote that I got from jaguar was They wanted it to be sassy and sarcastic Right. So something that is truly british like think uh john cleese right and could represent their brand globally And I think that that jaguar is one of Thousands or tens of thousands of companies They really have an opportunity to build a voice assistant or build an ai That represents their brand and interacts directly with their customers Reduces their costs their personnel costs associated with with having a relationship with those customers And then of course improves the customer's experience that when a customer calls in and says hey, you know, I just bought a I just bought a car and You know, I'm 3 000 miles into it and I need the oil changed. Can you schedule an appointment? Um, that's all they have to do that the voice assistant has access to their calendar has access to jaguar's calendar schedules of time maybe A jaguar rep comes out and picks up the car, but the user experience is you know, you say what you want and it just happens yes and That's going to happen across a variety of different industries call centers Obviously automotives of big industries smart speakers are a big industry I think that voice assistants integrated into individual apps are coming certainly My guess is is that brad and his team over at google are working on how To pass voice queries into applications and how to query those applications in a way that google's assistant Can act as a front end to any application now I would also suspicious Suspicious in a guess that they will design that system in a way that favors google and not necessarily the person who wrote the app but I think that that type of integration is definitely coming in the business world and That's going to cut costs. It's going to eliminate a lot of jobs that Are low value and let me let me really careful how I say that because I want to make sure that the people understand All labor has value if you go to work and you work hard by definition it has value because people pay you for it But it's important all work is important whether it's you know work to create dignity for the worker whether it's work that Provide services for end users when I say low value work What I mean is work that doesn't fully engage human intelligence The voice assistants that I see in the near term. So I define near term as between now and the time My daughter reaches 30. So my kid's 10. So the next 20 years or so Will be very limited. I think in terms of their ability to creatively solve problems so And there's some really good examples of this so x.ai is the company that creates a An assistant that provides a support for email scheduling. So going back to that task that I talked about with johnny where i'm simply trying to schedule a meeting with with Sean over it at at mozo In that case When people use x.ai for tasks like that many many become frustrated. So that's a very simple narrow application With it doesn't involve a lot of creative problem solving I mean you're effectively trying to match to schedule something that you can do with a pair of arrays and maybe three lines of python But because of the complexities of human language because of the complexities of availability because of the complexities of geography It's very difficult to solve that problem comprehensively in a way that provides an excellent user experience to every person who's engaged in the activity, right So given that you know the idea that a voice assistant is going to be able to solve a problem Like congress can't get anything done is ridiculous and the idea that some of these technologies will be able to to Be creative so create art write music Um, it performs some of the other actions that the humans are are good at and engage in and and are successful in Um is also I think far too Right. Uh, they already have a i that can write music. They already have a i that can write poetry But they don't have a i that can write music like well They can write music that's as creative and new and novel as the opening bars of star wars In 1977 You know something where john williams just blew the audience away with a melody that's familiar to everybody today Because it is so unique and powerful That type, you know, yes an ai could write something that sounds very similar to that write 50 different versions that sound similar to that But to create it in the first place is something that's uniquely cute And so I see the future of voice at least in the next 20 years as filling out this void in the in the current job market in the current labor market and certainly in the current You know in services that are provided for just regular everyday folks As reducing redundancy Reducing the amount of low value work. So non creative work that needs to be done by people And freeing those people to engage in work that is higher value that is more creative that allows them to Spend their time doing what humans are best at which is inventing and solving problems And so You know, we're at the very very beginning of this You know, I I think that that one of the things when I hear people who are critical of Voice technologies today and and say thing. Oh, that's not natural or whatever Look back four years and none of this stuff even existed at all Like the the voice assistant stuff was this simple voice command thing that That misunderstood almost everything you said right in many cases The voice companies were running these technologies on chip And because they didn't have access to or weren't using the latest and greatest machine learning models It was very difficult for them to create a user experience that was in any way realistic And over the last four years last five years since I would argue Siri was the first widely adopted version of this But of course it was followed very quickly by a google's assistant and then by amazon who took the lead By embedding it into a very capable smart speaker Over this time over the last four or five years, it's really improved substantially to the point where Individual users of the technology Really can have a fairly natural experience for a limited tasks So the question becomes is where does it go from there? And the answer is it becomes more and more natural and more and more customized And then finally more and more capable of solving real problems. I think is the third stage so And in terms of making it more and more natural That's primarily a function of data you know these technologies are based on Huge data sets that are fed into machine learning systems that spit out models that are able to solve problems using this very modern approach to data processing And the more data these systems acquire about end users how they're using it the types of questions that they're asking The more capable they will become And so You know since these technologies are fairly new. I mean google. I think just started shipping the google home less than two years ago You know today they don't have Huge value the huge volume of data that they eventually will have about how these customer interactions function They're just now for example beginning to put in place the ability to call and respond. So, you know Set it to you know a microsoft set a timer Okay, how long would you like for that timer to be that's a a call and response with me asking a question Minecraft asking for clarification me providing clarification That level of interaction is just beginning to exist now And of course there are much more complicated human interactions that may involve three or four people in a conversation That type of interaction and that type of natural ability to carry on a conversation I think is where a lot of the initial work Is going on is going to with The next piece is making the technologies more useful And I think that xai is a really great example here of a company that's doing really great things In a very narrow space that will work extremely well with future voice assistants So in xai's case their their ai one of them is named Amy I don't remember what the other ones are called That schedules meetings Would be able to plug into your calendar on one side and your voice assistant on the other and become the default destination for queries about your schedule. So you would say Hey, my craft set me up a meeting with Sean white at mozilla. That would trigger x.ai Right, which would reach into your ad address book and say, oh, I have Sean white's information. Hey, it turns out Sean white Has an assistant named Anna and I'm going to contact Anna instead of Sean And so there the voice assistant has made or the x.ai's technology Has made a decision that the best way to contact this person is through their assistant and not through Sean, right? Um It reaches out and touches touches Anna's email and says hey Anna, you know josh will want to come every once A meeting can we set something up? There's a back and forth in terms of availability and then the meeting gets scheduled on Sean's schedule Provided he approved it Without any interaction from me as the initiator or in this case Sean is the the person I'd like to meet with Now admittedly if it went straight to the person I was having the meeting with they would have to have some interaction But in this case the voice assistant and x.ai system end up as a buffer between me as the user And Sean is the as the person I want to meet with and from my perspective. That's a very very natural interaction, right? I say hey You know my craft set me up a meeting with Sean white sometime next week And then you know a couple hours later a day later on my calendar boom is a meeting with Sean Another step for that interaction might be an x.ai importantly there would be the intermediary We at mycroft don't have to develop all of the artificial intelligence surrounding Interacting with the calendar systems accessing contacts like coming up with the logic surrounding when to connect To the person's assistant as opposed to when to connect to that person We simply provide the voice front end and x.ai is a plugin that adds intelligence to the voice assistant and so I see the future of this as various different companies providing utility across all of the voice assistants not You know, I would argue that mycroft is probably most people's not most people's first Choice for you know the front end for their voice assistant today Now I I think that might change over the course of the next five or ten years if we're successful and continue to grow But today, you know most people probably start with amazon alexa Or google home, you know, if they're in the chinese market, they might start with genie over it over at alababa And so or viv over at samsung If I don't know if this is customizable I would get hazard a guess that viv viv to the one they're working on and say how ze will be but Which is kind of set that aside People will go and build a skill for one of these voice assistants and then hopefully You know, we can end the platform wars at some point and start making these things interoperable So the idea would be that if you provide a scheduling skill, you know, it works with alexa It works with assistant. It works with mycroft. It works with viv. It works with genie and you as a Is basically an intelligence provider as a service provider Are able to focus on just what you do and in this in the case of this example It's x.ai scheduling meetings, right? And the platforms become a way to distribute your product or distribute your intelligence to the end users and And you know, there's probably some way to get you paid to do that Whether it's, uh, you know, you charging the users directly or some kind of an app store sort of integration But but that's the whole idea and so You know, the first step becomes the naturalness of the integration The second step becomes utility so stuff like Scheduling meetings certainly tons of iot stuff like lights and locks and heat and you know All of the things that people are beginning to build into their home and connect to boys And then I think the final step is creativity right and And I think that the first versions of the creativity for the voice assistant will be very similar to Uh, the type of creativity that our good friend will Wheaton Brought to the the book ready player one Right, so for those of you who I love the book personally love the audio performance, you know Will Wheaton, you know went into a studio and read that book and and if you listen to it you can hear That as an actor, right? He changes the tone of his voice. He changes the inflection of his voice He changes the pitch He does, you know, kind of pseudo imitations of the characters that are speaking so that When h is talking about, you know, killing bad guys and first person's A first person shooter and Parcival is talking about pursuing the egg Those two characters have different voices within the book depending on whether they're quoted and so, you know Will and a and a voice engineer probably spent significant amount of time in a studio somewhere With will developing voices and developing imitations for various different quotes and the voice engineer processing the audio and putting together what's effectively a very creative performance of uh earnest client's novel and That's the type of creativity. I think that will first come to voice So the ability to drop a book into a voice assistant and spit out a Audible quality performance of that book I think is something that's coming in the future now. I don't think it'll be here next year I don't know. Amazon spent 23 billion dollars on research. Maybe it will be here next year but I think it is coming in the next five or five or ten years I think that you will be able to drop text in whether it's from a news article or whether it's from a complete novel And these voice assistants or these voice rendering technologies will be able to render them in a very creative way Uh, I also think that avatars are coming to this space And I could be proven wrong. I mean, you know, the one of the things we always talk about here is don't be clippy Right, like don't be clippy like we we actually are very much in danger of that because we do have the cute little mark one device that could be clippy if we if we do it wrong and uh You know, I thought that Microsoft's experiment with that avatar Um, I don't know. I think there are probably people who liked it But you know, if you listen to the wait wait, don't tell me performance where clippy suggests where to dig the grave you can Definitely understand what a lot of the public thought about about clippy on windows xp. I think is when it was launched You know the the But I think avatars are an important part of the future and I think that that key and peel Uh demonstration where they showed, you know, they track the facial motions of the actor They voiced over because voice technology isn't ready to do that type of thing yet. It's getting close Uh, and we're able to make an avatar of president obama say whatever he wanted to say, right? Um, I think that is coming with the entire interaction being synthetic So the ability to create a photo realistic rendering of the user With a voice that sounds totally natural with an interaction. That's very very natural I think we'll have a lot of value for a lot of folks and and the original applications there might be Sales videos. It might be like customized nightly news from your nightly news anchor I think that there's certainly an aspect of assistance there So being able to go to my computer and say hey my craft and have a little british Guy pop up on the screen and you know, what do you need me to do and and that type of thing? Don't be clippy, right? I think that that is coming and you know, I think that people like to see Who they're talking to and I think voice is just the first step towards these virtual people Right that we will be interacting with all the time. Some of whom will be experts for their brands Some of whom will be your representative on the internet Some of whom will be uh, you know generic brands that that are generic assistants that encompass the entire internet And uh, I think that those interactions will begin to get fairly natural And then I think that the final step of this is creativity I think that soon um, I think before I die certainly well unless I'm destined to die tomorrow um I think that the these voice assistants And the technologies that underpin them will will start being able to create and truly creative endeavors Be able to make new music right being able to improvise and work together with musicians to play music that Is new and novel and to create the types of experience that professional jazz musicians can create on stage I think that it's relatively actually, you know what like like Hollywood already has I think it's called um something about a squirrel But hollywood has a book on how to write screenplays that's formulaic Like you're going to spend one minute here and one minute I mean, that's why all the movies seem the same right like I want act two act three We already have a formulaic method of creating scripts for movies um, I think that the ability to add avatars that are photorealistic along with Voice experiences that are very realistic along with creativity mean that 30 years from now You may be able to render an entire movie just by building a bullet point script Right like here's 12 bullet points Render the movie for me and have a have a have a two-hour movie drop out of the Out of the computer with all of the background scenes and everything all cut. I think that that's coming Um, certainly it's coming. I don't know if it's coming in 20 years or 30 years or 100 years But it's definitely coming Unless the world ends which is a real possibility. Um, I for one welcome our robot overlords if anybody's indexing this for the future Uh, so that's really where I see things. I see, uh, a really bright future for the technology I see a lot of jobs where people aren't able to Engage an activity that takes advantage of their full human potential I see a lot of those jobs changing substantially in the future so that when An actual human is on the telephone for a company. They're not Running through the security verification. They're not doing some of these other low value activities They're really helping you to solve problems or creating deeper relationships with their customers um, I certainly see improvements in The overall stack getting to the point where people are having difficulty Telling whether or not they're talking to a person that's already happening And it is I think that that trend is certainly going to accelerate And then long term. I see these things being integrated into avatars. I see creativity I see these things being hyper hyper natural and how they interact with other people to the point where We as a society or we as a as a species achieve our goal, which is At least our goal here at microsoft, which is a voice assistant that interacts naturally and And runs anywhere that it's a voice assistant that's so natural or it's an avatar or it's an ai That's so natural that when you speak to it, you can't tell whether you're talking to a human or a machine That's where we're headed with this and And unlike some of the other technologies out there We're trying to do this in an open way The idea is that anybody can pull down the stack Put it to use for business put it to use personally put it to use as In a classroom put it to use in a movie studio put it to use anywhere that they choose to use it in a rocket ship, right? And create a user experience that's hyper natural so that the people the humans around them that are using the technology can focus on Doing what they do best being creative and problem solving. Well The ai and the voice assistant deals with the mechanical every day every day of human life, so anyway, if that type of Goal excites you. I encourage you to join our community. You can always Find us on our website We have Very active forums where people are engaged. We have a very active matter most channel Where folks are always online talking about Minecraft talking about the voice space talking about all of the things that are happening here We currently are raising money on start engine Where you can become an investor in voice We're one of the only as far as I know voice platforms that's created a mechanism for average everyday people for people who are contributing to the stack to become investors and then of course you can buy a mark one and Join the Minecraft community as a developer or just as somebody who's donating data to the system so that we can improve it over time We'd love to have you. We're always looking for more folks and we really appreciate You engaging with us and engaging with Minecraft AI. So thank you so much for your time today and I will will I'm sure be online soon. If you have any questions feel free to to reach out to me directly through the my matter most channel or Certainly post on the forums. I'm happy to happy to engage in a conversation. Thank you so much