 Okay, Laurent Picard is from Google. And yeah, let's take it away. Okay. So can you start your screen sharing? Yes, all done. So hello, everyone. Thanks for having me today. I'm going to hide my own window. Okay, quick introduction. So my name is Laurent Picard as you can tell I'm French. I'm actually based in Paris. And my background, I am an e-book pioneer. So I've been working in the e-book industry for 17 years, 20 years ago. And for three years now, I'm focusing on cloud technologies with Google Cloud. Okay, unfortunately, I cannot see you and ask you questions. I very much like to start with this quote from our clock because it really shows the feeling I have whenever there's something new done with machine learning. And still after a couple of years, I feel magic, honestly. But this is just technology and your scratch a little bit. This is just technology. And we can all understand what's behind it or have a pretty good idea. And my goal today is to maybe scratch a little bit behind some stuff you haven't seen. I have my own definition of machine learning. That's a weird one. For me, machine learning is solving solutions where you have data, right? You have data and you want to understand what's in your data. You want to extract information out of your data. So this is my personal definition, but it's an incorrect one. The real definition is that machine learning is a part of AI. And within machine learning, you have deep learning. So most of the stuff I'm going to show you today is actually deep learning. But for simplicity, for the sake of simplicity, I will be mentioning machine learning most of the time. So how does deep learning work? So first, the expert started to work on the field 40 years ago. Last year, they actually got the adventuring price award for that. It's like the Nobel Prize for computer science. And they thought at the time, OK, let's try to mimic the way we think our brain works with neural networks for that they needed many examples. And the magic here is that they managed to solve problems. And we don't know exactly why or we don't have the answer, the systemic answer to solve these problems. But machine learning is now solving these problems where we couldn't solve them before. Why does it work today? So first of all, we are inheriting from centuries of science and in particular algorithms a lot from coming from mathematics and physics. For a couple of decades, we now have everything we need for big data. We're able to store data. We're able to consult a lot of data now thanks to computers. And for most now, for let's say a few years or one decade, now technology and especially cloud technologies give us the computing power to do everything. Of course, personal computers, laptops now have an amazing computing power. But cloud technologies now allow you to go to the next step and do stuff in hours or days where it would take weeks before. To give you an idea, so I'm going to talk generally about machine learning possibilities. But to give you an idea about how much important that is at Google, so those are the numbers of projects. So it's a couple of years back which have a machine learning model in their projects. And you've seen some of them as results. So for instance, in Gmail, when you start to type a sentence, you have a suggestion to end the sentence. In Android, in the late version of Android, there is a local customized machine learning model learning from your habits and optimizing the battery life. And in Google Photos, maybe you've tried that. If you say, OK, this is my kid on one picture, it will find a match of your kid on all other pictures. But even 10 years back or two, so it's very amazing technology. There are three ways today that you can benefit from machine learning. Of course, if you are an expert, then you know a lot about it. You're dealing with neural networks. And I hope you will learn a few or see a few things of interest for you in this talk. But if you're spending most of your time developing solutions, then maybe you don't have the expertise to deal with machine learning. But it doesn't matter. You can maybe use existing machine learning models. They are available through APIs. They are ready to use models. And in between now, since a couple of years, there are auto-email techniques. So it's filling a big gap. You still don't need expertise. But you can automatically build customized models for your own needs. And the purpose of this talk today is to give you a quick overview of everything you can do with these two types of technologies. So first, the machine learning APIs. So if you remember my own definition of machine learning, it's solving solutions from data. And data here can be text, pictures, videos, or speech. Then you need models. And from that, you can extract information. And sometime, the result you want is your input transform, transcribe into something else. Right. Now, let me start with the vision model. So I really love this kind of model because in the 90s, I was a student. So we were not talking about machine learning at the time. But I was trying with other students to solve the problem of understanding what's in a picture, understanding the content of a picture to automatically detect stuff. And at the time, we were just trying to detect edges. And it just failed miserably because we could do it on a few pictures. And then as soon as we would bring something new, then it would fail. It would not work anymore. Machine learning is the solution now. Provide a picture to a machine learning vision model. First of all, it's able to give you labels to describe the picture, what's in the picture in general. So here, this is a picture about Hobbiton. So Hobbiton is the place in New Zealand where the Lord of the Rings movies were shot. And this picture, so this is on the right, the JSON stream that I get from the API. And it tells me that at 95% of confidence, it's about nature and so on. So that's correct. More precisely, so if I take the same picture, but this time I zoomed in a little bit, I flipped it and cropped it, then a vision model is also able to match this picture with an existing one, a public one on the web. And here, it's able to tell me that most likely this picture is about this place. And I even get the GPS location for it. More precisely, here, it must be a picture of the cast in a restaurant still in New Zealand. It can try to detect entities. So it's called object detection. So detect entities, but precisely with a bounding box in pictures. And here, the results I get are that there are many persons. So this one is a person, right? But there are pens here and even dots. So it can be very precise. Even more precise, it can detect faces in general, faces. So here it's a 3D rendition. And what I get is the crop box for the face, large one or a close one. But I get also the location of the different features, like the eyes, the nose, the mouth, and so on. I get the position of the head in three dimensions. And also, a vision model can be taught to detect emotions. So here, there are a few generic emotions. And what it detects is that, likely, this face is angry. And this is Golom, Golom is always angry, right? Let's move on. So now, still on vision, optical character recognition. So OCR. This is a problem that is now fully solved thanks to machine learning. If I take this screenshot, the vision model is able to tell me that there are three main blocks. And then inside them, there are sentences, or lines, or rows, if you prefer, and then words, and then symbols. It here doesn't make any mistakes. It's really perfect. So it's just a problem. Even if I apply some perspective effect, so if you take a picture on a table or on a wall and so on, it still works really great. So let's say it's a solved problem. But the next step now for OCR is actually handwriting detection. And it starts to work really great already. So it's the same principle. So here, this is a handwriting from Tolkien. And so it's not perfect. It's not as good as for typewriting. But here, it's detecting the lot of the rings. So ideally, it would detect the first one here and the second one here. But then it works pretty well. And it's just making one big mistake here, shadows. So it's detecting a V instead of the Ws, something that could be maybe auto-corrected with natural language processing. But it does very little mistakes here. There's the bottom of the F is detected as something else. So it's almost perfect. It's really, really good. The limit of that, of course, is if we are not able ourselves to read back something that is unwritten, then a machine learning model won't be either. And the limit might be doctor prescriptions. Sometimes they are not even able themselves to read them. And also, it's able to detect entities and to match them up with something close to it found on the web. So on this example, I took a picture from a sponge that I had never seen before. So it's a very rare picture of Tolkien. Once again, I zoomed in. I cropped the picture, changed the colors. So there's not any single pixel in common with the original one. But yet, the visual model is able to detect this picture to tell me that it's coming from this Spanish newspaper. But more than that, it's able to match with the text on this web page and tell me that, most likely, this picture is about Tolkien. And what I get here, so GR Tolkien, I get an entity ID. So this ID, let's me work with a single ID. And I will deal with Tolkien this way, wherever I'm working with these APIs. OK, can be used. So just a few lines. So this is a Python client library that is available as open source on GitHub. And it's a wrapper around the API. So what you have to do is just always create a client, provide a content, so an image here. So I have two pictures called the feature you're interested in. So face detection, for instance. And then you have the results right away. And you can deal with the results. OK, so we've seen what you can do with pictures as of today. You can extrapolate to imagine what you can do with videos, because videos are pictures with a time dimension, right? So maybe the easiest is to show you an example. If you can understand what is in a video, then you can index it. And so this video has gone through the video intelligence model. And I get labels. And it tells me what's in the video and where. So here at the beginning, I have a spiral galaxy. The world is made with. A bit later, I have humans. And in so doing here, I have a polar bear. So you see, you'll fix the results. And then you can really understand what you have in your data, in your input data. So let's move on. Just one code sample. So if you're interested to check out how it's done, I have written a tutorial. So it's a code lab here. And actually, to generate this or to get this information that there is an insect here in a larger video. Once again, it's always the same principle. You create a client. You indicate that you're interested in object tracking. And you call annotate video. And then you get the results. If your video is a couple of, if your video's duration is a couple minutes, then after one minute, about you will have the results. So it's, of course, not real time because it's a long processing. It's a harder processing to read all the frames from the video and understand what's in it. But you can actually track objects. So it's even better than on pictures. You can follow the objects in your videos. So next, text. So it's a very big field in computer science. It's called NLP, Natural Language Processing. I guess we all learned that if we went to a computer school. It's a really big field. And latest advancements came from machine learning again. So you provide text. And the Natural Language model is able to analyze the text and give you results. So on this sentence, it will tell me first that it's in English. It's able to give me the precise syntax of the sentence with all the different relationships. Intuition is detected. Lemmas, I know that relates to the verb to be and so on. Like in pictures, it's able to detect entities. And here I have three different classes, three different types of entities. In red, I have persons. Tolkien is a person. And by the way, if you notice here, I have an ID. And it's exactly the same ID than that for the picture before. So I can really deal with Tolkien here on text and on pictures or videos. And also, one cool thing, the Natural Language model understands the context. So here, if Tolkien was actually not GR Tolkien, but Christopher Tolkien the son, then we get Tolkien, Christopher Tolkien with a different ID. Of course, the unique ID for the son. Then British here relates to the equation. And the three books here, here, here are each detected as works of art, which is perfectly correct. OK, you can also ask for classification. OK, I have a book. I have a chapter. I have a paragraph. I have a sentence. You can ask for content classification. And in this case, it tells me that this sentence should be classified under books and literature with a confidence of 97% which is perfect. And finally, like in pictures, you can try to get a sentiment analysis. Try to understand whether we're talking positively or negatively in the text you provide. So to try that out, what I did is I retrieved two articles, two reviews about the Hobbit, one from the New York Times, last century, the last century, and one from Goodreads. It's a social book made for. The first one is very positive. The second one, as you can tell, is very negative. And the results I get are, for instance, for each sentence, I get a score between minus 1 and plus and it does work. These sentences come from the New York Times. This one, too, it's a neutral one. Most of the sentences, of course, are neutral. The these sentences come from Pauline's review. We really hated the book. So some companies, for instance, are using that to understand how people, our users, are talking about their products on Twitter, or on the web, and so on. So they are actually parsing, retrieving content, and using the natural language sentiment analysis for that. Sorry, some companies are using that on emails. On all the emails they receive to understand how happy or unhappy their customers are, can be pretty useful. Again, to use that in Python, you create a client. You provide the content, the document, can be text or HTML. And you call analyze sentiment, and then you have the result very, very quickly. In the same vein, translation. So I won't get into details. I can share something with you. So in 2016, I was still working on eBooks, and I was using Google Translate. And someday something happened. The results were a lot better. And what happened, actually, I got the answer since then, is that historically, Google Translate was using a phrase-based model, so mostly a statistical model. And in 2016, Google Translate switched to a pure machine learning model. And this is why, at the time, and since then, it just kept improving, we suddenly got a big bump in quality. OK. So here, I just need two lines to use it. I create a client I call translator, and I have a translation right away. It works from and to over one Android languages. So that's thousands of different combinations. And finally, regarding machine learning APIs, speech. So speech as an input, you talk, and you get your speech transcribed into text. So this is also a problem that is now solved thanks to machine learning. If you're able to understand the speech that is in your data, then it means you can index it. So for instance, if I have a new audio file, then I can get the position of every word in all my sentences. And to use it also very easy, you create a client, you call recognize. Sorry, yeah, you call recognize. And then you have the text coming from your audio. So this is, again, another tutorial I've written. You will find all the slides, they are public, so you will get the link at the end. It's also on my profile on Europe Item. If you want to try that. So what I tried in this one, I recorded myself speaking French poetry aloud, a very famous one from La Fontaine. And just asked for, so I'm helping here a little bit, telling that I know beforehand that it's French. Asking for automatic punctuation, but this is a new feature that is very, very important. It will give you the caps. It will give you commas and so on. And here I'm also asking for the word time offsets so that I can index my different words, OK? Now the opposite, text to speech. You provide text and then you get a speech out of that. 20 years ago, I used a text to speech engine in the first European e-book reader we made. It was a big failure. So I did work quite a few weeks on it. I was very proud of the result, but the result at the time was that you pressed the play button to get the book to be read aloud. And the result you got was, at least in Wonderland and so on. I am a robot talking to you. So now this is finished. This is also a solve problem, thanks to machine learning. At Google, this is coming from a technology called WaveNet. It's been developed by DeepMind. Maybe you know DeepMind because they've beaten the World Champion. More recently, they are beating gamers, young people who are champions at StarCraft. So DeepMind is trying to solve problem by starting from scratch and building from scratch machine learning model. And here it's really amazing. Let me get you to hear these examples. So one is the original recording. And the other one is actually the speech synthesized with the same sentence. She earned a doctorate in sociology at Columbia University. She earned a doctorate in sociology at Columbia University. It is really hard to tell the difference. So if you want to know this one, the one on the right is the original recording. I try to listen to them very loud and so on. It is a very, very natural result. Maybe it's the best model so far from everything I've shown to you. I have to admit, even though I love the vision model, because it's solving the problem I was trying to solve, this one is honestly really amazing because it's hard to tell the difference. WaveNet are the voices you can hear in Google Homes, in Google Assistants. And let's, by the way, try something all together. OK. So I don't know if you noticed, but on Google Search, you can actually do a search with your voice. So let's try that out. What is the temperature in Paris? It's 27 degrees in Paris right now. It's giving you results in real time, even though it might be wrong at some time when I started to pronounce temperature. On purpose, I use my French accent and it's been able to understand me. So let's try something else now. I'm going to go to the French version. OK. Quelle est la température à l'ordre? Oh, sorry. Let me try again. I didn't. Here's a matching video. Quelle est la température à l'ordre? Oh, I know. I forgot. I told you, I know what I did wrong. I told you I'm going to go on the French website, but I'm actually still on the English one. So here, now I switch to the French one. Sorry about that. Voici quelques informations à propos de la French. Marseille, 1975. Pierre-Michel, jeune magistrate venu de Messe avec femme et enfant et nommé juge du grand banditisme. Selon Futurations, chez l'humain, une température interne de 37 degrés Celsius est communément admise. Il fait actuellement 23 degrés à Londres. I want to show you the opposite. So I am asking a French, a Christian in French, but with an English accent. And so I messed up a little bit because I started to speak at the wrong time. But what you could see is that you're getting results in real time and you are getting the expected result. It's able to understand me, even though I'm really making it hard to be understood. So what does it mean? It means that the speech to take the engine as understood, as been trained and as understood, how to make, as understood the essence of our language and as understood the characteristics, the specifics of a speech to be able to understand the different words. Okay. So we've seen everything you can do with existing models or there are of course more features, many options could take a day to cover them all. If you want to generate text, again I've made this tutorial to generate. So it does take this to generate these three sentences in three different languages. What you need to do is to create a client again. You need to call synthesize speech and you need to provide some parameters like the language. You want to generate that. The name of the voice. So there are different WaveNate voices if you want a human-like sounding voice and you have different options. So here I was only this, I can generate three Wave files. You can try that in this tutorial. Okay. So next a big gap that is filling many, many needs auto email techniques. So let me show you this example you will understand better. If I take these two pictures, which are different, right? And give them to the vision model. It will give me almost the same results. Sky cloud, sky cloud, because two speakers are actually clouds in the sky. But if I want to build a forecasting service, for instance, I need to be able to understand the shape of the clouds. I need to know that it's a Cyrus here and I'll talk a little bit here. And then I'm stuck because the only info I have is that it's a cloud, it's a cloud in the sky. So auto email here can help you still without any expertise in machine learning. So the difference compared to the API, the API is that you need to work a little bit more. You need to build, you need to provide your own data set. You need to provide training data. You need to look for examples and give that to the auto email pipeline. Once you have the data set, you can launch a training. It's fully automated. And generally you will need a couple of iterations to understand how well your data set is doing. And then once you're happy, you can deploy and serve. And then you come back to the previous case where you have your own, this time your own private API that you can use in all of your solutions. So it's still work online here. If you want something that can work offline, then you can try, you can train a model that we call an edge model because you will be able to deploy it on the edge somewhere else. So it's a smaller model, not as efficient as the cloud model, but maybe it can work and fulfill your needs. So once you have trained your edge model, you can export it and you can get it to run in a container. You can get it to run on your smartphone or even in a web browser with TensorFlow.js. Okay, so it's very useful for instance because in factories on production lines, for many reasons, very often you don't have internet connectivity or you don't want you to have it. So you need something that works offline. Even for web browser solutions, of course you need to download the model first, but then you can work offline and have something that works in your browser tab with a local model. Okay, so once you have built your data set, so here if I want to make a difference, I need to label them. So I have a cumulus, cumulus, cumulus number and so on. So you label your pictures. There it's a classification problem. You want to make the difference between different pictures. You don't need millions of pictures like for the big machine learning models we've seen before. Here you just need a couple of hundreds of pictures per label, ideally 1,000, but with just a couple of hundreds, it starts to work really great. And once you have done your data set, you can launch a training. So here this is a one compute our training. Here are three compute our training and then you get a sense of how well it is doing because your data set, 80% of your data set is used for our training, 10% to evaluate the best architecture and the final 10% the other 10% are used to evaluate how well it is doing. Okay, for classification, you can use the confusion matrix to have an idea about how well it is doing. So here, for instance, it's doing great with cumulus numbers and cumuluses. But doing really bad with the alto cumulus, almost 50% of the time it is confusing it with something else. So there are two reasons here. It's first, we have less samples of alto cumuluses and second, they all look alike. So making data set is going to be an art, I think. You really need to understand that you want to build a balanced data set and you want to try to remove as much as possible the bias that could be in your data set because you're going to get 10 minutes left to get results out of the mobile and if you interpret the result as a causality, actually it may not be causality and this is the issue that we can have with the bias. Okay, so it could be something interesting for another talk. Once you have trained your mobile, then you can use it in an API. You can provide it with new pictures it has never seen before. So here, this is my own private picture. I was in Poland that day and it's telling me that there's a cumulus in this picture at 97th person, really great. Okay, so if you remember my definition, we need, we have data and we want information. So auto ML techniques as of today work already on text pictures, videos and also structured data. And this time you need to choose the features that you want to detect. So for instance, you want to do custom classification. Maybe you want to detect custom objects in your pictures on videos, as of today you can do custom classification. There are beta features also for custom object tracking and so on. You can build your own models on text with custom natural language features. You can do your own custom translation and you can do custom predictions on the... It's a new field, I would say since two years now it's just the beginning, but it's going to be very useful because you don't need any expertise to build a model. So I've done a demo that we are going to be able to try all live. If you are on YouTube, there is a delay. So maybe when you hear that it will be too late and I don't know how much the delay is on YouTube live. So what I've done is a small demo where you're going to be able to upload selfies. And in the first spot, I'm going to call the Vision API and try to detect generic emotions. But also I'd like to know, I don't see you. I'd like to know if someone is sleeping, someone is yawning or someone is having fun. So I've built with my teammates and with attendees from previous conferences, I've been my own private custom model that is able to, I hope, able to detect automatically these situations. Okay, the way it works is the following. So from your smartphone you're going to be able to upload a selfie, it will automatically trigger a Python function, which we'll call the Vision API and maybe the AutoML Vision, my own API if needed. And then we'll do something here, thanks to the analysis, we'll start the result here and you will see it on your smartphone. Here, this is a serverless small application. It's actually my administration back in for the demo. And here on the screen, you will see the result from the administration planner. Okay, so let's try that out. So I invite you to open the camera on your smartphone. Okay, it's starting, sorry for the delay. Okay, so you can either flash the QR code here or you can enter in your browser, this URL, bit.ly slash smart EP 20, bit.ly slash smart EP for your Python 20, smart EP 20. Okay, so if you go there, you will reach this page and I'm going to, we still have about five minutes. I'm going to go to step one, okay, bit.ly slash smart EP 20. So it will ask you for authorization to use your webcam. So here, this is the generic vision model that is going to be used. You can try to upload a selfie and try to trigger a detection for one of these emotions. So let's try that. Yeah, my network must be a bit slow, I'm sorry. Seems to be okay, but it's in the cloud. So maybe you'll get faster results on your side. I have no feedback, so I don't know if it's working for you. Oops, should have gotten the results already. Maybe I forgot to pray the demo gods before. Maybe that's why. Okay, so let me try again. Sorry about it, I hope it works on your side. Okay, so I had an issue before. Yeah, so it does detect surprise with a high level of confidence. Let's try another one. And maybe you've seen, as I have the position of the nose, the mouth, the eyes, everything, I can actually add a mustache to everyone. So let's try this last one. Okay, joy with tight confidence. Okay, let's now switch, you can try a few times if you want. Let's switch to the auto email part. So here, if you refresh the page or if you click next, you should be on the same one, but with my own private model. And this time try to trigger, try to stick out your tongue to your own or to sleep. Okay, okay, it found that I'm yawning. That's right, another one. This time to make the difference between the generic API and my auto email API, the mustache will have the French flag colors. That's my date. Yeah, it does work. So you could tell that I'm cheating and I'm actually cheating because I built this model with pictures of me, pictures of my teammate and of other previous attendees. So of course, it's normal that it works, but we're going to check whether it works for you too. Wow, wow, wow, many people. So Marc-André, okay. So happy people. So that was with the generic API. Still surprise people. So that's me here. Here maybe it wasn't surprise, I wanted to trigger. It's in between surprise and sadness. Here it's between surprise and joy. Here are two sad people, yeah. Here, hangry. Yeah, yeah, yeah, yeah, yeah, you're right. And my auto email model. So let's try if I have, yeah, yeah. So more pictures are coming. You all have your tongue out, great. Tired people, oops, sorry, tired people. Yes, so you see it. I did input people yawning with or without their, hand, it works, and people sleeping. Yeah, it works too. And finally, if you remember, it's able to detect objects with a precise location. So here I have the people, all the attendees with glasses and it seems to work great, okay. So couple of minutes. So you see it's really easy to use to do. One note, there are two ways to measure how well your model is performing. And for that, you have to understand the notion of true and false positives, of positives and negatives and whether they are true or false. There are four different cases. If you're focusing on quality, then the precision is the metric you're interested in. If you are using a search engine, then you will use the recall metric. And here you want to minimize the number of false negatives, you want more results. I will let you have a look at this. If you want to a little bit understand how to email works, at least at Google, there's one specific feature. If you want to do more machine learning, then you can use frameworks. One of them is TensorFlow. So it's an open source, maybe the most popular one on GitHub. By far, under the one is PyTorch that I hear a lot from experts. So what have we seen? We've seen that there are three ways you can use that. With the APIs, you just need a couple of hours. With a tool email, you need days and weeks or months if you want to become an expert. The difficulty, there's absolutely no difficulty with the APIs. With the tool email, you need to build a dataset and for that, you need a couple of days. Okay, a few links if you're interested to check out some solutions. Here, this is an online comic coming from Google AI. You will find lots of the terms. So it's a nice refresh if you want to understand a bit better. If you want to get the slides for this talk, they are here. You're very much welcome to send me a feedback too. So thanks a lot for having me today. My goal is was to give you this overview of what you can do as a developer and you don't have to be an expert to do everything you've seen. So I hope you learn a few things and also I hope it gave you a few ideas. Thanks a lot for having me today and have fun. Have a great year of item, thank you. Okay, thank you very much, Laurent. That was a very interesting talk, lots of topics, lots of things covered. We do have a number of questions but the time is already up. So I would say that we basically take them to the talk channel that I posted in the chat. And then you can answer them there. It would also be a good idea to maybe post the links that you have here in the slides in the talk channel so that they stay up and then are easily reachable. Sure, I will do so immediately. Right, so let me give you your applause. Well deserved.