 Thank you very much. This is the last talk, so I'm going to try to be brief and I will start with this image. This is what came up to mind a few years ago when I was thinking about Extracting insights from video. So it's like it's a garage. It's full of mess I don't know where to start. It's messy and the most part of people do this They just shut the the door and say forget about it. I don't want to do it It's a pity because there are actually a lot of opportunities Down there in the market to create value for your customers if you open the garage door and you start Dealing with that those messy problems. I'm going to give you a couple of examples now of Applications the leverage video as a source of unique insights here. We are talking about civil Avionics, so you have on the left video with the moving parts and Labels and rectangles. So what's going on here? Well, there is an aircraft that he's refueling and onboarding passengers and the computer vision is essentially detecting moving parts and moving objects and events on the right inside and by the way This is a technology in product brought to market by a company called Asaya and this part on the left is the reconstruction of the whole workflow and in this case, this is a very great example We're essentially deriving insights in this situation. It's very very easy because everything is already Automated second example. This is a marketing Digital marketing as you can imagine digital marketing now is primarily video What do you want to do if you're a marketer is to? Essentially spend the money on the ads that drive engagement again another company in the US these times this company It's called the finia. What they do essentially is to put respondents. So the lady on On on on my right is watching the video and obviously is reacting to the video art that is displayed there and the computer vision algorithm is Trying to predict what is the engagement level which is a score between zero and a hundred again another brilliant example on how you can create value by looking at images moving pixels and Deriving insight from it last example. This is a This is an example driven drawn from beautify live logging very popular increasingly popular You want to have more healthy lifestyle? Why not to look at what you're doing on a daily basis with an IoT device in this case It's a body camera that records things that happen in front of you. So is your Lifestyle at you're not depends on what you're doing, right? So in this case, there are things going on So how can we make sense of all these data? Now the interesting thing is that it is not actually as messy as it seems There is a number of steps. We always go through to essentially Take video as a raw source of information and come up with insights There are driving value for your clients and these are the four steps and you can you can see here set of phase Low-level feature extraction high-level feature instruction and then data analysis what I'm not gonna talk to you about in this Conversation today is the last step the data analysis one because you know you're all expert on this So we don't we don't need to go through this But I want to bring you to a point where the data analysis. It's pretty straightforward because all the rest Which is actually the 80% of the work is already done So let's jump straight into it and let's have a look at the first The first section which is the setup. So how do you how do you set up your? computer vision pipelineer to Commence and start your project now first of all as many of you are already aware of When you're doing a data science project at the first thing that you do is to focus on the business objective of your client So what is what about an airport? Well, it's all about delays, right? You want to minimize the delay by essentially Understanding what is going on with my video by the way? So good now. So you want to understand what are the operations that are more time-consuming and the ones that are Creating more problems. The second example is about engagement. We want to maximize the engagement of our consumers and We do this by essentially look at the Look at the ads that are most engaging final example at the lifestyle. We look at activities quickly Here the recommendation is very very simple. Not only you have to look at the business problem of your client But you also want to measure Signals that are related to that business problem If you want to do that actually we have on our website beautify.cozlash tools a set of a set of Templates set of tools that are enabled you essentially to evaluate. What is the use case? flesh out what are the alternatives when it comes to data and It's it's all there. It's easy. It's free. So please feel free to download and use it Now we were talking about Images and videos and they are so complicated to process because they are intrinsically redundant They are very difficult to manipulate But what if we were able to create an intermediate intermediate representation that simplifies Greatly this signal and kind of compact set now the first recommendation here is to really focus and identify The visual cues and maybe also the audio cues if you want to go through the audio analysis route That essentially qualify the objective function that we saw before Let me give you an example. If you are in the airline business Obviously you want to detect those moving parts so that you can see here. I light it in green I'm not an expert in this field, but the all these moving parts like Doors like a cargo containers They can actually pretty easily detect detected in in videos. What about faces? But what with faces, you know, we we are all expert about faces So the important parts and the important visual cues are certainly the eyes the subtle movements of the the Facial muscles and the lips when it comes to a complex complex Seen like this one. Well in this case we can separate in four different categories the visual attributes that will Do a good job in qualifying who is in the picture when the this event is taking place What is in the picture? What is the activity and where the activity is taking place? So here again, I just gave you an example of what are the things that you should look at Okay, and the final objective is to come up with a data structure Once you have a data structure Like this one you are in business Okay, because you have a framework you have the ability to pretty much See mapped out. What are the elements that your machine learning or your computer vision algorithm? The needs to detect to do a good job Unfortunately, this is just the beginning of the story because once you have the data structure our recommendation is always to iterate and try to simplify it to max and here Occam razor is The obvious choice no more and no less than What you needed to do a good job in describing the events that you want to detect again All this is not rocket science. It's just a simple common sense When it comes a little bit more technically is at this point in the pipeline when you want to design a low-level feature features that take this data structure and essentially find or Feel this data structure In your pipeline. So how does it work? Well, it all starts with data And here we are not very different in the sense that even when we are talking about computer vision You have to find a data set that Approximates well your problem. We said before that we have a data structure now we want to find a Good data set that approximates your problem and there is this brilliant website, which is called the visual data I your I strongly recommend you to have a look at it because what you can do with this is you can select a topic as you can see there for instance video classification and You know on this website you find an awful lot of datasets that are already publicly available What is the advantage for you? Well, if you don't have data yet from your client You can start by using this data second advantage is that you have the labels and third advantage is that in most part of the cases all these datasets already come with a baseline and already come with some Source code that you can use to do some early experimentation Now Let's talk about the The thing that is for me more interesting Which is how to now that you have the data data set now that you have the data structure. How can you? essentially train a machine learning algorithm Take videos in that data set you selected and extract meaningful signals Well, as you probably have heard in many many other talks the deep learning revolution as Spreaded in every single field of artificial intelligence and video recognition is not an exception This is the state-of-the-art algorithm that is called YOLO The acronym stands for you only look once and you will understand in a minute why it's really cool But you know, I strongly recommend that whoever Wants to play with the video recognition to use this because it's an algorithm that is open source It's super fast and by the way it is Very very performing like state-of-the-art performance. So let me give you the intuition behind YOLO Let's say that you have an image. Let's take a very simple example. We're not Talking about videos. We're talking about images within videos. You chop the the image into Into squares like this one that you can see this one that you can see here And then what you're saying is the following I want to look at each individual Rectangle or square and I want to detect objects within that square or within adjacent squares So YOLO what does for you is to create this intermediate representation where you have all these All these big rectangles big and small That are associated associated to each Square now not only YOLO does a great job in detecting objects as you can see here or at least making eye Hypothesis about potential objects, but it also classifies them And this is the output that you get when you combine the two. So for instance the color yellow. I think is Corresponding to dog objects, then there is the red that corresponds to bicycle and so on Okay, now this is very messy. Obviously. This is not what we're gonna get at the end What we need to do what YOLO does is to vote for the rectangles or the category that are Most likely to be representative of the objects and in that case what they do what you do is Apply a very simple voting technique like an MS No maximum suppress Algorithm to just promote the good ones at the end of the day you split your your Imaging regions and finally you get Neat detections now When you're running YOLO you have essentially a pre-trained off-the-shelf shelf a deep neural network that can really be effective in Distilling all that information and redundancy into a signal that is usable So this is another recommendation. Please experiment with this date with these tools because they are they are incredibly easy to use and effective But when you are Successful in doing in in running YOLO you are left with an output like this one so a video with a lot of things going on a lot of Detection results it can be very very messy at the end of the day You get a JSON all all the JSON is timestamped But still it's really really hard to think that you can go to a client to show to the client This output and convince them to pay you for insights. They it's we're not there yet Okay, so we need something that connects this layer Which we call low-level feature layer to an higher level Layer, okay, so how do we do this? Well when it when it comes to higher level features We will have to train a new classifier Using now these features now very mind that in this case The level of abstraction of this feature is much higher because we are now looking at Objects or pieces of objects. We put them in a feature vector, which we call x hat And we want to instead get those signals on the left which are Things like for instance healthy neutral or unhealthy if we're looking at body camera video sequences if we're looking at Avionic sequences they will be different, but it's the same principle, right? So you have low-level features and then you want to abstract and get higher level features So how do you do that? You train a very simple linear classifier, which means that? You're learning the weights and and then it's a dot product, okay It's a dot product each frame gets a classification result in this case is an LT But it could be any other any other class the gist of the story is that Okay, you have a sequence of Frames and this sequence of frames gets split it into three different parts one part which is Which is Corresponding to the first category on healthy the second part which is Corresponding to the category healthy or the third part which is the neutral now You are in business and you're in business because you have now two things you have the segmentation of your sequence You have a quantitative data because essentially it's a time series Which is annotated and you can extract two types of insights quantitative insights and qualitative insights now Let's wrap up and what I would like to share is a summary of the lesson learned that we have Incrementally learned thanks to a lot of clients in this field So first of all focus on the most burning business problem that you are solving Secondly try to design your insights even before Starting because as you have seen the pipeline is quite complicated So you don't want to get to the end before you do this this designing process and discover Interesting things at the end and then more importantly really try to Design your data model early at the start of the project for When when it comes to the computer vision part And I think it's very very key to reuse as much as possible things that are already there We talked about YOLO. It's not the only alternative. There are other deep learning frameworks that can be Used and can be applied in a very straightforward manner, even if you're not a computer vision expert last but not least try to incrementally Incrementally build your pipeline and from low-level feature to high-level features so that you can really Extract the most value for your clients. This is all. Thank you very much. Sorry about the problems with the screen If there are questions or comments, I'll be happy to address them. Thank you