 All right, so good morning everyone and first of all, thank you for coming to the session so Today we're gonna be talking about deep learning and technology behind self-driving cars but before Diving into the technical details of the presentation. I'd like to set the scene by Taking a look back into the most relevant events in the history of the automobile That from my perspective have led us to where we are today in the field of advanced driving assistance systems or a desk and autonomous driving So we have to go back to the late 1800s to acknowledge what is widely regarded as the world's first production car propelled by an internal combustion engine In 1908 Henry Ford brought the Model T to the masses and really opened up the automotive industry with the first affordable car Years later Charles of Kettering of Dayton Electric Lab Odelco, a patented the electric self-starter, which was a huge revolution as it avoidant having to use human power to start the engine using the crank and Despite this invention. He's his patent wasn't used commercially until one year later in the Cadillacs production cars in 1939 General Motors Cadillacs and Oldsmobile divisions developed the automatic transmission, which eliminated the need to shift gears in the car and probably a huge event for for AIDAS and autonomous driving was The modern cruise control which was unveiled in the late 50s for the Chrysler imperial convertible now Interestingly, there's there's a nice story behind this this invention is credited to a man Ralph named Ralph Titor and He was actually blind So the inspiration struck him when he was riding with his lawyer as a passenger and he noticed a tendency in his lawyer of slowing down while talking and speeding up while listening and His invention goes back to the mid 40s, but wasn't used commercially in cars until 1958 One of the greatest inventions for safety is the anti-lock braking system that was still used today and The short brake system was the first electronically control anti-skid system that I was able to provide Both front and real wheels from locking up and maintain steering control during a full brake stop And probably the greatest invention for for all of us here today as Data scientists engineers software engineers was when we started to put computers in the car And this first engine control unit or ECU was going to be a transformative change in the automotive industry Since then there've been many other Developments and from my perspective the next interesting thing came with connectivity and all of a sudden the car was connected to the network in some way and At first it was just a cell phone installing the vehicle that would route calls to a call center and get Emergency help when there was an accident, but today connected cars provide anything between remote diagnostics over-the-air updates Wi-Fi Turn-by-turn directions and so on and then one of the other greatest Advancements for for ADIS was the lane departure warning system, which was developed for the Mercedes-Benz Actros truck and all of a sudden the electronics in the vehicle were starting to perceive the world around it and This is a very important point the whole concept of having a computer Make decisions or help you make decisions That's at the heart of what AI is bringing to ADIS and an autonomous driving All right, so if we look into the fields of of automated driving there are really a lot of things we can cover and Just to mention a few and to set the scene We can talk about controls perception and localization and planning and we won't be covering all of these but maybe let's Talk about a few trends in controls Sensor models and model predictive control so as the vehicle is becoming more and more intelligent one of the things that we need we need to really care about is the accuracy of the sensors Playing a significant role in the control systems themselves Also in model predictive control as we're becoming more of a passenger We need to balance out behavior as well as ride and comfort in localization and planning researchers are Integrating their algorithms with Ross the robot operating system to solve problems like path planning and in perception the fields of deep learning which will cover deeply today and Sensor fusion where you would like to fuse data together coming from from different sensors to assess the world around them Now covering all these areas in 30 minutes. It's barely impossible. So We'll be talking about mostly about deep learning and maybe mention also a few of the sensors involved in sensor fusion So of course in traditional machine learning approaches We have to we have to have a some sort of feature extraction mechanism To learn discriminative information from images or from data Well, well, I suppose in deep learning. We're going to be working directly from from raw data And this provides an end-to-end learning mechanism, right? So we're going to start talking about convolutional neural network. So convolutional neural network often Referred to a scoff net or CNN is a type of deep neural network that can work directly on structured data Like images in order to for instance classify them now one of the important things is that because of the deepness in the network, we're going to be eliminating the need for creating handcrafted features and so deep neural networks provide feature learning and I suppose also to classic approaches in this case, we'll be dealing with networks that have anything from Five to hundreds and even thousands of layers So and because for this we're going to need Very big data sets GPUs are going to play a big role in the training process of these covenants And if we think of the architecture Here's a simple architecture of a CNN and you can think of these architecture as Maybe as the representation of the data as it travels through the network So we're going to start with an image with an image or an image input layer Then we're going to have several convolution rally pooling layers Up to the point where we are going to flatten the activations and work with fully connected layers and finally a softmax function to classify our data and The interesting thing here is that all these layers are going to be trained together, so The training process is going to involve adjusting weights in all these layers for the network to to the task Also another important thing is that the Very first layers are going to be involved in learning the features. Well, the last layers will be performing the classification All right, so now that we've looked into the Most popular architecture we can take a look at the typical deep learning workflows So we're going to start by accessing and exploring our data, which again can come in form of images signals files text Then we're going to go through a labeling process and this is one. This is probably one of the very One of the tedious tasks that frequently nobody wants to do We have to properly label or training data if the data that we have available is insufficient then we Might work with synthetic data or use all the techniques such as data augmentation Which basically consists of using the data that you have available and perform Operations that are object invariant with that data like cropping or reflections or translations rotations to enhance the data set Then we can choose our network architecture here in this case We'll be using the deep madlabs deep learning framework and we can either build the networks from scratch or we can work with pre-trained networks from research to perform maybe transfer learning Also, we can leverage the interoperability with all the frameworks like cafe or or TensorFlow or onyx And then it's time to perform training either in the CPU or the GPU Skill to to multiple GPUs or the cloud if we have them available and we can even perform hyper parameter tuning to come up with the optimal set of parameters For for our for our network and finally we have to share work in some way We might do that either in the madlabs framework itself or we can export the network to to onyx onyx stands for the open neural network exchange format we can also choose to Put our models in production in enterprise scale systems or even generate C++ and CUDA code for very fast inference All right, so let's start with what seems a priority the most time-consuming task Which is labeling the ground truth data and so for that we have developed a set of tools One of which is ground truth labeler and I'm going to guide you through the process of labeling the data so in this case I can import data and either video sequences of images or a specific video that Might have a specific compressors or specific file formats that you may want to read from here I'm just loading simply some video I Can inspect through my video timeline to see what the data looks like and typically, it's convenient to work with a subset of the video since Small movements of your hand can mean large changes in the video timeline So here I'm just zooming in an interval to work within just those bounds And I'm going to start defining labels So in this case, I'm going to define the label car As well as the lanes I'm going to give the the car label a rectangle shape And I can put a label description to tell others on my team on how to properly label this data in this case Using a tight tight bounding box or a bounding box. It's very close to the to the boundary of the of the object And then I can go and label a few of these The next thing I can do is is Use other type of labels in this case for the lane markers We're going to be using a line label instead of a rectangle shape and as you can see I can Go through my video and this case this frame click with the mouse and right click to stop to label those Now another interesting thing if about labeling is when you need to label an entire image or an entire time stamp And for that we have seen labels a scene is something that is true for an entire image or an entire period of time So here we can define for instance the weather conditions, whether it's sunny or not sunny Or even the lightning conditions If their shadows or not and now it is the process of applying those labels to the Selected data so we can either add a label to the current frame and as you can see in the top corner Only the first frame has been labeled as sunny in with shadows or we can do that for the tire entire time interval All right, so the next thing is is how to how do I automate this process? And that's really the Most time-consuming task so for that we have developed a set of algorithms and also we provide an open API So you can build your own auto algorithms to automate the ground ground truth labeling process in this case I'm just going to automate the detection of this car and for that. I will be selecting a point tracker Now as you can see here in the interface There's this I'll add algorithm and import algorithm API that I'll show you later But for now, I'll just go with point tracker and I'm going to just simply click automate. I Can configure some of the detection process like for instance, which type of feature detector? I'm going to use in the first frame which in this case. I'm just going to leave the defaults and Just go and hit run So just about a second I labeled That that data and I can go back Inspect it I can make any changes or make any adjustments if needed and when I'm done I go in and click accept All right Excellent now the next thing is we may want to know what the distribution of the labels look like so for that we can go to the view label summary and In here I can seek through my video timeline and I can take a look at what has been labeled Maybe they're specific a specific ROI specific labels that I want to take a look at or maybe there are specific scenes That I might have to pay close close attention to So that's that's convenient for seeking through video and then the very last part is exporting this data, so In order to do that. I'm just going to Go to export labels and I have a few options whether I can export to workspace or a file in this case I'm exporting to the workspace as a ground truth object that I can later use For for maybe a deep learning algorithm and and this object contains the data source So the video that I imported it contains the label definitions or the two car labels Or the car labels are lane markers and then the two scene labels and finally it contains the label data first frame contains the manually automatic the manually labeled data and then Or the first timestamp contains the manually labeled data and then there's one second of automatically labeled data So the next thing is what if the Available algorithms that we provide do not do the task that you need to perform Then you can create your own algorithm using the provider API and in here. We're just doing lane markers So we have a template that inherits from vision dot labeler dot automation algorithm that implements several methods One of which is the run method. It's basically the one doing most of the work All right So here in this case, I'm simply just going to import that That class in the MATLAB in the in the tool framework by going to import algorithm selecting the file and Then it will come up. It will come available in the tool So I'm just gonna hit Click the right lane and left lane in this case and Automate this process So I go to select algorithm. I find my auto lane detector I hit automate and then I need to find the sensor variable and Hit run Then I can review the This process and if there are any adjustments that need to be made I can do that manually all right So we've gone through the process of labeling our eyes and if we follow this path towards creating a detector We'll be doing ROI detection now in automated driving Sometimes frequently our eye detection isn't enough and we need much more accuracy and for that The other problem we can solve is pixel level classification. So each pixel being classified as one or other label so There comes another tedious task, which is labeling pixel by pixel every image and and for that the tool provides also the capability to label Images at the pixel level and here. I'm just going to show you quickly how we can do that In this case, I'm going to define a few our eyes, but this time the our eyes are going to be pixel labels instead of rectangles So I'm just going to the five road vegetation The skies etc. Etc. And then I can use the tools available in the app to fill the sky for instance using flop fill I can use a smart polygon to estimate where the shape or where the vegetation is and then use different foreground and background editors to Adjust the the detection process And and this is how it works. It's time-consuming Problem that can be until be automated in some way as well But this is the way the way you do it and also there's a smart way to do this because every pixel can have at most one label So you can do that wisely as we're doing here Labeling the road prior to labeling the car for instance Also, another interesting thing is when you're labeling you might need to know what parts of your Of your image haven't yet been labeled and for that you modify the opacity as we did and and and you can take a look at what's at what's left All right, so in this case now we're going to be solving the problem of pixel label classification using a semantic segmentation network so for that let's go back to the convolutional neural network that we know about so this is a convolutional neural network You have an image as an input and you have classes as the output. So the entire image is being classified as one of these labels And as you can see you have several convolution rally and and pooling layers here We've indicated the pooling layers in green to indicate that this is a down sampling operation and now to perform the semantic segmentation network what we're going to do is We're going to Redo the work. So we kind of have an anti pooling layer To end up having the output image Of the same size as the original image So now we have an input image and the output image of the same size in this case now every pixel is going to have been labeled With it with a different class And here's one of the problems we can solve this is figuring out what the drivable path is and For this example, we're using publicly available available data. It's the can bit data set So we don't have to go through the process of labeling, but if we had to you've seen how we can do that So let's see how that works First we're going to load some Some data and for that we have a data store, which is a way to load data into Matlab very efficiently Then we can What we're doing here is using also a histogram equalization to to enhance the the the image Because actually the image is quite it's quite dark and Maybe if if somebody can this screen is about to turn off, sorry, and I don't want to turn your back on you If somebody from from the team can can take a look and I'll make sure it doesn't So then we have a pixel-label data store. So we have We have Labeled data at pixel level. So we're going to be able to load those with with the pixel-label data store and in here in this case we're actually Able to visualize an image and overlay the pixel labels in top Another thing we can do then is we can take a look or we can look into the insights of of the data and and here if we take a look we will notice that the the data set is a little bit in balance and so this this data set will Pretty much do a good job in detecting roads But we'll probably not do a good job in detecting something something like a bicyclist So if we want to detect bicyclists, we probably need to go through the process of Collecting more data and labeling it All right. The next thing is we're going to be using a semantic segmentation network So for that we have the second layers function in that lab that will take VGG 16 weights and do the necessary work to Upload the network now interestingly here. This is This type of network is a DAC network or directed the cyclic graph network and in this particular case it has a Down sampling up sampling architecture or encoder decoder depending on how do you like to call that? So after that we want to find a way to compensate for the imbalanced data and So what we're going to be doing is we're going to be changing the last layer in the in the network and taking into account the weights of of the pixels available in the data So we do some replacement. We remove some layers. We have some connections and we replace the the last layer We can use data augmentation to expand the training data set here We're performing some reflections rotations and translations in X and Y And that's very convenient when when your data set is it's not massive and then we can go through the training process and Here you see the execution environment has been set to auto so In this particular case, we're we're going to be training on the GPU if you if you have one Otherwise we'll train on the on the CPU but if if we have multiple GPUs available or if we have access to a cloud or a cluster and We can either use multi GPU or or parallel All right, then we go through the process of of training and I've lost my slides here, but and This trains in roughly 1,000 minutes with one GPU and after that Basically what we can do is we can evaluate the train network Taking a look here at a single image and looking at the output of the network saying we can over We are overlaying the image and looking at how well it classifies We can compare against the expected output or the ground truth label. We can look at the differences between each other Thank you and We can compute metrics because And the end of the day is what we need so in this case We're using the intersection of a union which is a popular metric for semantic segmentation So then we can look at the whole test data set that we may have spared and in this case we are evaluating the against a Whole bunch of data and again we look at things we kind of have results that we kind of would have expect if we want to If we want to detect roads or skies we have pretty good odds But if we wanted to take something like bicyclists The network is is not performing as well So some of those underrepresented classes will not perform well and therefore we'll need to collect more data and label it Okay, so so far we've been looking at a data coming from images, right? So images are great because they provide great range It's the sensor with the farthest range more than 200 meters, but it lacks of the ability to detect proximity or Or working the dark conditions or detect speed and for that there are many other sensors that they can play a role in Adis and autonomous driving Some of the sensors are a lot a radar or ultrasonic or lighter So In in this particular case, I want to mention the fact that how important sensor fusion is and if we Leave lighter out for now and we look at the cheap sensors. We are the remaining three. This is kind of what sensor fusion looks like So you see that if we're able to work with multiple sensors, then we have good data to work under all conditions all scenarios Now today's talk unfortunately is on deep learning. So we won't be covering sensor fusion much But I want to talk about the the sensor that has been left out, which in this case was lighter so lighter stands for light detection and ranging and and here you see a lighter device on top of the this retrofitted Google car and Basically, it consists of It's a device that continuously a 360 degree sensor that continuously fires off beams of laser light And then measures how long it takes for the light to return back to the sensor So what does that later look like when we look in in the computer? So it looks like this. So we have on the left. We have video and on the right hand side We have a synchronized point cloud data coming from lighter. So the next thing we want to do is See how we can Develop a deep neural network using lighter data and the greatest challenge here is going to be on data preparation and labeling That's the part is really tedious. So accessing the data pre-processing it and labeling it so I want to go first into accessing and labeling data and so in MATLAB we have Efficient ways to read from the standard in industry is which is below dine pick up files So we will be using that in this case and also we need efficient way to to visualize the data So we have very efficient streaming point cloud players to to view this type of data MATLAB So first thing is accessing the data Or pre next thing is preprocessing data. Excuse me. And so for here We're gonna have to be a little cautious because as you can see there There's quite a lot of noise in this image. Those circular rings are bouncing off the ground That's that's basically Information coming from the ground. So we want to remove that. So in this case, we're going to fit a plane Using an algorithm called Ratsack so that we can remove those ground points because that ground is basically noise and One we have once we have those points removed. We want to go from These point clouds to objects by performing a clustering algorithm, okay? So here we're just simply performing a very simple Euclidean based segmentation algorithm and once we have that we have to go through the process of labeling and For that we've developed a prototype That basically what it does it snaps a bounding box to a cluster and you don't have to then Go and adjust the the limits of these bounding box. So that's very convenient. So we can label yeah cars and bicyclists and so on and so forth Now if you have some unique insight into your data, my lab can be quite handy because If for instance, you rotate the image maybe it's much easier to label the cars from a top view than than from a 3d view and Still be able to collect 3d labels at the end of the day So this labeler has labeled in a 2d view and then when you rotate it You have 3d collected labels All right, and the next thing is how do we automate this process? because that's the ideal part so in here, we're going to be selecting a few objects and In this case, we're going to use a tracking algorithm to track those objects and Be able to perform or to predict the position of the object in the next frame in the next frame in the next frame And thus label the data set. So these ends up saving a lot of time So what's what comes next? so in terms of classifying individual point clouds we decide to use a popular network from research from Stanford from 2017 Which is called point net The idea basically here is just to classify the point clouds as objects so Here you have this for reference, but this is a very straightforward convolutional neural network very similar to the ones we saw earlier for images, but in this case for point clouds that consists of several MLPs and I'm not going to go into much details of how it works or why it works But I'll show you some some data. So here's some of the output from the classification process I'm not sure if you can see this well, but you probably see on the left-hand side a point cloud That has been classified as a car and on the right-hand side What it seems to be a bicyclist that has been classified under the label none in this case We have three labels for this a specific problem cars trucks and none now another option we may have in mind is to use Semantic segmentations like we used before in this case. We're going to be using link net, which is a much lightweight semantic segmentation network So we still have to collect a data cluster it and label it But we have to figure out a way on how to organize the data for training and for that what we're going to do is Basically, we're going to project the data or the point cloud on 2d so that we work with x y c data and the labeled the labeled Point cloud data projecting in 2d. So we basically turn the problem into a standard semantic segmentation problem Working with images in this case now so for that this is the architecture of of link net it's again and Encoder decoder architecture like like the one we used earlier and Here you see how how we can use the model API as well to build a subnetwork Interestingly one of the greatest things of these DAG networks is that the For instance the decoder block 2 can have access to Very low-level information coming from encoder block 1 as well as high-level information flowing through through the architecture So link net uses rest net 18 or is based off rest net 18 in this case. I'm using multiple GPUs for for Training the algorithm and this can roughly take about 30 minutes on the standard workstation with a few GPUs And then we go through the last part of the of the of the process of the workflow Which is if you recall it was about deployment, so we need to deploy this in some way and The way we're doing it is through madlabs GPU coder that allows you to generate C++ and CUDA code that you can then embed in some type of embedded system like an nvidia PX nvidia drive px2 that could be on board of the vehicle If anybody's interested in benchmarking against other frameworks This is how madlabs GPU coder stands Y axis represents the number of frames per second and the x-axis represents the size of the batch or the number of images that you Put it once through the GPU In green you see madlabs GPU coder with coo dnn in the back end and in orange and gray You see using tensor rt Of course with int8 data types, this is much faster inference and then Resnet again on an nvidia titan v Comparing madlab and tensorflow both for single and int8 precision So this is the final result. This is what it looks like You can see the point cloud and you can see the segmentation process being done real-time And you can see that as the cars get closer to the lighter device where there are more points They're much better classified All right, so this is about it if you want to learn more about perception algorithms for math or for deep learning and automated driving with madlabs, I recommend you to take a look at these two websites also Find time to visit us at the booth. There are some really interesting demos you can take a look at and yeah with that Thank you very much and Enjoy the conference There's time for for questions if there any all right, so I'll stick around for a while feel free to come by right. Thanks