 My name is Alex Wu. I worked for Intel. I mostly do the work about media processing and deep learning acceleration for Intel hardware. I and my colleague implemented OpenCL acceleration for the OpenCV DNM module. Today I'm going to talk about the deep learning in OpenCV. Firstly, we are going to cover something about the OpenCV project and the deep learning. And then we are going to talk about the details of the OpenCV DNM module, as well as its OpenCL acceleration. I will also introduce my current work on Vulkan backend. At last, I will show a sample using DNM module. So what is OpenCV? OpenCV stands for open source computation library. It's used for image processing, analysis and image manipulation. The library has more than 2,500 optimized algorithms used for computation and machine learning. You can use the library in C++ Python on various operating systems. The library has more than 20,000 folks in GitHub. Many would agree that OpenCV is the most well-known computation library nowadays. Several weeks ago, OpenCV 4.0 was released and the Gold version is planned to release in the end of this month. OpenCV 4.0 has many new features. It's switched to C++ 11 and has no longer binary compatibility to previous version. And better performance on CPU with AVX2 in chain six. Compact footprint and bigger vision of DNM module. For those who are not familiar with deep learning, I would like to talk a little bit about the key concepts of deep neural networks. Before going deep dive into our topic, the primitive unit of a neural network is node or neuron or perception. These three words represent the same thing. A node is just a place where the computer happens. A node combines inputs with a set of weights. These input weights products sound and the sound is passed through a so-called activation function. You get the final results of a node computation after the activation function. Nodes make up of layers. Each layer's output is simultaneously the subsequent layer's input. The first layer of the neural network is input layer and the last layer is the output layer. There between them are the hidden layer. Deep neural networks are distinguished from the more commonplace single hidden layer neural networks by their depth. That is, there are more than one hidden layer in the network architecture. Chaining. Chaining is a process to make your network capable of an inference task. The typical chain process has four steps. Step one, initialize weights. Step two, set input data, for example, an image and compute the network output. Step three, compare the output and ground choose and calculate the error. Step four, modify the weights and go to step two until the error is small enough. This process is a little bit complicated, right? But don't worry, deep learning frameworks will do that for you. Inference. You have a chain model that is a set of weights and other parameters. Set input data and compute the network output using deep learning library. Down. Inference is pretty simple compared to the chaining. Here are some scenarios that applies from the deep neural network to the field of compute vision. You can do face recognition, pixel segmentation, object detection, and a lot of other interesting things. Okay, let's start our topic today, OpenCVDN module. In recent years and in many areas, the deep learning has shown results far exceeding loads of classical algorithms. This also applies to the field of compute vision, where the mass of problem is solved using neural networks. Since OpenCV 3.3, DN module is included in the main OpenCV repository. It implements inference only and is compatible to many popular deep learning frameworks like TensorFlow, Caffe, Toge, .NET. That means you can use the net model pre-chained from this framework directly without any model transformation. So why we need a new wheel of DNN in OpenCV? Since we have already so many deep learning libraries, I think there are many reasons here. Firstly, the lightness. It's possible to achieve the lightness of the solution, leaving the only ability to perform a forward pass over the network. This can simplify the code, speed up the installation and compilation process. Secondly, the convenience. DN module is a safe content. It's possible to reduce the external dependency to a minimum. This will simplify the distribution of applications. And if the project previously used OpenCV library, it's not difficult to add support for the neural network to such a project. And universally, DN module provides a unified interface to manipulate net models from different deep learning frameworks. It supports multiple target devices like CPU, GPU, and VPU. It can run on Linux, Windows, Android, and Mac OS. It's possible to make your application right-wise and run everywhere. DN module supports all the basic layers from the basic convolution fully connected to more specialized ones. In addition to support individual layers, support for specific neural network architecture is also important. AlexNet, GoogleNet, ResNet, and SqueezeNet, FCN, Enet, and SSD architecture are well tested in DN module. And more support will be added on the later. Let's go through the technical details of the DN module. This is the architecture. From the top to bottom, the first layer is language bindings. Python and Java are supported. Other components in this layer are accuracy test, performance test, and samples. The next layer is C++ API. This provides the high-level interface to DN module. You can use this interface to load a dead model, run it, and retrieve network outputs. The next layer is implementation layer. It includes model importers, which convert the prechained net model from different planning framework into the internal representation. This layer also includes DN engine and the layer implementations, which implement the general logic of the neural network. At the bottom is the acceleration layer. DN module implemented CPU acceleration, OpenCR acceleration, and highlight acceleration by itself. For the CPU acceleration, SSE and AVX inches in chain six and multistread are heavily used. For the OpenCR acceleration, some highly optimized kernels are implemented. And for the highlight acceleration, highlight language is used to implement layer computations. Besides its own implementation, DN module also uses Intel inference engine library to do the acceleration. Intel inference engine is a part of Intel OpenVINO toolkit, which is a set of tools and libraries for computer vision application. DN module defines backend and target to manage these different acceleration methods. It provides users with flexibility to choose the proper acceleration method according to their software and hardware environment. OpenCV backend is the default backend. It supports CPU acceleration and OpenCR acceleration. These accelerations are built-in implementations without external dependency, so you can use them out of box. Highlight backend also supports CPU acceleration and OpenCR acceleration. That depends on the highlight compiler and runtime. If you have Intel OpenVINO SDK installed, you can choose Intel inference engine backend. It supports CPU OpenCR and mid-read acceleration by using the MKL-DN, CR-DN, and Movidius-VPU respectively underneath. Use the set-preferable backend and the set-preferable target to choose what acceleration method you want. For example, if you want to use Movidius-VPU to acceleration, set the backend to the inference engine and the target to mid-read. DN module will fall back to the default backend if it failed to detect the backend and target you set to it. Network optimizations. DN module has some general optimizations on network level. Thanks to its internal implementation of different network, these optimization are not tied to any specific deep-learning frameworks. That means these optimizations benefit all the net models no matter what their original framework is. I will introduce these optimizations. Layer Fusion. When DN module set up its internal representation of network, it analyses the network structure, and if possible, merge some layers into other layer. This can reduce network complexity and computation workload. There are three Fusion types currently in DN module. In this case, a convolution layer fills the subsequent batch-normal layer, scale layer, and value layer. You can find this structure in ResNet 50 architecture. And in this case, a convolution layer fills the subsequent element-wise layer and value layer. And take another convolution layer as its input. And in this case, the concat layer is eliminated. This is a typical case in SSD architecture. Another optimization is memory reuse. This is the normal case without any memory reuse. The red box represents the allocated memory and the green box represents the reference memory. In this case, each layer allocates its own output memory and the subsequent layer references this output memory as its input memory. DN module analyses the memory lifecycle and reuse the previously allocated memory if possible. The first case is to reuse the input memory. For example, if the layer 2 supports in-place computation, it has no need to allocate the output memory. Instead, it reuse its input memory like this. The second case is more general. Considering that the inference process is sequential and it tends to top, only one layer works at a time. That means the higher layer has a chance to reuse the memory allocated at the lower layer. In this case, the layer 3 don't allocate its own output layer. Instead, it reuse the output memory of the layer 1. With this memory optimization, the output print of a net model can be significantly reduced. OpenCIA acceleration. The OpenCIA acceleration is built-in implementation. It has no external dependency except for the OpenCIA runtime. It supports float.32 and float.16 data format. If you want to enable OpenCIA acceleration, just set the back end to OpenCV and the target to OpenCL or OpenCLFP16 if you want to use the float.16 data format. OpenCIA acceleration has some highly optimized convolution kernels. We took an auto-tuning approach to find the best kernel configurations for a specific GPU. There are a set of pre-tuned kernel configurations built-in in the library. DNN module will use them by default. But if you want to get the best performance for your GPU, try to run auto-tuning instead of using the default one. It's easy to enable auto-tuning. Just set the environment variable. OpenCVOCL for DNN config paths to the directory to store your config files. If you enable auto-tuning, the first time running a net model will be a little bit long. Next time, DNN module will use the cached configurations directly and no need to tune again. For bad performance on Intel GPU, use NeoJiva. Neo is the open source OpenCL Java for Intel GPU. It supports Gen8 graphics and beyond. The best practice is to use the version as new as possible. According to our experience, new version always has bad performance. These are inference time in milliseconds for CPU acceleration and OpenCL acceleration. The test machine has Intel Core i7 CPU with 8 cores. And Intel iRest Pro GPU with 72 execution units. In this case, the OpenCL performance is far exceeding CPU performance. You can find more performance data in this page. Vulkan Backend. Vulkan is the next generation graphics and computer API from Kronos. The same cross-industry group that maintains OpenGL. Vulkan Backend can extend the usage of GPU acceleration for DNN module. For example, the Android has no support for OpenCL, but it supports Vulkan. So you can leverage GPU acceleration using Vulkan Backend on Android. And Vulkan Backend uses computer shader to implement layer computation. I have been working on Vulkan Backend for some time. This is the PR in review. If you have any interest, don't hesitate to give your comments there. Okay, the sample. This is a Python program to do the real-time object detection with MobileNet SSD. The first thing is to import the OpenCV Python module and then define the path to NetModel files. Image size this model can accept. Confidence, threshold, mean value, and the class name list. Of course, you can pass in this information, use command line arguments. On the line 16, open a camera device, and actually you just need a few lines of code to introduce DNN functionality. On line 19, load the NetModel, and on line 23 and 24, pre-process the captured image. On line 26, set the image as the network input, and on line 27, forward the network, and it will return you the detection result. The rest of the code are for visualizing the detection result. They are filtering the candidates according to the confidence threshold. Join bounding box, class name, and confidence, and displayed image. You can find more sample codes in this page. Okay, I will run the program and see what happened. What's your screen? Sorry. I used 20 classes MobileNet model. You can use another model. For example, 90 classes model to detect more objects. On the left top is the class name and the confidence. Okay, that's all. Anybody has any questions? Okay, thanks. This is okay. The native API is the C++. It's okay. But for the demo, I use the Python. It's easy for proper type. Front-Race. Okay. In my machine, the front-race can reach more than 60 FPS. 30 FPS. That depends on your machine. And depends on what backend you choose. Okay, thank you very much.