 Good evening, my name is Gopal and I will be talking about compressing deep learning models so that you can build mobile apps or IoT applications if you have a very low memory system. So, can I assume that at least some of you have a deep learning background or most of you have deep learning background. So then let me I had a little bit introduce about deep learning. So, deep learning is a kind of a trending or buzz word these days right. So, it is just nothing but what I say a series of computations like filtering you know image processing image filter and all. So, it is a series of those kind of computations and end of the day end of the network you have either classification or detection of faces or objects any general objects. So, that is basically deep learning that is required for this talk that is it you can consider as a series of computations for image processing or computer vision application like detecting person detecting car self-driving all stuff. And typically the process is you collect the data say you want to detect a car you collect millions of images of car and then you have to train a network deep learning network on that and at the end of the training so you will have a trained model. So, the filters are trained to detect the car and that model size that we after training I call it as a model which is a parameter series bunch of numbers and that ranges from the few 100 megabytes to like a gigabyte. So, depending upon your complexity of the deep learning network. So, if you have to build such application on a very low power low end device so you cannot afford to have few megabytes few 100 megabytes of model stored on your SD card you will be simply wasting space on the SD card for example right. So, this talk about is about how to reduce this deep learning model size to fit into SD card. So, or a mobile application so you do not have to have application with 250 megabytes downloading. So, nobody is going to download that right so yeah hopefully this will help you to build such applications computer vision applications on low end devices. So, outline is I will just this is very superficial in this particular topic it is actually a active research area I will talk about IOT's and challenges and popular network how to reduce this deep learning model into a smaller size and how to use low precision and I have case study of detecting objects like faces or human body then I will show you the demo. So, demo is on this PC I do not know how to how to this yes so as I introduce deep networks which require high memory right high SD card storage that is storage and also when you load that into your RAM. So, that requires that same amount of RAM space so and also very computationally intensive. So, it requires lot of CPU cycles so you need a very GPU's normally to compute this kind of output of this kind of networks and large models as I said few 100 megabytes so you have to have that much nor flash or not flash on the board. And so fortunately so you can do some kind of tricks to this deep network to reduce the size and still it will be as accurate as the normal floating point when you are using the floating point computations. And we all know that this is challenge when it comes to IOT's or mobile SOC platforms so I will just go through how to reduce the model size. Just to give a brief overview on state of that research in this field actually reducing this model size falls into three categories one is at the algorithm level you reduce the network size you do not see if my particular deep learning network has hundreds of layers you reduce the number of layers number of stacks layers and that is at the algorithm level and then you train your network on your data and then you have your deep learning model ready. And the second thing is network compression which say you have model which is already trained you can take out certain connections inside the network so that you can drop out all those computations all those parameters of the model. So, this is after training so this is called compression and the third one is quantization so you have model you have all parameters in 32 bit numbers and you can quantize this to lower number of bits so it is low precision networks and even one bits which is called binary networks you can have all the parameters with just one bit either 0 or 1. So in this talk I will just focus on this particular part and actually it is subset of this particular topic how to reduce how to represent model from 32 bit to 16 bit yeah I will I just have divided this into three levels so this is your PC where you have GPUs to train the deep learning model and you have a data millions of images stored here and you do the training and after training you convert this model which is a parameter into a very low precision 16 bit or 8 bit and you create a build application out of it so that goes into your mobile storage SD card. So, when you are going to run the application you are going to load that as it is on the on the system memory but before loading that you will cannot back to floating point so that means so model size is low there because it is a low precision model on the SD card so when you load it to RAM it is going to occupy the same amount of memory because before this stage so that is that is 32 bit and then you do normal floating point computations so there is there will not be any reduction in the accuracy. So, the advantage is you just save the SD card space so you have 30 bit numbers here say the model size is 100 megabytes you convert it to 16 bit your model size will be 50 megabytes so you store you save that much amount of storage on your end device and the level 2 is so you do the same thing and instead of loading converting into floating point and loading into memory you just load as it is low precision model and when you are computing computing on when you are processing image to detect some objects in that image you on the fly you convert into float so that will add on to your application run time because you have to convert again and again each image you process you have to convert again but still you are going to say both SD card space and the RAM space the level 3 is we have low precision model in the SD card low precision model in the RAM and the computations are also low precision. So, you do not need a floating point hardware floating point CPU to run the application so you can use low precision 16 bit or 8 bit or even lesser arithmetic operations using your fixed point CPU. Normally most of the DSPs are fixed point you cannot. So, that is the level 3 so we have and when you use low precision arithmetic your application also going to speed up normally so that is another advantage all this can happen without losing final detection accuracy of the algorithm. So, I will show you a demo but currently I started this project 2 weeks back so the development process is only here only till level 1 so my plan is to take this to lower more levels so that we have we will save memory and save CPU time as well as a case 3 this is object detection deep learning network which is called YOLO and very lightweight version of the same thing is called tiny YOLO. So, after training it takes 64 megabytes is the model size all the parameters and it has 7 billion operations floating point operations this one I can just ignore if you are not very familiar with object detection but the key thing is this is the original model accuracy algorithm accuracy using the original floating point model that is the 16 bit models I do not think you can notice a difference so almost as accurate as the floating point model so that is the key point here. So, then I wanted to show PC demo but unfortunately I can I can I can still hopefully show yeah okay I do not know if you can see or not but this is a 64 bit model you can see 63.5 megabytes and this is a 16 bit model which is 31.7 megabytes so I have application running let me check okay let me just configure how much I do okay 4 minutes more than half or you can speak a little more. Just let me switch to original floating point model and compile it okay all right. This is a model trained on to train to detect 20 objects like a person a person chair or bird or horses all this stuff and then so this is the original model running here now I will just compile for sorry I do not have a very good UI because just for quick demo. So, this one is using a 16 bit model hopefully it will detect all of you all right. So, actually if the application is not very critical so you can use this kind of tricks to reduce the model size so if it if the application does not detect you sometimes there is nothing nothing is going to happen right nothing is going to crash or something like this. So, as long as application is like a normal entertainment case or it is not like a I do not think model is trained for that kind of objects but maybe bottles and all it can detect and dogs and horses all this yeah that is it any questions this is running on my laptop GPU get force 920 m yes this is for inference final testing yeah currently because. Yes yes this one is using graphics card and running around 8 frames per second so very I told you 7 billion operations so if you just run using CPU it may run at 1 frames per second hit me when you install NVIDIA drivers NVIDIA drivers sometime. Oh ok thank you maybe VGA will be better ok. Because I have not used this laptop to project anytime so this is my personal laptop so yeah any other questions please yeah I have this ongoing project if any of you are interested you can just go to this Github and you can contribute if you are interested because I plan to extend this for level 2 and level 3.