 talking about edge computing on neural compute stick using OpenWino. So before I begin, this talk is actually about an open source software called OpenWino. It's actually by Intel and it helps to use pre-trained models on the edge. So I'm a master student at NUS and I'm also a course instructor at Udacity. I'll go ahead and begin my talk. So what exactly is edge computing? With the number of IoT devices that are currently increasing, a lot of the devices are connected to the web. But not necessarily you would want all your models to be trained or do inference right at the cloud. You would want it to be done right where the device is, thus building data centers closer to the edge. So cases where exactly edge computing is applied is firstly you have places like security cameras outside your house. So for example, you're trying to find out the number of people that are going in and out of your house. How do you do that? You can do that using edge computing wherein you can run a model on to a microcontroller or a camera outside your house to detect the number of people going in and out. It's also used in self-driving cars. What I mean by that is for example, if you have a car which is going through a tunnel and it has to suddenly apply brakes, you wouldn't want that command to go to the cloud and then come back because that would obviously take time and time is very valuable in those minutes or seconds. So hence those are those other places where edge computing is applied. So what are the advantages of edge computing here latency? As I explained in the case of the self-driving car, the time matters. Second can mean to apply the brakes or not. So obviously time matters and internet bandwidth. So you necessarily do not need to depend on the internet to communicate data. You can actually do it right where your device is, thereby not actually leaking any data to, therefore, giving you privacy and security right at the edge. And then all of this comes and you do not have any cloud dependence. So that's the best part of it. And it's computationally less intensive because you're doing inference right at the edge with devices which have very less memory. So it's obviously computationally less intensive, less expensive and intensive than a GPU or a TPU. So one of these devices is a Raspberry Pi. It's a tiny computer without any peripherals. I have it with me here. So yeah, this is a Raspberry Pi and the people who have used it know that it's an incompatible CPU. It has an on-board GPU card too and it has Wi-Fi and Bluetooth capabilities. It has operating systems like Linux and Raspbian. And the good thing about it is that it has really good open source community. So if you land into any problems, you can actually reach out to the community and tell them. And it's quite productive and they talk a lot. So that helps. And then obviously it is small and it is low-powered. It takes about 1 to 1.5 watts and it also can be used to run small or medium-sized machine learning models, which is why it's very beneficial for inference for edge computing. And also it has a memory of around 1 GB. And so today I'll also be talking about another device, which is the Neural Compute Stick. So the Neural Compute Stick is actually a VPU, which is a Vision Processing Unit. And it has a chip called Myriad X and the supported frameworks are TensorFlow Cafe. And also it is basically a USB 3 type. So you can see it. I have plugged it to the Pi. So it is a USB 3. And it's quite small. You can fit it into any camera that you have. And what is at the core of this is a Vision Processing Unit. So a Vision Processing Unit basically looks like this. The architecture looks like this, where it has the hardware accelerator and it also has the vector processing unit. So vector processors basically work on vector data. That is, they don't work on a single array, single data, but they work on an array of data. And then it has its own risk processor. And then an interface unit, which basically talks to the host device. So the Myriad X is actually a type of VPU. And it's the type of chip. And chip inside the VPU. And it actually, so basically Movedius came out with two Myriad chips. So the first NCS had Myriad 2. This has Myriad X. And it provides really low power and has around four Tera operations per second. And also is really good for vision processing. So what we are going to be using today is the Intel OpenVINO toolkit. So what the OpenVINO toolkit does is it trains a model and it basically converts the model into an IR format and then runs it on an inference, runs an inference engine, which can be run on different devices. But how exactly do you end up using OpenVINO? So you can check out their documentation here. It's there for different devices. So you can do it for Raspbian, which is what the Raspberry Pi uses. You can also do it for, you can see the list of Linux. They also have Mac OS. They have Windows and different Linux, I believe. But yeah, so you can basically do it for different OSes. And they tell you how to download this. It's completely open source. So you can go ahead and download it. And for example, if you want to run the NCS, which is a neural compute stake, you have to just add a particular dev rules to it. And then you can just run the script. And I'll show you how to run it. Apart from that, it also gives you a model optimizer guide. What I mean by that is in order to convert your model into a format such that the inference engine recognizes it, you have to convert it into an IR format. And that is done by the model optimizer. And then you have the inference engine. Basically, it helps you run the devices on, run the model on various devices. This includes GPU, VPU, FPGA, etc. And it has a list of pre-trained models. You can check it out. And it's available on GitHub under model two. And also it provides you model precision. This means that not all devices support all model precision. For example, a CPU might be good for running a floating point 32-bit. But at the same time, the NCS might be good for running an FP16. So it actually helps you optimize your model or maybe quantize it to actually run it on the edge, because larger models cannot be run on the edge. So you have to quantize it in order to run. So those things are also provided. And also, so I have a course on Udacity on this, which is coming out next month, which is actually dedicated to this. So I mean, if you guys have the time, please do check out the course. And so yeah, I'll show you a small demo of how to run the model on Neural Compute Stick. So the Neural Compute Stick I've attached it to the Raspberry Pi. And I'll be using the Openmino toolkit for Raspbian. So one sec. So basically out here, I am specifying the model, which is dash M. I'm specifying its phase detection. And I'm specifying myriad. So I told you the VPO is a myriad processor. So I'm specifying myriad. And I'm specifying the image. So the image I actually have is one sec. Just open though. So this is an image. This is just a simple image of Adam Sandler. And I've just taken up that to detect the phase here. And so I'm just running this. And so it's basically running on the NCS right now. I'm just SSHing into it. And you can see the image created. So out here, it says that there's an image created. So this is basically a sample application. And it even tells you the time for different execution it took. And out here, you can just check it out here, like one sec. Yeah, so it just detects the phase. So that is provided. And it's a very simple application. So you can just download the toolkit and directly run it. You can even run it on your own system. Yeah, so this brings me towards the end of what I want to present today. And that is the Intel DevCloud. The Intel DevCloud is actually beneficial because till now I've explained the neural compute stack and you might be wondering this means I have to go ahead and buy it and what if I have to buy it and it doesn't actually fit my needs then what exactly do I do? So yeah, choosing the right hardware is important. It means that you're able to differentiate between the fact that you need an NCS to the fact that you might need another microcontroller for edge applications. So this is what the Intel DevCloud for the edge helps you work around and also helps you simulate your hardware. What I mean by that is you necessarily don't need to go ahead and buy it. You can run a simulation on the DevCloud and find out how your application might actually run and then go ahead and buy this. I mean the NCS doesn't cost a lot. It's around $99 I believe but on the other hand if you're going for other microcontrollers like an FPGA that might actually cost you a lot especially if you're going for FPGA developer kits they cost around 5k. So I mean it's not feasible for you to just buy it and then test it. It's a lot better if you can test it from before and then run it on the hardware later on. So this is what the Intel DevCloud actually provides. You as a user you can actually run something called development node. So what the development node here provides is that you can use Jupyter notebooks on the DevCloud and then you have something called job submission. So they have an edge node through which you can actually submit jobs to various devices and these devices are then like these devices can be CPU, the Intel CPUs. These have Xeon, they have Atom, they have the FPGA, they have the GPU. So Intel provides an integrated GPU so you have that as well and VPU like the NCS and then you also have the inference on your particular image or your text. Anything that you have basically an image or a video file all of that is also provided so you can see the inference. So I'll go ahead and demo it. So yeah this is what basically the Intel DevCloud for the edge looks like. At presently you have to go and request for it. They basically give access to enterprise first because most of their business partners come from there but if you have a valid reason for running it you can go ahead and request for it and you can get access to the DevCloud and this is what it looks like. You can go into the DevCloud overview. They tell you what sort of devices are available. There are a list of devices available and then you can get into creating sample applications. So basically have like a there are a bunch of applications which are already there so that is another advantage because a lot of people do not want to train a complete model and try it out. So if you are someone who believes in plug-and-play this is really good and so yeah I'll be showcasing the people counter system today. So this is basically the Jupyter notebook for it and it basically you can just run it directly and you can see what all it has. So you have to import a lot of dependencies. You do it out here and then they have something called demo tools that is important and then this is the video that they have. This is also something that they have from before. So this is the video where the person is coming to a table and checking something. So this is what it will detect and then you're creating an IR model as I said before like using the model optimizer. So you just it tells you all the models that are present. So you can actually go to GitHub and see the code for the model. Most of them are written in C++ but if you are very if you're familiar with that language you can go ahead and use it for your own purposes but yeah I mean if you want to change it for your own needs you can go ahead and check it out and then you need to download your particular model which is the person detection model here and then you have to write a small job file in order to submit the job and this is what you need to do in order to submit the job. So it works. They have something called the portable batching system. The portable batching system means that they run all of your jobs in batches. It goes through a batch. It runs and then it tells you whether or not it's running or it has stopped running or it's still in the queue. So all of that goes to something called PBS nodes that is why you might be seeing here and then you can actually check out how it is done. So for example you're submitting to the edge node with an Intel CPU so you can actually see the inference time it took. It's a total estimated time it took for inference. Similarly you can try it for different Intel CPUs. That's a Xeon CPU. So yeah for example the neural compute stick. Now you're someone that you doesn't believe in buying it directly. You can just go here and submit it and see how it works and it's also there for Intel AriaKit which is pretty good because as developers I bet like a lot of us won't be able to afford it. So it's pretty good if you want to just try it out. So for example the first one yeah I think it's almost getting submitted. You can actually see the inference time and the time that it's remaining. But neural compute stick one thing is there that the frames per rate is actually pretty low. So it takes a longer time to run inference as compared to other. You would see that the FPGA CPU are quite closer but the NCS is a little longer. So you can actually view results for the things you ran. So for example the first one we also ran the NCS so we can check that out. So you can see how it would run on the on the CPU. For example NCS it's still running so that is why it was not able to detect it yet. So you have to wait for the NCS to run and then view the results and then you can assess performance as well. And it tells you what is the difference inference time and the frames per rate for different devices. So till it's running does anyone have any questions? Was it like clear maybe I can explain something again. Yeah you get a real time perspective although it's like it's a cluster so you basically it's like a cloud server it tells you whether it ran and then it tells you it tells you the status right after it ran. For the benefit of our online audience can you please repeat what the question was. Can you please repeat what the question was? Oh okay sorry I thought you said. He asked me whether it could be like if the questions can be sorry yeah if the the job status could be given right then and there or is it like going to the cloud and coming back that was that your question? Okay yeah yeah so basically sir wants to know whether the job gets like is there any job status it gets and that's true I'm sorry I forgot to run this but you can actually see the live queue status here so if it's in a queue state it tells you whether the job is waiting for available resources and if it's in an hour state it means it's running and if the job is no longer listed it means it's completed. So you can see the queue stat right then and there so it tells you what exactly is running and it gives you the status here right under S. S is running so it's running right now. Oh let me see if NCS is stopped yeah I think NCS is done. So you can actually see the NCS took about 194 seconds it's quite longer as compared to the others but it's a cheaper alternative so I mean. Have you built anything into a physical device as a result of what you learned from this and if so what? Yeah I have built so basically I've built it for the course but yeah we have built like projects based on this. Example? I okay maybe I can mimic it and tell it but in terms of example for example the people counter it can or for example if you have like a security camera on a factory floor. You've put a security camera in a factory floor? Like a camera on the factory floor we have done it with our previous company we have actually used a camera in a factory floor and we have tried to count the number of things in a box so and so basically while doing that you could use a neural compute stick to do it because a lot of the times you know you have a certain constraint that a client puts in that you can you can only do it within a certain amount of time and when that happens you need devices that actually do quicker inference so ideally if you would do it on the pie it would take a longer time but since you have a accelerator which is the neural compute stick it takes away most of the computation from it and is able to do it quicker so I mean if you take a factory floor usually the even the floor space is very expensive so they can't afford to keep a CPU or a GPU there in such cases it's better to deploy a microcontroller and if you're deploying a microcontroller you do need accelerators in cases of deep learning models such that they can actually run faster at the edge. So have you used the information that you gathered from this evaluation tool to make decisions about which equipment to deploy on-site or simply to validate your design before you no I didn't use the I didn't use this for deployment but I I had an idea before so we use that but no I recently came to use it so I've noticed that this is pretty good in terms of simulation. So we still have a few more minutes I don't wish to monopolize any other questions yes yeah so basically so you just need to sign up please repeat the question yeah sir wants to know if it's free to use the Intel DevCloud for the edge it's actually free to use only thing is that you have to go sign up and actually tell the reason why you're using it because they actually release it only for enterprises that is their business partners and that's their first priority because obviously compute is quite expensive and you know limited so they give it to the enterprise first and then they give it to developers I have got it so I mean if you are a developer who has a valid reason to do it they would give you the access but I would suggest that you sign up with your enterprise ID or something that really helps so you can actually see it running even on the FPGA so yeah you can actually see the inference time NCS took the largest and the FPGA is almost as same as the CPUs so I mean if you would end up choosing a hardware which is good for the same time limit it would be the CPU versus the FPGA if cost is not a barrier for your deployment so yeah that brings me to the very end so if you would need to contact me feel free to do so and basically this is a short feedback form at the end of this feedback form you can actually find link to my slides and link to the docs so feel free to contact me and if you have any more questions I would love to take that as well that's great thank you very much