 Hello and welcome to your introduction to using Intel OpenVINO on the Red Hat OpenShift Data Science platform. My name is Sean Pryor. I am a senior software engineer at Red Hat who works closely with Intel as a partner. I will be presenting on both Red Hat technologies as well as OpenVINO, which is an Intel technology. So to start off, let me talk a little bit about managed OpenShift and Red Hat OpenShift Data Science, which is an offering on top of the Red Hat managed OpenShift platform. So on the bottom we have the open hybrid cloud platform with self-service capabilities, Red Hat OpenShift dedicated, as well as Red Hat OpenShift on Amazon Web Services, as well as managed Red Hat OpenShift. And below that we have all of the accelerators from things like NVIDIA and Intel, which is all powered on AWS currently with other clouds planned for the future. Now let me talk a little bit about Red Hat OpenShift Data Science, which is a platform on top of the Red Hat OpenShift platform, which provides all the kind of core data science things you need, Jupyter for user interface, TensorFlow and PyTorch libraries, and source to image builds for publishing, and also ties in with other optional cloud services like OpenShift Streams for Apache Kafka, OpenShift API management service, and all of the other fun Red Hat things that everyone knows and loves. Red Hat OpenShift Data Science, or ROADS, can also be integrated with several independent software vendors, notably Intel in this case. But we also have others like Starburst Galaxy, Anaconda, NVIDIA, and others who provide all kinds of useful features on top of it. So let's take a look at the partner ecosystem, because no system is complete without a bunch of partners that can help augment and provide other value on top of it. As you can see here, we have a ton of logos. The one we're worried about right now is Intel OpenVena, which is a model optimization and serving framework. So let's go and take a look at that. One of the main benefits of running Red Hat OpenShift Data Science with Intel is that the default hardware in AWS is Intel hardware. So Intel frameworks get you out of the box acceleration in both model development and training and when deploying models to production. As global data footprints continue to grow, it's critically important that we take advantage of these kind of speed ups and optimizations. And one way to do so is through OpenVeno. OpenVeno has several optimizations that it can perform. It can perform quantization on models which will reduce the numerical precision turning 3.14152, so da, da, da, da, da, down just to like 3.14, which can drastically speed up floating point computations without a large hit to accuracy. In many cases, you can have less than a 1% change for a drastic speed improvement. It also has accuracy aware quantization, which allows users to specify a threshold under which it won't reduce precision anymore to keep a certain level of accuracy. The other techniques they have include layer pruning and sparsification, which take and removes unnecessary complexity from the model and removes nodes where the weights are zero and don't contribute to the actual results. And finally, it can also fuse operations together, which can drastically reduce the footprint of a model and a fused layer can also take advantage of certain Intel optimized instructions. So all of this is important when talking about edge deployments, where network bandwidth and compute resources can be severely constrained. OpenVeno also handles model deployment, so we can start by looking at the comparison to a standard model deployment that doesn't use OpenVeno. Standard deployments perform well with hardware that roughly approximates what they were trained on, which makes sense since if you drastically change the hardware, you'll get different performance results. Unfortunately, if it's a large complex model, you could end up tying up GPUs or other top tier chipsets for inference as well as doing all of the training. Alternatively, in the case where things are deployed to low end chipsets, they can see a massive performance hit because a large bottleneck in those lower end systems, if they can't fit the entire model into the processor cache, will spend a long time fetching the weights from memory. And this can take way too long to produce useful results. So a common response to this is to lower accuracy tolerance, which in many cases can also defeat the business goals that the model is trained to solve. So now, let's see how OpenVeno can solve that. It allows you to have much higher performance for essentially the same model. You can apply quantization, you can prune layers that aren't necessary, you can do operation fusing, and it can optimize for an intended inference device. So if you need to optimize for size, you can do that. Confusing can definitely reduce the memory footprint so you can fit it inside the processor cache of an edge device. Quantization can allow it to run much quicker when doing the actual computations. 16 digits versus 8 is a lot less to compute, especially in floating point, and you can even drop it down to integer math, which will run orders of magnitude faster. So all of that is to say that you don't need to sacrifice accuracy performance and business goals when OpenVeno can give you the performance you're after. And there's no end to the different industries that can benefit from OpenVeno. All of these in telco, healthcare, government, etc. can benefit from accelerated model inference. Increasing the computational efficiency of these models means higher throughput, lower latency, more predictions in the same amount of time. And additionally, it means you can now deploy models to the edge and have confidence that they will return results in a timely manner. It increases the density of the compute footprint by allowing you to cram more models onto a single node. It allows you to iterate on the overall data science process more quickly because you don't have to wait as long for inference results, and it allows you to solve more problems more efficiently, reducing your energy costs, and it also increases your efficiency when talking about dividing up important resources. Now finally, let's look at a quick demo. So if we go over to our Red Hat OpenShift Data Science cluster that I have, I have, as you can see here, an OpenShift dedicated cluster with Red Hat OpenShift Data Science installed. Now looking here, I have installed the OpenVeno Toolkit operator and I've already created a notebook custom resource which allows me to go into Jupyter Hub and use their custom image which has all of their libraries pre-loaded, which we will see here. If we go to the Red Hat OpenShift Data Science dashboard, we can see here the Jupyter Hub launcher, OpenVeno is enabled here, and so as you can see here, the OpenVeno image comes with a lot of pre-loaded demo notebooks that you can see here, so as not to spoil any of the fun things they have planned for you, we'll go over to the Hello World notebook and open it up, which is Hello Image Classification. It's a very simple one that uses an existing OpenVeno optimized model to identify this picture of a puppy dog. Now going back to the notebook, we have our standard imports, we have the OpenVeno inference engine, if we import that we can then load our model. You'll note that the model contains these two files, which are the model file, the XML that describes it, and the binary weights in a separate file. Now we load this network, we load it onto the CPU, and we take the input and output keys here, and save them. Now we load our image, which is the puppy dog from earlier, and we go ahead and call the inference method on it, we take the input image, and finally, let's take a look at the result. It is a flat-coated retriever, and that shows just how easy it is to use all of this. It's really easy to load up a network, call infer on that network and all of that from inside a notebook, and also from an OpenVeno model server. As shown here, you can see they have a nice form view, you can fill in all kinds of model settings, and deployment parameters, how many replicas to serve, and the OpenVeno model server is extremely high performance. So it should serve all of the needs that you have here. You can set the GRPC and REST port, and you can define wherever your model repository is, Google Storage, Azure, local to the cluster, or S3. And that concludes our demo. And with that, thank you for coming to Introduction to Using Intel OpenVeno on the Red Hat OpenShift Data Science Platform. Good luck on all your data-sciencing in the future.