 So let me just make an announcement and then we should be good. So hello and welcome back everyone in this track. The next talk we have is optimizing image recognition with OpenVINO, with Intel OpenVINO on OpenData Hub. So we have Sean Pryor and Ryan Looney for this talk and if you have any questions, feel free to put them in the Q&A section or just put them in the chat directly. And with that said, the stage is all yours. Go ahead folks. Alrighty, hello everyone. I'm Sean Pryor as mentioned, this is Ryan Looney. We are here to present on Intel OpenVINO on OpenData Hub. So to get started, I'll be talking a little bit about OpenData Hub, a data and AI platform for the hybrid cloud. So for anyone not already intimately familiar, OpenData Hub is a big meta project that aims to bring together a lot of the tools that one would need to do all this kind of data science. So we have Seldin for model server and we have Kubeflow as sort of the main backend component. And we have OpenVINO as part of enhancing training and inference on Intel-based CPUs. So when we go through AI, there are multiple sections here. So we have in OpenData Hub, we have pieces to take care of stuff about data storage. We have things like Ceph, we have parts to take care of data ingestion transformation. And then that's all on the data engineering side. And for the data scientists, we have all of your familiar tools for data analysis, building, training, testing your models. And we also have plenty of tools to do monitoring, model serving, optimization, detecting drift, all of that stuff. And all of it is integrated into OpenData Hub as a one source of all of your tool needs for doing AI and ML on OpenShift. And additionally, it is available today from the OpenData Hub community operator. So to take a look here, we have some of the names called out here, Ceph for your storage, Kafka and Strimsy for streaming, SuperSet, you all of the Apache projects have a space somewhere in here, Spark, Thrift, all the stuff you'd expect for doing data analysis. For AI and ML, all of the stuff you expect like TensorFlow, PyTorch, and some distributed training mechanisms like Kubeflow, PyTorch, job, TF, job, et cetera. And we have stuff like Selden, KF serving and OpenVINO model server as different ways that you can distribute your model for me facing Grafana, of course, standard monitoring tools and pipelines and other components that you might need. So the goal of OpenData Hub is to create this blueprint for building, running any kind of AI, ML workloads, simplify all of the streaming and all of the minutia of doing this. And it is all done in a nice secure hardened red hat security guarantee kind of way. So now to talk about the specifics of OpenVINO, we have Brian Loney, take it away. Sure, thanks, Sean. So hi, I'm a product manager at Intel for the OpenVINO toolkit and just to give sort of a high level, we take trained deep learning models from the popular frameworks like TensorFlow and PyTorch and optimize them for deployment on different Intel hardware. So whether it's a data center server with a Xeon processor at the edge with core, Adam or integrated GPU, we optimize the neural network so that it can be deployed for inference on these platforms. We also, and Sean, if you wanna tap one more time, I think it's, there we go. And so with the goal of deploying on Windows, Linux, macOS, and when we say Linux, we also are including OpenShift environments where we have an operator for OpenVINO so you can easily manage and deploy. And we're gonna talk a little bit more about that. We also do have some tools for quantization aware training that actually help sit on top of TensorFlow and PyTorch to do quantization aware training or training with Sparsity, but primarily we're focused on getting those trained models ready for deployment and getting the best performance on Intel architecture. And this is just another high level way to look at what OpenVINO's doing. If you push, there we go. Yeah, so taking an input image, doing some pre-processing, sending it to an OpenVINO optimized graph that can run on any of the hardware backends and then we're gonna get a prediction. So this is a simple image classification example and there's a number of other use cases. Image recognition is really just the tip of the iceberg, but it's the most common and where a lot of this journey started was with image classification. So we've had a lot of adoption within the ecosystem, a number of our partners, whether they're ISVs, ODMs, systems integrators have adopted OpenVINO and they're either using it to build solutions for their customers to help them optimize their deep learning inference performance. So taking AI into production and this is just some of the partners that we've worked with who are actively using our developer tools and deploying with our optimized runtime. Only some of them, huh? Not everybody likes to have their logo on our slides for some reason. Yeah, so one of the techniques that we use, so OpenVINO is built on top of the one API, one DNN, the low level libraries that Intel provides for optimizing performance on Intel hardware, but we do some additional processing of these models to help them get additional throughput and further improve the performance. So some of these optimizations that we do automatically are operations fusing as dried optimizations, convolutions fusing, these will help eke out additional performance gains so that you can have more frames per second for your image classification, object detection, segmentation, you name it. And if that's not enough performance, we also have tools, I mentioned the quantization and we're training tools. We also have post-training quantization. So if you have a model where you've invested a lot of time, hours and hours, maybe weeks or months invested in training, a really great model that you wanna deploy but you're not getting enough performance. We have a tool for post-training quantization and this comes installed automatically if you use the open data hub integration and Sean's gonna show us a little demo of the Jupyter environment. So if you use open data hub, you install the OpenVINO toolkit operator from the Red Hat ecosystem catalog, free open source operator, you can access these tools and do quantization which is the process of reducing the precision. So going from floating point 32 for example and bringing it down to integer eight precision with minimal drop in accuracy. And so the accuracy aware quantization is where you can actually define the maximum amount of accuracy you're willing to accept and will quantize the model up to the point where the accuracy hits that threshold. So we also have a lot of pre-trained models. We have the open model zoo, which is a collection of models for all of these different use cases, object detection, text spotting, natural language processing. We have action recognition, question answering, time series forecasting. These pre-trained models are both trained by Intel and we also have a collection of public models that are trained and provided in open source where you can download them, convert them to the OpenVINO format and use them. And Sean's actually gonna show us one of the demos you'll see is pulling some of these public models and actually using them for optical character recognition. One of the key features for deploying an OpenShift is being able to use the, what we call a model server. So this is our, to create an inference endpoint for model serving. So we have the OpenVINO model server just takes the optimized Intel, the OpenVINO runtime provides it as a service in a container based on, if you use the operator on the Red Hat catalog, it's based on UBI, so universal base image from Red Hat. And that will create a microservice that you can use to serve your models. And then if you wanna scale that, I will see on the next slide that you can take this and scale it with OpenShift, use a service mesh, load balance, the requests that your applications are sending over GRPC or REST, the API interfaces that are exposed by model server. And you can scale up or scale down these workloads. This is a high level view of the architecture. So like I said, there's a GRPC and a REST endpoint. It's similar to, we have the same front end API as TensorFlow serving. So if you've already built an application that uses TF serving, it's the same API calls that you'll make to the front end for your applications. And under the hood, we have a configuration monitoring, which is basically checking to see if a new version of the model that you have is ready to serve. If you've added additional models or changed the versions that you would like served in production, anytime there's a new model, so the model management, we have a concept of a model repository and I'll talk about that on the slide. But basically, if you have a storage bucket like a S3 or a Google Cloud or even OpenShift persistent volume, you can keep copies of your models. And on this example here, I'm showing a MobileNet V2 and a ResNet 50, one in Onyx format and one in the OpenVINO IR, which has been an XML. And this is all you need to do to create a model repository. This can sit in, like I said, the storage buckets or persistent volume. And what the model server does is it's checking to see if you've added a new version, if you've added a new version of the model because you've updated, retrained it, improved the accuracy for whatever reason, it can reload the newest model without interrupting the service. So your applications will start to call the new model for predictions and there won't be any downtime, which is one of the key features for the model server. And then this is one additional feature is taking multiple models and connecting them together. So to reduce the number of round trips to the APIs, you can create what's called a directed acyclic graph or model pipeline for short, is just taking your input image and passing it along to one or more models, taking the input, going through one model, taking that output and sending it to one or more additional models. And then having the results, like you see here, we have a picture, an image of on the freeway and there's some text being detected by a text detection model. And then the second model is a text recognition model that's actually recognizing the characters. And then the response back to the application is just the detected text. So this really simplifies and keeps these steps in memory so that the inputs and outputs can quickly be passed without having to make additional API calls from the client. And how do we integrate into Open Data Hub? That's actually, sorry, I put the wrong slide here. This should say Open Data Hub, but OpenShift Data Science is based on Open Data Hub. It's a new product from Red Hat and Open Data Hub is the open source version that we're talking about today. But OpenVino plugs directly into the Open Data Hub Jupyter lab environment. And we also can deploy the model server instances into the OpenShift cluster where Open Data Hub is installed. I hope you could do the next slide. All right. Oh, there we go. Okay, yeah, so this is what it would look like. If you're going to deploy the OpenVino toolkit operator, so from operator hub, if you try to install, oh, search for OpenVino and click install, you'll get the OpenVino operator. And if you want to focus just on deployment, like I was showing the model server, create a serving endpoint, you can do that by creating an instance of the model server. If you're planning to do development and you wanna have access to the developer tools like the model optimizer, open model zoo, the tutorials that come in the form of Jupyter notebooks, you would create a notebook instance. And that's what's going to enable us to quickly access the OpenVino developer tools and tutorials directly from Open Data Hub. And so once you've, if you're using Open Data Hub and you go to the Jupyter spawner, which is one of the key features in Open Data Hub, if you've installed the OpenVino operator, you'll have the option to select OpenVino toolkit as you see on the left and click start server. And once you've started the notebook server with OpenVino selected, you'll have access to some of the Jupyter notebooks that, and you can see a screenshot of that on the right. The one that I'm showing here is a Jupyter notebook that shows how to quantize a BERT model, which is a natural language processing model. And it's an end-to-end tutorial. You can click run all and it will download the dataset, download the pre-trained model, and execute step-by-step the process for doing this post-training quantization. If you could get the next slide. Great, I think we missed one. Whoops. But it's... Well, we'll cover some of this in the demo. Yeah, you'll see it in the demo actually, so it's fine. This is just showing once you've deployed the model server instance, which actually Sean's gonna show you. So it's even better to see it live than have me try to explain it. But, and the TC is gonna go ahead and show us what those API calls look like. Once you've deployed your model and created that serving endpoint, we have an API reference that can show you how to make the API calls to the endpoint. And there's some sample Python and C++ code so that you can directly call the API from your applications. And now, the thing everyone's been waiting for, the actual demo. So, as we see here, we have already installed our OpenVINO toolkit operator, providing both APIs here. We have a notebook already created, and over here, ta-da, we have the actual notebooks that you'll see given by Intel. So, we have our OpenVINO inference engine import here. And for this model, what we're going to attempt to do is take and do image segmentation for detecting the road on this image. Very useful if you might happen to have a self-driving car. And with the inference here, it, I restarted the kernel. And if we hit run all, you see almost instantaneously, it's able to detect the road. Just CPUs, no GPUs are attached to this pod whatsoever. And as we see here, we have the segmented road overlaid here, which could be useful for telling a self-driving car or some other similar model what parts of the road are best to be moving on. Additionally, we have our monodept. So, what this one goes into is taking and detecting depth in a single image just from context in the image. No need to do any, no need to actually have one of those Intel depth cameras telling you how far away stuff is. So, lots here, again, all running on CPUs. And we can see here, it's able to detect the distance of images very easily. And interestingly, when we get down here to creating the video, it was able to process all 60 frames in around 10 seconds. And this is on a fairly modest system, something you might be able to find on, say, edge devices or other smaller, smaller, less powerful systems are able to achieve this incredibly fast processing even for video. With a slightly more powerful system, this could be done in real time. And finally, for the downloading and running of the optical character recognition, we grab a public model and we load this nice image here with some text in it. And we are able to get the bounding boxes and print out the text here similar to what the pipeline would be doing shown earlier. And finally, when it comes to actually serving all of these, we have a sample model server created here, which is just doing your standard ResNet. We're gonna check the API over TCP and show that it is compatible with all of the standard API. Now, we have created a service here and we've exposed a route, just very easy couple of commands to create this and it's now usable on the internet. So we're able to hit here and we're able to see model is available, it's being served, we can hit metadata here and we can see it is float input type, we're able to see the dimensions. So it's a batch size, number of color channels, height and width of the image. And there you have it. So with that, I believe, Ryan, anything else to add before we do questions? I know there was a question in the chat about how much performance gain, and I'd say it's very dependent on the model and the use case and the input image size, but we do have some benchmarks published and they don't show comparing to frameworks or anything because there's so many different configuration options to consider. So I pasted a link to the benchmarks, oops, sorry, they didn't paste the right link, I pasted a link to our documentation. There's a benchmarks link that I'm about to paste and then on top of that, we have several of the notebooks that you can view in the open data hub. There's the benchmarking step that will happen in the notebook. So if you wanna see the notebook that shows how to convert a PyTorch model, we take the segmentation fast-segment model, which is an open-source public model, we convert it to OpenVINO and then we show the performance difference on that same CPU or integrated GP device. Okay, here's the baseline performance with PyTorch. Here's the improvement in frames per second once I've run it on the same device with OpenVINO. And there's a few other notebooks that we'll show that and there's some that have the benchmarking tool. So we have a tool called benchmark app that's included and installed in the notebooks. And so you can run that to get a rough idea of the performance you could get once you've converted the model. Awesome. Anybody else have questions for us? Yeah, I don't see any questions in the Q&A so far, but if you guys don't mind hanging up for like a minute or two, just- Absolutely. All right. Bring forth the questions and we shall answer them. Actually, so in the meanwhile, I have a question and this might be just me getting, but like really liking this and maybe getting too greedy with what Intel can offer. But I'm wondering as a data scientist, so generally you would run inference on the GPU side or like in a lot of the cases. So is there anything or like, is there any optimization that you can do during training time, like for like loading images or like cropping them and like all the work that you do before passing it on to a GPU? Can Intel OpenVINO help with that by any chance or is it purely for inference? Yeah, so when I mentioned that of a tool and I'll actually I'll paste a link to it. It's for quantization aware and training with sparsity, compressing the models in the training phase so that when you deploy them, regardless of whether it's on an Intel device or not, it will have, you can train it for low precision and with sparsity. So when there's hardware that can take advantage of this, you can use it, you can see that additional performance gain. And then to say on, with OpenVINO, the goal is you can get the inference performance, especially if you look at the cost of having an Intel CPU, compare it to the number of frames per second and saying, okay, how do I scale this to reach the number of frames that I need? So if I have video streams, let's say I have 40 video streams and I need to process 30 frames per second on each video stream. It's a simple calculation for me to say I need to have this many CPU cores available in that cluster and I need to route those frames to those CPUs and you don't necessarily need a discrete graphics card to do inference and it's actually much faster to load the models onto a CPU. You're not gonna get the same throughput as you might get with an expensive, 250 watt discrete accelerator card, which Intel is also planning to offer soon. We have today you can actually try the GPU architecture in the integrated graphics. So if you have an Intel core processor and a laptop or workstation, Core i5, Core i7, Core i3, the 11th generation, we have the GPU architecture that's coming out in our new discrete GPUs that will be announced very, or released very soon. You can actually try that today in what's code named Tiger Lake, or in even an older laptop or desktop CPU, you can see the integrated graphics and use it for inference. But I would say that most of our customers are able to hit their performance KPIs with just using a CPU backend and they don't need to purchase discrete accelerator cards. Yeah, that makes sense. Thanks so much. Yeah. Awesome. Looks like we do have one question here. How does the serving tool compare to others like Selden? Yeah, so I don't know the exact architecture of Selden. I know we've, in the past, Selden did have an OpenVINO backend. I know that Selden is taking many of the different inference servers or model servers. Like you can have a TensorFlow serving or a KF serving or Triton different backends to different engines that can serve the models. OpenVINO model server, we only have the OpenVINO inference engine, which is by design, it's very low footprint. So the image that you need to just serve CPU model, serve models on CPU is just around 145 million. It's compressed. It's a very lightweight image. If you look at like Triton, for example, which has several backends as well, it's like somewhere between eight and 10 gigabytes for the image. So this model server is much more optimized to be lightweight and just have one backend, which accesses Intel devices, whereas some of the other servers have different additional backends. So like Selden and Triton KF serving that pulls together potentially many different backends. So it's a slightly different approach. This is lightweight and it only runs on Intel devices at this time. The others you can use Nvidia GPUs or other hardware beyond just what we have from Intel. Just quickly going off of that question, even if I didn't want to use the OpenVINO backend, can I still just quantize it post-trading and then just put the model back into my Selden server or something? Yeah, so it would not act, I'm not entirely sure how Selden is set up today, but if the OpenVINO backend is there and if it's the latest version, so let's say that it is the latest version, I don't know, then yes, that would be true. You could go do the post-training quantization, get the low precision model and then go load it into the backend. And that's true of Triton also. So the Nvidia inference server has an OpenVINO backend. You can also go and load the quantized OpenVINO model on Triton inference server with the OpenVINO backend and you'd be able to serve predictions on a CPU only or for the OpenVINO backend in Triton. All right, perfect, thanks. Anyone else? We're here to answer your questions. Oh, actually, if there was another one, that would probably have to be our last because I'm just now looking at the time. Yeah, I was gonna say, it's like the last final call for questions because we're almost out of time, so if anyone has questions, now's your chance. Or if not, you can also head out to the breakout room and continue this discussion over there. But otherwise, I don't think there's any other questions. But thank you so much folks for the presentation. That was amazing, I really enjoyed it. And yeah, thanks for joining. Awesome, thanks for having us. Yes, thank you.