 Hi everyone, if you just join us in the intelligent apps track, my name is Vanga Taylor, I'm a manager of social architecture at Red Hat, so our next session is titled supersonic model servant using deep Java library or DJL and Quarkus. So honestly I'm really excited to introduce you to our next speakers who have worked closely for the past three years. Our lead speaker is Jeff Bride. He is a senior principal specialist solution architect at Red Hat and he specializes in cloud native dev and integration. We also have David Marcus here, he's an associate principal specialist at SA and he's also a data science lead in North America. So in this session Jeff will introduce you to supersonic model serving in a cloud native environment using the power of DJL embedded in Quarkus. Now if you have any questions please put them in the comments right and if we have time we'll have a live Q&A session at the end. As always all sessions and recordings will be uploaded to the Red Hat developer YouTube page and the presentation will be made available later. Jeff, please take it away. Alright thank you Vanga and thank you all for joining us in this session for about the next 45 minutes. I will mention that certainly everybody is welcome but my intended audience really for this discussion is really to our enterprise Java DevOps shops that spent the last couple of decades investing in expertise running Java applications in production environments and they want to accelerate any initiatives that they might have of using data science machine learning opportunities on those Java applications or co-located with those Java applications. So that's where it's probably going to get interesting for you. So we're going to talk about Amazon's deep Java library initiative, we're going to talk about Qarkus and what I think is the power of the two of them combined but let me start with Amazon's initiative just as a from an intro perspective. So Amazon has this open source initiative called deep Java library and they are using it both internally and for also for customer requests and requirements that they get for model serving and supporting their data science projects. They're using it very successfully it seems like and being that it's open source other Java shops have picked up on this and are also using deep Java library. So the next statement here I feel is quite important certainly an enormous value proposition and that's the unified deep Java library brings a unified Java API wrapper of the most prominent C++ based deep learning engines. So just a level set here when we talk about these deep learning engines we're talking about things like PyTorch and MXNet and TensorFlow, TensorFlow Lite, so on and so forth. There's lots of them and from a performance and optimization for performance and optimization reasons these deep learning engines are predominantly written in C and C++ and so how do other frameworks such as Python and Java get to take advantage of those C++ libraries? Well Python as an example writes a wrapper around those C++ libraries and is able to interact with those deep learning engines. Java does the same thing historically going way back Java has the Java native interface or the JNI interface for interacting with C++ based applications and so that's what we have. We have JNI wrappers around these C++ deep learning engine libraries similar to Python so those already exist. Now what deep Java library does from Amazon is a couple of things. It sort of unifies or brings together all of those JNI wrappers for all of those different deep learning engines and abstracts the details and idiosyncrasies of those deep learning engines and instead provides a single intuitive Java API, a Java based API that someone like myself who is not a data scientist, traditionally a Java business application developer, well I can standardize on that DJL unified, simplified Java API for like 80% of my use cases and at least get me started with serving models at runtime so that's super powerful and then if and when need be, I can interact more directly with a specific deep learning engine, let's say TensorFlow if and when need be but the default is that I'm working with this very nice Java abstraction that DJL provides. It makes me much more efficient and productive than if I had to learn each engine individually. So my learning curve is dramatically reduced and then as I mature in my expertise of data science well in the future then I can go and dig into the other engines more specifically if I want to. As a user of DJL here at Red Hat I have found that their open source community is very well supported, it's very active, they've got great documentation, they have an entire book, I'm using deep Java library for data science and their forms are great so all of my questions have been answered through these community forms so it's been great. So deep Java library in my opinion the real value is from the ability to serve models in Java, that's where I've been using DJL and that's where I'm going to focus on but just so that you're aware deep Java library also has tooling that supports pre and post processing data manipulation and model training. So these ladder initiatives, data manipulation and model training traditionally are done in Python or something equivalent that data scientists are comfortable with but just know that if you're interested you can also do this in Java using DJL as well. And so from a target environment where your model serving is going to run basically the sky's the limit in terms of options that you have certainly your Java model serving initiatives using DJL are going to run great in a cloud environment. So here I'm speaking specifically to Red Hat's OpenShift platform. I've used that extensively and it works great. Another use case, another target environment is running your model serving at the edge co-located with the data ingestion point that the predictive analysis of your model is analyzing on. And so this use case, the edge use case is intriguing to me and so I'm actually going to use that as my demo here in a few minutes so I'll talk more about that later. And being that deep Java library is Java runs great on your Android mobile platform as well. So speaking of use cases, I just wanted to mention that Amazon has done a very nice job of blogging about all the various use cases that they've encountered, customer use cases and internal use cases on their blog site. Obviously I'm not going to go through all the details here. I've kind of cherry picked their blogs and listed them here but they're really interesting, very insightful. And so I encourage you to go over to their blogs and take a look at how they're using deep Java library in production for model serving for a variety of use cases. So one value proposition that I was mentioning about just earlier is the unified Java API certainly accelerates adoption of model serving to non-data scientists like myself. So that's great. Another value proposition that I want to mention from the context of the challenges that apparently occur in terms of data science itself. So, again, I'm not a data scientist, but it's become a I've been made aware that a real problem that still exists is that only a fraction of data science projects today are actually making it into production. And a colleague of mine, David, who's on the presentation here, linked to me this article that I've provided on the slide here. It's from 2019 but apparently the problem still exists where a very low percentage of data science projects are actually making it into production. And the author of the article points out one of the root issues is the disparity between the tooling and methodologies utilized by the data scientists and the mismatch and disparities with the tooling and methodologies that a traditional DevOps team utilizes. And so that's the problem. And it seems to me, in my opinion, that DeepJava Library attempts to alleviate or totally address this issue. By using the model, you can kind of think of it as the contract between the data scientists and the Enterprise DevOps team. So the data scientist is going to use the tooling and methodologies that he or she is familiar with to create a model and train the model, maintain that model over its life cycle. And then the traditional DevOps team is going to be able to serve that model at runtime using the expertise that they've developed over the course of the last 20 years using the tooling that they're already very familiar with as they're supporting many other Java Enterprise applications. And so hopefully, you know, that disparity between the two teams is alleviated. Okay, so the next thing I want to just highlight is from another value proposition perspective is that so running your models, model serving in Python. So I'm not a Python expert, but anecdotally, it's my understanding that under some circumstances, Python, it's a little bit more difficult for Python to scale. So it's my understanding that with Python applications, with the C Python interpreter specifically, which apparently is the default Python runtime, it's a little problematic to scale and that if you want an additional threat of execution, then you're going to have to instantiate an entire new operating system process. And along with that, all of the RAM requirements to support that additional operating system process. So from a scalability perspective, that becomes problematic. Now, apparently there's some workarounds and there's some alternative interpreters, that seems to be the default experience. That's not the case with the Java virtual machine. The Java virtual machine is inherently multi-threaded and so using the same heap, you have multiple threads of execution running through that same heap and subsequently the scalability of your JVM-based application is going to be much more stable. All right, so still keeping with Amazon's deep Java library from an architecture perspective, one way that you can interact with deep Java library is through what they call the model server. And so this is a, you can think of it like a centralized traditional model, monolithic application. It's a web application. It actually happens to run on a framework known as NETI, which Red Hat is a longtime contributor to. But it's this monolithic web server that exposes a variety of different APIs, like a REST API and an asynchronous API. And so remote clients interact with that centralized monolithic web server. And so you feed this monolithic web server your model at runtime and then the deep learning engines that are embedded in the model server will execute on that model at runtime. So you're not embedding it into your own application. You're interacting with that model server remotely. Being that it's all Java, you know, it's monolithic, but it runs great on a Red Hat stack, no problem. In this diagram I'm depicting it running on OpenShift. So it's a Linux container running on OpenShift and making use of the Red Hat stack. If your OpenShift is enabled with GPUs and corresponding NVIDIA libraries, then the DJL model server is going to auto-detect the presence of those GPUs and run your model if you choose so on those GPUs. So with that, let's now transition into talking about Quarkis. And so this is where I think it gets interesting, the combination of deep Java library and Quarkis. So what is Quarkis? It is Red Hat's modern Java framework for microservice and edge architectures. So it's something that we've been creating, supporting through a traditional open source methodology for about the last five years or so. And it's just kind of the evolution of JEE. But for modern microservice and edge use cases. One of the nice things about it is that using Quarkis, if you're familiar with microprofile, if you're familiar with JEE, you've come to lean on and expect a massive ecosystem of enterprise features or basically everything you need to do in a business application. So reactive programming, integrating with messaging and streaming brokers, integration enterprise patterns using camel as an example and exposing APIs using Rust and SOAP and integrating with single sign-on as an example, so on and so forth. All of that is brought into the Quarkis framework and done so in a simplified, unified API. So it's really nice to work with. It's very intuitive to work with if you're a traditional JEE developer but are looking to get into microservices or edge. Being that it's a modern Java framework, it needs to run as a first-class citizen in a Linux containerized environment. And so there's a whole suite of tooling within Quarkis to fast-track you from your Java, your Quarkis Java application into a containerized environment. So that's really nice to work with. And subsequently, in terms of container orchestration, there's a whole suite of tooling, again, to get you from your Linux container, which embeds your Quarkis app now into OpenShift itself. In the context of Quarkis, you'll also often hear the term developer joy and that actually means something. The APIs that Quarkis exposes are very well thought out, very intuitive for Java developers. And again, the tooling is going to make you very productive. So I think you're going to enjoy working with Quarkis under all scenarios if you're a Java developer. From an architecture perspective where we bring together deep Java library and Quarkis, this depicts a microservice architecture. I think this is where things start to get pretty interesting. So in this slide, what I'm depicting is a series of business applications that are Quarkis-based, so they're Java-based. And within each one of these Quarkis applications, I am embedding the deep Java library as well as the corresponding deep learning engine into my business application. So my business application is not interacting remotely with a centralized model server, as I was kind of depicting a couple slides earlier. Instead, I'm embedding the library, the model and the learning engine directly into my business application in a very small footprint. And I'm running so as a microservice. And so I have the options of a variety of different engines that I can embed in my business application. And I'm containerizing them and subsequently orchestrating all those containers just as I would normally do for any other Java Linux containerized application on OpenShift, so it's running on a Red Hat stack. And similarly to previous, if I've enabled my OpenShift with GPUs and the corresponding NVIDIA libraries, then my embedded microservice applications are going to auto detect the presence of those GPUs and actually run on those GPUs so I can take advantage of that as well. So that's a microservice architecture where DJL is embedded directly into Quarkus as opposed to the centralized approach that I was mentioning earlier. So another use case architecture that I find interesting is the Edge use case. And this is actually what I'm going to demonstrate here momentarily. But similarly, I've got my Quarkus application embedded with the deep Java libraries and deep Java engines and the model embedded into some type of Internet of Things device. And that device is co-located at the point of data ingestion, so the predictive analysis is done right where the data is at. And then ideally streaming of state change as they occur. So I'll speak to that here in a second. So the Edge device similarly, I mean probably going to be containerized if you choose to do so. You could just run on a JVM on the device if that's what you prefer. But most often we're going to run in a Linux container. And you could also optionally run it on MicroShift if you choose to do so, which is kind of a smaller footprint OpenShift framework for Edge devices. If your Edge device is enabled with GPUs and the NVIDIA libraries, then you're going to be able to pick up on those GPUs and leverage those GPUs similar to the other architectures as well. So let me spend a few minutes going through my demo. Again, I could have picked a variety of use cases, certainly anything cloud-based where the predictive analysis is running on the cloud. We've got several demos of DeepJava library workers doing exactly that. But for this presentation I decided to demonstrate kind of the Edge use case. So let me show you that. So I'll show you my device here in a second. But this demo is what pertains to live object detection. So you've probably seen these types of demos before where predictive analysis is done on video frames and it detects the presence of different objects in real time. So I call this demo my intelligence at the Edge demo because the prediction, so the model serving and the prediction analysis of the engine on the model and the video frame is happening at the Edge. And my corpus application where the model is embedded into is acting on state change events. And so all that analysis is being done at the Edge. And then when a state change occurs that event and only that event is sent over the network to the cloud. Edge devices are typically operating in disconnected, intermittent, low bandwidth environments. So it's often the case that you don't want to be just like randomly streaming video frames. You just want to stream the actual event itself. So that's what this demo is doing. Just so you're aware, I'm going to be using the PyTorch engine and DeepJava libraries unified API. The model that I'm going to be using is from DeepJava libraries, which is what they call model zoo. So I'm just using an off-the-shelf model I haven't modified it myself or optimized it at all. All of this is going to be running in corpus on a Raspberry Pi 4. So it's the ARM architecture and on my Pi 4 I have Fedora 38. So all that's happening on my Raspberry Pi when a state change occurs. I'm going to transmit the event via MQTT to my OpenShift environment where the MQTT event gets persisted into a queue maintained by my Red Hat AMQ broker. And then I have a simple corpus application that consumes that event and streams that to my browser via what's known as service event. So let's take a look at that now. All right. So what I want to do next is... So this is going to be my... This is going to be my very simple web application that's running up in OpenShift and here in a second you should see video frames of myself from the webcam that's on my Raspberry Pi. So this is going to be the web app and now let me navigate here over to my Pi 4 itself. So what I've got here in front of me and hopefully you can all see this is a standard Raspberry Pi 4. I've got an Ethernet cable connected to it so that it can stream my MQTT events as it detects some type of state change. I've got power running to it and I've got this webcam connected to it. The webcam is currently off because my application is not running on the Pi 4. So that's basically the hardware that I'm using and so here in this terminal I am... I am shelled into my Pi 4 so I'm on my laptop and I'm shelled into my Pi 4 and I've got my application that I've previously written written in Quarkus and DeepJava Library and it's got a little bit of smarts to be able to detect some state changes. So I'll kick this off and it will take just a second here to start off but you should see here in a second that as it starts up my webcam should turn on and it will begin to... yep, so there it goes, turns on a green light and we should see now that it's streaming state change events. There we go. State change events as it picks them up and streaming them to my cloud-based web application when need be. So if I gotta turn around the... or turn the webcam around you can see you get an intimate look of my work environment, my home and so that's it. So this is model serving using DeepJava Library and Quarkus on a remote... on a Raspberry Pi and then forwarding state change events as they occur to the cloud. So let me go back and now turn this off so it doesn't distract me and let's continue on with the presentation. Okay, so just a few more slides and we'll do a quick Q&A on just a few technicalities about DeepJava Library and Quarkus. So I was mentioning earlier the C++ libraries that exist for a variety of deep learning engines and how they're implemented predominantly in C and C++ and I just want to mention that Amazon's DeepJava Library maintains pre-built jar files of the JNI wrappers for most of the common triplets that you would encounter. So by triplets I mean the combination of runtime or the combination of processor and operating system and desired machine learning engine that you choose at runtime. So there's wrappers for all those different types of triplets and the Amazon DJL community maintains those for us. So that's super nice to have. Those JNI wrappers are embedded in or included in jar files and if you really wanted to poke around in your application that's using DJL you would see that for this particular triplet in this case TensorFlow you'd be able to find that specific JNI wrapper and engine that was used at runtime. So in terms of using the C++ libraries and getting them into your application either in a microservice architecture or in an edge scenario like I just showed you have a couple of different options. The first one is your application can auto-detect the needed triplet or the needed set of engines that are needed based on the model and so that's kind of a nice feature that DJL provides. At runtime you feed it a model and it'll auto-detect what's the best engine to run that model and so for quick starts and demos and getting started scenarios that's super convenient. Now that's often probably not the case it's probably not ideal to run in a real production environment so often times you know because production environments we're typically talking about they're locked down so alternative what you can do is you would decide up front which triplet basically you need for your runtime environment and then you would specify that dependency for that exact triplet in your Maven POM.XML at build time and so that triplet that specific JNI wrapper gets embedded into your application and so you're not dynamically determining this at runtime and pulling down libraries instead it's already built into your application so that's probably going to be the more secure approach when working in production environments. A few more notes about Quarkis and Linux containers so just you're aware Quarkis has a lot of tools to support getting your containerizing your Quarkis based application it's very simple to do so as a Java application developer you often using Quarkis you typically don't need to know the mechanics of how to containerize your Quarkis application all you have to do is specify which approach you would like to take the default approach is using a pre-generated Docker file that comes when you create your Quarkis application but there's some other approaches as well and all of this is made very easy to not only create the Linux container but also to push that to a container registry in OpenShift and deploy. There are a few parting notes there are a few alternatives to deep Amazon's deep Java learning deep Java library initiative one of them is known as deep learning for J I apologize I don't know a whole lot about it but just wanted to make you aware that there are alternatives it seems that I'm not sure that they have the unified API or at least it's not as mature as the one in deep Java library so keep that in mind but there are folks using that and here's a slide on a variety of other approaches where you can use Java to interoperate with a variety of different deep learning engines so there's alternatives to DJL and as my final slide I just wanted to leave you with a few references where you can learn a bit more if you're interested so Banga and David that's all I have for now did we have any questions? Thanks Jeff that was really good we had one question but Dave Marcus responded I'll just repeat it here someone asked what's the difference between using Python and Java in regards to multi-threading on GPU and CPUs for training models and Dave responded with a bunch of links in the chat so that's there and then the other question was does the same number of libraries in Python have like CNN, Transformers and others and Dave responded with the libraries and engines supported by DJL with official documentation we have a couple minutes Jeff anything else to add? No not at this time I think Dave I guess I do have one question I was thinking while you were speaking so with DJL does the model get built into the links container at all? Well you have, that's a good question you have a couple of different options you certainly can embed the model if you choose to do so into the application it seems to me that you probably might not want to do that instead it seems to me that the model would exist outside of the application outside of the Linux container but co-located with it so as one example you might mount the model to a persistent volume if you're running an open shift where the Linux container has access to that model so it's mounted at runtime the deep job of library application in the Linux container then loads the model from the persistent volume into the application so that's one approach you could take you could also through DJL and corkis you could pull from a model repository that's in the same environment in a secured manner so that's another approach object storage of some sort whatever you choose and so I think the nice thing about not embedding it or it seems to me one of the nice things about not embedding the model in the Linux container is that that model can evolve over time and as changes to the model happen they get reloaded into the application at runtime so a variety of different approaches thanks for that Jeff so I think we're now out of time but thank you Jeff, that was a great presentation and demo so like I said before all sessions recordings were uploaded to the YouTube page and presentations available made later so our next session is a fun one with Dave Marcus so we'll come back, we'll have a session on active learning loop on OpenShift so thanks everyone and see you later great job Jeff