 Good morning. Good afternoon everyone. This is Hongfei from KPMG. I'm a cloud engineer, director, co-presenter with Kevin Martelli, his principal in the cloud industry area. So our topic is how we leverage OpenShift data storage to deploy our Ignite machine learning platform. Yeah, so I think as Hongfei was alluding to, you know, as part of this database session that we're having here at Commons, we plan on showing our KPMG Ignite platform. The KPMG Ignite platform is our data science AI and ML platform used for business, but it takes advantages of a lot of these technologies. So we want to show you in the context of a use case on how we leverage it through high throughput and also large data volumes of storing data. And that will kind of go into the next two agenda items here. And then finally, you know, if we do have time, we thought it was interesting with some of the more, I would say robust models that we were storing. It required some object storage or using PVCs with a main IO layer on top of it. Not exactly database, but a data storage type of platform and application. We thought it would be interesting to share as well. If you go down one slide, Hongfei, as I was mentioning, just so we could set the background on what KPMG Ignite is. So KPMG many years ago, we built what we call our data science AI and ML platform powered on top of OpenShift. It's a platform that's built in a very modular way to allow the usages of best pieces of whether open source software or proprietary software or commercial software that can kind of plug in in order to build your use case or application. Initially, it was built for data scientists and engineers. However, there's a hook there in for the business to be able to engage with and interact with the data sets that are coming out, as well as to keep that human in a loop through the end-to-end process. And finally, it was built, you know, mainly around unlocking the value of unstructured data. It since has changed to do structured data as well as semi-structured data, but really built off of all the rich text that needed to be taken out of these unstructured documents. And what I just wanted to quickly show here before we dive into the details is how our use case and methodology is built, which aligns to some of the ways that we're using different database technologies. So use case is put together by a component. A component can be like something open source, like an OCR engine, a component be classification data extraction. So there's many components that get strung together into a workflow to produce an output. And as these components are, you know, communicating back and forth, Kafka is the messaging channel, if you will, that allows these components to talk back and forth. And then there's interfaces in that human in a loop. So users can see the output and help to retrain and re-update the models. I have to get that one slide on thing. And then finally, this is the last slide before we drive into the content. If we think about Ignite, we think about it as a layer cake, if you want. There's sort of all the top in the user experience part of it. There's interfaces and annotation UIs and management consoles of how people can interact with the data coming out of the platform. We have what we call in that middle layer, the Ignite AI platform, that these are kind of the AI tooling that enables you to build and execute pipelines. As I was mentioning earlier, things that could be proprietary that KPMG is built, where we call it, you know, custom type of capabilities, things that may be open source in the market, like Ignite Tesseract, or things that we've kind of built as part of our overall like drivers of certain types of, you know, more tactical data extractions in which we call our intelligent domain engine. And if you look to the left, it talks a lot about some of the core fundamental things about the platform. So Loon is a way that we store data. So there's a consistency where you put something into a particular, you know, component and how something comes out of that component. And then finally, you know, as one would expect, we have the orchestration layer, which is really powered by OpenShift and we have some workflow engines in there. But I wanted to highlight this core infrastructure. So the core infrastructure is where we're going to focus most of our talk on today. And these are around the different, I would say, database-like applications that we're using. So we're using Kafka, we're using Postgres, you know, we're also using Minile as we talked about, and then we are also using Elasticsearch, but we won't go into that for timing. But we'll go through the types, the ways that we're using, you know, Kafka, how Kafka is set up in the Bioform Pros and Cons, and then we'll also talk through, you know, how Postgres is being used as well. All right, with that, let me turn it over to Hanfei Kao. All right, thank you, Kevin. For the rest of the presentation, let me introduce you how we set up and leverage OpenShift Datastore to deploy the database for Ignite Platform and also share some of the lessons learned, best practices, and what's the benefit deployed on top of OpenShift. So the first component I'm going to introduce to you is Kafka. So we leverage Kafka for the message broker to stream our Ignite workflow metadata, or some job result to the multiple worker container. To simplify, here are three node Kafka clusters with high availability setup, and each worker or worker container part will have multiple processor volume claim amount to it. Here we have customized storage class for the processor volume, which are using the encrypted OpenShift container storage OCS. So the storage setup here right is mainly for distributed Kafka message, you know, also our contraversion, Kafka requires ZooKeeper to store the cluster information. So we also set up high availability ZooKeeper cluster. As one simple example, we have three ZooKeeper nodes, right, as a minimum column cluster, and each ZooKeeper node right similar to Kafka, it has a multiple processor volume class amount to it. So we found when deployed Kafka and ZooKeeper on top of OpenShift versus other like a traditional VM or, you know, as type of deployment on cloud, the benefit using OpenShift including, you know, the below aspect. First one you saw as you know, Mac mentioned, right, there's a strong, you know, advantage to build this hybrid cloud strategy, cloud agnostic approach using OpenShift, and a lot of box OpenShift offered us default like, you know, orchestration and also the failover through the, you know, state for site deployment, through the, you know, building replic set. Also, it's a whole deployment is using, you know, automated CSED workflow, which, you know, help us significantly on the, you know, Kafka restore, you know, early update, etching, et cetera. Last but not least, through the OpenShift, we can easily scale up and scale down, you know, our Kafka cluster or ZooKeeper cluster based on the workflow, you know, needed. Hey, honestly, maybe there's one thing I want to add on to this slide. I think one of the challenges that we did run into and, you know, we had a kind of software around it, as we were mentioning before, we have the concepts of a component. So component can be like OCR, component can be a classification. The component can be some like heuristic rule that's getting information on a document. And if you're going across hundreds of thousands of documents, and then you're having, you know, thousands of these components spinning up to operate on the documents, there's a lot of communication and trafficking going back and forth between Kafka saying one component's done, next component take it, next component's done. So all of that interchange between, you know, the process of executing component one, component two, component three, component four to produce some type of output, you know, had a lot of heavy throughput on how Kafka needed to be kind of deployed, configured with on the platform to have the certain SLAs that needed to be in place and also keep the resiliency of how the tooling needed to work. So there was a couple of things that the team has worked through, I think, on Faye, you know, talk through them. But that was initially a challenge with how many messages were going back and forth because of the spin up of the pods to actually compute those individual components for selected workloads. Thank you, Kevin. Right, the next component, I'm going to talk about is Postgres. So we, as a united platform, we use Postgres to store or in network flow metadata as a relational data store. Similar to Kafka, we also want to deploy Postgres as a high availability cluster setup. And what we found out is OpenShift offers a Postgres operator, you know, through the vendor, you know, the implementation. So it significantly reduce the complexity of deploying the high availability Postgres cluster. Also, we have, you know, building some solutions or customized solutions for backing up the Postgres data, which leveraged the object storage, MIIO as a landing zone type of solution. We dump the Postgres data to MIIO and once the Postgres cluster restored or, you know, backup, we can, you know, share the data across, you know, to the different cluster or, you know, backup restore the data to the new Postgres cluster. When we deploy the Postgres on OpenShift, we found, you know, below advantage, right benefits, including the easy deployment through the operator. And it has, you know, a very good integration with storage support. Also, the building enterprise grade level, high availability in orchestration fell over, help significantly on the database deployment. It also, similar to Kafka, it provides a cloud, agnostic hybrid cloud approach for the deployment and, you know, easy migration. CSED integrated using the existing CSED, like Jenkins, Ansible People, you know, TicTom. So it's going to reduce the deployment time. Last but not least, the building security module to support the policy and hardening our deployment. So a lot of benefit for us right to deploy the database Postgres on OpenShift. Next, I'm going to quickly talk about another type of, you know, storage, we directly elaborate for Ignite machine learning model. Different from Postgres Kafka, here we directly leverage the standalone persistent volume claim running on top of the OpenShift cluster storage. Like many other machine learning platform ecosystems, Ignite also has a model database or model inventory to store the pertrained model. And sometimes the model could be a very large scale. If, you know, it involves, for example, deep learning or, you know, the nature network processing model, it could be like, you know, even several gigabytes size. To speed up the model, you know, prediction or classification process, when we serve the model to reduce downloading the model from the database, model database or model inventory, we actually set up a centralized shared read read many persistent volume class to store those large size object or pertrained model. And then later it will share across multiple, you know, machine learning worker container or parts. So this requires, you know, minimum data download time is only one time data load. And it is significantly reduced network traffic between the model database and the OpenShift cluster. And given the model itself, the nature of the model is relative static compared to the other data we store in Kafka or Postgres, we can do the separate deployment loading the model as the beginning of the, you know, model serving job. And it only required infrequent data updates, which we have a separate deployment job for the model updates. So here on the right hand side, it shows before the deployment, we go to mount persistent volume, read read persistent volume claim to our deployment part, and it will download the model from MLflow as our model inventory. Once the model is persisted there, for any like model serving part or worker job, it can load the persistent volume claim as read read many and reduce the network traffic. Last but not least, I'm going to quickly touch on the object storage setup inside Ignite. So we also leverage the MIAO as our file system on top of the OpenShift OCS storage container. Here, the MIAO is deployed as a state-force asset, and each MIAO state-force asset has multiple persistent volume claim with customized storage costs. To benefit Ignite, the MIAO supports read read money and has the API with secured access key to allow different worker containers to access the MIAO data. For example, we can store the runtime log, the job input, the documentation list, etc. on top of the MIAO as our shared object storage. Finally, to conclude our Ignite deployment OpenShift work, we found that leveraging OpenShift, especially as the operator, is key for our enterprise level grade deployment to handle the Kafka Postgres machine learning platform storage. We found that OpenShift offers a lot of out-of-box functionality to support high availability fall over and say CD pipeline. Also, to have a better high availability support, we prefer to deploy our platform to multiple clusters in different regions and location data center. To enable the read read money, persistent volume claim is the key to reduce our network traffic for large-scale machine learning pre-trained model, like a deep learning model for ALP and shared across multiple ALP or model-serving job. Also, customized persistent volume claim backup utility is also the key to help us quickly rotate or update our existing database like Postgres or Kafka. Last but not least, my grade from the old storage class to the OCS encrypted storage class gave us a better throughput and encryption from the OpenShift storage perspective.