 All right everybody, welcome back to another OpenShift Commons briefing today as we are wanting to do on Mondays. We are bringing one of the upstream projects that is one of the more important workloads on these days. Not that your workload is not important, but one of the more interesting ones, Open Data Hub, the AI platform and team to come and tell you a little bit about this project at Red Hat. We have a number of members here, Juana, Vaclav, Chad, Landon, and Beverly. Landon LaSmith is going to walk us through first a little overview of what Open Data Hub is and then we're going to open it up to Q&A and have an AMA session on this as we like to do and have a little bit of a demo of it. So give you up your questions wherever you're watching this, whether it's Facebook, Twitch, or YouTube, or if you're in the Blue Jeans and we'll aggregate those questions and answer them hopefully after the demo and lecture part and have a conversation about what Open Data Hub is and how to use it. So take it away, Landon. Hi, this is Diane Seated. My name is Landon LaSmith. I'm one of the engineers on the Open Data Hub team along with Juana, Vasek, and Chad. So I'm just going to give a quick overview of Open Data Hub and hopefully we can answer all of your questions. So in this slide, we're going to cover what is Open Data Hub, give a brief introduction to Kubeflow, which is our kind of upstream project that we're in sync with, kind of tell you where Open Data Hub is used, and just to give you a quick demo on how you can deploy Open Data Hub. So what is Open Data Hub? The original goal of Open Data Hub is to build a platform for data science. So we want to make it as easy as possible for a data scientist to stay within their workflow. So we know that they have many tools that they use for model training, model development, model serving. We wanted to make that as easy as possible to do it on OpenShift. OpenShift allows us to kind of scale out to different needs and configure the workflow exactly how they want to do it. One of the issues we tried to tackle is to make it so that the everybody on the team can contribute to the data science workflow. We want a team of data scientists to be able to work on shared data using some type of storage, use an environment, a development environment that they're comfortable with, in this case Jupyter Notebooks, but also allow it kind of data engineers and DevOps to work within that workflow to create the best solution possible. So this began what we are now calling the Open Data Hub. Open Data Hub is not an official Red Hat product, it is a community project we set out to create a reference architecture to provide best practices on how you can deploy these different tools within this data science workflow. We have a lot of information on these best practices, how to deploy Open Data Hub, how to use different components of Open Data Hub on our website at opendatahub.io. And the core part of Open Data Hub is the meta operator or the meta project, the Open Data Hub operator. So with this operator we can deploy different tools that will be used in the workflow for a data engineer, the data scientist, and make it easy for DevOps to deploy this project. So if you want to deploy the Open Data Hub you can find that on any OpenShift cluster under the operator hub on that cluster and look for Open Data Hub. It is a community operator that's available to install for free, no Red Hat subscription required. So the Open Data Hub ecosystem combines a lot of different parts where we gather input for best use cases, best practices for Open Data Hub. So we work with a lot of customers, internal and external, to kind of lay out how we want the Open Data Hub to proceed. So we take public requests. You can contribute to the Open Data Hub. We work with Red Hat partners to see if their tool helps further the Open Data Hub. And we work with a lot of upstream components that have downstream projects within Red Hat. So our goal is to use completely open products within the Open Data Hub and also provide a path where you can kind of substitute in these downstream products if necessary. But everything is freely available. So this is kind of a few of the components that are in Open Data Hub in this nice graphic. We focus on kind of Jupyter notebooks for the development environment. Object storage provided by kind of Ceph, Apache Spark for data engineering, Selden for model serving. Tensor workflows are kind of the core pipeline technology that we've used in the past. Prometheus Grafana, TensorFlow, and Kafka. So with the recent release of Open Data Hub, 0.6, we're currently on version 0.7, we are an official downstream of Kubeflow. So the Kubeflow project is a project to bring together all these data science tools into an ecosystem that works on Kubernetes. And we do the work to make sure that this workflow also works on OpenShift. But we also bring in a lot of products that are covered by Kubeflow. And all of this is available on Operator Hub. So this is a graphic kind of our original release. So a little bit of backstory about Open Data Hub. Probably a year ago, we had our official release of 0.5. This contained a few of the components, which were Jupyter Hub, data catalog for, they contained Q High, and Threst, GPU support, Argo, and all in Antwo operator. With the switch to downstream Kubeflow, we refactored or updated Operator, so it's purely based on Go. It works with the KFDef manifest, and it fully supports Kubeflow products. So using this Open Data Hub operator, you can deploy Kubeflow on OpenShift in addition to Open Data Hub components. So the current release is 0.7. You can see a few of the components we have released. So we have full support for Kubeflow version 1.0. You can deploy that with our Operator 1 OpenShift. KF Serving Support with our operator, I think this might be mixed in. We can use this with Open Data Hub. Full CI testing on all of our updates and releases. So as soon as we submit any updates to Open Data Hub, we've run a full battery of CI tests to make sure that new component doesn't break any existing functionality, but also provides working new functionality. You can mix and match ODH and Kubeflow components. So right now we're verifying a small subset, but with the 0.8 release, we plan to verify and test kind of all of the default Kubeflow 1.0 components mixed in with ODH components and some OpenShift container storage. So the current operator for Open Data Hub is kind of a phase one basic install. This means that it will deploy Open Data Hub and do some minor updates, but for the most part we're doing a full install. We have plans as time goes on throughout the year to kind of bring this into a phase five operator, but these are long-term plans. But as of right now you can deploy your Open Data Hub ecosystem, architecture on your OpenShift cluster without any issues. So Kubeflow, for those that may not be aware of it, is an open source project dedicated to making deployments of ML workflows on Kubernetes, simple, portable, and scalable. A lot of the work we did to bring Open Data Hub in line with Kubeflow is to make sure that there are no issues when deploying from Kubernetes to OpenShift. We had to introduce a lot of updates and fixes to make Kubeflow more secure. We want to make sure that not every container is running with root privileges, that you don't have to elevate any container privileges beyond the standard runtime permissions. And then we kind of verify and make sure that model training is serving works on OpenShift. So these are a few of the goals. This is for Open Data Hub and working with Kubeflow. We want to incorporate best practices, sorry, a simplified install. We want to use the kind of UBI or universal base image as the base for all of the Open Data Hub components. This provides anybody deploying Open Data Hub with a level of security that comes with using that UBI base image. So you get a lot of the Red Hat effort for providing a secure base image in Open Data Hub. And we also want to make sure that we secure that deployment of Open Data Hub and by extension Kubeflow. So that's using kind of well-defined permissions. We kind of eliminate any containers that require root privileges and work within the kind of standard deployment, standard secure deployment that OpenShift provides. So this is kind of a quick graph of some Open Data Hub components that we are bringing to the new release of 0.7 or 0.8. We're working on kind of allowing you to deploy storage along with Open Data Hub based on Ceph Object Storage. We have support for, I guess we have components that are using Postgres, but as of right now you can deploy Kafka and we have, you can deploy Spark clusters. We're working on updates to provide kind of data exploration. So we do have SuperSet, which allows you to do data visualization. So you can work directly with kind of your external databases or data sources to visualize that data. We're working on adding data cataloging with Hue so that you can kind of navigate your object storage, but also run Spark SQL queries on that data. So we're hoping to get that into the next release. And we currently, I think we do support with the ability to mix Open Data Hub and Kubeflow components for TS serving. I think PyTorch is in the final verification steps. We do deploy support Selden model serving and Argo and Kubeflow pipelines along with monitoring by Prometheus and Grafana. And like I said in the bottom right, our data scientists kind of workflow includes Jupyter Hub. So we fully support OpenShift authentication for those notebooks. So Jupyter Hub being a multi-user notebook server, a team of data scientists can work within their own notebooks separate from each other, but potentially share data either through object storage or even sharing notebooks by either allowing others to access their notebook pods. And all of that is fully integrated with kind of Spark cluster that you can deploy. And we have our AI library with kind of example AI models that you can utilize for your workflow. So if you want to join the Open Data Hub or follow it, as always, feel free to go to our website at opendatahub.io. We are fully functioning on github.com slash opendatahub-io. So if you want to track any issues or progress that we're making in the project, all of our Open Data Hub projects exist under that organization of opendatahub.io. Again, we're a community project, so feel free to kind of take a look, file issues if something doesn't work correctly, or submit PRs. So if you see an issue or you want to add a new feature, definitely go there and submit a PR. If you want to track progress, we have an announcements list you can subscribe to, and then a contributors list if you go the extra mile to submit PRs and want to become a contributor. And then we have bi-weekly Open Data Hub community meetings that you can track archives on our GitLab site. So I want to clear up confusion. So our old operator exists on GitLab. But to make sure that we can stay in sync with kind of Kubeflow updates and become a fully functioning downstream of Kubeflow, we migrated to github. But a lot of our old projects are still on GitLab, the Open Data Hub community being one of those. But it's still current for the Open Data Hub community, so you can see old meetings, get notes from any of the meetings where we have a lot of guests present, any use cases that are utilizing Open Data Hub or kind of volunteering or opening the discussion to add new features to Open Data Hub. So this is kind of some examples of where Open Data Hub is being used. Originally Open Data Hub was an internal project that started with a basic elk stack, if I remember correctly, and we worked with internal customers so that they could kind of work with their data in an easy fashion. So we provided a storage and elastic search to interact with that data. And from that we got a lot of customer use cases that helped to form the Open Data Hub. So a lot of the work internally we transitioned to the Open Data Hub so that some of our experiences with this type of workflow can be utilized by the community as a whole. A few of the early adopters for Open Data Hub, the Massachusetts Open Cloud, it's a collaborative effort of a few universities to kind of run their data science and high resource workloads on a high availability cloud. So Open Data Hub is kind of part of the backbone for some of this work where kind of professors, researchers, and even some students can get access to run their OpenShift or data science workflows. So I'll give a quick demo. I just want to demonstrate how you can get access to the Open Data Hub and deploy it within your workspace. So let me kick over to my OpenShift console. So here I have a basic OpenShift cluster. Potentially you could deploy this on any OpenShift cluster. So right now I'm using a three worker node cluster. So this is pretty standard for any OpenShift install. We do have support for deploying on something as small as a CRC or code ready containers cluster. You could also use OKD, which I think just released, just went GA or general availability for OpenShift 4 clusters. So right now the the current iteration of OpenShift or Open Data Hub supports OpenShift 4.x. So the current version is 4.5, which I think was released last week or two weeks ago. So any of the freely available OpenShift clusters can be used to deploy Open Data Hub. But if you go down to kind of CRC or OKD on your laptop, you'll need to scale it accordingly. So if you want to deploy Open Data Hub, you can log into any OpenShift cluster and go to Operator Hub. So this should be available in every single OpenShift 4 cluster. And you search for Open Data Hub. Actually, let me backtrack. Let me go ahead and create the namespace. So just create any namespace I use Open Data Hub. And here you'll see Open Data Hub operator. And again, it's available as a community operator, which means it's freely available for anybody to deploy on any OpenShift cluster. And you'll get kind of a rundown of the current components that we deploy as part of the Open Data Hub with additional info about where you can track the project, the operator image, where we're pulling operator image, and additional information. So this kind of describes the available channels that you'll see in the next step. So when we click install, we're presented with these standard options for any operator. The current iteration of Open Data Hub is a cluster-wide operator, which means that we can deploy any KFDef custom resource that the Open Data Hub watches for into any namespace in the cluster. And the operator will see that and deploy Open Data Hub. The update channel is Beta. Beta is what you want to use right now. That is where we're hosting our new operator. Legacy is the older namespace-bound operator. That is the older Ansible operator. That still works. But we are providing kind of minimal support for that. So a lot of the components that are deployed there will not be receiving updates since we're doing all our updates on the Beta channel. And we'll leave the approval strategy as automatic. So this means that whenever we release newer versions of Open Data Hub, they'll be available and installed. The operator will update automatically. And hit subscribe. And now we're just waiting for the operator to be installed by OLM. OLM is operator lifecycle manager. So we utilize a lot, or OLM a lot. So in the older operator, one of the issues that we encountered was that we had to kind of recreate the deployment strategy for every component we deployed. So if we deployed Prometheus, we had to create the deployment objects, the roles, service accounts. Every single item that was required to deploy a component, we had to bury that or kind of embed that into the operator image. Now, if there is a component that the Open Data Hub uses that is available in Operator Hub, so whatever component has put forth the effort to kind of be listed on Operator Hub, we can easily leverage that entry in Operator Hub for Open Data Hub. So we're not recreating the deployment strategy or plan for every component. We can literally say for Selden version 1.2, reach out to OLM and deploy that operator. So that's good because we aren't required to kind of stay in sync with their update strategy. So as Selden updates their operator and pushes that to the OpenShift Operator Hub, we automatically get those updates for that version, and OLM will handle the kind of deployment strategy. So now that the operator is deployed for Open Data Hub, we'll just click on that, and you'll kind of get another overview of the deployment. So the Open Data Hub operator is looking for KFDef custom resources. So anytime you submit a KFDef resource, which is the essentially the customized manifest format for Kubeflow, once you submit, create one of those on an OpenShift cluster, the Open Data Hub operator will see that, and based on the information that's in there, it will deploy it. So we'll go ahead and click Create Instance, and hopefully you can see this, but this is a sample KFDef or an example KFDef format that we provide. So what you can do is you can look through this, and you'll see every entry in this application's dictionary has the same basic format. So Customize Config with a repo ref and an aim. Customize Config, repo ref and an aim. So this determines what is getting deployed as part of this KFDef. So here you'll see that we're deploying AI Library Cluster and AI Library Operator. One of the things that we set out to do whenever we add a component Open Data Hub is to kind of separate the cluster wide permissions or cluster wide action, mainly things like deploying to a cluster wide namespace. I want to say checking for required CRDs exist in this kind of cluster component, and then anything specific to the deployment of the operator or application exists out there. So in this operator deployment, sorry. So a lot of these components will have two portions or two configs. So here you'll see Kafka cluster and Kafka Kafka. So anything that's not named cluster, so Kafka Kafka actually has the deployment files necessary for Kafka deployment. Cluster is generally the CRDs and any required cluster wide options. So as you look through this KFDef, you'll see all the components that we're deploying, Kafka, Grafana, the Rata Analytics Spark operator, Prometheus, Jupyter Hub. Jupyter Hub will be the entry point to a lot of the use cases for Open Data Hub if you watch any demos or examples, Airflow, Argo, and so on and so forth. So with the latest release of Open Data Hub, this is kind of one of the new features we wanted to focus on. So you'll see in this repost section, we have KF Manifests and Regular Manifests. So KF Manifests is a fork, a downstream fork of the fixes and updates that are required to deploy Kuflo on OpenShift. So if you go to the github.com Kuflo slash manifest, that is the pure vanilla Kuflo deployment that will work on Kubernetes. And they do have support for additional cloud providers. So Azure, let's say IBM, Google Cloud. But in this Open Data Hub dash IO Manifests are all the files and updates, fixes that you need to deploy on OpenShift. And this plain manifest, which is in the ODH Manifest repo, this is the Open Data Hub proper. So these are components that we've specifically curated as part of the Open Data Hub reference architecture to deploy. So if you just deploy anything using Manifests as the repo name, then this is the Open Data Hub implementation. If you see anything that references KF Manifests as a repo name, then this is based on kind of the upstream deployment of Kuflo that we have added a few fixes to make sure that deploy successfully on OpenShift. So right now, I think everything just references Manifests. But as the next version is released and newer versions, you'll start to see more and more mixing of Kuflo in Open Data Hub components. So potentially you'll see like the TFJob operator, the PyTorch operator, maybe even some pipeline work. It just depends on kind of what we have time to verify before that release. So in order to deploy Open Data Hub, you just hit Create. You'll see the KFDef file is created. You can view the YAML. And now we just wait for everything to deploy. So slowly you'll see kind of different components come online based on that KFDef. They have library operators, selling controllers, SuperSet, so on and so forth. And once these pods come online and they deploy successfully, then you can start to use any of the components that are deployed. So this may take a few minutes, but that is kind of Open Data Hub in a nutshell. So I don't know while we wait if we just want to go ahead and open the floor to any questions. Certainly. We always have questions. One of them, while you're doing this, maybe explain a little bit. One of the questions that's often asked is, is Open Data Hub available for generic Kubernetes, which kind of flows into the question about, is Open Data Hub available on operatorhub.io? So, yeah, so there's a lot of confusion between operator hub that you see in OpenShift and operatorhub.io. So operatorhub.io, the website, are for operators that are certified to work on Finale Kubernetes. So not OpenShift, but the upstream Kubernetes server. So we are certifying that we work on OpenShift, which means that we are only available in the OpenShift operator hub that is deployed with all OpenShift clusters. So just because you don't see us on operatorhub.io means that, does not mean that Open Data Hub isn't available on operatorhub, it just means that we're certifying that we work on OpenShift. So any OpenShift deployment, whether it's OKD, code-ready containers, OpenShift on AWS, OpenStack, if OpenShift is supported on any type of infrastructure, then you have access to Open Data Hub. Yeah, and you did mention, and I'll mention this while we watch your screen scroll here, that OKD is now available, and OKD is the open source distribution of OpenShift, and it's available now, so July 15th, in general availability. And it's running on Fedora CoreOS, but you should be able to deploy the operator hub, the operator hub, Open Data Hub easily on OKD. And I don't know if anybody's tested that yet, but if you haven't, let me know. I do one of the chairs of the OKD working group. We'd love to get your feedback on that and help you through it if there's any issues whatsoever. I don't think anyone on the ODH team has done that yet. It's probably too soon. That was just last week, so definitely have to get that tested. Yeah, and just to kind of build on top of what Diane just said, if you deploy it on OKD, or any infrastructure provider OpenShift cluster, and you're experiencing the issues, please, please submit the issue to kind of any of our projects. If something isn't working with the operator, add that to the open data or create an issue in OpenDataHub-operator. If any of the components aren't working correctly, then feel free to file an issue on ODH-Manifest. If you are deploying kind of pure Kuplo on OpenShift, then feel free to file that on Open, the organization OpenDataHub-IO-Manifest. And if you file it to the wrong one, that's fine. We will definitely make sure it goes to where it needs to be. Definitely straighten here. Point in the right direction. And really, if you're listening to this and you are running this reference architecture or want to, please do reach out when definitely looking, I'm seeing it pop up in lots of conversations across the ecosystem from healthcare to and COVID tracking stuff to all kinds of interesting things. So it's definitely, then starting starting to get a lot of overflow into other spaces and market spaces and use cases. So we're definitely looking for more feedback and any bugs, anything you find, then to us. So how's your demo going? Everything's deployed. We're missing one key thing for Achiever to Hub, but I'll investigate that and we'll go from there. So we can open it to other components. So there are questions. Let's see. So one of the things I'll say, whenever you deploy OpenDataHub, we make sure that everything's ready in a state where you can use it automatically. So if any of the components need to be accessible, so they're not just kind of backing components where a component A is just utilizing a service from component B, if it's something that the user needs to interact with, we make sure that there's an OpenShift route created to that so that you can easily, once OpenDataHub's deployed, just go to networking routes and access that component. So here you'll see SuperSet. So now that this has been deployed, it's ready to kind of interact with and you can start your workflow from there. What was the piece of Jupyter Hub that didn't deploy here? This is why we love to do live demos while we're live streaming because it makes it much more interesting and people believe us that it actually works and it's not smoke and mirrors and this truly isn't smoke and mirrors. So we do have checked the operator. I'm trying to see if there's any mention of Jupyter Hub. Maybe while you're doing this, we can answer a few more questions and I'll unmute some of the other folks that are from your team and you can debug it and just raise your hand when you figure it out or not and we can do that. So let's see who else we have. Wana is here. Valklav was here. Beverly is here. Hey, Wana. How are you? And it's Joanna, right? It's not one. It's Joanna and I'm going to just shoot myself because I ought to be able to remember that each time. No worries. I'm sorry and Valklav is here. So while he's doing that, a couple of other questions and I think you answered the one about and explaining where it is in terms of vanilla Kubernetes versus OpenShift and I think we do have a pretty strong full open source stack with the the complement of OKD now. So anybody who wants to do a full stack without licensing OCP could if they would and I'll see if I can get the OKD working group to find someone to test it out for us. But one of the questions that came in and Beverly is probably going to guide us through some of them. Maybe the first question if you want to go through that. Absolutely. So Wana, are all components from Kubeflow available or included in OpenData? Actually, yeah, so not all of them are. For example, I could say KF serving today doesn't work with Kubeflow 1.0 that we have. And if you look at the example manifest that is actually linked through our operator main page description, you'll see that some components are commented out and these are the components that we are actually still working on to get them working in OpenShift. So it's a work in progress for us. So Wana, do you have right now it's probably a heavily red hat led contribution base right now. Do you have people external to red hat contributing and helping out? We do have very few people mainly opening issues and guiding us through fixing the issues. I wouldn't say we have major contribution, but we do have some contributors from IBM with regards to the operator contributing heavily there. I forgot what it was, but yeah, that's mainly what it is. Our community meeting is always busy with many different developers from different companies and we do work really close with many of the component owners such as Selden and Kubeflow. So I think as this community expands, the end users become really important because they're giving the feedback to how it's being used and the integration partners like Selden and others become important too as well. So it'll be interesting to see how the ecosystem grows around this because you have incorporated a whole lot of partner and integration points there. So that's going to be fun to watch as we go through. And Diane, now that you spoke about the end users, we also have a question on whether we have active use cases for open data hub? So we have from a use case perspective, from an industry perspective, we do have a couple of one use case that's already out there which is the fraud detection use case that we have all the code and all the instructions on GitLab for it. And then we're working on a couple. We also have AI on the edge that Landon's working on and then we're working on other industries. I think we have one in the banking industry and a couple down the line coming down. Is that what you mean by use cases or did you mean how is open data hub being used currently? I mean it could, well that answers the question but we could also look at it in terms of do we have maybe like clients that are already using open data hub in their infrastructure? Yeah, so we do have a couple clients. We have XM Mobile using it and they did many presentations with regards to using open data hub. We also have internal implementation of open data hub that is being used by internal data scientists and data engineers in Red Hat. And then we also have the MOC that Landon describes and I'm sure Vasha can add a couple more about this and where it is today. I also think that yeah yeah back up go ahead. So with MOC we are working on support for open data hub on power nine machines and power nine clusters of open shift and we have open data hub deployed in MOC and is being used by students for their research work. We have a couple kind of early adopter projects. I don't think that any of them is live right at the moment but part of that is that since we have moved to mostly only supporting well supporting is the wrong word but using and verifying all the age deployment on open shift four and MOC is still running on open shift three. It's been kind of hard to keep it running there so we are working with them. We have weekly syncs to basically see where they are and when they have open shift four ready for us we will come back to having open data fully running there. Part of our roadmap which you can find on open data hub.io as well is for the next release to have a plan for how we could do see continuous deployment. Landon mentioned we have an internal deployment of open data hub which is running internally at Red Hat and then we have that partially public deployment on MOC where the researchers that are part of Massachusetts Open Cloud can use it and our goal for the next release will be to come up with a plan for reproducible continuous deployment solution or process rather where we are new releases of open data hub would go to our internal data hub instance and to MOC deployed instance and it would be hopefully also reproducible for our users where they can use that process to also get their deployment bound to our releases and stuff like that. I saw on your roadmap as well is that you were thinking about disconnected deployments. Yeah definitely it is a big is a big ask and not only for open data but also for the key flow the upstream project that we pull components from there is plenty of people that are running disconnected be that with edge deployments or or generally on a remote locations where they maybe only have mobile connections or something like that and they need to be able to make sure that they can control the traffic that is coming in and out of the clusters so we want to make sure that that is possible open data hub when deploying that like everything goes kind of smoothly they can pre pool the images and they can deploy to a disconnected environment whenever they are ready so we'll be looking at that probably this this fall we've been looking in that for some time in our previous versions which was based on Ansible operator but it was kind of hard because that was just a lot of parameterization with Ansible and it was just like all the repos all the registries all the images and it was kind of a mess so we hope that with this kubeflow based solution it will be a bit easier and also kubeflow was working in the past I'm not 100% sure if they were able to finish it but we will definitely look at the kubeflow solution for that but that there is any and if not maybe we can help or finish or bring it back to the community and see what they have in mind for that I know because we had someone come to the okd working group who wanted to do on arm 64 an ml use case using okd in a disconnected fashion so I think maybe open data hub is a bit overkill for what they were trying to do but I think it gives them a good roadmap and a good maybe a collaboration point that to work through so I'll see if I can feed you that use case as well I think I think it's an interesting point whether open data hub is overkill you don't have to use all the components right yeah if your only reason to run open data hub is to deploy seldom and something else then maybe it's it's still good to use open data hub because we have verified the components that they run well and open shift and there are some integrations and there is more coming if it's just one component then it's already an operator hub and we just depend on it maybe it doesn't make sense to run open data but if it's like three things that you would be running it gives you kind of a single point where you just applied it one custom resource and it all comes up and it's all integrated and configured in the background there did you get that work in London yes so just so it doesn't look like magic or anything what we did I was playing around with the kfdef the operator was kind of throwing an issue with Grafana deployment so it was kind of a timing issue so we're relying on OLM to deploy Grafana based on the Grafana devs configuration since we have it separated into kind of Grafana cluster and the Grafana application we kind of needed like a little wait time in between the two of a few seconds so one of the dependencies that the Grafana deployment required wasn't present yet so that would have been deployed by Grafana cluster so it was a small race condition so let me I'll just go over what we did there's always when you have a problem try the simplest solution so I went to the kfdef and I moved the Grafana component to the bottom so this is pretty simple so I just cut this text from higher up in the kfdef yaml and moved it to the bottom what happens is once we save that it'll trigger an update which the operator will detect and then it will reprocess that kfdef so now there's based on the previous attempt to deploy Grafana all the dependencies were installed and now we can deploy Grafana successfully so that's that's all I did and what they did was kind of unblock the dam so once that was resolved all the below components or the components below Grafana deployed successfully so you'll see we have a lot more deployments and we have now Argo is available Grafana it's online and Jupyter Hub then here we can just sign in luckily I didn't expose my password to the world and again so we're using OpenShift OAuth for a lot of these components by default and here these are this is one of the customizations we've added to Jupyter Hub where you can select your notebook from a list of notebooks so we have kind of a minimal notebook which is just kind of bare bones I think it's just Python is installed, a sci-pi notebook for the sci-pi library, a spark notebook that has version 2.spark245 and Hadoop273, a spark sci-pi notebook, mid-tensor flow notebook so any user that deploys this has access to this so if they have if they can kind of read access to the namespace a basic kind of minimal access to the namespace they can deploy their own notebook and we have different sizes so if you have a team that needs different size notebooks there's an ability these are the faults that we provide but you can provide your own custom resources so even internally and externally we have support so if you wanted to change the small medium large configurations to be kind of 10 cpu and you know 256 megabytes or gigabytes of memory or even larger than that where you have kind of small cpu but large memory you can do that and any if you the user wants to add any environment variables they can and then they just spawn the notebook so this is under full control of the user at this point they don't have access to the project space where Jupyter Hub is running but they have full access to their notebook pod so if anybody has any questions about kind of the the process we went through that kind of debug this or any questions about any of the components feel free to ask I think Beverly has a couple more queued up here yeah um so we've got a question since open data hub is a platform or blueprint to to building an AI as a service platform can you talk into whether it works with GPUs so yes it does so we have full support for GPUs so the open data hub does not do GPU enablement but the notebook that we just spawned a user has an option of requesting enable GPUs so I think so let's do a quick shout out to open data hub.io we have a quick guide for how you can utilize GPUs and open data so we have links to upstream partners so right now in AWS you can add a GPU node and you can you would use the NVIDIA operator that's available in I'm not sure if it's red hat operators or community operators it's red hat operators I'm pretty sure but I will double check okay so since it's in red hat operators you will need a kind of fully subscribed cluster if I'm not mistaken but the access to the NVIDIA operator is free I'm using air quotes so if your cluster has access to red hat operators then you have access to the NVIDIA operator so the NVIDIA operator is responsible for doing GPU enablement so you provide the GPU node install the NVIDIA operator and then that will handle kind of using it requires this auto dependency that's installed the node feature discovery operator is a dependency for NVIDIA that will essentially catalog every node in the cluster and it will give you all these annotate or these labels for different hardware features that are available and once it sees this annotation the NVIDIA operator will go out to that node install the appropriate drivers for the GPU that's installed once that's installed you should get this line here when you describe the node so if you OC describe that GPU node and you see a this value a nine zero value that means that you have whatever this number is that many GPUs available for requests so at this point open data hub can request X number of GPUs and then from there you can spawn any notebook with that will request the GPUs and then you can use that in your model development so you have full access to that GPU I think right now all of our examples use TensorFlow to to crunch the numbers GPU hopefully that answered the question that was great London London um and can you talk about what the AI library is um that's a good question uh maybe chat hasn't answered for that I know he's done some work with the AI library um if you pop over to the docs there's a little overview on what AI library is there and it's just okay here we go the AI library is an open source collection of AI components machine learning algorithms and solutions to common use cases to allow rapid prototyping so again if you have any questions about like any of the components if you want to know more about them then we provide an open data feel free to go to open data hub.io we're always improving the docs and increasing the amount of documentation that's available for different components so as we add a new component or update a new component to add features it will be available on open data hub.io website but I think these are it's a collection of models that you can use in your workflow and we're using Selden so Selden is a dependency for the AI library where any of these models will be deployed and the API will be available for you to submit data to. Then back to the GPU topic we just got a question coming in from Oleg on YouTube how to automatically run CALS on free GPU is there any spawner for that GPUs with some RAM available? Sure. So how is it generally working in an open shift with GPUs? It's basically that the container when you're spawning a container and it requests some resources that can be CPU memory or any special resources like GPUs the container will run on that node and based on the configuration it will get those resources so for memory you ask for 100 gigabytes of RAM if you have a node which can accommodate that container it will get it. If it's GPU if there is a node that has a free GPU unassigned GPU it will get that GPU right now there is no good solution as far as I know for like splitting GPUs or something so if you are talking about using a GPU it will be one GPU per container or multiple GPUs per container but cannot be multiple containers per GPU there are hacks around it but it doesn't really really really work yet. So for this we cannot automatically run so if you just say in your code running in a container hey if it's there if there is a GPU I'd like to run this on GPU that's on how it works the container part itself has to specify that it requires the GPU to run and if there is a free GPU it will be assigned to that node and it will be it will be run on that node and it will get the GPU kind of mounted into the devices of that container and the code inside a container can use that. Problem with that as long as that container runs the GPU is allocated and cannot be allocated to something else so there is no smart way right now to let's say okay if there is a free GPU run on GPU if not do not run on GPU we don't have that and I don't think anyone really has potentially you could write an operator which would take your code and inject some information whether there is a free GPU or not and then based on that the code would change the execution path and then maybe based on the information from the cluster it would either get or don't get GPUs but but I haven't seen such solution yet but in general there is no automated way how to how to decide this. Available in open up or in open shift in general. Thanks for that and I'm looking at the time and we're almost to the end of the hour so maybe Lendon if you want to share that slide where people can find additional resources again while we blather on a little bit more that would be a great way to to end the hour here and I'm just wondering because you have lots of different partners and different integrations into this have you done anything just because they are now part of our family with the IBM Watson stuff have you has anyone integrated any of that into and use it from a Jupyter notebook or is that something for future briefing? We actually have an issue created where IBM provides a CUDA enabled container image where we basically cannot redistribute them as open data hub and as Red Hat we cannot redistribute CUDA binaries in our images we always have to build it on spot so if you deploy open data hub and you want to use GPUs and you want to have CUDA enabled images you have to build them in your cluster which is fine we provide all those build conflicts and everything it just takes time and some resources for the build whereas IBM provides these images in actually Red Hat registry so we have an issue for looking at whether we can use that image as a base for some of our Jupyter notebook images so that we don't have to rebuild build them but we can actually leverage what they already provide on the other front IBM is very active in kubeflow communities so we are talking to them often in our community calls and in kubeflow community calls and coordinating basically the kubeflow operator which is a base for open data operator now has been built by IBM team with our guidance and contributions to ideas documentation and some code but they did majority of the work the open source team at IBM so really really good collaboration there awesome well we're going to have to get them on again soon and see if we can't make that all work and explain how that all works too so I really want to thank Beverly for for stepping up and organizing this today and making it happen and the whole team from open data hub for coming and answering questions and sharing your your wonderful project and congratulate you on on it's really come a long way since the last time I did an open upstream conversation on it so it's really amazing to see all this and I know I've been talking with folks like Guillaume around some of the work that he's doing up I'm up in Canada as is Guillaume and with the COVID project in the that the Ontario folks are doing and hopefully we can get him back on again talking about that some more that I think we did a briefing a little while ago but I think a lot of people out there in lots of different spaces are leveraging what your what started out as simply a reference architecture and has turned into a real community around things so kudos to y'all for making this happen and thank you Landon for making the demo work and explaining the fix and we will have you guys back again soon for new updates and new use cases for this so again here's all the information you need to find everybody and hopefully we'll get your feedback and have a few of you on showing us what you're doing with ODH soon so thanks again everybody thank you bye and we will upload this and put the slides link it to on the youtube channel r.h open shift and I'm sure the open data hub folks will steal that video and put it out on their feeds as well so look for that shortly thank you all for taking the time today take care and be safe