 Thanks everyone. I'm Sherard Griffin. I am a senior manager at Red Hat and I work in the AI center of excellence Where we work on open data hub and I'll explain a little bit more about what open data hub is in the second here Yeah, all right. We're good to go Before I dive into this Dan did an excellent job Talking about what it means from an ethical standpoint to do AI what it means in terms of what we have to think about How we make it into a practical application I'm actually going to take it one step farther and dive a little bit more into the technical part of this How can you get started? How can you your customers get started with your AI initiatives and we're going to round that out with something called open data hub? I'm actually going to start with this with a little bit of background of how we got the name data hub we started this project internally at Red Hat and the focus was more specifically around Aggregating data that was the primary reason I came to Red Hat is because of the experience Dealing with big data and we knew we wanted to do that on OpenShift and show the capabilities of OpenShift as a platform for data Engineering and ingestion we had the data lake all of a sudden one of my colleagues Marcel who heads up the AI ops team said Hey, Sherard, we have all this data. I want to do some data science work on it What can't where can I point my data scientists and say, huh? You know, oh, yeah, we've got all these tools for data ingestion. No, no, no I want to do it more of an AI type of thing So what we decided to do was figure out how to bake AI into what we were doing on top of OpenShift And that's how we got the open data hub and really the problem that we tried to solve Was how can a data scientist instead of bugging everyone on my team every time they want to do something? How can we enable them to have more of a self-service type of infrastructure? How can they just go into an environment request the resources and the technologies that they want have that working in a Collective ecosystem and then get on get on with their initiatives and be able to get some results out of it So if you look at that whole self-service model, that's one of the big drivers of open data hub Is how do we enable the data scientists to do what they need to do in a flexible manner? It's what we found internally was needed So again, I don't get a bunch of service now tickets and requests for For onboarding, but then we also talked to a lot of customers We've had a little bit of a roadshow over the past year where we've talked to many many different customers And it turns out that they're interested in the same thing Some of the current challenges that the data scientists both internally at Red Hat and at our customers were facing One of the big things is that they were all working in their own isolated one-off environments Whether that's a laptop or a server housed away somewhere tucked underneath one of their desks It's very challenging for them to number one have a way to be able to share their work together Right be able to take some kind of model that they built and say hey, this is really cool It does something tangible. Why don't you go check it out? To the one of the biggest things is just limited resources if you imagine the environment where let's say you do have that machine Tucked under your desk. What happens if you need more hardware? You have to bubble that up the chain even if you have traditional it infrastructure What happens if you need more hardware? You have to you have to send in a request to it It then has to order the hardware the hardware takes a few weeks to come in next thing You know you spun it up and you know you're off to the races, but oh by the way It took you three months So what we're trying to do is figure out ways to address those challenges to not only lift the burden for IT But also make it a little bit more flexible for data scientists What we gravitated towards was open shift helping to solve that problem and when you look at what we're trying to do It's not that different than what you would see in an application Lifecycle you want to be able to have some kind of self-service environment and in this case We're replacing the term developers with data scientists And you want to be able to have them do all the work that they need to do and push something out into production So when we look at why open shift was so key and so relevant for us to build this platform on top of the one thing the number one thing that stood out is the fact that it allows us to Have such an easy mechanism to be able to deploy something into production It's very similar for what an application developer would do you have these iterations of testing something you want to push that out The other thing it allows us to do is be able to load balance these services and be able to kind of You know you you span out horizontally, but you can also span out vertically very easily, especially if you integrate open Open shift along with something like open stack So what that allowed us to do is every time a data scientist deploys a model that models a microservice And then we scale it out depending on the demand and i'll show you an example of that shortly We also have the ability to orchestrate these microservices these these you know machine learning microservices and be able to You know schedule them for training deploy the model as a service do whatever we need to do there And not only deployed in one environment, but we actually real world problem we have today in red hat we have Some of the teams who have infrastructure in amazon and then we have some of the teams who have infrastructure on prem How do we do that same level of workload and just and shift all of that work to whatever resources we want to pretty freely Without we're using the same infrastructure and that's you know the whole hybrid cloud solution that open shift gives us And we just show the capabilities of doing that from machine learning perspective Where is ai strong right now if you look at a lot of the customers that we're working with They're they're starting to do their ai initiative. This is not a proof of concept This is not something that's way out in the ephemeral cloud and everyone's talking about it But no one's doing it. We actually have real tangible results that are being generated using open shift and using A lot of red hat products and open source products in order to in order to do those things So here are a couple that are interesting right here. We have xon mobile. They'll they'll talk about some of their use cases today And it's it's growing. It's a growing list and not only that we're using it internally at red hat as well we generate about you know about 300 gigabytes of data per day that flows through open shift that's available To our data scientists. That's just from our build systems We also have a massive amount of telemetry data that's being generated and we're doing ai work on On a daily basis. So the data volumes are growing and the data scientists are being are having more and more capabilities in order to do their workloads That all leads me to the open data hub project I mentioned a lot about the experiences that data scientists are looking for the experiences that it's looking for How does that all get rounded out? What we decided to do is take all of those lessons learned from running an internal ai as a service platform at red hat And surface that up as an open source project where we have the open data hub It takes all of those little things that we worried about not just the machine learning aspect of it But what comes before that and what comes after so you'll see a lot of focus in the open data hub I mentioned this at the beginning about data ingestion and collecting data and how you build a data lake whether that's a virtual or An actual data lake across many different clouds Then we focus on how do you prepare and massage that data you do things like cleaning it You know if anyone ever sells you hey the very first time I ingested data and it was perfectly Ready for a machine learning exercise. You should probably not get that person on the project again There's always some work that has to be done from a preparation perspective And then you're all familiar with the machine learning building a model training it Pushing it out to production, but then once you push it out to production your data scientists can't say hey Oh cool. I'm done. I can go home now. No There's a lot of work. There's a lot of work that happens after it goes into production How do you model how do you monitor it for drift? How do you make sure that that model is continuing to be accurate over time? Just as if you were doing something with application development and you push some code out You want to make sure that it's actually functioning over time And and it's a little bit of a joint operation between the data scientists and the sre's and we'll show a little bit of that as well If you're interested in the open data hub project t-shirts out front, I think they're pretty cool The funny story about the t-shirts I realized that we've been handing these t-shirts out for about a year and a half now And like no one on my team has ever gotten the t-shirt So I got to stuff my suitcase full of t-shirts on the way back Just to make sure everyone has some It is a blueprint architectures if you look online You will be able to see a little bit more detail in terms of how to get started But then also, uh, you know what the whole vision of the open data hub is and then how you can get involved When we talk about open data hub, I'll show you a little bit about the components that are in there But I want to set the level, uh, you know level set in terms of What is actually the vision of open data hub and where we're pulling from? It's not just a collection of Technologies that are kind of one off. We're actually trying to tie ourselves to upstream communities And the relevance for that is as the communities grow then we enable these ai workloads on open shift and those capabilities will grow As well, so we do a lot of partnership with nvidia We we have several components that are part of coup flow and we're having stronger integration with coup flow in open data hub Uh, we have some things with seldom, you know pie torch spark a lot of open source technologies We're kind of pulling from those communities and wrapping it all into a nice package that can be delivered on open shift Now let's get into the the meats the the meat and potatoes. I'm a meat and potatoes kind of guy So with this what I'm going to show here is just a little bit of what's deployed when you actually go into open data hub We have a number of different things that are relevant for both the The data engineering side of it and the data science and machine learning side We have for data science work. You have jupiter notebooks. You have seph for your data lake For ingesting data. You have kafka in this case. We use trimsie, which is an operator We also have argo, which is great for your pipelines And then we also have from a monitoring perspective for meaty from ethios and grafana We have seldom for model serving and we have spark. This is just what's available today in open data hub Internally, we run a lot A much broader stack. Don't worry about the details of this diagram You can go to the website and get more information But this gives you a little bit more insight in terms of what's running internally at red hat and what's being poc'd out to move back up to the open data hub One of the things that we do is we do a lot of processing data as it flows through kafka So we do things with kafka connect. We do things with ksql And kafka consumers and producers. We also have log stash fluent d rss log for for data ingestion We all we build a data lake. We just happen to do our data lake in seph But there's also s3, you know any other technologies you want to do for your data lake and then from a You know from if you look all the way at the top From an analytics perspective We do a lot of analysis with hue which is cladera hue and then we have cabana as well And then for the model lifecycle, we have kube flow ml flow selden We also have something called the ai library ai library is a predefined set of of machine learning models that you can get up and running really quickly and those were been built with Kind of community efforts. So you have things like sentiment analysis Cluster detection, you know all kinds of interesting things that you can have right out of the gate And then for our business intelligence perspective, we're rolling out superset pretty soon Now going really quickly right before I get to the demo here I just want to you know kind of go back to what we said before We're moving from a world where data scientists are on their own machines Or they have some some isolated environment that they're doing their work And what open shifts allow allows us to do is to move them to more of A centralized place to do all that work not only can they share the resources they can share their models their notebooks But then also it just makes it a nice place where you can actually push that model out into production as a service And then that service itself can be managed and monitored just as if it's any other application in open in open shift So with that said I'm going to roll this demo This is going to be interesting because all of my links to the demo is on the other machine So I'm going to try and do this from this machine and we'll see how this goes All right diane. I'm going to pull this up here No, we don't need the vpn All right So I'm going to log in here And one of the first things I'm going to show you if you want to get started with open data hub Very again very easy to do what you would do is we're going to go into this Thing called operator hub and just show of hands how many people have actually played around with open shift 4 Okay, so a good amount of folks. I'll take a little bit of step back and explain operator hub We're moving into the world of operators. You'll hear this a lot diane mentioned it earlier Operator hub allows us to build out these operators that are really intelligent You know ways of managing infrastructure and managing your applications And we've released open data hub as an operator It's basically you can think of it as a meta operator where it's responsible for other operators like spark operator Strimsey kafka Selden so In this case if you want to get started instead of looking for all those individual applications that you want to install You can just go in here and go to And and type in open data hub And then you'll see this open data hub operator if I click on this open data hub operator I then have the ability to Install it and if i'm installing it then That'll allow Anyone who has a project in an open shift to be able to deploy their own version of open data hub now For the sake of time i've already done that and i'm going to zoom this in a little bit in case that's a little hard to read back there And I actually have a project already ready to go And if we look really quickly, I just want to show exactly what gets deployed in open data hub And again, this is just kind of the options that i've selected you can choose different things to to deploy Whether you want all of it or just a couple of things you can it's at your will But I mentioned this before we have grafana. We have spark operator. We have strimsey operator And we have a couple of other things that are deployed here jupiter notebooks Um, you know some and then what the example that i'm going to work through today is actually a spam filter Where you're trying to detect legitimate messages from fake messages Now what i'm going to do here now that you see what's actually set up i'm going to open up jupiter hub here I'm going to log in Before I do that. I want to show you one more thing here One of the capabilities that we've added into jupiter hub So i'm going to log in as a environment here And i'll show you Some of the interesting things you can do with jupiter hub here When you first when your data scientists first goes into jupiter hub They have the ability to select these notebook images in this case We have several different images and the image it allows us to kind of prepackage up Some resources that we want available to the data scientists right out the gate in this case What you'll see is a spark along with side pie, but you can do anything like tensor flow Whatever other types of technologies you want to add there They have that ability to just kind of select and choose this is We have some predefined ones that you have out of the box But of course it's a community if there are others that you want to add you can always contribute it Or if you have something private that you want to roll internally you can do that as well They can also select the size of the cluster that they want here So anything small medium large and in this case, this is specifically for the container running the jupiter notebooks The other thing that happens here I'm not going to show an example here, but you can play around with it on open data hub Is when I select that I want a spark cluster Behind the scenes when I start my notebook server You'll actually see a spark cluster spin up specifically for that data scientist And that can be whatever size that that you want You know anything from just a couple of workers to 10 15 workers Whatever you need for your for your work And then the cool thing is once you terminate that notebook and say hey, I'm done with today's work The spark cluster cleans up automatically for you and it goes back into the open shift ethereal of hardware available Also, what you can do here is decide how many GPUs you want to use And the workloads will actually be run on that GPU So if you have GPUs enabled in open shift It's as simple as changing this number to whatever you want and you have the ability to do that Some of the other options here I won't really explain, but if you're interested in about about them, you can always take a look online So I've already started a notebook server for you For for this demo and what I'm going to do is walk through a couple of things here The first thing I'm going to show you is we have this feature engineering workbook It's feature engineering workbook. It's really about Preparing the data and making sure that we have training data ready to go for the model in this case you know what I'm what you'll see here is it's All this fancies data. So I'm not a data scientist So I had a data scientist create this it makes me look smarter than I am But all I he told me this when you look at the graph all we really need to know is blue is Legitimate messages orange is spam you want to see that those diverge so that you're correctly Associating legitimate messages with spam messages. And so we're good to go We know that we have a good training data set That we can use and now let's move on to the training aspect of it. So I'm going to go back and select The notebook to train my model Now once I have this notebook for training my model You'll see here that I'm going to just quickly and let me zoom in a little bit here for you guys What I'll show you here is all we're really doing is Deciding what is a legitimate message and what's a what's a uh a spam message and in this case Dark blue is great. You know this lighter color is great as well That means again the messages have diverged and we can clearly tell what a legitimate message is from a spam message Awesome, we're good to go And then what I want to do here is let's let's play around with this a little bit Let's actually See if we can Get a just a little bit more information about what's going on So in this case, I can see that I've got I predicted that this these messages were legitimate and the actual result was legitimate I have a pretty good accuracy of 94 percent. I have some that were predicted legitimate But we're actually spam. That's only you know, that's only about five percent I have some that was spam that we were detected that uh was actually Legitimate, but they were we thought they were spam That's about two percent and then the ones that were spam that actually were spams 97 percent So i'm good, right? I'm happy with those results. Again, it's not perfect like dan said, but it's good enough to start with And then i'm gonna just run this next one here and show again the accuracies, you know, everything looks pretty good So now now that I have that what I want to do is Start to deploy this as a service And as I deploy it as a service what i'm going to do is i'm just going to run through all of these cells really quickly It doesn't take long And we'll see some results start to come out here so And one of the things I actually want to show you is how I deployed it as a service To take a step back here and go into my build configs And have something called pipeline and in pipeline what i've actually done is i've taken the model itself So this model notebook and you'll see you'll see this is the actual model notebook that we have here We have something called source to image But we built source to image to work on notebooks If your data scientist has a notebook and they want to deploy it as a model As a service then they can quickly run source to image on this just as if it were an application and deploy it Into open shift as a running notebook and you'll see here, you know, we have that actually running here If I can show you which pod The spam filter pod and we can again scale scale that guy up and down however we want right, you know We can scale it up to 10 different ones So with that said We'll look back at the Service notebook and then we just ran a quick test now that it's up and running All I all I want to do is send it a note of a message and i'm sending it a message to arrest I'm saying hey I'm going to send you something dog food dog food is detected as spam And then i'm going to send you the second message and that's the second message was sent as legitimate And great everything looked good Then I can keep going here and i'm actually going to Now predict a few more few more messages here And what you'll see is again all i'm doing is i'm sending arrest message to the service and i'm getting some results back You can see these were pretty good, right? It's pretty accurate again So now that I have that A couple of things that I can do here Open data hub also deploys with prometheus as a data scientist. I want to i've tested this out. That's awesome Now I want to actually see how does it performing over time? And that's always an interesting one because we always think about the Deployment of it is hey it worked that first time, but we don't ever Think about how do I test this and validate this over time? So what I want to do here is i'm actually going to go back to back to the Prometheus And I'll show here Exactly what's happening over time and let's see here If I can remember the Med trick hold on one second here. Oh, I'm not i'm not gonna have time. It's on my machine But i'm not gonna have time what I would actually show you is uh, it's a really cool graph. I wish I could show you here Let's see here No, I won't be able to pull that up unfortunately. It's on my machine the the command for that. Um, but what uh, well What's cool is you'll actually see a graph of activity here But one of the things I can't show you is that if we look at the metrics itself here There are tons and tons of metrics that come That come from What I actually have here and what you'll see is these metrics like pipeline predictions created total pipelines the You know that how many basically just you know, what's going through the system? So if I if I click on one of these let's say this guy um You'll actually start to see some activity and here it's not as cool of a chart as I wanted to show you But this is how you can see right out of the gate. I've deployed something It's a nice microservice and it's actually giving some results and these are actually changing over time This is all coming through live And the data is flowing through one more thing really quickly that I want to show is that we also have Grafana that's hooked into this and so when we go through grafana We can check out a number of other Metrics that come across and in this case What we have here is some kafka metrics and again, you'll see everything is flowing through this is when I started the spam Detector this morning and you see that some data is actually flowing through the system So again, all of this with open data hub you have prometheus grafana seldom all of these different technologies rolled into a nice deployable package for you and we're going to continue to release more and more technologies to Round out the the whole ecosystem of the end-to-end ai pipeline for you. Um, and that's that's really it You know, it's the whole project and we're very excited about it We're great to to have it as a nice foundational piece of how you can do ai and ml on top of open shift The the landing page so people know how to find. Oh, yes. Yes. Yes never remember to do that. Yes So of course always go to open data hub dot i o That's where and then there's a link to community if you want to know how to reach out to us And then also docs if you want to know how to get started. Thank you dian. Cool. Yeah, there you go. Great. Thank you