 Well, hello, everyone. Does anybody else notice the summer flying by? If you are here watching the show, we do the last Tuesday of every month. And so that means it's already the end of July. I'm not sure when that happened. So I'm Langdew White. I'm the host of the show. This is Cube Buy Example Insider. And what we do with the show is we try to interview people who are involved in the Kubernetes community somehow and talk to them about what they see coming next in Kubernetes. And why do we do this? Because if you've ever worked with open source before, you understand that product managers often do not know what is coming, because the challenge with open source is that even if somebody tells you you can't do something, you can spend your weekend and do it anyway. So as a result, it's often better to actually talk to the people who are actually involved. And so my co-host today is Daniel Oh, who you've all seen before, but Daniel would like to introduce yourself. Sure. Yeah, thanks for the rundown. Hey, everybody. Daniel Oh, I'm working for Red Hat as a developer at AboKid. And also, I'm really love to have some conversation with platform engineers and then SRE. I'm super happy to be back here. And then it'll be great time to have some conversation around the open data hub today. And Anish, do you want to introduce yourself? Yeah, sure. Hi, everyone. My name's Anish Astana. I'm a senior software engineer at Red Hat, who's been working on cloud-native things for five, six years now, street art college, actually. Last couple years for me have been focused mainly on machine learning on the cloud. So more from a platform perspective was this actual data science workloads, right? Like just enabling them, I guess is what I'd say. Most recently, I've been more involved with distributed computing. But again, that's pretty much so. And the machine learning we've asked. That's cool. And so one of the things we always like to ask people is, what got you into kind of the open source world in the first place? This is going to, well, this is not going to be as interesting as some of the answers you've had before. But I'd say three pizza. Red Hat had a firing event on campus. And I'd never heard of them before, that anyone machines used them to us and make the labs. I was like, OK, three pizza. I'll go check it out. I talked to some people that got an internship. And today, I can't actually imagine working in a company that doesn't do open source. I have friends, classmates, whatever, who worked at companies where they can't publish anything. Or you talk about this stuff, and that sounds insane. Yeah, it really corrupts you. It's funny, because I mean, I spent 15 years or something working in industry and proprietary kind of software world, especially in consulting, which was even worse. Like they didn't even want to let you retain the IP, much less kind of make it public. You always were doing it for a client. But after, I don't know how many years at Red Hat, I can't imagine working in the other way now. It's really quite interesting how much it changes your kind of perspective. So for everybody who's out there listening, if you haven't contributed to anything in the open source world, we recommend it. It really does make you a better programmer. It makes you better at playing well with others, those kinds of things. And you should definitely try it out. So the next follow-on question to that, of course, is what brought you towards Kubernetes? Yeah, so that's a great question, right? So boring answer, I interned on a team that was the OpenShift Business Unit, right? So we ended up just reading Kubernetes there. Over time, I've actually started seeing the value of Kubernetes initially. I was like, OK, what is this thing, right? Like we have virtual machines, so why am I bothering with it? But like, as I spent some time running an internal platform actually for Red Hat, the methods of containers reached out to me, right? Like being able to, Kubernetes is a state that has so much stuff for you. And it made at least my operations much easier than what it previously used to be. You know, I don't want to turn this into a talk of like, oh, yeah, you know, here's what you should use Kubernetes. Yeah, yeah, yeah. I don't want to be playing out for it, but I do. Yeah, well, I think that's a common experience, right? Especially if you start out in containers, you know, kind of working with containers, you quickly realize that not being able to orchestrate them is really a pain in the butt, even on your laptop, right? And so that basically explains the popularity of Docker Compose, right? Is that huge makes it so much easier to kind of orchestrate things, but then as soon as you want, it's something a little more complicated, right? That's when you kind of step into things like Kubernetes. But yeah, it's kind of cool. So I know you primarily are responsible for what's called the open data hub, or that team or whatever. Can you give a little bit of background on what that is? Yeah, sure. So I'll give you kind of the whole historical context behind it, right? Like when I joined Red Hat full-time, our team was working on running an internal platform for machine learning workloads. Previously, you had lots of different teams running their own sort of data platforms. And it just didn't make sense, because I mean, having on bespoke implementations. So we start building that out and already on Kubernetes. Over time, we realized that these problems are actually relevant to the community, right? And Red Hat does open source. So we said, OK, let's throw this out and reopen. And I realized I haven't actually talked about what it is. It's a machine learning platform, right? So kind of the primary idea behind it is that with any machine learning workload, like you don't have just data science or a model that's being served, right? You have a lot of different things that go into your overall workflow, right? You have to have your data ingestion. You have to have data storage. You have to have some way to automatically run workflows to clean up data, to train your models, right? For your actual data scientists to experiment with data. And you can get a scientist anyone, right? Like, you spoke around the data, come up with models, and you're on the train them, you're not served them. And in what you've served your models, that's not it, right? You have to wonder your models. You have to make sure that they're sort of behaving as you expect them to be, right? Like, if they're hitting their target metrics, whatever that may be. And that's really a pain in the butt to run, right? It's not easy to set up. And there's lots of great open source projects that do portions of it, but like sort of running it in one unified story is like, it's not for challenging. And that's really what the open data hub is doing is that we've seen like, here are some more user stories of workflows that we think should be supported well. We've looked at open source projects that are kind of very popular for those use cases, right? So like, you have to put in notebooks, right? How do you run that on the cloud? You have keep so pipelines for running repetitive workflows, right? You have that on all these kinds of things, right? How do you build them, bring them together into kind of a cohesive user experience, right? Because I look today, if I was to install each one individually, my users would hate me because the experience is not the same. I think UI is very different across the board. So how do you bring that together to something that's easy to deploy and then also easy to use for data scientists, right? You don't want to have people learning new UIS constantly and then going back and forth between the two pages, you're gonna make it, like, sort of easy. A lot of words. Any questions on that? Yeah, I got one question, maybe. Yeah. And then you're muted, I think. I was muted, yeah. But go ahead, Daniel. Yeah, so I got one question actually. Yeah, I can't read it as like a developer perspective. So if I understand currently, it's often data hub, like provide some of the AI platform and then data scientist stuff. And then as the developer standpoint, I'm just wondering, is there any kind of like a consumable API for developer to use like some AI, like a specification implementation? For example, like a generated AI, like a chat APT or co-pilot and then things like that. Is it some kind of platform to generate AI machine learning? However, like a developer is some user perspective. I wanna use the kind of API as a consumer rather than producer. So yeah, I'm just wondering, yeah, is there any kind of such a thing in the open data hub, like some feature or service, even if it's some kind of roadmap? Yeah, so the open data hub is like, there's an operator for it, which we're actually re-architecting right now and I can probably talk about that later. And sort of to your point, right, about like consuming or interacting with APIs, you can serve models using the open data hub, right? So as a user, could train a model or a producer, could train a model and serve it up for anyone to interact with and use. We've had a number of very interesting use cases come up where people have sort of models and they have lots of other APIs trying to sort of wrapping that and then interacting with the models and getting inside from them. You may or may not have seen some demos actually at Summit that involve the open data hub. Yeah, I saw that, so yeah, perfect. So about what you're, I mean, to further answer kind of Daniel's question, right? So the idea is that if like you have an organization and you wanna offer some of these models essentially to your customers and those customers might be like internal users or it could be external users, but the idea is that here's a platform that is going to, because one of the challenges I think is that you often have data scientists who are doing the work of kind of creating your AI or whatever, right, your models and they are not generally speaking trained as like total software engineers or especially not kind of operators, right? And so how do you make that model available that isn't, oh yeah, download this model and run this Python script on your command line, which I see a lot at BU from students is that, oh yeah, it's all done, it's all deployed and everything and you can just use it. All you gotta do is download this 50 gig model and install Python and these libraries and run it and it'll work just fine. So that's really the kind of problem that open data hub is trying to solve, right? Yeah, that's part of the problem, but yeah. Okay, so what I was gonna kind of ask a little bit more for background is like, so if I'm kind of an organization and I'm running some of these models or whatever, like what do you think the trigger point is where it becomes worthwhile to potentially deploy something like open data hub because it's like I was saying with containers and Kubernetes, right? There's a kind of a threshold you get to that says, oh, now you really need orchestration before it's gonna really burn you. Do you have a sense of where that is in like with open data hub? That's a great question, I'd say, right? Like so I'd say one thing that we've really tried to do is make it as easy as possible for you to set up or install, right? So if you have cluster admin or if you know some of the cluster admin on a cluster, you can go ahead and just install it today. Speaking personally, I think it's when I start, when you start working with more than a few people on your project, right? Like everyone has different ways of like, okay, you know, I wanna deploy work on my models, right? Yeah, you can work on, do data science work on your own machine. Probably shouldn't, but you could, right? Like when you start getting into a team that's more than a couple of people that has a shared environment where they need to be running the notebooks and then they need a consistent way to solve models, right? Like where you're out of like the student project, the OC stage, do like something like, oh, you know, like this is something people might actually care about or where you are working on a problem that people care about. I'd say that's right, my doors, right? Like, if today I was, I knew nothing about open data hub and any new data science, probably, okay. But I know a little bit about it and I am aware of all the different problems that can come up with a data science project. I think it makes 100% makes sense to use it. So what, but what I'm hearing is that it's kind of, it's almost people-based rather than problem-based, you know, in the sense of like, you know, like I was saying kind of with Kubernetes, like the trigger is that you can't keep track, right? Whereas it sounds more like with open data hub, it's more collaboration enablement, you know? So if you got two people sitting in the same room together, maybe you don't need it. But if you got, you know, five people spread around the world, it's probably a different story. So I think that's always a good thing to kind of know, right? Cause then you can have an eye out for the triggers when it might come in handy. So I wanted to ask next, what, you know, what is the user type you're really looking to support? You know, who's your primary customer as it were from a user persona perspective? So I'd say like there's really two main ones, right? One would be your platform administrators who don't wanna go crazy, right? Like running all these separate things is challenging, really challenging. So we wanna make it as easy for them as possible, right? To manage, you know, like what resources are available to their internal customers, right? What RBAC is set up, you know, those kinds of things, right? Or even what projects are available for people to use. And that's kind of like the boring backend side of it, right? But then from a user perspective today, I think I'm gonna focus is data scientists, right? We don't want people having to, anyone who's working to build a net ease or most technical already, you know, you're just gonna write the animal for days, right? Like, we don't wanna have people forced to do that. We don't wanna force people to go into command lines, like mess around with things like that. And we wanna like really focus on data scientists using workforce that we're familiar with, in a way that makes sense for them. And we don't wanna force people to do things that don't really align with their skills that. Why are you only saying data scientists, right? Like a data scientist who doesn't care about something underlying, so. Yeah, there is just one case. I mean, so a lot of people heard about like a new role and responsibility around the AI platform, such as like a devil's engineer and then data scientists and then like a data engineers. So would you really explain what kind of responsibility and role for each like a data engineer and like a data scientist in terms of how they can use this open data hub with some like some features or like some specific role and responsibility. Yeah, and I can probably even talk through some projects that I've seen happen recently doing that, right? So the assigned is just the easiest one. So I'll start with that. You know, the work is going to be the most challenging. You look at the data, right? Like you go into your notebooks, you wanna access the data that's stored somewhere, you experiment with it and then you try running some models on top of or training some models using that data. You look at the numbers and then you say, okay, you know, like this seems good enough and then you save your model to storage and then sort it out. Data engineers would be more focused on keeping the data stored efficiently, right? Data ingestion pipelines, you know, you may have workflows that you're running regularly to compact and clean data, right? So you may have data coming as like JSON daily or whatever, right? And then you have a job running every day, every hour, every week, which compacts all the data into one single RK file, right? Or something similar. So that's where data engineers will focus. I have a really nice slide talking on somewhere else which I should have brought with me today, but like really clearly demarcates like where most guys would fit in. That means that like, you know, that's where they primarily play. That doesn't mean that's where it ends, right? Like you may have data scientists, right? Pipelines that will, pipeline workflows are used interchangeably. You may have them write pipelines that will regularly learn to retrain the model, right? Like they say, okay, the model is like this. We've had X new data come in. Let's make sure that the model is being kept up to date and trained. And you mentioned like the DevOps I think don't know is also ML Ops engineer, right? And they're more worried about like, sort of that bringing more software engineering and that practices to the full cycle. Personally, I feel like it's kind of just like, you know, just another term people have paid off, right? Like it's latest DevOps engineer, but you like you're just working ML Ops machine learning. That's because most people have DevOps engineer title, DevOps engineer title don't actually do DevOps. That's kind of part of the problem. Yeah, that's a big project if you haven't already. Yeah, yeah, sure. Yeah, that's a really good to know because I've been asked a lot of specifically, I mean, a lot of people actually ask me around the AI platform and then AI technology. And then they really looking forward to some, like how to change processes and then organization when they adopt AI platforms, like they want to create a new role and responsibility rather than just like a fancy DevOps engineer title. Yeah, that's really good to know. Thanks. And then your website reliability engineers, right? Like where did the lines and titles end? So that's why they really want to like bring up new AI workflow as a part of the organization. And then they need to also need a new role or like some responsibility to handle that. That's really good. So I guess you're kind of wondering is like, is this creating new specialties that are needed inside of an organization or do your existing skills kind of apply? Like, so like I did a master's and like with a focus on machine learning, right? So I think having some beta science background has helped me and got involved with all the work I'm doing, right? Like Ina's has run into a platform because I was able to at least at a high level, right? Like speak the language, there's a means for speaking, right? Like they'd say, oh, you know, like, we're trying to do this thing with this model, right? And like here's where it's breaking or you know, it's like high level terminology. Like being able to understand that helped me develop solutions the right way for them. Where that makes people feel like, hey, you know, like I'm trying to do something by torch or blah, blah, blah, I could like train this model and like it isn't working. And I wouldn't start competing from scratch. Like I'd be at a jump start versus someone who had no idea what's going on the data science thing. To me, I feel like, I mean, being a journalist is like something always been important. Like you need to be able to pick up new technologies and like data science, but machine learning is just another evolution of that. So kind of related to that. So what do you consider your job? Do you consider yourself kind of a, I mean, not what you were trained for or what you could do, but just like kind of, you know, on your day to day, do you consider yourself more a data scientist? Do you consider yourself more a software engineer? Do you consider yourself more like a, you know, site reliability engineer or an operator, I often say for short? So today I'd say more of a software engineer. If you'd asked me this, like what are two years ago, I would have said operator or site reliability engineer. And recently I was being focused more on like building out the upstream and open source community around the open data level. And so with that, like I've transitioned more over to like just developing the project. One of the nice, really nice things that I'm gonna work on open source and then like in this space is that there's a lot of projects, right? There's a lot of communities with a lot of smart people in them that I get to work with and contribute to and then like learn from on which and like doing the open source horn, right? Like... Yeah, what they used to call like a force multiplier, right? Yeah, because of all the terminology that we use in tech comes from the military. So you said you were in the midst of re-architecting open data hub, like why? And what are you hoping it's gonna bring to the table? Oh yeah. So the open data originally started as a couple of fan scripts we ran to like spin up some pods on Kubernetes. This is like five years ago, right? Like, oh, like, here's this thing called operators. We should do that. And they had something called Ansible Operators. So he said, okay, this is gonna be the least amount of work for us to do. So let's just make an Ansible Operator. Well, I agree with you to get familiar with operator concepts. Ansible doesn't give us a lot of flexibility with what we want to do, right? Like, was this just having something like a goal board? So after I said, okay, like, that's me, this is a go-getter operator. We did that and we may have taken a lot of like that Ansible mindset over and do like, oh, you know, we'll just do all the things and like make something called a meta operator where we try to apply lots of different things and like keep our platform as generic as possible. One thing that really hurt us there was that we could do a lot of things okay, but like we weren't being able to really do anything well and like really focus in on like doing something very intelligent with like Jupyter Notebooks or with anything, you know, any sort of special logic. The most recent like iteration of this re-architecture now, which is ongoing is we are really bearing down the components that are deployed by the open data hub as core components and we're saying, okay, here are like really the core technologies that we think you should care about. Here's all this other stuff you could deploy and like that would be kind of nice to do, but like we're not gonna worry about that as a core. And by doing so, we're being able to invest more time and doing things intelligently, right? Where if you need advanced logic for dependencies or anything of that sort, we can do that. Yeah, sorry, I'm very excited about this re-architecture actually because historically we've really struggled with how many different things we were trying to do and now we're trying to do a few things better. Gotcha. And so they, and so how do you kind of see those things going forward? Like is it, you know, for lack of a better term, are you considering something more like a plug-in architecture, like a way to, for other people to participate in a formal way in a sense? Yeah, I'd say so. So, again, in the past, right? We said, if someone said, oh, we should go to the open data hub, it is a yes, and then we'd be like, okay, yeah, sure, we deploy it. Now we are sort of, we've built out a more formal community and sort of acceptance process, right? For like these plugins where people can compose them and like the relevant sticks for like, you know, ABC feature and then why we should care about this technology and then we will work on either tightly integrating it with the operator or like I said, plug-in architecture where they can plug into the UI and like have some sort of likely supported, supported, you know, we used to use the technology. So a mix of both, right? Like you'd probably start off as like a high level plug-in and then really nail that on for time. I gotcha. So, you know, I'm gonna ask this question kind of twice in a sense, but so both now and post-rearchitecture and I think the answer may be different depending, but basically it's like, can I run this on my laptop? Is this something I'd want to run on my laptop in the sense that, you know, if, especially, you know, I come from a software engineering side, so I sometimes have a hard time with doing the data science side because I'm like, I can't, I can't, you know, like I can do something in a Jupyter notebook, but ultimately it'll be in straight Python at some point. So I guess what I'm kind of wondering is like, can I, would it be worthwhile, especially as a training exercise to kind of set it up on my laptop so that when I'm going and working on this stuff, when I want to deploy it to production, right, or I want to throw it over the wall or, you know, start moving it into something that's not a toy, is it something that's got the, you know, is it from a resource requirements perspective, something I can run on kind or something on my laptop? That's a loaded question. And if you have a really, really kick-ass laptop you'd run anything on it. Yeah. 60-30. Exactly. You could run it on your laptop. As long as you're not going to be on your laptop, right, like it's just an image that you deploy on. Now, depending on how many differences, like portions of the open data have you're running, you know, you'll quickly run into resource limitations, right, like if you say, okay, I just want notebooks and the open data, we can do that easily. If you're just UI and notebooks, okay, fine. I said, okay, now I want to do pipelines and you're gonna do model serving. Then you're gonna start struggling on your laptop. My personal opinion is I probably wouldn't run it on my laptop, unless again, like I'm actively developing and like I don't feel like going through like a Kubernetes cluster or kind of whatever. Yeah, and this goes back to the earlier question, right? It's like it really makes sense when you have four or five people, one person is even less than that, right? Yeah, yeah, exactly. I got you. Do you think the same will be true post the rearchitecture? Or like with the, and if it kind of comes down to, it's people-based, right? Like, you know, maybe it's unrelated to the kind of the, you know, software overhead as much as it's just related to, it's not bringing a lot of value for an individual. Yeah, exactly, right? Like that's that's it. Like it's not bringing a lot of value for an individual. Okay. Which is why I'd say running on your laptop, probably overkill. Alternatively, the way you could look at it is like, say I am someone who's like, oh, I want to do machine learning, right? But I don't know everything that I need to deploy and run for it. If you're to just install the open data hub, you get all the core components that you need for a machine learning use case, right? Like all of our projects. So kind of from a learning perspective, it could still be useful for you because, you know, coming in, you don't know what you don't know. And open data hub can sort of help you with at least some of those problems or questions. Right, kind of knowing what the questions to ask are. Yeah, yeah. Exactly. Yeah, that's interesting. Yeah, I mean, that's one of the nice things about kind of pipelines in general, right? In a sense is that they tell you a lot about how stuff is supposed to work, assuming somebody who knew what they were doing set up the pipeline. Exactly. Which is, you know, always a question mark. Cool, yeah, that was kind of what I was kind of curious about, you know, because, you know, it is a problem I have run into before. And so that's why I was wondering if it was like, if it made a lot of sense or if it was more of a independent thing. And it sounds like, I mean, you know, what we should really do, right, at some place like Boston University, right, is like have kind of central IT set up an open data hub. And then, you know, students or whoever could go and use the feature set, but it would be managed at a kind of institutional level. Yeah, exactly. And even if we're not working together, right? Even if we're working on individual projects or whatever. Yeah. I mean, I think that's one of the things I, I went to BU, like Boston University as well, right? Like, and that's one of the things I found most lacking was sort of like this sort of centralized infrastructure that I could really easily interact with for my data science classes, right? Like we had a lot of like that old school soft engineering stuff. But I was like, oh, I jumped into notebooks and like you're working in teams now and good luck. And I was like, ah, I barely know Python. You're asking me to like do these kinds of things. I've actually heard of some like people using the open data hub for classes where someone set up a cluster and they even set up race through resource requirements, right? They found that the students don't need to have like digs and gates of memory. So like I said, very small things and aggressive frameworks that would abuse the platform. And they found it worked well for more aggressive teaching. One of the things that I learned recently is GitHub code spaces. You can actually have GitHub code spaces use Jupyter Notebook instead of VS Code. That's neat. And you can set it as your default preference as well. So I might be using GitHub Classroom combined with GitHub Jupyter Notebook code spaces. I don't know what you call it for one of my classes in the fall to kind of solve some of that same problem. It's GitHub, so it's not open source, but it is a really nice way to get a Jupyter Notebook. And for me, it's also a good way to do like assignment distribution, which kind of leads me to my next question, which is what does open data hub or is it just not a problem that it's trying to solve? What does it do about the data, right? Like one of the huge challenges you have, particularly like I said, coming from the software engineering side where everything is in source control, except for the most important thing, which is my data. So does open data hub try to start to solve that? Or do you have any recommendations to kind of data scientists out there or software engineers out there who are working with a lot of data about how to do source control and not check in multi-gig files and to get? I don't. So this is something we are actively thinking about. There are a bunch of open source technologies for like data versioning and all that. I think one of the projects I'm doing is Backadorm kind of a project or company, right? But like there are open source communities and projects around that stuff. But it's really hard, right? Like I have seen anywhere from like someone says, okay, you know, we have different folders with different versions on them. Do like something, you know, like APIs that built out around the storage, right? So like you have your S3 or whatever compatible storage and you build out an API on top of that, which has a versioning for you. I would say this is something like the open data hub is actually actively thinking about it, sort of versioning both from a data perspective and then also from a model perspective, right? You may have different versions of your model over time that are trained on different data sets or just with different parameters and how do they, like how do you keep track? How do you version them? This is somewhere on the roadmap for the next year or two. So if you're interested, state you and nor. Anthony? Yes? Well, one we're looking at if you haven't heard of it is called a DVC or data version control. And I think the idea of it is it kind of more integrates directly with your source control tools. My problem so far as well is I haven't found a particularly good solution that doesn't require both a server and client and getting something onto a server. A lot of the time is the challenge. So that's not as useful, but it is definitely worth checking out. I'll throw that one in the chat too because it's been around for quite a while and it does purport to solve some of these problems. And that's not something I've heard of. Admittedly, I have not spent a ton of time thinking about data versioning recently, but I can almost imagine whether my team members are screaming at me from behind this screen. What do you think? Exactly. Yeah, it's just, it's a problem for me a lot. And so, it's one of those things I've been, the only, even just for like a student class, right? Like the data is not privileged or anything, it's just, it's big. And I wanna stick it somewhere that I can make it accessible to the students without having to mail around a zip file, you know? And so that's, you know, there's various ways to do that. And, but I haven't found any particularly nice ones. And that's why I was kind of asking the question. So, Daniel, did you have another question you wanted to ask or? Yeah, for sure. Yes, as always, in case a lot of people are willing to see OpenData Hub after this, like a KB Insider, in a way to best like a practice or a process is for like a developer and the data scientists, even if like a platform engineer in terms of contributing back to this OpenData Hub, for example, creating a new component or something other stuff. Yeah, so lots of different ways, right? Like the first place at point, anyone that is Slack, like we have a community Slack for the OpenData Hub where, you know, the entire team lives, like the entire community lives, right? We've had people sort of join in from there and find things that they're interested in, right? Like it doesn't make sense for you to just throw stuff over linearly. Can they add a hashtag with the Anish in there when you're asking some questions to the Slack channel? Yeah, I don't know. Yeah, definitely. I can always directly add someone who can actually answer your questions. But no, yeah, like I understand, like I think the Slack is quite active. We also have in like a GitHub pages and like the community sort of like, yeah, with the OVH community repo, which I will draw in here as well. But I think the main place to interact with Slack, over there we have... Maybe there's the Slack invite link to the... Yes, that's, I have to monitor something desperately putting that up right now. Is it on the OpenData Hub, like website? Yeah, it should be. I can't figure out how to type it into the chat. I can do it if you, yeah. This is our place, right? So one of the things we've recently started doing is like we've established SIGs for all the different spaces that the OpenData Hub is staying in. Signature stands for Special Interest Group. So we have something that's focused on developer experiences, kind of the front-end sites. We have something for platform, which is more generically applicable to the entire OpenData Hub. Then we have stuff around the different working groups for like different marketing components. For more and so, we will provide links. That would be awesome. There you are. You guys just want all the links. Yeah, I just find out some... There are the community meetings, which is really awesome. So everybody interested in helping in the meeting, in the community meeting, they can ask any kind of question and then they will find out some of the opportunity how to contribute back to the community. In the end. Yeah. Yeah, I think some of the great things are open source, right? Like we do everything out in the open and all of our meetings are out in the open. So, anything you understand, you can do. Yeah, that would be awesome. You know, also, I can't resist to ask these kind of questions because I've got to ask some of the Reddit customers, they actually watch this KBE insider online but also like watching, we call the video as well. And then, so some of the people are actually haven't heard about like a push to AI. It's something like AI, like an enterprise product environment. And then, so can you really explain what is the differences between open data health and then like a push to AI, which is similar thing, like AI standpoints? Yeah, so I can also, so then like cut me off at any point, I didn't do too much detail. So, we have the open data, which is the community project, right? Like anyone can participate. And then, it's actually like we built out a downstream like Reddit product around it called Roads, which is Red Hat OpenShift Data Science, right? I think that's actually what they then went at some it was using Roads for a lot of these use cases that I just talked about. So, that's gonna take me an often today. As we've gathered more, as more interest has come up, we've sort of, I say we, right? Like it does mean like a sort of rebranding of sorts to something called OpenShift AI, which is broader than Roads, right? Like Roads is like, okay, I want to open data up. I want the whole kitchen sink, but not every customer wants the kitchen sink, right? Like, there may be someone who wants just pipelines, right? Or they want just models, so then that's fine, right? So, what do we call products under that umbrella? And those will be, you know, whatever they maybe call, right? But like that umbrella itself is OpenShift AI, right? And so- That's really good to know. Is, sorry, is the, is it often, offering models or is it helping you build models or? Oh, helping you build models, right? Like we, I don't want to say anything too prescriptive, right? Like we don't, or as far as I'm aware, we don't offer models or like train models for you or anything of that sort, right? Like our focus is more so on the platform level where we allow you to build your models to build and serve your models and all that. Okay. So, are you, you know, is OpenShift AI is the, I mean, whether it's open source or not, but like, is there an upstream or is this, or is it kind of the same? Like because there's open data hub, right? And roads, is there something equivalent with OpenShift AI or is it kind of like the same thing in a sense? I'd say, speaking as just an engineer, right? I'd say OpenShift AI at compare more to open data hub or like the open data hub organization or community, right? And then within that, you may have specific components and then one installation part of all of these components is roads, which is install all the components. So I'd map OpenShift AI to open data hub and again, like all of this stuff is upstream all the way, right? Like we, everything happens out in the open. So there are no other organizations I'm aware of. All projects. Yeah, yeah, yeah. That's why I was just that, you know, because it has OpenShift in the name. That's why I wasn't sure if it was like the downstream product that there would also be, you know, some sort of upstream. Yeah, it's all open data hub. I'd say it's the upstream for everything. And then just at Hawaii, the OpenShift AI is a brand new name of OpenShift Data Scientist, aka Laws. So OpenShift AI is a brand new name of OpenShift Data Science. So it's more like a coverage or leverage, not only like a platform, but also like a more consumable API for not only developer, but also like a bunch of the other like a law and then users. So that's why we changed the name from Laws to OpenShift AI. So, but if I wanted, you know, if my company went and bought OpenShift AI, right, and then deployed it or whatever, and I wanted a new feature to show up, I should go contribute it at OpenData Hub. Yeah, I mean, you know, assuming it, you know, gets approved and all that other just, right? But that's how that's the path that I would get my feature or my bug fix or whatever to land. Yeah, I'm understanding. Everything goes through OpenData Hub for us. Right, gotcha. Okay, that's why I was, that's what I was kind of really asking for is like, you know, if I, if I want something to show up in OpenShift AI, how do I, how do I make that happen? Right? Well, first you create an issue for it on GitHub and then you, and then the maintainers have to, but yeah, and that's, you know, yeah. Right, right. But it's on OpenData Hub. Not, there's no, that's kind of always looking for. Okay, that's cool. And so what, what do you think is gonna be, you know, the next big change or new feature or whatever, you know, the next three to six months that you think is most compelling, like that's the most interesting for you, not necessarily for your users or whatever, but for you, what do you think is the most interesting thing that's coming? Yeah, so one thing I already talked about was the re-architectural of the operator from a developer standpoint, I'm very happy, like very excited about this because it just makes our lives, all of our lives a lot easier. And then on something I'm personally working on is sort of distributed workloads functionality for the OpenData Hub. So, you know, this is in all the projects using like gray and something about old-flare. And really this is offering capabilities for like your data scientists or your data engineers to spin up jobs and like, you know, jobs scheduling and those kinds of things, but like he's actually great. For those who aren't aware, Ray is comparable to Spark, where it's kind of like just distributed about the Spark, not the Spark. That is comparable to, you know, this is a distributed engine, right? So you can submit a job, it's going to have like six different walkers and it'll like schedule stuff onto them. And Ray is kind of a more Kubernetes-needive Spark. At least that's how I personally think of it. That's cool. Yeah. I'm also looking for the link to Ray to drop in the chat. And yeah. So there is the Ray project. Yep, that's exactly, yeah. So something else to check out. I swear to God, Spark is like the most common name for anything. You know, there's even a Boston, there's an organization within the city of Boston that's also called Spark. You know, so. Really? Yeah. So it's like, it's everywhere and we always have to clarify. But yeah, I totally hear you. I mean, right now there's a comparison with BU Spark because I was talking to some of the other day and I said, oh yeah, like I'm working on this thing called Ray which is cloud native Spark. And they're like, what the heck does BU have to do with what you're working on? Yeah. Yeah. I normally have to clarify the other way. You know, so most of the time. The particularly entertaining thing is I'm actually going to offer a course in the spring that is going to cover Spark. So that should be entertaining. Well, I should talk about Spark. You should do it with Ray instead. Yeah, yeah. Well, maybe I should. Then I wouldn't have the name collision. Yeah. So, Daniels, did you have any more questions you wanted to ask? Not really. The Anish touch a lot of stuff. I'm really interesting. And then curious about the AI platform stuff. And then, you know what? So I'm more like a developer side. And then I'm really wondering, everybody's saying like an LLM and then like a JNAI stuff. And I'm really looking forward to like an open data hub, like some core key role, make it a more JNAI platform, like as a part of the Kubernetes native stuff and to probably like some cloud native platforms eventually in the end. Yeah. So I would say like the open data app today would probably support those kinds of workflows, right? Where we can help you build out your LLM, right? Like if you're not trained like a large language model and so we're through open data hub components, you could do that. I guess Daniel, you're more afraid to like just haven't generated like those capabilities built into the open data hub where it does something machine learning E for you. Like. Well, I think for me, at least it's a little bit of both. I mean, one kind of related question is like, you know, have you considered how you might integrate LLMA, you know, because they're quasi open license, you know, into open data hub such that when I, you know, if I wanted to check that box when I install open data hub, I could now offer LLMA as a service to my, you know, to my users, whatever that means, you know, or any of the other tools like that. That's definitely a very interesting idea. Not thought about it at all. Like all I've done with LLMA is like, okay, like, there's a website I can go to now and like, type in my questions and then have a fixed time for me. So that's. Right. Yeah. Yeah. And I totally hear you. I just, you know, especially, you know, kind of the increasing, let's say like availability of, you know, like at least quasi open, you know, pre-built models, you know, partially offering this. One of the things I go back to is like, there's a project in Apache land that's been around for a long time called Mahoot, right? And it's a bunch of basically crowdsourcing AI, like tools that you can kind of just use, right? And they're not the model side so much as like, you know, to kind of, it's the whole kit and caboodle to build one, right? They don't have the pain level of like an LLM from a, you know, how much, you know, data they need and all that jazz, but it's kind of a nice packaging, right? It's kind of a nice way to just kind of say, oh, you know, I just go here and I kind of look for the one I want, right? You're the kind of thing I want to use, right? And then I can kind of go off to the races. That's something interesting that like, again, we've talked about with some customers has been around like foundation models, right? So what happens is like, what a foundation model is, you have a company like IBM or Google or something who has all the money in the world, right? Like they just go and train like these massive, large language models, right? So like, for example, and it's great, I'm like, okay, I have a model that I've taught English, right? So it understands English, if I ask you something, you can answer that, right? That's all it does, right? Like it's in most languages. Now, as a customer, right? Like see if I am in the business of, if I sell a product, right? Like I can take this foundation model, which knows English, I'll give it all of my data so it understands my specific use case and then I can create a chat board around it so that it can answer questions related to support for my product, right? So if I have, I don't know, open shift, right? Like I can give it all the open shift customer support articles and then it can start answering questions about open shift, my comments, open shift issues. So, and the process of getting it from like, that base model into something that works for you, it's called fine tuning. And that's actually something we are tractable infrastructure for our support for the open data hub is like, okay, we'll be made those when they will show us more and easy. Gotcha. So I would like go check the box that says, you know, I want to use the, you know, barred or whatever. And then, and kind of that would be the beginning of the pipeline and open data hub. And then I would write my code, whatever that means exactly, and put it in in the fine tuning layer and then out the other side, I could offer an API that was for my chat bot or whatever. Yeah, something like that, right? Yeah, exactly. Yeah. Yeah, that'd be very cool. You know, like I said, kind of the, you know, one of the things I actually really miss from kind of the really old days of open shift, but it's kind of sort of similarly there now, it's kind of like the offering of other services through like, you know, kind of marketplace, you know, an open data hub marketplace of those kinds of things might be also interesting. You know, especially if they, if you could, you know, you knew what the status of their, of their licensing was, for example. Yeah. But on that note, and before chat GPT just takes over the rest of the show. Thanks so much for coming and maybe we should wrap it up here. Yeah, because we are already in a ball scenario. So. Yeah, exactly, exactly. Yeah. So Anish, thank you again seriously for coming. We appreciate kind of getting more, you know, kind of understanding, exploring the space around Kubernetes and how it's growing. And, you know, for all you data people out there or anybody who needs to support data people out there, you should check out open data hub. And yeah, and we'll see you next month. Hey, thanks for having me. Thank you.