 Good morning, good afternoon, good evening. Wherever you're handling from, welcome to another edition of the Data Services Office Hour. If you notice, the name changed during the show. So we're gonna talk a little bit about that today. I'm Chris Short, executive producer of OpenShift TV. I'm joined by Chris Bloom and Gion Motier. I'm very happy to have both of you here. I'm very happy to have learned that pronunciation this morning as well. So thank you very much for the lesson and French pronunciation this morning, Gion. So Chris, Data Services, what's up? You haven't been online for a while? You had to go to Portugal? Like you got a lot going on in your life right now. It's great to be back. Yeah, so some of you, our true fans that always watch the Office Hour here, they probably remember me. I was gone for a little bit because I live in Berlin, Germany. And a lot of people have decided to work from home and that kind of overloaded the internet capabilities of Germany. And so I was limited to 0.4 megabits upstream. And so I said, we want to deliver- Which surprisingly just doesn't work for live streaming. Nope. So right now I spent three months with my aunt, very remote, don't have postal address anymore, but we do have fiber internet here. So 100 megabits synchronous. And so I'm able to come back and I hope you're not too sad not to see Michelle today. So- Michelle will be back from time to time though. Just- She will be back. Yes. She will be back. So that's not the only change that I'm back. We also changed the name. So we changed from the OCS Office Hour. We changed it to the Data Services Office Hour. And that's not just a name change. It's also a change in how we change our focus. Where previously we always talked about storage. Storage storage and a lot of people just think about OCS as the solution to provide their persistent storage to PVCs and just all that single use case. Just do the dumb stuff that a lot of other people can do and it was sometimes difficult in these conversations to really position the true values of what our product does. So with the name change of Data Services we're going a step further. We're not just talking about storage because that is a problem that has been solved over and over by multiple companies. We want to show you that we were thinking one step further, we're not just trying to provide some kind of storage capacity. We want to help you run your workloads for the storage. And the storage is a little bit smarter. It understands a little bit more about what you want to do and helps you in doing that. So that's in a nutshell. In a nutshell, yes. So, we basically have two parts of this. So we have the, let's call it the old part. So OCS is now renamed to ODF, so OpenShift Data Foundation. And the other part is what we have Guillaume here now is we want to talk a little bit about data science. So do smart things on the storage and that involves AIML workloads and we can do phone stuff with Chupiterha, just prototype something in Python, let it run. So Guillaume will talk a little bit about that later. Awesome. And that's the idea here, if I may. It's like changing the perspective about what we are proposing. It's not only about storage, which is the implementation, how you do it really, but much more about the business value. What can you do with the storage? Which in fact is, okay, I will work with my data. For data science or purely data for your applications or things like that. But of course, at the end of the day, it's the same thing that is running. You have to store your bits and pieces inside some storage. But now we want to lean a little bit more on this aspect on how can I use my data? How do I integrate my data inside my overall architecture and not only living storage at the end of the chain as something that you don't even consider until you're really needed. It has to be part of your architecture. Therefore, the slight shift of approach. Right, like you can't just dump everything in one place, right? Like we need to think about this in a, like you know how we used to think about how we partition disks. We put the boot volume in the very front, right? Like you have to think about all of your data in not the same light, but you got to think about it, right? Like what am I going to do with it? It's just going to sit here. I have to keep it for regulatory reasons. Can I do anything else with it while it's there, right? Like how do we manage all that? So the entire engineering effort behind the data foundations, I think is like, it takes the name and actually applies values to it, right? Like Chris, you mentioned values and like it's a very interesting proposition here, right? Like I like it a lot. Yeah, so the foundation is literally a foundation. So what we had before is now the foundation of what we can put on top of it. And what we want to talk about is use cases. You tell us more high level about what you want to achieve. And then we can talk about how the ODF or the data services can support you in doing this. And also one of the consequences is we just released a new version. Now ODF 4.7, and with that, we also thought a little bit more about the pricing. So the pricing will be a lot easier to calculate with this new version. And yeah, and we're starting in this data services approach where we add things onto our foundation that you can then use. Beautiful. So name change, whole like transformation inside your department, whatever you want to call it, business unit department, I don't know, yeah, whatever it is. Is it working so far? Yeah. Asking the hard questions here, Ana. Yeah, that's important. Someone needs to ask the hot web questions. If you have any more hard questions, just write them in the chat. Yeah. So I don't think I'm up for it. So like I said, we just released it. I think yesterday was the GA 4.7. So in the last 12 hours, it has worked. In the last 12 hours, we've killed it. Yeah, awesome. But obviously before the GA, we had a lot of internal discussions about this. We wanted to actually understand, are we doing the right thing? Do people want this? Do people understand this? There was a lot of conversation about how we should position this, how we should do it. And in the conversations, when people really understood what we want to do, obviously there were a lot of people that were sad, that were leaving that term storage. A lot of people look at this term and say, well, now we're not storage anymore. What does that mean? But it's a new term, it needs getting used to. But once it sinks into people, that's what we've seen internally here is that they understand that we were limited before. We were limited to being a storage department that just cares a little bit about disks and how to partition those disks, how to make them available, how to be a fast storage or storage that only needs little resources. But now we can actually drive our conversations further. We can talk further about, hey customer, what do you actually want to do? And then we can deliver a full blown solution to the problem, not just a product that you install, you put in a CD and install it, and then you're done and you never think about it anymore. This is interesting, so go ahead. It's also a change of the people we want to talk to. Not only the season means the storage and means, but go a little bit broader with the architect, the solutions architect, the CTO, the CIO. If you speak to a CIO about storage, you will say, oh no, it's the thing for my ITN means, I don't really care about storage. Now if you're talking about what you can do with the storage, what you can do with the data, then you have the attention. And anyway, we have experienced this shift for the past 10 years in all the IT infrastructure components becoming more and more commodities, especially with the cloud. Okay, let me take you back 10 years ago, back when I was working at Level University, we were out for an RFP for new servers and things like that. We would spend, the architects team, we would spend hours looking at the bus architecture with the processors and how it's handled and everything. Fast forward 10 years, oh, just bring me a server. A data on HP or whatever, I just don't care because that's not relevant anymore. What has become relevant is the containers that you are able to reschedule automatically or your VMs or things like that, but not the infra itself. Not saying it's not important, but it has become so easy, you know? Oh, I can have a server from AWS for measure or even on-prem, now I have all my pipelines to deliver VMs on my internal cloud or things like that. It's not really the subject anymore. Now the subject is how can you deliver this to your dev, to your people, for them to be able to use it in one hour, because that's what you are competing with, with AWS, you know, 10 years from now for servers. Oh, you want a new server? Yeah, call us back in three weeks because we have to order it and then it will be delivered and then we have to rack it and connect it and things like that. And you know, it's obvious for servers, but the same thing has been happening for storage, storage becoming a commodity because, oh, you want object storage, yeah? Just go to AWS S3 and you have object storage. That means we have to deliver exactly the same experience. Therefore, leaning more on the data aspect, what do you do with the storage? Of course we will continue to speak and to work really closely with the storage people, the pure storage people, because at some point you have to do this, you know, for performance and scalability and things like that. But if we want to have more impact, we have to go a little bit up the ladder and talk directly about the usage with those different personas as we were used to work with before. Nice. So change the name, change the product name, change all these things. We're thinking about data holistically now, not just, hey, we're going to sell you the storage and you know, three different open source projects glued together. What has changed in the product as a result of this so far? Did you hear me? I thought like the biggest change is actually an area where GeoM is, right? Okay. So internally in Red Hat, we merged two teams, which was fully the storage people and then we got the data science people on board too. So the biggest change that you can see today is that the data science part has been added and GeoM can talk a lot more about this. The other thing is that now with the 4.7 release, we're starting and it's now deaf preview in 4.7. We started looking at the DR things. So these will gradually improve now and you get a little preview of it in 4.7, 4.8 will already be a lot better and then we're looking at the releases afterwards where you can release that really, yeah. So that is now available, but the biggest change is the data science part that has been added. Nice. Yeah. And it's not, you know, we changed them last week, something like the announcement was yesterday for the official new name. So it's not about the changes that we put in the product because this has been happening for a few months, adding more features towards this easiness of consuming storage or being able to deliver data services. So we've been doing that and we will of course continue to do that. So that means integrating more things into directly into the OpenShift console, into the OpenShift UI so that for people, it's easier to work with storage, especially as a deaf. You know, you don't want to, in fact, you don't even want to talk about storage. You just want to put your data somewhere. So we... Mike, tell me where to dump those. Give me the API call. Exactly, give me an API. I only want to talk about an API and an SDK. The rest, in fact, it's not my skill set as a developer and I don't want to learn more about this because of tons of other things to learn that are directly linked to what I do. So this storage, again, is a commodity from this point of view. So we have already and we will continue bringing more of those features, of those integrations inside the OpenShift as much as we can. Let me give you an example. In SAF, since last year, we have this feature called backup notification in object storage. That means whenever something is happening on your bucket, of course, you configure it. Say you have uploaded a new image or something, this bucket has the ability to send a message, to send a notification to an endpoint. It's such a simple message saying, hey, this file with this name has just been created inside this bucket. And we can send this message to different endpoints and HTTP, REST API, Kafka, MQ messaging. And then you are able to act upon this event. Okay, so that's the first illustration where you bring data as an intelligent thing within your architecture because now it's part of your event-driven architecture. It's not just something where you dump your data and you retrieve it when you need it. Now it can totally be part of the architecture. But this feature, well, it's not that difficult to configure bucket notifications. It's pretty standard and we reuse the same mechanisms and protocol as you have in AWS S3. So all these decays are here and everything. But still it may be difficult for some people. So there are work, there is work going on right now to bring this as a configuration part of a YAML definition. So purely native Kubernetes way of programming thing. Oh, I want to have a bucket storage and I want it to send events to this endpoint. Three lines of YAML, bam, you have your object bucket and you are able to work with it from your applications and then create this event-driven architecture. You don't even need to know how it's implemented. You know, behind the curtain, you don't even know if it runs on SAF or whatever else or how many nodes or replicas or things like that. No, normally your IT team is supposed to take care of that and provide you with these performant scalable storage that you need to work with. That's the kind of things that are happening. Nice, so if, go ahead. Sorry Chris, it appears to me like I'll talk, not me. Yeah. So in addition to what Gion said, one of our goals is to keep ODF with the similar goals that we already have with OCS where you have an interface that's very simple to use. It's very integrated into the OpenShift experience. A lot of the other products that you can have that give you storage on the OpenShift platforms, they might not even be written for OpenShift or written for Kubernetes and they sometimes work with OpenShift and then there's a new version and there's compatibility issues. ODF is developed primarily for OpenShift and works with OpenShift. It's deeply integrated, you get dashboards and even though we add more and more features, we have a ton of features that we have in staff in the backend that we can pour to ODF but we do still want to have that ease of use so that you don't need dedicated storage people that need to understand it. It's just that it's there, you can use it but you can have your regular people use these products easily. Regular people using the product easily. That's music to my ears, it really is. I mean, because we all know I'm the storage idiot on the show, right? Like the data science idiot, right? Like I've helped the data science people embrace containers. Now it's like, can the data science people help me embrace some of what they're doing? That'd be awesome. So yeah, like I like this. I like the direction we're heading here. And we've shown this in the first couple of shows if you want to go back to the archive where Chris himself, he installed OCS and it worked. It worked, man. It's one of those things where it's like, I'm patiently just waiting for the next cool thing to come out about it, right? Like it's not, it's maintaining itself that does what it does. It's an operator, it's going to handle things for me. Just waiting for cool features. Is there anything else I can do with this? Is always the thing that I think about like, okay, the clusters here, it's doing things for me. Can it do more things for me, right? And that's what most people do with their infrastructure too, right? Like we'd like to take on a new project. Can we do what we got or do we need to add something else? A lot of people sit there and think that. And it sounds like ODF is going to have some foundations and data science that will have folks thinking, yes, we can do more with what we have now, right? Like that's kind of the premise is what I'm hearing. You don't have to go out and be architect or do that RFP that takes, you know, months to figure out what you need. And no, it's often running from the get-go kind of thing, just install OCS or ODF and be on your way. I'm sorry, that one's going to take a little bit. I told you I was going to say it. So where do folks go right now to learn more about ODF data services, that whole gamut of things? I dropped one link in here that I found, OpenShift Data Foundation from the technologies section, but it's just a high level overview. I'm assuming the docs and everything have been updated, right? Like, what else should be fine? Exactly, so we just updated our access.redhead.com site. Let me just fetch the link for you. And we try to make that more obvious what we talk about, what is data services and that. And yes, you'll find us in the documentation. There, you'll learn how to do it. And there's going to be a lot more material out there in the next couple of days that we'll talk about data services, how it is positioned. All the things that I talked about earlier, why the name changed from storage to data services, what does it mean? And... Right, I want to see here. I'm looking forward to... So what are you most excited about? Like, looking forward. Now, folks, this is when we're talking about the future. There's no dates, there's no times being promised here. Like, just keep that in mind and future talk is happening right now. So not saying that this is promised in the next release or promised ever, right? Like, what are you most looking forward to as a part of this change? I know it's bringing people together, which is always good. It's changing people's perception of what they can do with data, but what else? Well, a lot about these talks about the use cases. We want to understand more about what you run on your data foundation and then help you leverage that. Very good example was what Jim already talked about with the object bucket notifications. So you have an application and the object storage can enable you by adding new features to actually do that. And now that we have the object bucket notifications, we can deliver that to all platforms no matter where you run. Previously, maybe I only have that available in AWS or not environmental or anywhere. Now with ODF follows you wherever you want to go. The other thing that I'm looking forward to a lot is the DR thing. I already talked about it a bit, but that will enable you to have multiple sides and have your persistent data available wherever you need them. Side fails, you can switch over. And that is something we're actively working on. The underlying technology is already in step, so it's nothing new. It's not like we go out and we say, okay, it's quite complicated to synchronize data across an internet link. And sometimes you want to do it synchronously, so it's updated immediately on both sides. Sometimes you want to do it as synchronously. And that difficult part is already handled and has been used by customers already as it's, even though it's a new ODF feature, it's not going to be like you have to be careful or afraid to use it, but we want to make it so that the user experience is great. We want to make it so that it fits to the goals of ODF so that it follows you on whatever platform you use, but it's also easy to use. You don't need that storage guy. Chris Short can do it. Yay! We need a stamp, like. Chris Short can do it, yeah, that would be awesome. So right now at that phase where Chris Bloom can do it and then we want to get it to the stage where Chris Short can do it because it's easy, it's in the UI, and we do have specific Kubernetes DR objects that we can use to describe how we want to do the synchronization and that's what I'm looking forward to. And that's also an area where talking about data services takes the next step because over there, we're not only transferring your data, we also need to think about workloads because it doesn't help you in any way if you have to PV over on the other side, but your application isn't just in that one side. We also need to take your application, what we call the metadata and we move it to the other side as well, even though that's not persistent data. So talking hybrid here, let's think hybridly since you've mentioned that, hey, cross-cloud maybe or on-premises to cloud, what are the advantages of putting ODF across a fleet of clusters where data scientists can access it easily and then have like, there's a team over here, there's a team over there, there's one big bucket of data that they use. What is that experience gonna be like for everybody using it? If I'm pulling up a Jupyter notebook as a data scientist, what is that experience gonna be like? Is it gonna be more simplistic? More, I need to learn a little bit or I need this one snippet, tell me more about that journey for a data scientist now. Well, for a data scientist, if you work inside your Jupyter environment, you're already one layer over, so you shouldn't be concerned about storage. And there are different things you can do. I guess the main interesting point brought by ODF is that it brings all the three different types of storage you will need to make data science or data engineering happen, okay? I will take first example, see the team in Ontario that I helped build a data science platform for their COVID-19 research, okay? It's a loose group of 300 researchers from different organizations, different ministries and things like that in Ontario and they grouped up together as a community to work on the data that was available for COVID-19. Short story, they were kind of fed up with the way the government was publishing the data, which was not really useful. Well, the data was useful, but not for researchers because no, it was not raw data, it was not updated in the right way. So they took upon themselves to, okay, we'll do this, we'll do this data aggregation, data scraping and recreate data sets that we can really work with. So I helped them set up this open data hub environment, so this data science platform environment and here they had these specifications that they wanted to be able to share notebooks and they wanted to be able to share data between each other. How do you achieve that? Normally a notebook, when you launch it with Jupyter Hub, then it's connected to your storage, but that's your storage, your kind of stock. But with ODF, oh no, we have also five system storage with CFFS. So that means we are able to have those RWX volumes and meaning volumes that you can connect to multiple pods at the same time. And from this, you can build a shared library, a shared library of notebooks or a shared library of data. That's the first step. Second step is, oh yeah, but we want to be able to access also all those data from many different points on interconnect all those things together. Then object storage is much more suited for this kind of thing. And generally we tend to see more and more people shifting to object storage for this exact reason. It's easy to work with. Now it's built into most of those scientific or data science libraries. But because of this disconnected mode, it's not just five system that you mount on your server. It's only an HTTP request that you can make from wherever you are in the world. So this connection in between your notebook or the container or the VM that is processing the data and the storage makes it really well suited for data science environments. And it's also brought by ODF because ODF has also object storage. And then at some point you will need a database. Oh, for database, I would use blocks because I need this block approach and an intensive workload approach. Well, it's still ODF. So you see, that's where this I find interesting because granted, there are many different storage vendors that have fantastic offers in block or in object storage. But usually they don't have this fully integrated approach all across the board of storage, which is what you get in ODF. As we said, even Eucris, you are able to deploy it in a few clicks and bam, you have five block and object. And then you are able to do mostly whatever you want depending on the use case. So because data science and data engineering is exactly about this, it's always trying to reinvent something because the context changes or you want to use new things or test new things. It's really different from a standard application that say I'm an insurance company and I want to do the architecture for my new application. Well, I will work a few months on my architecture. I will say that, okay, I need this type of storage. I will go out by it. And then I bring everything and it will stay the same for five, 10 years. That's okay. It's not true in data science. In data science, what you are implementing now did not exist six months ago and will be obsolete six months from now. So if you don't have this agility, being able to pick and choose the different types of storage that you need or recreate easily architectures by just using, again, PVCs, person volume claims or object bucket claims or things like that, if you don't have this agility, it begins to get really, really difficult to work with. So again, I think the best thing is that all the F being fully integrated into OpenShift that totally makes it the platform of choice. To set up those data science environment. Plus, you're totally agnostic of the real infrastructure that is underneath. Meaning, whatever you are creating in AWS or Azure as a test, you're trying your things just to learn more or maybe you have a subscription to Rhodes to begin to use OpenShift data science and you're okay, you see, it fits my need. But I want to be able to do something on-prem. Yeah, you can totally do the same thing on-prem because you're not tied to the specific storage that is brought by AWS or when Rhodes will be on Azure, you're not tied to the specific storage that will be brought by Azure. So again, it's about flexibility and I guess that's our main strength here. So the flexibility is the main strength. My understanding is you have some examples, use case demo type things that you could maybe bring to the table here. Yeah, I can show you some of the things I'm doing. Let me share my screen. Screen share dance. Here we go, folks. Uh-oh, what is it telling you? There we go. All right, very small print right now though. Yeah, I will zoom in a little bit for you. Thank you. Is that better? Okay, so for those who are not familiar with Jupyter, it's an environment where you are able to write notebooks. And for an example of a notebook, I'm gonna open this one. Basically, a notebook is a web interface that connects you to a kernel, being the engine that will run your code. Okay, so here we can see, I'm in my environment. So again, fully web environment and I can see that I'm connected to a Python tree kernel. So that means whatever I would be running inside my notebook would be run against this kernel. And this kernel doesn't run on my computer. It's running on the cluster, on the OpenShift cluster in the container that I've launched. That's the first advantage of setting up this data sense platform on top of OpenShift because that means you can bring to your users the full capabilities of a cluster. Okay, I could do this from my iPad. It will work exactly in the same way. But the code that I will run will run on this cluster. So maybe with eight CPUs and two GPUs, 32 gigabytes of RAM, whatever I don't have on my iPad, it will still run. And the way it works with notebooks, you enter your code into cells like this one. And this is a Python cell. This is a Python code, okay. And you are then able to run those cells independently. So I will run the first one, I click on run. And I have the results here. This is what you entered. Okay, hello world, you know, very basic, but it has run only this cell. Now I want to run the other one, perfect. And then it has run the same function. It has run the function that I had created on my first cell, but with a new text. Okay, so interactive way of developing your Python code. This is basic and well, of course, you can take notes. You have cells with code and cells with markdown. And you can create your environments. This is basic, but where it's used, it's for this, for example, this is a notebook that I have that I used to create a model that is able to recognize a risk of pneumonia into chest x-rays. Okay, that's the kind of things that you do in with AI ML tools. And here I have my notebook. So I have first a few small description about what it is, what it does. Then I have my imports and my code. But as you see, it's fully documented code that people are able to read directly, understand what's going on and replay all of those different cells one by one. When you are developing your algorithms and things like that, that's a really easy way to do this because of course you are always adjusting parameters and things like that. And you don't want to rerun everything from scratch. Normally in standard development, you would put a break point and then rerun everything, see what happens at the break point or trying to debug everything. Here you can go totally step by step for each of your function, here each of your parameter and things like that. So here- And because it runs centrally, you can also share this, right? So Geom develops this, has a problem and he calls me for example and says, hey, can you look at this? Why is this Python thing not working? I can just open up his notebook and look at it. I know. That's why it has become so popular with data scientists because, and it's called a notebook because that's exactly what you would do as a researcher doing experiments. You have your research notebook and you take notes. Okay, here I'm running experiment number one with these parameters. You run the things. I don't know what you do. You're in chemistry. You mix up different liquids and see what happens and you write the results. Here it's exactly the same thing. I'm writing, okay, this is what I'm gonna do here on this specific cell and then I have the result. So it's the same workflow but applied to code. Okay, and once you have this, once you have this model that has been deployed, you can put it into motion into a real application. And this is what I have here. Let me first switch back to this view to give some explanation. Okay, here it's working in this way. I have extra images that I am sending into a bucket. Okay, an object storage bucket. But because this bucket has been enabled with notification, every time I am sending a new image, it will send a notification to Kafka, to a Kafka topic. Nice, okay. And I have here in my OpenShift, everything runs in OpenShift. In my environment, I have this, which is a Knative eventing component. It's listening to this Kafka topic and whenever something, some message has been coming in, it will send this message to here. I have a Knative serving component with a serverless function in which I have built my model, my model that is about to recognize the risk of pneumonia. And in this container, I'm making this risk assessment, okay? And then I will save the results into a database. But here you see, we have this workflow where we go from, okay, I'm sending my data to my data repository, which is an object bucket, which is taking part of this overall architecture where it sends message to Kafka and then to my risk assessment bucket and container. And we can see it live on this dashboard. We have here, you know, I started a while ago, the generator. So I'm sending all those images into my object bucket. I have here this, this counter that will count the images coming in, then. I think you need to zoom in a little. Yeah, but it will mess up a little bit to the dashboard, unfortunately. Can you just open up the pipeline progress as a own view? That would be nice. No, unfortunately I cannot. I mean, it's legible, but I can't read the print inside the blue boxes first. Okay, but let me describe, you know, it's what I've described here. Okay, I'm putting everything into my object storage. It's sent to a Kafka bus. And here I have my container that is doing the risk assessment. I have this counter that is counting the number of images that have been recognized so far. And some of those images have also to be anonymized because the model is not able to recognize exactly if there is a risk or not. So for further processing, the images are first anonymized and then they are sent to another process. And here I have all those data about the last images that were recognized and so on with the images themselves. But it's just to illustrate how you go from this, oops, this one, which is your data science development environment. Okay, and here again, you are leveraging different things. You are leveraging the block storage because that's where my notebook is residing. You are leveraging object storage because that's where the data set with my 6,000 or something raw images are to trend the model. And that's a small data set. You know, sometimes the data set that you have to work with is 500 terabytes of data. Of course you don't put it on your USB key. That means you have to have those bigger environments. And that's where OpenShift plus F with its scalability coming to play because you are about to have those 500 terabytes of data residing with no problem to self. And then you can have hundreds of data scientists using this data set, the central data set in object storage. You see that that's where it works. And you have your data scientists working, working on the data set, working on the data, creating those models. And then on the same platform, the very same OpenShift platform, they can deploy the model and use it for real in the real implementation. So again, that's what I find really interesting with the business proposition that we are making here. It's the same OpenShift plus ODF platform that you can use both for your data science development, day-to-day usage, and also application production. You don't change your environment and it's totally portable. So it's, yeah. Nice. So I can put all... Do you have any other demos that you wanted to show off or anything? Or was that the... Oh, no, not at this point. Maybe in a later show. Yeah, yeah, yeah. So the data set was public, I'm assuming, and we're just using it. Of course. Okay, cool. I don't have to put any disclaimers out there. Yeah. Yeah. Like that's incredibly powerful, right? Like to train models and be like, okay, taking this a step further, right? Like this patient had COVID, this patient didn't. What's the difference, right? Like we're gonna have to get through this pandemic. There's gonna be some aftermath, right? Like that has to occur or something has to happen for these people that are dealing with the after effects of COVID, right? Research is being done there. Hell, my wife just told me the other day that some group in Europe developed from mRNA, just like the COVID vaccines, but it's like pandemic agnostic. It doesn't matter, right? So it's like, okay, great. Like how did you do that? Like what data did you consume to figure out that you could create a vaccine to fight any coronavirus? That is impressive. And it seems like we're giving people the tools to be able to do that, like all in one spot, as opposed to having to bring together like your entire IT department to figure this out kind of deal. Yeah, that's why data science has been under rise for the past few years because now we have the capabilities, the processing power, we have the techniques, we have everything to be able to train this model, to do real AIML. The mathematics part of this is really old. It is 30 years, 40 years old, but until mid 2010s, we didn't have the real mean to be able to leverage that, okay? That's not true anymore. Now we have this. And here it was an example with image recognition. This is now pretty much standard. Image recognition is easy, but there are many other things that we have tried for COVID-19. For example, someone trained a model, you just cough a little bit on the phone and it's able to detect if there is a risk or not. Here it's the same. It's about having those thousands of samples of people coughing and training a model to be able to detect what the human here cannot do, obviously. So it's those tools that we are bringing, that have been brought into the world that were reserved for the past few years to some specialists and it was really difficult to use, it was really difficult to implement. Now it's a little bit more mainstream and by bringing it on top of OpenShift, it's even more mainstream because it's the standard platform that you may already have in your enterprise and most customers I'm working with, they already have some OpenShift installation or some OpenShift knowledge and now they're interested into this data science thing Oh yeah, we have this data and maybe we think it will be useful. How can we do this? Well, you already have OpenShift. Let's deploy OpenData, five minutes from now, you have all those tools. Oh, perfect. And where do I start? Yeah, if you notice like I have a bigger data set than I expected, because of OpenShift, you can use a machine set, you scale it out with a different instance type that is bigger, you don't need to touch anything because the OpenShift thing is handling all the installation and once you're done, you can get rid of it again. Yeah, and I have customers I'm working with that are doing exactly that. They have those huge processing to do periodically every 24 hours. You know, it takes tens of machines to be able to run that. But of course, as it runs in the cloud, they don't want to keep it running 24 seven because of fortune. So now it's part of the workflow at the beginning of the process. They will just increase the machine set. It will spawn some new things. Then they will launch the process using those, you know, those data science tools, Spark and the rest. They will do the processing takes a few hours. And then when everything has been done, to just scale down the cluster and they save a lot of money. I mean, this is really reminding me of a time where I worked for a financial services like marketing company and the data science team, we were having so many problems with like infrastructure and all this other stuff, right? Like, oh, my model didn't finish running before the spot instance shut off. And now I waste it all that time and money, right? So OpenShift, like it puts all the power in the people's hands is what it feels like, right? Like I don't have to worry about some other team or some other, you know, configuration touching my workloads. It's, I'm managing this now. I have a machine set that does what I need. It spins up, it spins down as I'm processing data and off I go kind of deal. It seems really powerful, right? Like... And it's, you know, it's really close. You know, we know, we've known for a few years now all the benefits that OpenShift can bring to development. Okay. In general, all these flexibility, agility and everything. It's about bringing the exact same advantages to data science. It's really well suited. Now that most data science tools would run and will run into containers, then it's, oh, fantastic. Now we can run them on top of OpenShift and we can use all the know-how, the skill set that we have developed around DevOps and around creating those infrastructures to data science. That's perfect. And when you add to the mix, SAF with ODF, then you bring the scalability and the performance that you need for data science because it's not only about, you know, storing a few data here for now more and more people, we're talking about petabytes of data and petabytes of data that have to be, that have to be processed in as a small time as possible. So meaning you have to have performance on the storage board. And that's where SAF shines, you know, especially with the predictability of performance, this perfectly straight line, the more capacity you add, the exact same performance you get, that's really important in data science. You don't want to be, okay, now that I'm reaching over one petabyte of storage of my specific stuff, the performance are totally dropping because the storage is not able to cope up with it. We don't have those kind of issues with SAF. So it's kind of bringing best of both words, storage and Kubernetes to data science. That's why I'm so excited to work with it. It's, yeah, perfect patch. That's amazing. Yeah, but in my daily life, I'm not actually handling a lot of fake data or I'm not wearing lab codes or anything. So one thing that I want to mention about Jupyter Hub is it's not just to do what you showed us. You can also do regular development in it. And maybe Chris, you can share in the chat a link that I just sent. There's like a list of all kinds of kernels that you can use. I'll mention it in the beginning. The kernel is the language that you write in your notebook. And there are kernels for pretty much anything. I like to see that there are Go kernels so you can write your Go applications in the Jupyter notebook in your browser, share it with anyone. Or one thing that's very popular and that's pretty cool is you have an Ansible kernel. So if you've ever written an Ansible playbook, you know that it's hard, like you write it and you want to have it so that you can repeatedly run it. You want to test it, you want to document it. You can start writing your Ansible playbook in Jupyter notebook and test it in there. And then you can immediately see what it does, what the output is and all of that. That's pretty cool. Yeah, I'm about that life. You say Ansible and my years like that, obviously. But yeah, I see it right here, Ansible Jupyter kernel. That's awesome. It's funny because, you know, hardcore developers will always swear, you know, by their own ID, you know, it's... But when you come from a different background or you're not, you know, I'm not a full-blown developer, that's not what I do. I have the same approach as Chris. You know, taking the best thing depending on what you want to do. And for Ansible, I've never done this before, but I have tons of Ansible playbooks to rewrite to deploy those demos into our HPDS. But I totally see the point of, oh no, I want to test only this part of the playbook and not rerun everything or just comment the parts that I don't want to replay because I'm just working on this part. That's the interactive mode proposal from the notebooks that is interesting in there. So, yeah. And you can document it in full markdown. So, it's also great if you want to teach someone to learn a certain language or Ansible, whatever. There's also bash kernel. So, if you want to teach all those millennials what you can all do in bash, then you can write a notebook and make it fancy with the markdown, tell them exactly, hey, this is a forluse and that's how you do it. And they can run it and see it immediately what it does, what the output is. So, where do folks go learn more? Is it our normal learning places? Like learn.openshift.com, for example? There are things on learn.openshift.com related to roads, Red Hat OpenShift Data Science. And let me check. Yeah, I was going to pull it up too. We need some elevator music. Sorry, I do need some elevator music, Bobby. So, for folks that aren't aware, I have an intern this summer and I'm very happy about that because he gets to take notes and tell me what I'm doing wrong because he has production, like this kind of production experience in his background. So, or not this kind, but like movie production experience. So, I'm sure he's like flushing or whatever in the background. But yeah, I like talking about my intern. So, yeah, some mood music as I'm searching for data here. AIML? Yeah, in the meantime, I can read some more kernel. There's Redis, there's PowerShell, Wolfram. Wolfram, oh, okay. Yeah, you can write Wolfram, Mathematica stuff in it. I like them because I have this app on my phone when I was a student. I didn't always have everything in my kitchen. So, sometimes I only had a weight. So, I wanted to know, okay, how much does 100 grams of flour weight? Right, yeah. For 100 milliliters of milk? Kind of deal, yeah. How many whatever leaders or you had to do a conversion or something? Yeah, and obviously Wolfram would overdo it. It would tell me like, okay, you have 100 milliliters of milk. Is that like whole milk? It wants to know the consistency. Yes. Because it would differ by probably two grams. Right, exactly. All right, I mean, let's not belabor the fact. We are approaching the top of the hour. Is there anything else we wanna talk about before we sign off? We don't have any questions in chat. So, or at least I haven't seen any, I hope I haven't lost any by just not looking at YouTube and Twitch directly. Okay, no, I haven't. All right. So, yeah, like anything you wanna sign off with? I would reiterate that, you know, for the part I'm working on, which is data science and data engineering, the important thing you have to consider when building the thing is the platform. Okay, it's not the tools only by themselves. The tools are easy to figure out, but it's a platform. And here, running those kind of workloads, you know, AIML or statistical workloads or pure data analysis on top of OpenShift with everything that goes with it, you know, all the F, all the other components that we have several S and so on, that makes a great platform. So that's, that would be my takeaway from this. Beautiful. All right, so don't ignore all the other things that you see happening in OpenShift. Maybe poke around, you can, you're not gonna break things, just create your own project and off you go. All right, so thank you, Giamme and Chris. I appreciate your time today. As always, later on the channels, 11 o'clock Eastern, 1500 UTC, we're gonna be talking about the value of get-ups. And we're gonna have some guests on, so please tune in for that. And until next data sign or data services, Office Hour, we will see you then. Stay safe out there, everybody, for real. See ya.