 The rise of artificial intelligence has further heightened our awareness of the importance of data. The need to analyze vast amounts of data to gain new insights and take action to drive business value is on the minds of executives around the world. Current trends, however, strain traditional storage systems and are driving new requirements for data storage platforms and new thinking around how we access, process, manage and store data in a cost-effective, efficient and high-performance manner. Now, we've covered the business angles today. In this session of the IBM Storage Summit, we want to focus on some of the technical aspects of scale and specifically we want to explore the changing role of global storage platforms and how organizations are leveraging data to create a new breed of AI applications to manage the flow of data throughout the information supply chain. And to do so, we welcome in Chris Maestis who's a worldwide chief executive architect storage solutions for data and AI at IBM. Chris, welcome. Welcome. Thanks for having me here today. Yeah, you bet. So look, you've been working on storage systems at scale for quite a number of years now. What, from your perspective, what's changed with the recent awakening around the potential impact of AI and how should we be thinking about that? Yeah, that's a really good question. And in terms of what we were doing in the past in talking to storage environments and having the applications read, it would be a lot of data that was ingested for long-term streaming or in some of the healthcare and life sciences we were seeing lots of itty-bitty files that are being created. So we started to see changes in workloads from, you know, media and entertainment, healthcare, life sciences, financial services sectors. And AI really has changed it because it kind of picked the middle of the road, right? Not the itty-bitty-bitty files that you see or the large streaming data that you've been doing. It's really looking at, you know, if you sort of, you and I are always doing these things where we're not a robot and looking at tagging various kinds of images and looking at these type of image samples that are there, whether they be video, images, audio, we're really seeing that data size change. And again, having to adapt to a different data size that we've not traditionally handled in the past. So I want to come back and ask you about the importance of scale. Because when you talk to folks who have developed large language models, for instance, they say they really didn't appreciate the importance of scale until just really a few years ago. And then when they were able to actually ingest and work on, you know, lots more data, they realized the potential that they could create. Now we're all seeing that. And I feel as though that your architecture has been preparing for this day for quite some time. So I wonder if you could talk about the importance of scale and how you accommodate that. Yeah, one of the things that I have observed here at IBM since I've been here, right? And I've sort of been here now over a decade, but been in this industry for about two decades now and seeing the changes in how we were working on data. And when I came to IBM, looking at, let's just remember to remind ourselves, this is GPFS, which essentially is a product that's been in the field for over 25 years. We're actually doing a big anniversary, big birthday party this year for scale again. And the workloads that it's handled in the past, media and entertainment was streaming videos as it's encompassed and approached healthcare life science workloads for small data and being encompassed with these types of changes. We've been able to adapt. And when we saw a couple of years ago with Hadoop and big data, and there's this whole, you know, map reduce kind of workflow that's there. Once again, the product adapted. So scale adapting to AI type of workloads, that for me was just business as usual. We have another workload we're going to address here. Can we make sure that the environment can give you high performance? Cause that's going to, you know, job one is always giving you high performance access to the data that's going to give you reduced time to results, being able to increase the work that you're actually doing. But then we sort of start to think about the things we've done here at IBM with scale is giving you the enterprise focus around that. So not just the high performance environment giving you different ways of interfacing with the data whether you want to explore containers in the future whether you want to look at, you know pushing it out to an object store. So we really are adapting to how you're accessing the data and then the way that we actually can connect to other data sources where we recognize today we're not the only storage vendor in the room anymore and we need to be able to play nice with others. And what we can do is actually connect and cash from other data sources that are non IBM that are object storage, NFS, you know NetApp kind of filers or again, non IBM kind of storage and bring that data in where we can catalog it we can manage the data for you you can actually create new decisions based on a workflow that you're doing an AI and have that data be brought in from one data source and be pushed out to another data source. So scale is really adapted from being just that GPFS high performance computing kind of environment to exploring other ways of interfacing with the data and again, me being in this industry for so long as a new protocol a new way to do data access comes along I'm very confident that we can continue to have scale support that particular interface. So I think it's worth Chris just sort of mentioning, you know, GPFS it's GPFS has been around for a long time but it provides concurrent access to a single file system or multiple file systems from many nodes. It's a global namespace. It's got those enterprise features that you're talking about, recoverability, high availability, replication and I could do non disruptive updates those types of things that you would expect from the enterprise but now what you're saying is as workload requirements change which you just described, GPFS is actually in a good position to handle that. That's correct and to kind of pull on your point of having a common data source of platform where I can ingest from a variety of different data sources GPFS can do that. And again, it's had that type of technology for over 10 years to be able to do this. And again, as the AI workload has come into the scene working with accelerators, whether they be FPGAs, whether it be NVIDIA DGX kind of systems, we'd be able to support new interfaces like GPU direct storage. So scale has been well positioned to deal with new interfaces all the while continue to provide high performance access to the data when I need to read and write to the environment. But again, we sprinkle on an enterprise view of the environment where we ensure, again, if you need to back up your data you need to do any type of security encryption on the environment. You have those choices that are available to you as well. That's interesting because you're bringing up now silicon diversity, something that we talk about on theCUBE a lot, where you mentioned accelerators, you mentioned GPUs, it's not just the x86 CPU anymore. It's many, many other aspects of connectivity within that complex. And there's a lot of data moving around is very connect centric. You're doing different things and of course you're doing things in parallel. That's correct. Essentially, that's the other focus of being, what GPFS has been able to do. Again, the business as usual performance being able to grow, being able to add storage phase in and phase out storage in your environment all the while of being an online kind of perspective. I mean, that enterprise kind of perspective from IBM is I need to do this high performance parallel access online. So when I'm adding new storage I need to be able to add that storage in and not interrupt critical work that's going on and then help migrate the workload from what we've just introduced to make sure that it can take advantage of that environment. So again, that's what I think it's been curious in how, from my perspective being in this industry seeing how GPFS has been able to take a new workload on and give it that enterprise view again where I'm not gonna go offline for a couple of days I can do a rolling upgrade. I can add storage when I need to and again, have that new type of hardware and maybe I'm exploring other types of chips. Again, we have the AMDs, we have the Intel's we have power that's still there from the IBM perspective but as new emerging architectures are also coming on right we're going to be prepared for that as well. Yeah, I mean, you kind of chuckle when you said power power is going to actually some incredible memory management and it is actually doing some amazing things in high performance computing as well. I want to ask you about the so-called data or information supply chains. How do you define that or how do you think about what that is? Help the audience understand that. Sure, and one of the common themes that I see in talking with clients is like I said they may have had an effort to explore cloud they may again have other storage that they've been procured in the past and they need a common platform really to figure out how they can talk to that environment. So what we do here from a workflow perspective it's really from what I call it from ingest to archive and sort of the AI ladder it used to always be from ingest to insights and again, that's where your workload is going to be done get the insights, the business value of your AI workload in the storage perspective I need to be able to ingest from a variety of data sources. So whether they be S3, whether they be NFS can I actually connect to that environment or other GPFS environments as well I can get that data, bring it in but once I'm doing that I need to be able to catalog it gain some additional insights and it's not just what you and I think of a system metadata, right? Who's the owner? How big is the file? When was it last touched? You know, who's the last person that modified it? So we have that data but we can extract additional metadata insights and create a project focus that now groups this data together that I've pulling in for a particular project I've cataloged it I'm now preparing it to run and again, this could be on an accelerated environment this could be an analytics environment high performance computing and have that job run but once I'm done with it as well I can make the decision to put it into an archive, right? You know, ingest to archive because what we wanna do is store the data we use with a particular insights and we've gained from the metadata extraction and be able to reference that later so that archive that we push it to it may not be from the original source we talked to it may be a different source where we push it again where I need to push it off to a tape offsite somewhere or I need to push it to a different object store or again, I'll be, you know doing some type of other compression that can be stored into the environment and just leave it there because it's still gonna be there for the next three to six months these are the decision points that you have along this workflow that you can now think about where in the past you were really siloed to accessing data from one way in, one way out we really allow you to bring the data in from many sources and then decide where you wanna put it at the back in the environment as you can imagine a lot of the times with what we can do because we're so powerful in the platform that we have here, right? For customers it's really kind of really in a disbelief is like, can you really do that? And when we sort of show that we can connect to these data sources we can show them through command line through REST API, through GUI kind of environments look, we can create a policy for you and this will actually help do what you are describing to me, you and I talking about I want this data to be stored in this environment I want it to be compressed over a certain amount of time or pushed off to another data store we can do that with a customer and then we can demonstrate that and I think that that's kind of that golden egg that we've had here in GPFS for many years. Right, so thinking about that workload or workflow that you just described of data across supply chains from ingest, refinement take action on insights, sun setting the data archiving, et cetera what do you see as the biggest customer challenges and how are you approaching solving them? I'm inferring its optionality, its flexibility access to different data formats but maybe you could describe. Yeah, I would sort of say in terms of what we're able to do with scale in the environment today is again we're ready for the next workload and what I was seeing with customers today is like they may have a workload that's there today maybe it's a mainframe coupled with Hadoop coupled with some power AI type of environment and they really need to figure out how do they connect to these data flows in amongst these three types of environments to bring common data to them? So we build a platform for them really really that encompasses a data platform that can be global and essentially allows us to connect to these types of environments and customers like I said, are challenged again with these data silos that they wanna connect to and in the past, like I said the story that we would do is like well, we'll just bring you all to our global namespace and you don't need to do anything else but now the message is look we can present you a global data platform that connects to your existing environment and we're not asking you to throw away that investment that you've made as well but we can prepare a high performance tier environment to talk to your AI kind of work that you're preparing for now or maybe that you've started to explore and then also as the client workloads are being modernized and start to run into container type environments we actually give that kind of option to the customers as well. So you're not really having to buy in the past a storage platform for the new workload that's coming in a Hadoop, a mainframe, a Power AI a container environment you can have scale providing a global data platform and be able to serve all of these workloads all of these access methods in one simple easy to use environment. Thank you for that Chris. All right, my last question is bring us home paint a picture of what you would consider the ideal architecture to address these challenges for today's world and the future of this AI era that we're entering. Yeah, that's a really good question because I think what we, what I've seen in the over the past couple of years is whether it be a protocol whether it be an interface, we thought we would hear rumors that POSIX is gonna be phased out and it's not really right. I think really what we're saying is what we wanna do is give you a nice easy to use platform and essentially that's our scale system environment that we have today that can provide you the media that you need. Now, whether that media needs to be NVME based type of storage whether it needs to be flash form memory with inline compression whether it needs to be HDD whether it needs to be tape or whether it needs to talk other types of environments. What IBM can really do is provide you with a solution that can scale in terms of what we actually can do but it can adapt to giving you the types of storage that addresses the workload. We're not here to sell you a point solution we're here to work with you with your workloads and be able to give you the right type of media to interface and as you grow and change we can actually bring those new access methods those new caching methods into the environment very easily. I'm chuckling, this is IT, right Chris? We never get rid of anything. So, hey, thanks very much. I really appreciate you participating in the program. Great stuff. Great, now thanks for having me. Yeah, you're very welcome. All right, keep it right there for more technical discussions from the IBM Storage Summit. We're live and on demand from theCUBE's Palo Alto studios and of course on theCUBE.net.