 from downtown San Francisco. It's theCUBE, covering IBM Chief Data Officer Strategy Summit 2018, brought to you by IBM. Welcome back to San Francisco, everybody. You're watching theCUBE, the leader in live tech coverage. We're covering the IBM Chief Data Officer Strategy Summit, hashtag IBM CDO Ed Walsh is here. He's the general manager of IBM Snorage. Stephen E. Luke, who's the Vice President of Deep Learning and the Global Chief Data Office at IBM, Stephen. Yes, sir. Good to see you again, welcome to theCUBE. So this is a great story. We heard into Paul Bandari this morning talk about the Enterprise Data Blueprint and laying out to the practitioners how to get started, how to implement, and we're going to have a little case study as to actually how you're doing this. But Ed, set it up for us. Okay, so we're at this Chief Data Officer Summit in the spring, we do it twice a year and really get just Chief Data Officers together to think through their different challenges and actually share. So that's what we're at this summit. And what we've, as IBM has kind of tried to be a foot forward, be the cognitive enterprise and being, and showing very transparently what we're doing in our organization to be more data driven. And we've talked a bunch of different times. Everyone needs to be data driven, everyone wants to be data driven, but it does, it's really challenging for organizations. So what we're doing is with this blueprint, which we're showing as a showcase, in fact you can do, you can actually physically come in and see our environment, but more importantly, we're being very transparent on all different components, high level processes, what we did in governance, but also down to the really technology level and sharing that with our, not because they want to do all of it, but maybe they want to do some of it or half of it, but it would be a blueprint that's worked and then we're being transparent about what we're getting internally for our own transformations as IBM, because really, we looked at this as a platform. It's really an enterprise cognitive data platform that all of IBM uses on all transformation work. So our client in fact is Steven and I think you can give, what are we doing? By the way, it also, same type of infrastructure allows you to do what we did in the national labs, you know, the largest supercomputers in the world, same infrastructure, and the same thing we're trying to do is make it easier for people to get insights for the data at scale in the enterprise. So that's why I want to bring Steven on, right? I joked with Interpol, I said, oh, if you can do it, IBM, if you can do it there, you can do it anywhere, because he's pointed out we're a highly complex organization, so Steven, take us through how you got started and what you're doing. For sure, so I'm what's referred to probably as a difficult customer, right? So because we're so multifaceted, we have so many different use cases internally in the orders of hundreds. It doesn't mean that I can just say, hey, this is a specific, you know, pattern that I need, Ed. You need to make sure your hardware is efficient in this area, because the next day I'm going to be hitting him and say, hey, Ed, I need you to make sure that it's also efficient in terms of bandwidth as well. And that's the beauty of working in this domain is that I have those hundreds of use cases and it means that I'm hitting, you know, low latency requirements, bandwidth requirements, extensibility requirements, because I'm bringing, I have a huge number of headcount that I'm bringing on as well. And if I'm good now, I don't have to worry about in six months to be stating, hey, I need to roll out new infrastructure so I can support these new data scientists and effectively so that they can get outcomes quicker. And I need to make sure that all the infrastructure behind the scenes is extensible and supports my users. And what I don't want them to have to worry about specifically is how that infrastructure works. I want them to focus on those use cases, those enterprise use cases, and I want them to touch as many of those use cases as possible. So, Interpol laid out sort of his five things that, you know, CDO should do. He starts with develop a clear data strategy. So, as the doer in the organization, how did that, how'd you go about doing that? Presumably you participated in that data strategy, but you're representing, you know, the lines of business, presumably, right? To make sure that it's a value to them. You can celebrate business value, but how did you start? I mean, that's a big challenge. For sure, yeah. It's a huge challenge. And I think effectively curating, locating, governing, and quality aspects of that data is one of the first aspects. And where does that data reside though? And how do we access it quickly? How does it support structured and unstructured data effectively? Those are all really important questions that had to come to light. And that's some of the approaches that we took. We look at the various business units, and we look at are they curating the data correctly? Is it the data that we need? Maybe we have to augment that curation process before we actually are able to kind of apply new techniques, new machine learning techniques to that use case. There's a number of different aspects that kind of get rolled into that. And, you know, bringing, you know, effective storage and effective compute to the table really accelerates us in that journey. So Ed, what are the fundamental aspects of the infrastructure that supports this sort of emerging workload? Yeah, no good question. And some of it is what we're going to talk about what's a storage layer or what's a compute layer, but also where are the tools we're putting in place to use a lot of these open source tool sets and make it easier for people to use, but also use the underlying infrastructure better. So if you look at the high level, we use a storage infrastructure that's built for these AI workloads, which is closer to an HPC workload, right? So the same infrastructure we use, we use the term ESS or Elastic Storage Server. It's a combination of turnkey solution, half rack, full rack, right? But it can start very small and grow to the biggest supercomputers in the world, like what we're doing in the national labs, like the largest, you know, top five supercomputers in the world. But what that is is a file system called spectrum scale allows you to scale out at the performance, but also low latency, gets at to the metadata of also high throughput. So we can do layers on that either on flashed, being all the hot tears will be on flash, because it's not just a throughput you need, which is high. So our lowest end box, close to what, 26 gigabytes a second, our highest one, like national labs, it's a 4.9 terabytes a second throughput, but it's also the low latency quick access. So we have a storage infrastructure, but then we also have high performance compute. So what we have is our power systems, power nine systems with GPUs. And the idea is how do you use a term feed the bees? How do you have the right throughput or IOPS to get the data close to that CPU or the GPU? The power systems have a unique bandwidth. So it's not like what you find from a commodity or the Intel servers. It's a much faster throughput. So it allows us to actually get data between the GPU CPU and storage or memory very fast. So you can get these deep learning times. And maybe you can share some of that. The learning times go up dramatically. So you get the insight. And then we're also putting layers on top, which our IBM cloud private is basically how do you have a hybrid cloud container based service that allows you to move things seamlessly across and not have to wrestle with how to put all these things together either. So it works seamlessly between a public cloud and private cloud. Then we have these tool sets. And I talked about this last time. It might not seem like storage or what you're having a few, but we use the term power AI is taking all these machine learning tools because everyone was open source, but we make them one more scale, but also two easier use. So how do you use a bunch of great GPUs and CPUs, great throughput, and how do you scale that? A lot of these tools were basically to be run on one CPU. So to be distributed, key research from IBM allows you to actually with power AI, take the same tensor flow workflows or dot, dot, dot and run across a grid, dramatically changing what you're doing from learning times, right? So, but anyway, you can probably give more things, but it's a multiple layer. It's not one thing, but it's not what you use for digital storage infrastructure or compute infrastructure for normal workloads. It is custom, so you can't, a lot of people try to deploy maybe their NAS storage box and maybe it's flash and try to deploy, and you can get going that way, but then you hit a wall real quick. This is purposely built for AI. So Beth Smith was on earlier. She threw out a stat. She said that 85%, based on some research, I'm not sure it was IBM or Forrester Gardner, said 85% of customers, they talked to, said AI will be a competitive advantage, but only 20% can use it today at scale. So obviously scale is a big challenge, and I want to ask you to comment on another potential challenge. We always talk about elastic infrastructure. Scale up, scale down, end of month, okay. We sometimes use this concept of plastic infrastructure. Basically, plastic maintains its shape because these workloads are so diverse. I don't want to have to rip down my infrastructure and bring in a new one every time my workload changes. So I wonder if you could talk about the requirements from your perspective, both in terms of scale and in terms of adaptability to changing workloads. Well, I think one of the things that Ed brought up that's really, really important is these open source frameworks assume that it's running on a single system. They assume that storage is actually local, and that's really the only way that you get really effective throughput from it is if it's local. So extending it via PowerA, via these appliances and so forth, means that you can use petabytes of storage at a distance and still have good throughput and not have those GPU utilization coming down because these are very expensive devices, right? So if the storage is the blocker, is the controller and he's limiting that flow of data, then ultimately you're not making the most effective use of those very expensive computational mediums. But more importantly, it means that your time from ideation to product is slowed down, right? So you're not able to get those business outcomes. That means your competitor could get those business outcomes if they don't have it. And for me, what's really important is I mentioned this briefly earlier is that I need those specialists that touch as much as the data, as much as those enterprise use cases as possible. At the end of the year, it's not about touching three use cases. It's touching three this year, five, ten, more and more and more. And with the infrastructure being storage and computation, all of that is key attributes to kind of seeing that goal. Without having to rip that down. Without having to rip it down. And repurpose building it every time. And just being able to deal with the grid as a grid and you can place workloads across a grid. That's our spectrum. Compute products that we've been doing for all the major banks in the world to do that and take these workloads and place them across a grid. It's also a key piece of this. So we always talk about the infrastructure as being, so Ed, that's not storage or infrastructure. No, you need that. And that's why it's part of my portfolio to actually build out the overall infrastructure for people to build on-prem. But also talk about everything we did with you on-prem is hybrid and goes to the cloud natively because some workloads we believe will be on the cloud for good reasons and you need to have that part of it. So everything we're doing with you is hybrid cloud today, not in the future today. Today, no, 100%. And that's one of the requirements in our organization. We call it one architecture. If we write it for on-prem, we have to be able to run it on the cloud, right? And it has to have the same look and feel and pane of glass and things like that as well. So it means we only have to write it once. So we're terribly, incredibly efficient, right? Because we don't have to write it multiple times for different types of infrastructure. Likewise, we have expectations from the data scientists that the performance also has to be up to par as well. We want to really be moving the computation directly to where the data resides. And we know that it's not just on-prem. It's not in the cloud. It's a hybrid scenario. So don't hate me for asking you this, Ed, but you've only been here for a couple years, right? Did you just like stumble into this? You got this vast portfolio. You got this tooling. You got cloud. You know, you've got a part of your organization saying we got to do on-prem. The other part is saying we got to do public. And so, or was this designed to the workload? Was it kind of a little bit of both? Well, I think it's luck is good, but it was an embarrassment of riches inside IBM. Between our primary research, some of the things we just talked about, how do you run these frameworks in a distributed fashion and not designed that way and do it performant at scale? That's our primary, that's research. That's not even in my group, right? What we're doing is for workload management. That's in storage, but we have these tool sets. The key thing is working with the clients to figure out what they're trying to do. Everyone's trying to be data-driven. So as we looked at what you need to do to be truly data-driven, it's not just having faster storage, although that's important. It's not about the throughput or having the scale out. It's not about having just the CPUs. It's not just about having the open frameworks, but it's how to put that all together that were invisible. In fact, you said it earlier. He doesn't want his users to know at all what's underneath. He just wants to run the workload. You have people from my organization, because I'm one of your customers, you're my customer, but we go to you and say, we're trying to use your platform for the 360 view of the client, and are not data scientists, not data engineers, but ops team can use this platform. So anyway, so I actually think it's because IBM has this broad portfolio that we can bring together and when IBM shows up, which we're showing up in AI together in the cloud, that's when you see something that we can truly do that you can't get from other organizations. And it's because of the technology differentiation we have from the different groups, but also the industry context that we bring. And also when you're dealing with data, it is the trust. We can engage the clients at a high level and help them because we're not a single product company. We might be more complex, but when we show up and bring the solutions that we can really differentiate, and I think that's when IBM shows up, it's pretty powerful. And I think it's moved from trust me as well to show me, and we're able to show it now because we're eating what we're producing, right? So we're showing, we can, like the cognitive blueprint, that's, we're using that effectively inside the organization. So now that you've sort of built this out internally, you spend a lot of time with clients, probably 15% of my time, no, no, because I'm in charge of internal transformation and operations, right? They're expecting outcomes from us, right? But at the same time, there's clients that are, they're in the exact same boat. They realization that this is really interesting. There's a lot of noise, a lot of interesting stuff in AI out there, from Google, from Facebook, from Amazon, from all Microsoft, but image recognition isn't important to me. How do I do it for my own organization? I have legacy data from 50 years. This is totally different. And there's no get repo that I can go to and download a model and use it. It's totally custom and how do I handle that, right? So it's different for these guys. What's on your wish list? What's on its to-do list? Oh, geez. I want, it's so simple for my data scientists that they don't have to worry about where the data's coming from, whether it be a traditional relational database or an object store. I want it to feed that data effectively and I don't want to have them looking into where the data is to make sure the computation's there. I want it just to flow effortlessly. That's really the wish list. Likewise, I think if we had new accelerators in general outside the box, like not something from the traditional GPU viewpoint, maybe Dataflow or something in new avant-garde type stuff that would be interesting. Because I think it might open up a new train of thought in the area, just like GPUs did for us. Great story. Yeah, no, I think it's, you know, so we've talked about AI for business and I think what you're seeing is we're trying to showcase what IBM's doing to be really an AI business and what we've done in this platform is really showcase. So we're trying to be as transparent as possible. Not because it's the only way to do it, but it's a good example of how a very complex business is using AI to get dramatically better and everyone's using the same common platform. And we learn, like we effectively learn, that being open is much better than being closed. Like look at the AI community because it's openness. That's where we're at right now. And following the same lead, we're doing the same thing and that's why we're making everything available. You can see it and we're doing it and we're happy to talk to you about it. Awesome, all right. So Stephen, you stay here. Yeah. We're going to bring some it on and we're going to drill down into the cognitive platform. That's a good idea. Thanks for setting it up. I really appreciate it. Thank you very much. All right, good to have you guys. All right, keep it right there. Everybody will be back at the IBM CDO Strategy Summit. You're watching theCUBE.