 From Midtown Manhattan, it's theCUBE. Covering Big Data, New York City, 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. Okay, welcome back everyone. Live here in New York City is the CUBE coverage of our event here, Big Data NYC. Our own event we've been doing for five years, been covering the Big Data space with Hadoop World for eight years, since 2010, since the beginning of the Hadoop craze. Now it's evolved to Strata Conference, Strata Hadoop, now called Strata Data. Soon it'll be Strata AI, but a lot of things happening. We'll still be theCUBE. I'm John Furrier, your co-host. Our next two guests is Steve Roberts, who's the Power Big Data Offering Manager at IBM. And Keisha Ranaghathan, who's the Senior Offering Manager for Power Systems Analytics. Welcome back to theCUBE. Thank you. So let's just level set. Power Systems, Power has been a big part of IBM's strategy. We've covered it at a lot of your events. But it's gotten, it's taken on a life of its own. Ecosystem success has been well documented. Just people have been jumping on Power in general in an open source kind of way, if you will. But then now that Ecosystem's developed, now you start to see some real things emerging. What is the, let's level set. What is going on in Power Systems at IBM? Well, firstly just Power Systems as a family of scale out servers has a rich family of one U2U, two socket, 22 core. So Power Systems, just think about scale out Linux servers that deliver the same Linux that you would know and love in any Intel system. So we're there to compete head to head with Commodity Intel, but with optimization points that can deliver unique value to big data and AI workloads. And those optimization points have specific things going on now. We'll get to that in a second. Keisha, Power Systems Analytics, that's where you guys take advantage of the optimization with Power AI is what you guys are announcing. Right, so when we talk about Power Systems, when we designed it from ground up, it was designed for big data and analytics. If you look at the enterprises, they have tons of databases already running on Power, but if you look at the new data sources, the modern data platforms, if you look at the newer applications around advanced analytics, the way that Power Systems are designed, what we are delivering on Power Systems is around specifically targeting those advanced analytics applications. So in that context, our goal is to simplify deployment of AI and related applications in the enterprise. We announced the availability of data science experience on the Power platform recently. So data science experience is the software offering from our analytics organization, targeting data scientists, targeting data scientists productivity, specifically around making it much more easier for them to develop applications, learn, collaborate within their organization. So it's an end to end platform. And what we have done on Power is basically bring in our expertise in deep learning. So we are saying we are bringing deep learning into the data science world based on Power AI. So Power AI, to simply state, is a set of libraries and frameworks. Things like TensorFlow, you know, Cafe in a piano, you name it, we have it sort of a model. We have taken all of those packages. We have optimized those on Power. We have optimized those to run, not just on the Power architecture, but also on Power with GPUs. And we are delivering a phenomenal platform that takes a user from doing simple statistical modeling all the way through deep learning workloads. Well, soon you'll have a blockchain minor tool out there, I'm sure. You guys have a lot of power in those systems. Let's go back to the optimization because Rob Thomas was just on it. So he's launched the integrated analytics system. So it's interesting, you guys haven't, you mentioned end to end, you guys are looking at this holistically. I want to make sure I get this right and share this. I think it's an important point. It's not IBM silos doing things. You guys are looking at this from a holistic perspective. Rob's on the front lines on the analytics side, thinking about the developer experience. He's got free software people can play. The developers will love that. Deep learning has been adopted by developers pretty heavily. Google is clearly winning that, although some are arguing that Amazon and Azure are killing them on the enterprise, which I would agree with. But TensorFlow is resonating with developers. So now you have the developer action happening. This is DevOps happening. So your power systems are optimized for that. Is that kind of what I'm getting here? And if so, what specifically is the benefit for the developer and the enterprise now? Who might try something, do a little sandbox and get things going and then go, wow, I could scale this thing out. What's my next step? Is that right? Am I getting it right? Explain. Yeah, I mean, so when you talk about like, what does it mean for developers, right? If you take a deep learning workload, when you run it on just the CPU boxes, it could take maybe in days for you to iterate and get to your result. Again, it depends on the level of optimization that you're trying to work out, the level of accuracy that you need, but it takes a long amount of time to get the results. Like for example, if you're doing an image recognition, now you're using large number of images to then feed into the model to recognize the next set of images that you want to feed into the model. It could take days for you to really train those models. Now what we're talking about with PowerAI and some of the newer capabilities that we're introducing in PowerAI is basically bring that time down to hours. So we basically are improving the productivity, we are helping them get to the results much faster. It's an iterative model in the deep learning, machine learning world, it's all about iterations. And bundling some of these core ingredients is critical. I want to ask specifically, take a minute if you can, to explain what is PowerAI. So PowerAI, like I explained earlier, that it comes with a, it's basically a software piece, but there is a very critical hardware component associated with it. So PowerAI, we took all the deep learning frameworks, libraries, whatever else that you need, optimized it on Power, optimized it for GPUs, basically running those on GPUs. And we have now created a single binary package that clients can take and deploy within a matter of hours. So typically if you look at an enterprise when they're trying to go into deep learning, you are going and picking up a ton of different open source libraries from various sources. You know, you might want cafe, you may want TensorFlow, you might want piano, you're going to all these various locations and trying to bring it into your organization, build it, deploy it, whatever challenges you face, you're trying to address it on your own. What we have done is completely taken all of those challenges off the table. You want any of these frameworks, we have it all in a single binary. You go for lunch, you start the installation process, you go for lunch, come back, and you have your complete framework ready to go. And it's in a hardware appliance. It's not an appliance, it's a software piece. Okay, but it can sit on any gear metal. Correct. Any power systems with GPUs, and we actually have a very specific power systems attached with GPU, and we are delivering some unique innovations there in terms of how the CPU connects to the GPU. Okay, we have what is called as NVLink, and that completely changes how CPU and GPU communicate and what we can do with the CPU-GPU-based system. So that's probably helpful when you start thinking about some of the augmented reality VR in the enterprise, which seem to be like really good use cases, whether it's pharma companies trying to do visualization of molecules and drug tests that they're doing, not in the laboratory, of things that people can imagine that with some of the new deep learning. The question is, how does someone get started? So let's just say that, you know, I love this power, I want some of that. I want to go to lunch and load the binaries, get it all up and running. So what do I do? I have to have a power system? Correct. Take me through the playbook. So you have a power system, specifically the power systems with GPU. That's one option. We do have partners that have the same system accessible in the cloud. So you could go and access that same thing in a Nimbus cloud. That's one of our partners that have power systems with GPU in there. You basically take the libraries, the PowerAI libraries that are all freely available. We have done all the hard work but made it available for our clients. And you basically install it and you are up and running. If you already have an application that is built on TensorFlow, now you are now running it. You know, mostly without any changes because it's the same APIs that is exposed to the user. But you're just seeing that it's much faster. So I got to buy a power system. I don't have, I got to buy one. So it's either from IBM or an IBM partner. Correct. I have it in the cloud. I go through Nimbus. Nimbus. And I'm up and running. Start getting my hands dirty with it. Correct. Great. And not going to integrate TensorFlow, some other cool things. Anything else? Right, so we talked about the deep learning part quite a bit, right? I mean, this is where the DSX announcements comes in, right? The data science experience announcement. Not everybody does deep learning. But we are probably the best platform for doing deep learning applications. But we are now making it much more broader in terms of what we can do on the power platform with data science experience on power. So if you're doing machine learning, as part of DSX, there is machine learning libraries that are all pre-built and ready to go. And in fact, for PowerAI, if you want to try out PowerAI within DSX, you install DSX, you get PowerAI. So there are multiple points where we are simplifying it in terms of how clients can get access to it. So in these cases, I want to give my data scientists or boot up a data science practice, I bring in the PowerAI that jump starts off of them. Correct. Machine learning, just standard machine learning that IBM has built. Is it open source? Is it unsupervised, supervised? What kind of machine learning are we talking about here? So all of those are possible. When I talk about IBM ML, there's a specific IBM machine learning package that is included in data science experience offering. So that's a software offering. You can bring that in and you can deploy it on power. You just need a couple of power servers to get started with that. If you are just doing deep learning, if you want to get your hands on experience with deep learning and GPUs, all you have to do is get one single power box with GPUs and start to download the libraries and get started. Okay, well, talk about the systems and storage impact because obviously the big data world, you need to store stuff. Yeah. And stuff knows gets stored. So optimization around configurations is important. Talk about the impact there. Well, firstly, data scientists and data analysts, they want fast access to the data. They don't really care what infrastructure you're running on, but they do care if they can get the results twice as fast or they can perhaps load a data set that's two or three times bigger so they can get more accurate results. So our focus with Linux on power systems is underpinning both for data science experience but also for Hadoop-based data lake is we've had really our partnership with IBM Hortonworks has expanded from power systems to storage and now to analytics. So we really have the full stack integration. So from a power systems perspective, it's about time to results. In fact, we are so confident in Linux on power systems running high workloads on Hadoop that we currently have a three X price performance guarantee for a power cluster versus a comparable Intel cluster. So that data scientists will get three times the work done or we'll get much faster access to the results running in a comparable system. And what's the guarantee? It's throughput for donors. Money back, extra servers. Well, yeah, there are terms and conditions but in a nutshell, the guarantee is that if a customer can undersheath those results, we'll apply complementary services to achieve the results or we'll apply additional hardware. So you'll bring research to the table. People or systems. Yeah, it's okay. So you'll look over the show, you'll get them up and running. So if someone's not happy, maybe it's a tweak, maybe it's some code, you come in with professional services. Cool, so you just say guarantee. It's a good guarantee. It's hard to get the money back in this world, the software. Okay, talk about the model of Elastic because that has been a big part of cloud operations. Elastic cloud concepts really was driving the horizontally scalable movement DevOps and now you've got this movement in the enterprise. Yeah, one of our focus is really about more of the Elastic access to the data because of course you can get fast query results but the data has to be where the data scientists and data analysts need it. So from there, there's a couple of strategies. One is of course you can do different federation options and big SQL as one engine that can federate SQL across disparate data sources is a great option for that, runs great on power, can really choose through complex queries fast. The other option is to avoid having to even have data silos in the first place. So we have the spectrum scale file system formerly known as GPFS. It was been around for like two decades founded in HBC space but makes it as a truly global scalable file system which can support over nine billion files, can span data centers, supports all core interfaces including HDFS but POSIX, SMB, NFS. It's a monster data boss. You can connect with everywhere. Monster data platform, so it could allow, you could have really a single view of your data serving Hadoop, non-Hadoop. So mixed analytics on a single copy of the data so you don't have to always be moving data to different workloads. But then the additional part of Elastic that relates to Hadoop is right now Hadoop classic deployment with local storage requires three X replication for resiliency. And that often results when clusters get over tens into hundreds of servers you have ways to compute resources because you're adding servers to get storage capacity. That's Hadoop's sprawl. That's the sprawl. And we have a lot of large customers, retail, manufacturing, banking, running into this problem. They say, how can I get better value for this? So you separate your data from your compute, put your data into a resilient storage appliance. The IBM version is called IBM Elastic Storage Server. Includes the file system software I just mentioned but also the pair of power servers and really a wide array of different storage drawers and storage types. Now you can reduce your storage requirements because that has built in software rates. So you have a 30% overhead versus three X. But more importantly, you could right size your compute. You can now pick the compute that meets the workload. So you guys actually took a different perspective. You basically built the engine knowing what the problem was rather than throwing the problem at a general purpose hardware. And in this case, the data layer is super important because I mean the joke on the cue ball yesterday was real time is you got low latency query results but that's last week's data, right? That's real time and the guy who thinks it's real time. I got you the result you wanted to question but I'm not looking at the data. So the question really comes down to the power AI is where is that data? Is it addressable? And this is a huge issue. So how do you guys view that? I mean when you talk to customers, real time is always kind of subjective in definition. But certainly important. Well part of the journey with the data science experience in Hortonworks is tightening up that connection between DSX and Hadoop based data. First phase of integration will be through secure NOx access to be able to run spark workloads from DSX directly on the the yarn cluster. Going forward into next year as Hortonworks expands to HTTP3 supports containers in yarn. Tools like Power AI and DSX can run directly in a container model and be run in place managed through the same resource schedule or as your other Hadoop jobs. Well congratulations, big fan of the power system. My final question for you guys both is when I see things that look like a Ferrari and runs like a Ferrari, you think, wow that's high speed, that's really great. You got a lot under the hood, you got power AI. How do you get people to learn how to drive these things? That seems to be, is it a challenge? It should be easy, how's the ease of use on this thing? Or is it more of a Corvette or a Rolls Royce? Let Steve comment. We have simplified, I mean again, the name of the game is simplification. If you think about, for a lot of these applications, the underlying operating system is Linux. It's the same exact Linux. It comes from the same exact set of vendors and whether it's Red Hat or Suze or Ubuntu, we have the same set of interfaces supported and that's one of the reasons why we have grown, right? I mean our open source ecosystem has grown and the application set has grown. The administration is not any different. All you gain is lower footprint, reduced crawl, and much better performance. The engine's faster and optimized. I mean Linux, I mean IBM, you guys have been doing Linux for a long time, it's well documented, but worth noting here, Microsoft just now getting on the Linux bandwagon, Microsoft's Ignite event has gone on this week, I saw some tweets, hell froze over again. So I mean this is now, Linux is already one, so we get that. This is about the engine. You guys are optimizing the hardware and systems in context to what also Rob's doing around data science experience. Okay, so the preferred vision of data science in the future is what? They're pushing buttons, the voice activated, what is PowerAI, what does it extend to? So if you think about, I mean again, if you appeal the audience, so to speak, right, what is AI? I mean there is the first the training part and then the inference part, right? I mean everybody focuses on the inference part where things happen magically and I'm able to drive a car, an autonomous car and whatnot, but to make that happen you need to do a lot of hard work, getting the data ready, doing the training, getting these models ready to go. And that is where we are focused on simplification. PowerAI what you're going to see is we are going to improve the scalability of those models. We have a new concept called distributed deep learning that completely takes up deep learning from a single server to now 64 servers or more. So scale out models for deep learning, you're going to see large model applications where instead of just having dependency on the memory that is inside the GPU, we are allowing access to the entire system memory. So if you have 16 gigabytes of memory within your GPU, now you have a terabyte of memory in your system and now you have access to it, which means you're going to completely change the game in terms of how training is done, right? In terms of we have additional tools that we are delivering for developers to make it much more easier to develop these applications. And you're going to see this, we are going to, yeah, we are going to see a number of things from this. So URL where they can go to real quick, is there a URL? I think ibm.com slash power systems. And then you can just do a search on PowerAI. PowerAI, PowerCube, that's a new application hopefully we'll have out there as part of our deep learning. We're here getting all the data so you can learn here deeply in theCUBE. I'm John Furrier back with more live coverage after the break. ibm power systems here inside theCUBE more after this short break.