 Covering, DataWorks Summit Europe 2017. Brought to you by Hortonworks. Welcome back to Munich, everybody. This is theCUBE. We're here live at DataWorks Summit, and we are the live leader on tech coverage. Steve Roberts is here as the offering manager for big data on power systems for IBM. Steve, good to see you again. Yeah, good to see you, Dave. So, you know, we're here in Munchen, a lot of action, good European flavor. Is my second European formerly a Duke Summit, now DataWorks? What's your take on the show? I like it. I like the size of the venue. It's a lot of ability to interact and talk to a lot of the different sponsors and clients and partners. So, the ability to network a lot of people from a lot of different parts of the world in a short period of time. So, it's been great so far in looking forward to building upon this and towards the next DataWorks Summit in San Jose. Terry Vernig, VP in your organization was up this morning, had a keynote presentation. So, IBM got a lot of love in front of a fairly decent sized audience. Talking a lot about the sort of ecosystem and how that's evolving, the openness. Talk a little bit about Open generally at IBM, but specifically what it means to your organization in the context of big data. Well, I'm from the power systems team. So, we have an initiative that we have launched a couple years ago called OpenPower. And OpenPower is a foundation of participants innovating from the power processor through all aspects, through accelerators, IO, GPUs, advanced analytics packages, system integration, but all to the point of being able to drive OpenPower capability into the market and have power servers delivered, not just through IBM, but through a whole ecosystem of partners. This complements quite well with the Apache Hadoop and Spark philosophy of openness as it relates to the software stack. So, our story is really about being able to marry the benefits of OpenEcosystem for OpenPower as it relates to the system infrastructure technology which drives the same time to innovation, community value and choice for customers as it relates to a multi-vendor ecosystem and coupled with the same premise as it relates to Hadoop and Spark. And of course, IBM is making significant contributions to Spark as part of the Apache Spark community. And we're a key active member as is Hortonworks with the ODP organization forwarding the standards around Hadoop. So, this is a one-two combo of open Hadoop, open Spark either from Hortonworks or from IBM sitting on the OpenPower platform built for big data. No other story really exists like that in the market today, open on open. So, Terry mentioned cognitive systems. Bob Pitchiano has recently taken over and obviously has some cognitive chops. You know, has some systems chops. Is this a rebranding of Power? Is it a sort of a layer on top? What is, how should we interpret this? It'll take a bit more as a layer on top. So, Power will now be one of the assets, one of the sort of member families of the cognitive systems portion of IBM. System Z can also be used as another great engine for cognitive in certain clients, certain use cases where they want to run cognitive close to the data and they have a lot of data sitting on System Z. So Power Systems as a server really built for big data and machine learning, in particular our SA22LC for high performance computing. This is a server which is landing very well into deep learning machine learning space. It offers the Tesla P100 GPU and with the NVIDIA NVLink technology can offer up to 2.8X bandwidth benefits, CPU to GPU over what would be available through a PCIe Intel combination today. So this drives immediate value when you need to ensure that not just you're exploiting GPUs, but you of course need to move your data quickly from the processor to the GPU. So I was going to ask you actually sort of what makes Power so well suited for big data and cognitive applications, particularly relative to Intel alternatives. You touched on that. IBM talks a lot about Moore's law starting to hit its peak that innovation is going to come from other places. I love that narrative because it's really combinatorial innovation that's going to lead us in the next 50 years. But can we stay on that thread for a bit? What makes Power so substantially unique, uniquely suited and qualified to run cognitive systems in big data? Yeah, it actually starts with even more the fundamentals of the power processors. The power processor has eight threads per core in contrast to Intel's two threads per core. So this just means for being able to parallelize your workloads and workloads that come up in the cognitive space, whether you're running complex queries and need to drive SQL over a lot of parallel pipes or you're running iterative computation in the same data set as when you're doing model training. These can all benefit from highly parallelized workloads which can benefit from this 4x thread advantage. But of course to do this you also need large fast memory. And we have six times more cash per core versus Broadwell. So this just means you have a lot of memory close to the processor driving that throughput that you require. And then on top of that now we get to the ability to add accelerators and unique accelerators such as I mentioned the NVIDIA and the link scenario for GPUs or using the open CAPI as an approach to attach FPGA or flash to get access speeds of processor memory access speeds but with an attached acceleration device. And so this is economies of scale in terms of the government offload specialized compute processing to the right accelerator at the right time so you can drive way more throughput. The upper bounds of driving workloads through individual nodes and being able to balance your I own compute on individual node is far superior with the power system service. Okay so multi-threaded giant memories and this open CAPI gives you primitive level access I get to I guess to a memory extension. Yeah, like the whole accelerators through this high speed memory extension. Instead of going through the what I often call the horrible storage stack AKA SCSI. And so that's cool. Some good technology discussion there. What's the business impact of all that? What are you seeing with clients? Well the business impact is not everyone is going to start with souped up accelerated workloads but they're going to get there. So part of the vision that clients need to understand as they begin to get more insights from their data is it's hard to predict where your workloads are going to go. So you want to start with a server that provides you some of that upper room for growth. You don't want to keep scaling out horizontally by requiring to add nodes every time you need to add storage or add more compute capacity. So firstly it's the flexibility to be able to bring versatile workloads onto a node or a small number of nodes and be able to exploit some of these memory advantages, acceleration advantages without necessarily having to build large scale out clusters. Ultimately it's about improving time to insights. So with accelerators and with large memory running workloads on a similar configured clusters you're simply going to get your results faster. For example, recent benchmark we did with a representative set of TPCADS queries on Hortonworks running on Linux and Power servers were able to drive 70% more queries per hour over a comparable Intel configuration. So this is just getting more work done on what is now similarly priced infrastructure. Because Power family is a broad family that now includes one U2U scale out servers along with our 192 core horsepower for enterprise grade. So we can directly price compete on the scale out box that we offer a lot more flexible choice as clients want to move up in the workload stack or to bring accelerators to the table as they start to experiment with machine learning. So if I understand that right, I can turn two knobs. I can do the same amount of work for less money, TCO play, or for the same amount of money I can do more work. Is that fair? Absolutely, absolutely. Now in some cases, especially in the Hadoop space the size of your cluster is somewhat gated by how much storage you require. And if you're using the classic scale out storage model you're going to have so many nodes no matter what because you only put so much storage on the node. So in that case, your cluster can look the same but you can put a lot more workload on that cluster. Or you can bring in IBM solution like IBM Spectrum Scale or Elastic Storage Server which allows you to essentially pull that storage off the nodes, put it in a storage appliance. And at that point, you now have high speed access to storage because of course the network bandwidth has increased to the point that the performance benefit of local storage is no longer really a driving factor to the classic Hadoop deployment. You can get that high speed access in a storage appliance mode with resiliency at far less cost because you don't need three X replication you just have about a 30% overhead for the software erasure coding. And now with your compute nodes you can really choose and scale those nodes just for your workload purposes. So you're not bound by the number of nodes equal total storage required by storage per node which is a classic how big is my cluster calculation. That just doesn't work if you get over 10 nodes because now you're starting to get to the point where you're wasting something, right? You're either wasting storage capacity or typically wasting compute capacity because you're over provisioned on one side or the other. So you're able to scale compute and storage independent and tune that for the workload and grow that resource efficiently. More efficiently. You can write size to compute and storage for your cluster but also importantly is you gain the flexibility with that storage tier that data plan can be used for other non-HDFS workloads. You could still have classic POSIX applications or you may have new object-based applications and you can with a single copy of the data one virtual file system which could also be geographically distributed serving both Hadoop and non-Hadoop workloads. So you're saving then additional replicas of the data from being required by being able to onboard that onto a common data layer. So that's a return on asset play. You've got an asset, it's more fungible across the application portfolio. You can get more value out of it. You don't have to dedicate it to this one workload and then over provision for another one when you've got extra capacity sitting here. It's a TCO play but it's also a time saver. It's going to get you time to insight faster because you don't have to keep moving that data around. The time you spend copying data is time you should be spending getting insights from the data. So having a common data layer removes that delay. Okay, because it's HDFS ready. I don't have to essentially move data from my existing systems into this new stealth pipe. Yeah, we just present it through the HDFS API as it lands in the file system from the original application. So now all this talk about is rings of flexibility, agility, et cetera. What about cloud? How does cloud fit into the strategy? What are you guys doing with your colleagues and cohorts at Bluemix, aka SoftLayer? You don't use that term anymore but we do. When we get our bill it says SoftLayer still. But anyway, you know what I'm talking about. The cloud with IBM, how does it relate to what you guys are doing in power systems? Well, the cloud is still really the born on the cloud philosophy of IBM software analytics team is still very much the model. So as you've seen the data science experience which was launched last year, born in the cloud. All our analytics packages will be our big insights software or our business intelligence software like Cognos for future generations are landing first in the cloud. And of course, we have our whole arsenal of Watson based analytics and APIs available through the cloud. And so what we're now seeing as well that we're taking those born in the cloud but now also offering a lot of those in the on-premise model so they can also participate in a hybrid model. So data science experience now coming on-premise we're showing you at the booth here today. Bluemix has a on-premise version as well and the same software library, big insights Cognos, SPSS are all available for on-prem deployment. So power is still ideal place for hosting your on-prem data and to run your analytics close to the data. And now we can federate that through hybrid access to these elements running in the cloud. So the focus is really being able to, the cloud applications being able to leverage the power and systems these based data through high speed connectors and being able to build hybrid configurations where you're running your analytics where they most make sense based upon your performance requirements, data security and compliance requirements. And a lot of companies of course are still not comfortable putting all their jewels in the cloud. So tip of that is going to be a mixing match. We are expanding the footprint for cloud-based offerings both in terms of power servers offered through software but also through other cloud providers. Bluemix is a partner we're working with right now who actually is offering our Power AI package. Power AI is a package of open source deep learning frameworks. Package by IBM optimized for power and an easily deployed package with IBM support available. And that's could be deployed on-premise in a power server but also available on a paper drink purpose through the Nimix cloud. All right, we're covering a lot of ground here. We talk strategy, we talk strategic fit which I guess is sort of an adjunct strategy. We talk a little bit about the competition where you differentiate some of the deployment models like cloud, other bits and pieces of your portfolio. Can we talk specifically about the announcements that you have here at this event? Just maybe summarize for us. Yeah, no, absolutely. So as a race to IBM and Hadoop and Spark we really have the full stack support. The rich analytics capabilities that I was mentioning has deep insight, prescriptive insight, streaming analytics with IBM streams, Cognos business intelligence. So this set of technologies is available for both IBM's Hadoop stack and Portnwork's Hadoop stack today. Our big insights and IOP offering is now out for tech preview of their next release. The 4.3 release is available for technical preview. Will be available for Linux on Intel, link some power towards the end of this month. So that's kind of one piece of new Hadoop news at the analytics layer as it relates to power systems. As Hortonworks announced this morning, HDB 2.6 is now available for Linux on power. So we've been partnering closely with Hortonworks to ensure that we have an optimized story for HTTP running on power system servers. As the data point I shared earlier with the 70% improved queries per hour. And at the storage layer, we have work in progress to certify Hortonworks to certify spectrum scale file system. Which really now unlocks the ability to offer this converged storage alternative to the classic Hadoop model. Spectrum scale actually supports and provides advantages in both a classic Hadoop model with local storage where it can provide the flexibility of offering the same sort of multi application support but in a scale out model for storage. But it also has the ability to form a part of a storage appliance that we call elastic storage server. Which is a combination of power servers and high density storage enclosures. SSD or spinning disk depending upon the, or flash depending on the configuration. And that certification will now have that as a available storage appliance which could underpin either IBM open platform or HTTP as a Hadoop data lake. But as I mentioned, not just for Hadoop, really for building a common data plane behind mixed analytics workloads that reduces your TCO through a converged storage footprint. But more importantly provides you that flexibility of not having to create data copies to support multiple applications. Excellent, IBM opening up its portfolio to the open source ecosystem. You guys have always had, well not always but in the last 20 years, major, major investments in open source they continue on. We're seeing it here, Steve. People are filing in, the evening festivities are about to begin. Really appreciate you coming on theCUBE. Thanks very much. Thanks a lot Dave. You're welcome. All right, keep it right there everybody. John and I will be back with a wrap up right after this short break. Right back.