 Live from San Jose in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2017, brought to you by Hortonworks. Welcome back to theCUBE. We are live at day two of the DataWorks Summit in the heart of Silicon Valley in San Jose. I'm Lisa Martin with my co-host, George Gilbert. We've had a great day and a half so far, so much energy, so much buzz about the next leg of the IT trends. We are here with a couple of gentlemen next. Alex Chen, the director of storage for IBM. Welcome to theCUBE. Thank you. And we also have Nadim Ashkar, field CTO and global head of technology, Alliance as in partner engineering from Hortonworks. Welcome. Good to have you guys on. So one of the things yesterday was the big news about the IBM announcement, really the extension of the partnership from a technical perspective and a go-to-market perspective. But Alex, I want to talk to you as the director of storage for IBM. There's really been quite a bit of news from IBM and Hortonworks in the last maybe eight months. So talk to us about kind of the evolution of some of the certifications and where you are now and then we'll kind of get into data science experience. Certainly, certainly. So I would say that our server colleagues have sort of started their relationship last year and we are following very surely with also a certification work that's underway now is certified between storage and also Hortonworks. And the work really started beginning of this year. We have been doing testing for six months, nightly test and to make sure the environment is solid and sound and proven for our customer. And we're basically ready to go for both the all version of essentially Hortonworks. So Nadim, you as a field CTO, you're saying you engage with big customers globally. You're also the head of technical alliances and you work a lot with Alex. Talk to us about what this means, the evolution from a server certification, a storage certification with IBM. Now with IBM standardizing on HTTP as their Hadoop distribution, Hortonworks standardizing on IBM as their data science platform. Talk to us about what you're seeing in the field. How is the field the customers driven this technology alliance to grow? I guess thank you so much for asking this question and just to kind of give you a little bit background about this relationship. We started with as Alex mentioned with the power certification which the power is an open hardware. Think about that just like the way Hadoop is an open source software, power is a completely open hardware. That's where we started our relationship. And very quickly then spectrum scale team, storage team figured it out that's the future of that. How can we bring value to their existing customer and new customer to attach all the engines which Hortonworks provide, our open source community provide to attach to their existing data sets to bring value to the customer. And now this relationship is actually gone beyond that where both teams feel very compelling reason that we are doing a lot of great work around the data space and IBM is doing a lot of great work in the data science side. Combining these two together is going to bring a lot of value to the customer. And you will start seeing that our customers are super excited about this news. We are constantly been talking about with them and getting feedback from them and the reaction is amazing because there was certain limitations of Hadoop through this partnership we are trying to solve that. And at the end of the day customer is a winner here. Just let me follow up on that about the limitations of Hadoop. Is that, and especially the way you expressed the, we want to enable customers to bring compute or analytics to their existing data assets. So were those data assets sort of on shared storage whether in NAS or SAN configuration? And then the idea being that you can have a compute cluster that's sort of separate so you can expand or collapse the amount of horsepower you're devoting to data that's stored separately. Okay, I guess I'll take that one. So first of all it's both and the reason is first of all spectrum scale has been a very successful product for quite some time. It's number one in HPC especially commercial HPC. We have, you know, we publicly say stated that we have over 4,000 customers. So and we're in a lot of large HPC deployment whether it's commercial or it's a research HPC. So there's a large number of customers already deploying spectrum scale in a multi-hundred petabyte fashion with tens of billion files. Like a son of our customer have 10 billion files, right? So, and it's the ability to, as you grow deeper and as we venture deeper into machine learning, deep learning the amount of analytics over a larger number of data sets only going to get to grow. It's not just about the data that your sensor generated but also perhaps you will bring weather data or perhaps you bring other data that somebody else generated and even in the cloud. And so the data set will continue to grow and only through more data will you become more or the machine or the analytics engine where your model become more complete and more smarter and more refined. So as we, you know, that's just, we're only at the beginning of all this explosion about using analytics to be able to make the right business decision. And the key of that is to incorporate more data into your analytics engine. And spectrum scale allow you to do that. The data you have everywhere, data you even did generate, you can bring it into this data platform. So just to follow up on that, when you're talking about high performance compute, that's generally different configuration from as far as I understand like traditional shared storage that this is makes it much easier to sort of have the different compute nodes communicate with each other. Can you tell us more about the use cases that are appropriate for spectrum? Yeah, sure. So the reason and go back to the original question is this only have to be a NAS storage or is this still shared nothing? It's, we really have both modes. You can deploy, it's a software defined storage. It was flexible deployment model, which you can deploy in a shared nothing mode on servers, storage servers potentially, and you will do data protection across the different servers. Or if you believe your compute to storage ratio, where you have a lot more data and a lot less for compute, you could separate the data out into a shared storage pool where, and that shared storage pool is also completely scale out so you can scale the shared storage pool as your capacity grows, performance and capacity. So both are flexible and not only that, we also, it's because the beauty of software defined storage, IBM's number one software defined storage, and this is the key component of that, is that you can deploy on flash, you can deploy on disk. We also have connectors that allow you to do hyper cloud so you can move to the cloud storage. Or we even have some cases where people are moving data to tape via our spectrum archive or LTF FASCINE or file system to making tape projects look like drives and those are inactive data that you can move off. So the beauty is the software defined storage, maximum flexibility. So guys, talk to us about kind of where the customers came in to play this sort of, or did they drive this tighter partnership, this tighter alliance on the technology side. Maybe Nadeem, I'll ask you from your perspective, what were you hearing from existing Hortonworks customers in the enterprise? And then Alex, same question to you about really driving this partnership. One of the most amazing thing about this particular relationship and also around this thing, what is happening is that a lot of our customers have petabyte scale data sitting in spectrum scale and other storage mechanism and stuff like that. And they're not interested to set up a separate cluster for Hadoop where they need thousands of nodes and then there is a huge data movement from spectrum scale to Hadoop cluster to make it work for them. Now what we are bringing to the value of the customer in this scenario is that you keep your data over there, whatever you are doing, HPC around that, you keep doing that, but now you have all open source tool available to you which are attached to that and you can do Spark, Hive, all other engines run on the same data which is getting generated red access from a lot of other applications. So that is exactly what our customer wanted. I don't want to do data movement. I don't want to make a copy of the data. How can you help me here to solve that problem for me? And as we look at data in motion versus data at rest, structured on structured, semi-structured, one of the things actually that was talked about today was this now end-to-end data in motion capability of data flow and that it's really now much more complete than it was in the last couple of years. Tell us about that from an enterprise customer perspective, where does that come into play in terms of facilitating that and partnering with IBM on the data science platform side? So that's a great question which you're asking. So as Alex mentioned, spectrum scale can be run in multiple fashions. Spectrum scale can run on power hardware as well. And power is already certified on HDP and they're already certifying on HDF. So, and also as on power when people are running clusters, it brings three x efficiencies in terms of performance and cost wise. So now you have end-to-end play where your data is on store in spectrum scale fashion on power hardware which has support for GPU, CPU and it has support from end-to-end from streaming all the way to the data at rest kind of use cases. At the end of the day, you will see that it will bring a lot of value to the customer from the cost perspective, from the accessibility perspective and from the connected data platform perspective. You said one thing that came out that actually made this scene really clear was the, you know, we've got this software defined storage that's shared that we use for high-performance compute use cases. So this sounds like an example where you would take the Hadoop ecosystem related analytic capabilities and project them onto these devices that have, that are running the software defined storage. So are you bound, it's going to sound technical but it's, are you saturating the network? Are you constrained on how you're moving the data around or are you constrained by the processing, you know, when you're trying to basically crunch through a big job? Where's the bottleneck? So I'll take that, as you know that spectrum scale is a parallel file system. So there is no single bottleneck as such over here. And that is a core value. So on one side, you could have your HPC computer attached to spectrum scale. On the other side, you could have your HTTP cluster attached to spectrum scale. And you, without affecting any of those things, you can do a lot of things in parallel. And without saturating your network, without saturating your IO on the file system and stuff like that kind of thing. So that's, think about that. Like when Alex was talking about in HPC use cases, you are generating ton of data. Today, unfortunately, you don't have tools to kind of access the data to set scale, to drive value out of that. After HPC, what essentially is working, what essentially is happening? But through these tools, now you can actually get to that level where it can provide you that. And so far, we haven't seen any kind of limitations around the networking and IO, which will limit that adoption. Well, thank you guys so much for talking about the technical partnership with IBM and Hortonworks. It sounds like it's just really, there's a tremendous amount of inertia and momentum there. We wish you continued success with that. And we want to thank you for stopping by theCube and having a chat with George and I. Thank you. And for my co-host, George Gilbert, I'm Lisa Martin. You're watching theCube live from day two of the DataWorks Summit hashtag DWS17. Stick around, we'll be right back.