 Live from New York, it's theCUBE covering Big Data NYC 2015. Brought to you by Hortonworks, IBM, EMC, and Pivotal. Welcome back to New York City, everybody. This is day three of Stroud and Hadoop World and our event, within the event, Big Data NYC. This is theCUBE. theCUBE goes out, we extract the signal from the noise. Thank you, everybody, in our audience. We really appreciate all the back channels, the questions that you guys send in. Let's see, Chris Harold is here. Back from day three, we saw you yesterday. It's the global field CTO of EMC's Big Data Solutions and Kumar, no, Kumar's not here. Chintamaneni is here, Chintamaneni from Blue Data, sitting in for my friend, Kumar. Hello, Kumar, hope you're feeling better. I miss you, but thank you very much for sitting in. Welcome, gentlemen, to theCUBE. It's good to see you guys both. It's good to see you again. It's good to see you. So, a big week here, the feedback from the Javits is more people, more technologies, more customers demanding solutions. So, let's start with you, Chris. What's the vibe from customers this week? Absolutely, Sue and I have just, of course, talking before we came on camera, this is the time of operationalizing data analytics, right? So, we see it consistently in both of our separate customer sets and working together that people were in that early adopter stage, they were trying it out, they were kicking the tires. They started to build a use case, build a model around data analytics to do, obviously, specifically, but any one of the analytics tool sets. And then, they got to a certain point and they went, this is awful. I can't run this way forever, I can't grow, I can't change, I can't be flexible. I really need to take that next step and operationalize to do, but that's where you had organizations like BlueData and EMC thinking, we have the operational expertise with us, we have the IT expertise with us, there's got to be a better way to set this up for our customers. And I really believe that we're addressing that solution need with our customers right now and it really resonates as you start talking to all of the organizations that are here this week. People are really, they're hungry for that solution and I think what we're building together is really providing that. So, not three big themes, the big one, of course, is ingesting data real-time. So, we've certainly heard that, but the other two are really, you know, make it simple, right, we're hearing that a lot. And then obviously there's a lot of discussion around solutions and being able to adopt it, we're hearing some side conversations about security, but you guys are in the make it simple business. So, talk a little bit about BlueData, what you guys are doing, and then I want to get into where you guys fit. Yeah, absolutely, absolutely, thanks. So, BlueData is an infrastructure software company and what we're doing is virtualizing a pool of servers or VMs that are provided to us and we essentially eliminate the barriers to complexity with Hadoop by just making it extremely simple to get an Amazon-like experience in terms of spinning up Hadoop clusters, Spark clusters, and getting elasticity all on-premises in the customer's data center. And then we go beyond that by making sure that customers don't have to always copy terabytes of data from their existing storage systems as a first-order principle of doing Hadoop, but they can start with their fail-fast experimentation by just spinning up clusters and pointing at data in place, especially like systems like Isilon that have a terabytes of data to unlock value. So, it's agility for the end users by one-click kind of clusters and pointing at data wherever it might be. Wait a minute, I thought you couldn't virtualize Hadoop. You guys chatting about that. Can we talk about that a little bit? Because there's so many myths about virtualization. I remember back in the day, it was like you can't virtualize Oracle, you know? We would tell our practices, no, no, no, that's not true. So, talk about that dynamic. You share with our audience what you've learned there. Yeah, absolutely. So, I think the implementation matters very much as to how you virtualize Hadoop. Hadoop is actually two integral pieces. One is the processing layer and the other is the data or the storage layer. And when you virtualize Hadoop, if you try to run all of Hadoop in virtual machines with virtual disks and so on, then you can get some good performance but you'll have to do a lot of optimization. But if you separate compute and storage and run all the compute services in the virtual machines and keep storage outside and then have innovative technologies like what BluData brings in terms of IO Boost, then you can get performance equivalent on many cases better than bare metal. Yeah. Okay, and then, so talk about, of course, the relationship between BluData and EMC, where is the- Sure, so that's actually really funny. We were just talking about this. So, around about the same time that Kumar left VMware and started BluData was about the same time that inside EMC, I was really kind of running around like crazy persons espousing this theory that virtual Hadoop should work and we should be doing this and sort of getting cock-eyed stares from people like, you can't virtualize Hadoop. And so it's really been a good synergy to come back together on the other end of that three, four year journey now and realize that our vision is really pretty tightly aligned. So, as you know, we've talked earlier this week about the Federation Business Data Lake platform that we're building and our partnership with BluData around, again, easy, give me the big red push button, get cluster experience that customers expect from Amazon and from their experience in service providing environments and translate that into my on-prem, my critical data and where we really come together with BluData is in the marriage of that simplification of the platform with our engineered ecosystem of tools around the outside for compliance and governance, for ingest and index management and for surfacing and consuming data sets and tools in a really tightly packaged, tightly integrated form factor that we're really targeting that, I think I said this to you when we talked earlier this week is this data science without the data scientists, right? And just like we need Hadoop clusters without Hadoop gurus that people can easily adopt and implement, we need to be able to provide that packaged, appliance-like experience to allow analytics to really reach all the places that it needs to reach. So, you're creating a solution with BluData software, EMC bringing obviously its IP around storage and presumably Isilon, all the connections to Hadoop, et cetera, you want to add? Yeah, actually what I wanted to add, we're big fans of the Federation Business Data Lake strategy that EMC has, you know, the ecosystem is very diverse. There is obviously the distributions, Spark is emerging, there's also NoSQL platforms out there and there's streaming platforms. So ultimately a solution and then there's obviously the BI, ETL visualization tools. So I think what a customer is looking for is a solution that spans the entire pipeline from ingest to serving the data and that could include a NoSQL layer at the end to build your mobile app or your web app, right? And so what we are essentially doing, partnering with EMCs, providing that essentially the container in which the entire pipeline of the whole Hadoop ecosystem spanning from BI, ETL tools to, you know, NoSQL platforms and, you know, things like even Splunk, for example, can run and make that easy for the customer to mix and match what they want. So ours is a software layer that allows customers to provision the pipelines of their choice. So you're containerizing all that complexity. The intent being simplification, not necessarily portability, you know what I'm thinking, but docker containers. Okay, so actually that's, you bring up an interesting point to tie on to that. It is a little bit about portability for us. Yeah, so obviously we're the federation, we're big fans of VMware, we, you know, we, they're part of the family, we lead with our federation technologies, we, that is our solution. However, however, as you just heard with, with IBM and EMC on the cube together, not a minute and a half ago, this is a team sport, right? And we can't be a closed ecosystem and just, I mean, we've talked about this a number of times. We have to, we have to support all the major Hadoop vendors, even though we have pivotal, we have to support all of the potential virtualization stacks, even though we support VMware, it's, it is too much of an open ecosystem for us to force some sort of closure on people. And so for us, this partnership and our, and indeed our work with Pivotal around Cloud Foundry and everything, it's all dedicated, it's all directed at portability. It's, it's let the customer have the choice and the flexibility to innovate, but give them at least that framework, that structure, so that they're not off in the weeds all the time, trying to sort out, you know, very esoteric problems. Now I want to just clarify, because people who know Kumar as the VSAN lead might think that, oh, Blue Data, storage play. It's not a storage play. That's right. Otherwise you guys wouldn't be partnering like this. So can you help us visualize the stack, if you will? Yeah, sure. I can take a quick stab at that and then I can sort of let Chris style the Federation Business Data Lake into it. But essentially, you know, our stack is a layer that is software-defined, big data from a cluster and a compute perspective. So we are using Docker containers to provision Hadoop clusters, so that's on the compute side only. We have a layer called DataTap and it's essentially a way to connect to remote storage systems, whether it's HDFS, NFS, ObjectStore, and essentially get that data over the physical network through the native protocol and present it up to the virtual Hadoop clusters in HDFS API using the same semantics of HDFS. So we are not a storage system. We're not a software-defined storage play in the sense that we're not storing any metadata. We're not doing anything in that layer. We're just boosting the IO and have some innovations in that area in terms of how we kind of map the data into the Hadoop cluster. But in terms of the Hadoop applications Spark or BIETL tools, they run unmodified. And so that really allows a customer to mix and match and get the pipeline they want in a few clicks. Okay, and the intent, of course, is to bring cloud-like simplicity on-prem, which certainly the very large customers want, a lot of the mid-sized customers as well. But what about the public cloud? Can DataTap treat some dumb cloud target if it needs to, or? So as long as you have a pipe that can go from your data center into a public cloud or into a hosted data center, we don't care. So our fundamental premise here is that networks are getting faster. That's really enabling compute storage separation. You're going, you know, 10 gig networks are very common right now and 40 gig networks are upon us. And in fact, there's been research out there whether it's reference architectures from other big data vendors, or even from Amplab, where they have shown that disk locality is relevant in the data center, if you think. As long as you have a fast network, the CPU becomes the bottleneck. Right, right, and okay, but generally speaking, you know, your customers obviously want to do stuff on-prem. You guys have been advocates of the hybrid model for a while. Maybe talk about that. Give us your perspective. Yeah, so, I mean, our initial solution focuses on-prem. I think we all agree that, especially in small, medium enterprise and even on the upper end of an SMB type, you know, small, medium business type customer, the concerns and security needs and especially a number of our early adopter customers for our platform have been heavily regulated. The data's not leaving the building, right? It's just, they're not doing it. It's not a cost concern. It's not even a size concern. They just don't want to let go of the control. We totally get that. However, there are subsets of those work flows and workloads that do make sense sometimes to burst to the cloud. Now, obviously, from a federation level, we have the benefit of having the best hybrid technology. We acquired Virtue Stream, I'm sure as you guys know, over the summer. That deals final in there. They're a full federation member. What, you know, we're all getting aligned and figuring out how to best engage with them, but I think that's going to offer us then that next layer of ability to... We try to integrate the platforms together to then be able to offer that, not just even data burst, but actually burst compute need out to the cloud, do your work and then bring the results back down with you and spin up, spin down cloud. Yeah, and you got a couple options there. I mean, Virtue Stream's one, vCloud Air obviously another one. I mean, Virtue Stream, of course, big SAP presence, but that's sort of inning too, right? Yeah, exactly, yeah. We just don't see a lot of customers in this particular space that our wholesale put everything in the cloud or wholesale on-prem. There is a mix, there's generally a hybridization. However, the bulk of them much prefer to build this in-house. They just want the cloud like experience, they just want it in their own building, right? Everybody wants to go to Amazon, but nobody wants to leave the house to go. Well, I think what's really relevant here is if you look at what Amazon, Google, and Microsoft are doing, the appeal of it is they're helping people build this data pipeline as a service, but you're essentially replicating that on-prem, I mean, that's the concept, right? That is Blu-Rata's tagline at the conference as well is build Hadoop as a service, Hadoop being kind of the broader term for big data, you know, on-premises in your data center. So, and I think that's kind of what customers are looking for is very simply put, how can I get that Amazon-like experience on-prem that have the level of elasticity? I mean, think about the ecosystem that we just talked about. If somebody wants a pipeline with a BI ETL tool and a Hadoop cluster and no SQL platform, they're going to have to go rack and stack servers and build out that pipeline and if something went wrong, they would have to wait six or nine months to set that right. I think with a compute storage separation and the ability to sort of automate these pipelines and orchestrate those pipelines in a software-defined fashion, it's got a kind of software-defined big data, you can mix and match based on the workload you have, different tenants, multi-tenancies, another way of saying Hadoop as a service. You can have each line of business do different pipelines based on their specific use case because the workloads are different. There's some streaming use cases, there's batch analysis use cases and there's different functional use cases across supply chain and marketing and so on. And the solution is a Hadoop solution, it's a big data solution, it's a governance solution, it's kind of all of the above. Yeah, really what we're building is an analytics platform, right? We're focused 100% on the end-to-end capability of analytics, I need to get the data in, I need to store it, I need to be able to analyze it, I need to take those results, those insights that I create from that and surface those and then I need to be able to drive some sort of activity with that, right? My boss Aiden who you had on this morning is very fond of saying that copying and pasting insights into PowerPoint is bad because that's where insights go to die, right? And what we're doing again, and we were talking about this earlier is that the top layer of the Federation Business Data Lake platform is built on Cloud Foundry, it's built on that app deployment and app management engine. And by bringing the data fabric, the data itself, the analytics tools and capability and the application platform together in one package, I increase that data gravity, I pull everything. So that I can analyze data at rest, I can, if I, oops, I goofed up, this cluster's not big enough or it's too big or it doesn't matter anymore. And most importantly, after I create those models and I get those insights, my application platform is right there on top of it, running in the same framework with the same security and governance and policy management wrapped around it, and now I can actually start to drive behavior, customer behavior, employee behavior, whatever it might be. And what about the services catalog, the menu, if you will, of all the Hadoop capabilities that I want from the distro, the various types of distros, the various streaming options that I have, the zillion projects that are announced every year, how are you handling all that complexity? So yeah, in Blue Data, we announced the app store concept in our product. So we have an app store, which is essentially Docker images of all of the leading Hadoop distributions, multiple versions, Spark standalone, for those who are interested in running Spark. And we've also now added, we had an extensive partnership with a whole bunch of BI, ETL, visualization vendors, as well as search vendors. So we've also included the images of those applications in our app store. So it's really like an iPhone experience as I was speaking with Chris earlier. You know, when you go and install an app on iPhone, you know, it's going to work, and you have one person to talk to if that doesn't work. So similarly with Blue Data, being that sort of software-defined level layer, we have an app store-like experience. You want to get a BI ETL tool? Sure, you can get one click and you get Hadoop under the hood all wired up because it's software-defined. And the next person, the next team that comes in wants a different kind of application because, you know, there's a right tool for the right job. They can go ahead and do that. And whatever distro I want, I mean, if I want the ODP, if I want, you know, YapR, Cloudera Manager, that's all. Yeah, so today, Blue Data supports Cloudera with Cloudera Manager, Hadoop, HTTP with Ambari, and we're going to be soon launching, especially given that the ODP platform is on Ambari, it makes our life a lot easier to support big insights and pivotal HD and in fact, we've rolled out an application workbench that makes it super easy for folks to register images of their own Hadoop distributions and sometimes folks have patches and very custom things that they've done. They can do that as well. And we're using Docker containers to store these images and Docker, that's one of the key sort of advantages of Docker is application packaging. So we're making it easy for IT teams to be able to register their own applications. We ship with about 15 of them out of the box. That's kind of a guide to how to add new applications. And pretty much any popular capabilities that I could think of. I want ZooKeeper, I want Scoop, I want Floom. Yeah, typically there's two classes, Scoop and ZooKeeper, they'll usually come with the Hadoop distribution, the more kind of the borderline ones are things like Kafka, do you want to standalone or do you want it with Hadoop? If I want data torrent. Do you want data torrent or do you want a distributed BI analytics layer? Any distributed platform? We have the capability to orchestrate that. Pretty much anything that you would potentially build on your own. Exactly. And that's where I think the FBDL fits in is because I think when you go to customers you can offer them a whole variety of ecosystem products but there'll always be this one product that may not be in the list out of the box and they will want to add that at some point. Okay, and the go to market obviously is through EMC's channels, right? Yes, yeah. Okay, yeah, because your sales force is slightly larger than the blue data's. Incrementally. So making stuff easy is hard. So maybe talk about how you did that and maybe talk about the team a little bit. I mean, I presume many of the team from VMware but a lot of other parts of the world. Yeah, I think we've got a rockstar engineering team, folks from VMware, Cisco, Intel, folks who are kernel hackers and so on on the engineering side of blue data. You know, our coming out party was last year, same event, we won the best of the show award and then since then we've made a tremendous progress but really I think the big change in the last 12 months for us has been we started off with virtual machines. We really worked hard with our partners, Intel and many of our customers to really kind of improve the performance in a significant way for some of the compute intensive jobs and then along the journey we partnered with a whole bunch of ecosystem players, you know, with Hortonworks, Cloudera, MapR and so on as well as VIETL vendors as I mentioned and the biggest move for us has been to Docker containers. We moved away from virtual machines to using Docker containers as the building block for the infrastructure, for whether it's Hadoop clusters, whether it's VIETL tools, couple of benefits there, a number of our customers actually wanted virtual machines. So they said, you know, I can give you a 10 virtual machines instead of physical servers, can you deploy on that? So this really makes it possible for us to deploy on virtual machines and I'll leave it to your imagination on where else this can go once you run on virtual machines. And you know, we've got dozens of customers, you know, folks who are using our product with Isilon and have actually demonstrated performance benchmarks where we've, you know, just hands down, beat HDFS on disks or remote HDFS because our data type can connect to remote HDFS as well as remote NAS. So a lot of great progress and, you know, our team has grown. We were about 20 people last year, we're, you know, just about 14 now. We had a round of funding one month ago where we were, we raised our CDC with Intel. So things are very exciting and we're really looking forward to partnering with EMC and making our technology more broadly available. Chris, I'll give you the last word, you know, thoughts on the event this week, sort of the main takeaways from the show? Yeah, I think the kind of the big three for me are, you know, the make it simple message, right? That we're all 100% bought into. The kind of the subtext underneath that is we're all in this together. Just like you guys have said, it's, you know, we're sitting in a room with Cloudera and Hortonworks and Pivotal at the same time, you know, and because you have to, you can't go to market closed anymore. You've got to be able to support one another in this ecosystem. And so, you know, we're bringing together a lot of organizations that have traditionally not gotten along particularly well. And it's exciting and it's exciting to see those barriers coming down because it's important for the customer. And that's really what the final takeaway from all of this is, is that, you know, seeing the customer attendance go up so heavily and seeing the, you know, kind of move away from these sort of the whales of industry, right? And these giant customers that are generally represented at Hadoop conferences in the past and starting to see, you know, there's some small business customers here. Like, this is pretty cool, this is exciting. And this is the right time for everybody to be coming together in this message. I mean, it's perfect timing for this. Awesome, but there's a huge need in the marketplace for this. I mean, it is, you know, people can make it work, but a lot of people are struggling and really the market needs that John Furrier said this the other day, really start to focus on the value, stop talking about Hadoop, start focusing on business value. So gentlemen, thanks very much for coming on theCUBE. Really appreciate it. At Kumar Heal Up, you know, so we could take that long walk to the Sequoia event that we did recently in San Francisco, you wouldn't be able to make that, but hope you're feeling better. All right, keep right there, everybody. We'll be back. This is theCUBE, we're live from Big Data NYC at Strata Hadoop World. Right back.