 The Cube at Hadoop Summit 2014 is brought to you by Anchor Sponsor Hortonworks. We do Hadoop. And Headline Sponsor WANDISCO. We make Hadoop invincible. Welcome back everybody to The Cube at Hadoop Summit. I'm Jeff Kelly with Wikibon. My next guest is Ankar Gupta, head of sales and marketing at Metascale. Ankar, you've been on before. Welcome back to The Cube. Thank you Jeff. So for our audience, it's not familiar with Metascale. Why don't we just start with a kind of level set. Tell us a little bit about the company spun out of Sears, kind of leveraging some of the big data skills and knowledge you built up there. Tell us a little bit about the company and kind of the business model. Certainly. So Metascale was born out of Sears, born out of a very large enterprise where Sears started on its big data journey several years ago. And what we have seen that there is a need for a strategic partner like Metascale in the big data space that's growing with the number of vendors that are out there because while there are a lot of vendors who are providing different kind of services, analytics, infrastructure, big data as a service, platform, cluster, everything. But because the technology is so new, you're looking for a trusted advisor who has done it in a large enterprise. It's one thing to do a couple of POC, fire up a few clusters and try a few use cases. But when you actually implement in a large enterprise, you run into pain point that you have not experienced even thought through before. And that's where someone like us come in there. We would, because we have done it in a large enterprise, we were born out of a large enterprise. We have gone through these mistakes, pain points, decision trees and whatnot. We can plan for it well in advance and make sure that you don't have to go through it. So Metascale as a big data company of a large enterprise helps other companies accelerate their big data journey. So they don't go through the same point that we did in our parent organization and they can seamlessly have a long-term strategy for big data while gaining results quickly. So that's where we are. So when we talk about big data, you're talking about, among other things, Sadoop and some of the related technologies. So I know you guys had an announcement that seemed pretty interesting. Tell me about this. It's a big data service. So kind of explain kind of the new offering and how it fits into, how it would be consumed by an enterprise customer. Absolutely. So one of our main offerings is big data service, actually managed services where we provide complete end-to-end infrastructure. So an enterprise doesn't have to think about what distribution to use, what kind of reference architecture to use when you're looking for big data implementation. We provide Metascale appliances that we announced a couple of months ago that's ready to go plug and play. We actually move your data from your data warehouse into the appliance and have you focus on the business side of it where we take care of the infrastructure. We manage it 24 by 7, either remotely or on-premise. What we realize that a lot of companies, especially on the business side, companies that are say marketing organization or finance organization, they don't care what is underlying infrastructure. They want the results. They want that analytics report quickly so they can make business decisions. So a lot of companies that we run into, they're because of a large company bureaucracy or not having enough funds or whatever the reason may be, their big data implementation may be taking some longer than that was expected. So we announced big data as a service at this conference where if you as a company is looking to get end results, if you're looking for the report whether it could be social media analytics or sentiment analytics of customers about your brand or your products, you can get that report from us directly. So we will use our infrastructure, our clusters, our Hadoop infrastructure and then we will use our resources to do that custom development for you and provide you the end report that you're looking for. So again, you don't need to worry about setting up the infrastructure or worrying about hiring resources to do coding or anything of that nature. We will take care of all that and provide you the end results. Yeah, you don't have to worry about waking up at 3 a.m. in cold sweats because your cluster failed and something went down. That's something you just kind of abstract away all that complexity and let your clients focus on their business. Absolutely. Well, at 3 a.m. you will never have to worry about work with us because we will be managing the infrastructure. You might be up at 3 a.m. but they won't be, your clients will. You get them some peace of mind. So something you mentioned there, talking about the on-premise model was that you'll help your clients take data out of their data warehouse and move it into the cluster, which touches on the bigger question that we've been talking about all day here and yesterday as well and in some of the sessions as well about the relationship between Hadoop and the data warehouse. Now, there's 88 plus vendors here and some of them have a very vested interest in the data warehouse market, understandably. So we hear a lot about a complementary approach and that Hadoop is not going to replace the data warehouse. We've heard a few contrary voices on that. But for the most part, its being positioned as very complementary. You're out in the field. What are you seeing in reality on the ground? How are people looking at Hadoop as it relates to their data warehouse? Well, I mean, similar to what you just said, by the way, great session this morning with two very interesting organizations together. Yes, thank you very much. That was a fun... I think you managed it well with the two competing organizations. Yes, that was a fun session to do with Arun Murthy from Hortonworks and Doug Cutting from Cladera, two of the critical players. Certainly, and I think if you ask those guys, they will say that Hadoop will replace the data warehouse. But what we are seeing in the field really is it is a complementary system. So a lot of organizations, including our parent organization, we've used Hadoop for all batch processing, so kind of warm to cold data. But then for all your hot data, or say, RDBMS processing, or your quick processing you're still using the existing EDW that you may have. But Hadoop has fit really well with this ecosystem if you design the data warehousing really well. Or account for Hadoop infrastructure with long-term goals in mind. So what we have seen again in the field, a lot of organizations that started using Hadoop for as rudimentary use cases as just for storage device, move all my log files or outlook PSD files even to Hadoop first. But now from there, they're now talking about can I use Hadoop along with another NoSQL database like Cassandra, HBase, or MongoDB or something like that and do more real-time processing and whatnot. So I don't think that organizations that have already invested in EDW are going to throw everything out and say, let's replace it all with Hadoop. That doesn't seem practical. It may happen a couple of years from today when they're out of the contract. I doubt though that they're going to take everything out and large organizations don't make this. In fact, they're taking too long to have that Hadoop infrastructure come into their existing data warehouse setup today. So really, practically speaking, we don't see Hadoop replacing all of what organizations have today even if it is capable of, it may be capable of today. But what we do see is Hadoop becoming a really good complementary system. The other thing is organizations do not need to spend a lot of money in expensive EDW systems anymore. So instead of growing their investment in whatever boxes they may be using, they could use Hadoop for a lot of backup storage or archival mechanism and then, as I said, for batch jobs and cold data for the most part. You touched on something. It'll be interesting as this market progresses what the impact will be on the EDW vendors. I agree, I don't think it's going to replace the EDW, but I think you're going to see revenue start to stagnate potentially at some of the data warehouse vendors. And it'll be interesting to see how they kind of adapt to this new paradigm because it's a really different approach from a cost perspective, from the way you process the data. That'll be interesting to watch. So another big trend here, of course, is SQL on Hadoop. Keep hearing about this from a 90 number of vendors who have their own approach to this. It can get a little confusing for customers. How does menace scale approach that? You're getting a lot of requests from your clients for those types of capabilities. What's your thought on is SQL the best way to start to interact with data in Hadoop? What's your take? It's interesting, while a lot of organizations are taking time to adopt Hadoop, the organizations that have adopted Hadoop or our existing clients are asking for more. And one of the new asks, or the ask, is now NoSQL. So our clients are looking for more real-time type of processing and looking for databases such as, as I said, Cassandra, MongoDBs of the world. Truly think that NoSQL is becoming more and more mainstream. The challenge is, though, NoSQL is not pretty today. It's your existing DBAs, I have DBAs on my team who are 20 plus experience on managing relational databases. And for those DBAs to move to NoSQL wasn't an easy task. They will learn over time, but it will be hard for them to, you know, the relational databases that they've been used to to move to NoSQL, certainly. But clearly, NoSQL seems like picking up more and more just because our data is more and more unstructured data. You're seeing more video file analytics. You're seeing more audio files such as customer call data and whatnot, semi-structured data. And NoSQL seems to process that data much better. So clearly, NoSQL is becoming more and more mainstream. But I think I could see tools that move data from, say, your RDB-MS to NoSQL, but data from RDB-MS to Hadoop. I think generally people say it's SQL to NoSQL, but I guess it's primarily RDB-MS to Hadoop is what the main term is. Well, that's interesting. So you're seeing NoSQL what's the style of deployment you're seeing? Are you seeing Cassandra being deployed basically inside Hadoop or are you seeing kind of side-by-side? What are some of the deployment models and what are some of the use cases where they're bringing in something like Cassandra or MongoDB alongside Hadoop? Yeah, so for a customer, for one of our customers on retail side, we actually just put their real-time inventory system using Cassandra. And we chose Cassandra because the implementation was, the design for the implementation was done a couple of months ago and it wasn't as strong as that time as it has come a long way as strong as it is today. So what we're seeing is Cassandra, for that particular use case, Cassandra is a standalone system. It's its own cluster. So what we did was use Cassandra, took data from the POS system and used Storm Kafka combination to move data in real-time from POS to Cassandra and then from Cassandra to Hadoop to use Hadoop as an enterprise data warehouse. But we put analytics on top of Cassandra. So one of the analytics engines, whichever one you want to use, there are tons, as you said, 80 plus vendors, a lot of them provide analytics, put it directly on top of Cassandra and once the data is in Cassandra, you can do whatever BI you want. Oh, okay. So you're doing some kind of operational reporting right on top of Cassandra. You're also moving the data into Hadoop so if you want to combine that with other sources and do some more historical type analysis and things like that. Exactly. So what we did for this client was we used Hadoop, moved data from Teradata boxes to Hadoop and built Hadoop as an enterprise data warehouse for them. So you had a single source of truth for that data and then for real-time we moved data into Cassandra but then from Cassandra again, put all the data into Hadoop so like you said, do all the historical analysis and whatnot as you want and store the data in Hadoop. Yeah, I mean it's really, like I said, it's really interesting the deployment models we're seeing in some of the use cases. And it mirrors in a way some of the more traditional ways we worked in the old world if you will with the transactional database in your analytic data warehouse but now the scale is so much bigger. The types of data you can incorporate and the speed, it's got to be real-time. It's pretty much a requirement these days. Absolutely. And if we were talking three years ago about Hadoop being real-time, you would have been laughed at but we're seeing it now where people are building real-time systems. What's your take on just Hadoop specifically on how quickly it's developed and what do you attribute that to? Is it the community? Is it the vendors? Is it a combination of the two? Yeah, so I truly believe that Hadoop is not a breakthrough concept. I mean distributed file system and parallel processing has been there all along but I think what made it really successful is one, cost is major player but then open source community. I think it's, what has been done on top of Hadoop is phenomenal and how, like Duck Cutting said this morning we never expected this to become this big and I believe the part of reason is if it was an open source I personally do not think of it as big as it is today and there's more and more development happening so certainly that's one big reason. You talked about real-time, certainly. I mean in fact at our booth here at Metascale Booth, what we are demonstrating is real-time analytics on Twitter data, sentiment analytics on social media in real-time. So you can actually go, we have put a single node cluster, local cluster, Hadoop cluster here in one of our laptops and you can type the keyword you want. In real-time we are pulling data from Twitter and actually providing sentiment analysis using natural language processing and whatnot to you as you type the keyword and collecting the tweet. So it's phenomenal that how you can use a combination of Hadoop and one of the NoSQL databases. In our case actually we're using MongoDB for this use case but as I said you could use whatever NoSQL you want but certainly Hadoop has come a long way. In fact we were the keynote, Metascale was keynote at Hadoop Summit two years ago where we talked about some of the use cases and those use cases were primarily around batch processing. It's very interesting, I was looking to the keynote this morning and I was hearing that analytics 3.0 and then your discussion with Doug Cutting and Arun Murthy and I was really thinking through two years ago when we talked about it was cutting edge two years ago what we discussed was about batch processing using Hadoop and now you're talking about analytics 3.0 and real time processing machine learning and artificial intelligence certainly come a long way I believe. It is moving fast and you know a lot of it is due to the community so you know for our audience out there that isn't here at the show describe the vibe here what's your sense of kind of the vibe I'll just quickly give you my take and I think it's you know it's definitely got that excitement to it but there's also a sense I find a little bit more serious tone this year than maybe in past years and I think part of that is because we're talking about things about enterprise grade capabilities like security and governance which maybe Arun is sexy as some of the analytics but critical to really make this an enterprise grade platform what's the vibe from your perspective of the show? Yeah so there are a couple of things one the food is great so everybody's love. That's always good because it's a crowd shoot with conferences you know sometimes the food is... When you have a conference at such a large hall you and people are walking a lot you want to make sure that they're fared well but other than that jokes apart the key is like you said the tone is more serious I guess in previous conferences people were kind of testing the water they're still testing the water but in a more serious manner they're here to talk about POCs they're here to talk about use cases I see a lot of large enterprises more than what I saw before it used to be more vendors or typically e-commerce web-based organization that were early adopters of Hadoop now you're seeing more mainstream organizations that and multiple folks from multiple departments within the same organization so it seems like the not only organization are looking to embrace these technologies but multiple departments within the same organizations are you know so probably they have more the multiple use cases or you know a long term view view on Hadoop and what not so it's exciting to see it's exciting to see new development exciting to see traditional EDW vendors you know having their big booths here with Hadoop as part of their solution so clearly I guess they have realized that you know there'll be a part of it be in or be out yeah I mean all the big whales in the industry are here and you know they're some are taking a little they're taking different tax in some cases but they all realize I think that they need to be in this market it's the way of the future I mean it's either get in now or you're going to be left behind I think exactly but you're the one who write about it you're the expert in it how are you feeling about the whole thing well I think it's I think Hadoop is going to be a challenge for some of the larger vendors to adopt it's sort of you know it's sort of the innovator's dilemma problem but I think you know part of it is the business model is different in this environment than a traditional software sale so I think that's one challenge that the big players are going to have a little trouble adapting to I mean it's very different to sell I think into when you've got an open source community and when you're trying to also I mean we need to start moving kind of beyond some of the talks of speeds and fees and talk about business outcomes so you've got to talk to business people you've got to be able to talk their language so that's I think going to be a challenge as well you know some of the vendors are doing a better job than others but long-term I mean it's encouraging that they all know that they have to be in this market you know we'll see it'll be really interesting and you know five years ten years to see which ones were you know in this for real and which ones were maybe not in it first how many of these organizations will merge together well that's true I mean well there's eight companies out here and a lot of them are really small startups and so that'll be interesting too to watch you know some of them are going to get snapped up some of them will hopefully go on to be viable independent companies some of them will go away I mean that's just the nature of a fast moving emerging market but you know that's what makes it fun to cover as an analyst as well well we'd like to read it on wikibon wikibon and quick pitch we're going to be doing a webinar together coming up we haven't set the date yet but stay tuned for that and keep an eye on for that we'll promote that and let you know where you can see that sounds good no wikibon is a trusted research area and we're really proud to have you as a partner in a research webinar where we could provide an enterprise perspective and you provide what you're seeing in the industry so certainly invite everybody to be a part of that research and you can see the details on wikibon website or metascale website yeah when we nail down those details we will certainly get them to you to our community so Ankur thanks so much for coming back on theCUBE appreciate it wrapping up day two here at Hadoop Summit