 Okay, we're back here live at Stratoconfer, I'm John Furrier with SiliconANGLE.com. This is theCUBE, our program, we go out to the events and talk to folks that are in the hallway, keynoting, entrepreneurs, developers, we have the CEO of InfoObjects, Yashi, Rishi, Yadav, Yadav, is that how it is? Okay, great, welcome to the program. So tell us about your company. Yeah, so we are around seven-year-old company. We have always been focusing on open source technologies and from last one-and-a-half years, we have got into the big data Hadoop. Our positioning is same, which was before Hadoop, that we always, always work with the open source stack, nothing else, nothing proprietary, so that's the positioning we have been having, and we are getting a lot of traction with that. So let's talk about what you're seeing on the market today. So what's your take on the whole competition? We just had Intel on, Green Plum announced the distribution, Hortonworks, Cloudera's, now there's all these different distributions. What do you think of that and what's your point of view on that? Okay, so honestly I would say the services companies like us, they kind of lose money because of it. The reason is that it delays the decision-making process of the client. I mean, if somebody has to spend, say, couple of million dollars and they see, say, five or six distributions, right, they get confused and all those distributions look similar. So be specific, what do you mean you're going to lose money? So the decision, the slowing down of the, of which version to go with slows down the purchases? Exactly, so what I see in the last six months is that we see a lot of POCs happening, right? So there's a lot of interest, you talk to anyone and they say that they have big data and they have big problems to solve, right? But a lot of POCs are happening, but it's not moving much beyond that because everybody is waiting and watching. Okay, so you think it's a bad thing for the ecosystem? It's a bad thing for the ecosystem, when everybody has different flavors. But if it's all Apache, the argument would be? So what's happening is, I mean, you see companies like Cloudera and the hotel works and I really thank them because they have contributed so much to the open source. But at the same time, what happens is that, one thing is that the pure Apache distribution, right? But then what happens is that everybody adds something proprietary to it, right? Whether it's Intel manager or Cloudera manager with all the good things they have in that. But so what that does is that leads to, that leads to the delays in the decision making that which one they want to go with. So Cloudera goes out CDH, which is their management. It's still open source. So does that slow you down? Cloudera manager is not open source. The CDH. The CDH is their Hadoop distribution. Oh, their managers, they're proprietary. Yeah, but CDH is open source. CDH is open source, that's modeling. So are they cutting you out of the services business or? No, I'm not saying, no, we are cutting a lot of traction and that's the reason I'm saying that. I mean, thanks to Cloudera and other companies that they've opened this whole thing. They actually commit. So let's talk about EMC Green Plum. What do you think about their announcement, Pivotal HD? Yeah, so Pivotal HD, again, that's again, they have their own proprietary stuff there and that again leads to more confusion in the market. Do you think they have a credibility issue? EMC being EMC, I'm not sure they would have a credibility issue, they are way too big for that, but overall. And the community might, because they're obviously causing, they actually don't care, right? They're going in saying, hey, we're going to do our way and they have their own clients that they need to take care of. Exactly, yeah. So community acceptance is, that becomes a challenge. For Green Plum? For Green Plum kind of companies. I mean, I don't want to name companies, but yeah, for the big players. I just did, yes. What about MapR? What do you think about MapR? So it's an interesting thing. So what MapR does is, so on one side you have this Apache Hadoop distribution, right? And on the other side, you have API. So what MapR is doing is, they are saying that, okay, we will comply with the API, but we will have our own distribution. But they're clear. I mean, you know where they stand, right? Yeah, that part is good, I mean, but at the same time, what's happening is that, what do you call a Hadoop distribution? If you just, if you have a software and you just provide a Hadoop API on it and that becomes a Hadoop software, or you just bundle everything together, like what Hortonworks is doing it, and if that is a Hadoop software, I mean, what do you call a Hadoop? That becomes a question then, because, and if you see the big data, I mean, you see every company, everything is big data in Bay Area at present, but. Yeah, we even were joking that SiliconANGLE is going to come out with its own distribution of Hadoop, and just to kind of just jump in like everybody else. So you're saying, so basically your thesis is, okay, this is fragmenting the marketplace. Slows down the pace of deployment. Exactly. Okay, so that's not good. So, you know, news flash, all this kind of slows down, in a way it's a cold war going on, and so this kind of slows everything down. I do call it a Hadoop war, and I say it's a very local war, so the war starts somewhere around Palo Alto, and it goes all the way to city, and there is a company which provides all the weapons for the war, and they are sitting in Seattle, that's Amazon. So yeah, that's what we joke in the company. It's a good joke. Let's talk about Amazon for a minute, because Amazon obviously has made great strides with AWS. I mean, we love Amazon, we use Amazon, developers love Amazon, Shadow IT, I wrote a post today about a company called Skyline Networks, just launched a great security product, targeting the cloud, and it's awesome. So there's a lot of clouds happening, we love the cloud. However, Hadoop and moving data in and out of the cloud is challenging. What do you think about the cloud aspect of Hadoop, supporting Hadoop? Is it, how early is it? Is it ready? Is it expensive? What are you seeing with clients? Are they moving there? So, my take on is this, that as everybody knows that data has gravity. I mean, in our office in a small company, we are setting this 10 terabyte cluster, and it's going to take us 20 days just to move it, based on our 10 GB. What, from bare metal to cloud? Or a big cloud? No, I'm saying just moving data. I mean, even if you just want to collect some public data, which is 10 terabytes, it's going to take 20 days to do it. So the point I'm trying to make is that, whether to adopt cloud or not, I think it depends on companies where their data is sitting. I mean, if you're starting something completely from scratch, and you can choose to put all of your data in AWS in S3, then, because compute is easy, compute doesn't have any gravity, but the data does. So, gravity equals lock-in. So, if I go to Amazon and put all my data in the cloud, do I get lock-in there? So, gravity means two things. Number one is that the data, it takes time to move data from one place to another place. And the second thing is, yes, I mean, the data, I mean, you are logged into Amazon and you really want to do it or not, so. Let's talk about Hive. So, Hive has been kind of the whipping post these days for everyone. I mean, blaming Hive. Hive's been beaten up. I mean, Green Plum basically came out and said Hive's, they kind of use benchmark, so I'll just say he's not holding up. And then they even went further to criticize Impala, which is a competitor's network. But Hive is very well respected in use. So, what's your take on that? Is it warranted? Is Hive really kind of sucking wind right now? Is it working well or? So, I think my take is that the reason Hive came into the place was to increase the adoption because there were these people who, the only thing they did was SQL. And you didn't want, I mean, I've done Java for 15 years, so I love Java. So, for me, probably writing a map-produced job is still faster than writing a Hive query because that's what we have done. But, so, Hive did solve a big problem, but now a lot of people got into the latency issues because you don't want to wait for half an hour for your query to run, right? So, I think a lot of, I mean, a lot of interesting things are happening there. I think Google Dremel also, I think that also changed things that you can come up with a lot of innovative solutions, so. What do you think about Amazon's elastic map reduce? Have you played with that at all? Yeah, so we do, I mean, and for a couple of our clients, I mean, they are running on EC2. So, that's great, but as I said that it depends on where your data is residing if you're starting a new project and you try to, and you make a strategy that you'll put all the data in S3 or EBS now, which is another thing which works far better with EC2, right? Then it becomes easier, but if you have your data, say you've got even say 100 terabytes of data which is residing in your data center, right? Then it becomes a problem, then you think two times before moving the data there. What are some of the biggest use cases you're seeing with your customer base right now with Hadoop? You said a lot of proof of concepts. Someone was kicking around a statistic earlier here in the morning was one in five POCs make it to production. Are you seeing the same data? Is that realistic? And what are some of the projects you're working on? So, it's kind of interesting. I mean, I've tried to keep it high level because we have a lot of NDAs signed, but there is one company. Yeah, you don't need it to name names. Yeah, yeah, so for example, there is one company who has got almost 70% car users data in America. And it's an old company, almost $3 billion company. They had that data forever. But now with Hadoop, they are thinking, well, I mean, we have this data, but let's try to make some sense out of it, you know? Maybe let's try to get some marketing dollars out of it. Right, so that's one thing. What do they do? So, how's it stored? At present, they store in a regular Oracle and other relational databases and some of the data is not being stored at all, right? In the logs and all. But now with the Hadoop, they have this option, right? So, we have all this data. Let's collect the data and let's try to make sense out of it. Besides what's happening is, for example, one company I was talking to and what they had was that they had location data, lat and long data, right? And they wanted to merge the data with OpenStreetMap OSM and then try to make some sense out of it for advertisement purposes, right? In the same way, there's a company which collect the, they have electric sensors, so I mean for the light sensors and all. And they are saying besides light, but we are also sensing temperatures and all other variables, right? So they are saying, okay, can we make some sense out there? So, I mean, what I mean is that any company I talk to, they have data and they have a lot of use cases of data, right? But how to derive intelligence out of it? Number one is that they have data, they don't know whether there's any KPI, which actually they're going to get it out of it or not because they never got it, right? But now they see this promise with the Hadoop, right? So, I mean, so they say that, you know, this thing looks interesting, we definitely have interest in that. So. So. Rishi, my final question to you is as we end this segment, thanks for coming on, by the way, and giving your perspective on the Hadoop Wars here, is what's your vision with InfoObs? Have you been in Java developed for a while? Looking out of the landscape of the big data world and watching it grow and become competitive aggressively with some of the things going on now, what do you think is going to happen? What's your vision for the next couple of years? So, it's kind of interesting, I think this market is going to explode, not expand. I mean, that's the, I mean, we have been growing at almost 50, 60% every year and with our whole focus on Hadoop, I'm assuming it's going to be probably two to 300% per year from this year on. So, we definitely see a lot of growth in the market and what we do is, so we had this interesting position in the being a services company and being an open source purist three or four years back, it was not a big deal, right? But now with Hadoop, we have this position that people love our position and we can get some traction using that. Well, you're an open source purist, I want to just ask one more question because that's a great term. What does that mean? I'm from the old open source days when it was just starting, we're now in our fourth, fifth generation to be able to talk to an open source, it's a standard, open source is the way and open source always wins. So, obviously, you can take that prediction and saying proprietary will lose. So, that handicap screen pump a little bit, but that being said, what does open source purist mean and what is the business model for open source? Obviously, Red Hat was a success and when I remember when Cloudera and Hortonworks came out, they were called the Red Hat for Hadoop. They're not using that anymore. A little bit different business now, but how do you see the business model we're evolving or does it stay the same? Obviously, support's a key part of it, but open source will continue. We all agree on that. What is the business model and partnerships look like? So, it is interesting for a company like ours, what we are doing is that we have three things in the business model. Number one is the training. Training doesn't earn us a lot of money, but it works as a catalyst. Number two is that the resources, because when we train resources, a lot of companies need these resources. So, the skill gap, which is, I mean, every other article you see about open source, they talk about the skill gap now. So, that's where I think there's a lot of money to be made for a pure services company like us. Like, what kind of resources? What do you mean? Like, people resources? Programming resources. People resources. Programming resources. That itself is a... Yeah, got it. And maybe some automation, maybe some software that you might write out. Exactly. And the third thing is that implementation and support. Okay, just making some notes here. Love that. And we're always covering, because Hortonworks is pure play. They say they're pure play, but yet they have Stinger and they announced two more Apache projects. So, do you like the Hortonworks business model? I mean, they claim 100% open source. Yeah, so if I have to choose a company which we come closest to, that's Hortonworks. No doubt about that. Not Cloudera. Hortonworks over Cloudera. Because of the management piece that Cloudera has. Yeah, Hortonworks. I mean, if I have to choose between Hortonworks and Cloudera, I would say Hortonworks. But Cloudera, I mean, I am really thankful. They started the game. They have really contributed a lot to the community. They have a nice business. I mean, they have a nice, they balance it. They have a nice balance. And they're doing good work. They have the contributors. Okay, thank you for coming on theCUBE. Open source breakdown of the business models. We'll be drilling down tomorrow on business model partnerships. Business model in Hadoop and open source tomorrow here in theCUBE. We're going to be back with our next guest after this short break. This is theCUBE's SiliconANGLE's coverage of O'Reilly's Stratoconference. We'll be back with our next guest after this short break.