 Live from the San Jose Convention Center, extracting the signal from the noise. It's theCUBE, covering Hadoop Summit 2015. Brought to you by headline sponsor, Hortonworks. And by EMC, Pivotal, IBM, Pentaho, Teradata, Syncsort. And by Atunituandisco, now your host, John Furrier. Okay, welcome back, everyone. We are back here at Hadoop Summit 2015. Live in Silicon Valley, it's theCUBE, SiliconANGLE's flagship program. We go out to the events and extract the signal from the noise. I'm John Furrier, founder of SiliconANGLE. I've got my two guests here. Rishi Yadav, CEO of Info Objects. Welcome back to theCUBE again. And we have Juan Esenho, lead technology architect Rockwell Automation. Guys, welcome to theCUBE. Nice to meet you. Thanks for coming on. Great to see you again, Info Objects. Business is good. Last year in theCUBE, you said Spark was going to be big. Hey, it's big, it's Spark Summit's next week. What's going on? Give us a quick update on Info Objects then we'll talk about what you guys are doing with some of your customers. Yeah, so Spark has been a blessing as we talked a few months back also. So the whole big data ecosystem has stabilized a lot. And whether it's HDFS, whether it's Spark or whether it's Cassandra. So now it's more like even clients are very knowledgeable. They know what they are looking for. So that kind of makes it easy. The place where they get confused is all these distributions. I remember when two years back when we were at Strata and I think an hour before my interview, the Intel had a distribution and our point was that how many distributions this planet needs. So now the number of distributions are far less than what they were two years back. But still sometimes it becomes confusing enough for the clients. So and that's where we come into the picture, right? We come as a vendor neutral partner to our clients. So we are on our client side as opposed to any specific Hadoop vendor side, right? And then we sit with the client. We understand their requirements and then we suggest them that what exactly is going to work for them. What is best for them out of all the different distributions and the open source versions which is a big thing out there? I like to get Juan's perspective on this, but before I go to Juan, I want to ask you a question because earlier today we had an interview with Pivotal and EMC, and it came out in other interviews as well. But really that was the highlight was the whole consumerization thing of IT. That's now digital transformation. One, two, it's a services led market right now because there's so much technology out there that the bridge to the future has to be built. You got to have one side of the bridge and then you got to have a partner to go with. So that's kind of, we've talked about that in the past, but the word future proof is back. Future proof is back. It's a word that's been around for a while. That means lock in or hey, whatever. But there's real issues now. Future proof is a real deal. So I got to ask you in your role with customers, is that part of the deal? What are you guys doing to be that partner? Because being vendor neutral means you're certainly making a decision. You're partnering with the customer. That's a services led approach. But there's still technology to construct that bridge to the future and still maintain the future proofing. So explain how you guys fit into that paradigm. Actually it's a very good question and what has happened with the open source is right. Now whether it's a Hortonworks distribution or it's a Cloudera distribution, they are going to use the same open source version of Hadoop and then they added a lot of extra value added features onto that. So the idea is that in the future, no matter what happens to these companies or any other distributions, right, the pure open source will remain there, right? I mean, there's a lot of companies like us who understand open source or tomorrow if any issue comes, you're not really stuck to any specific vendor, right? Because the base is same and which is open to everyone. That everybody can look at the code base, everybody can improve to the code base, everybody can contribute to the code base. So in fact, the open source movement has taken the future proofing, as you said, to a completely new level. So Juan, talk about the relationship and how did you guys pick info objects? What's going on in your world? What was happening? What made you say, hey, I got to do stuff? Was it gun to your head from the boss? Was it the infrastructure? Was it everything? I mean, what was the pressure points that made you guys look at quickly accelerating into the future? The rationale behind that is we have a strong relationship with many, one of main vendors like Microsoft, but it's a lot of technology, a lot of moving parts. So when you come to picking technology and integrating the technology, you can't really have all the team yourself and so you need help and you have to find a company that bring all those elements together and that's the reason we approach it. What did those guys do with you? Did they help you with the technology? Did they do the bake off? They help with understanding the, for our use cases, if you look at, you talk to any of the vendors on the big data and analytics, you may be confused because which one I pick and so we need to really understand our use case and pick the one that's, as you just said, future proof, cost effective and it can really do with the resources that we have and I would say probably we're in the same boat with many companies and we have to augment our capacity by going to a consultant that have experience with these topics. What was the big factor for deciding info objects? Vendor neutral, domain expertise? Yeah, I would say, I would say all of the both are mostly the domain expertise so I've been very impressed with the capabilities and especially Richie himself is a source of knowledge and that's the reason that we like that. So Richie, so what happens? Take us through how this all works out with customers. Tell us how they engage, how they engage with you and what is the relationship? What do you guys do? What do they do? What do you don't do? Take us through that. So it's interesting, in fact, I was talking to someone a few days back and I was telling that it looks that we are into the content business, right? On our website, three fourths of the hits are about the kind of content we have about Spark, different Spark recipes and now my book is coming out in less than a month, right? So what's attracting the future prospects and the future customers is the content, the content which we are making it available on our website, the content we are making available through CUBE and other channels, right? So that's kind of makes them interested that okay, this company definitely has a knowledge and then they reach out to us and then engage and then obviously the kind of depth of knowledge as well as breadth of experience, right? I think both helps. So I think the one advantage of engaging a consulting company, even as compared to hiring full-time staff is this, at a consulting company the problem you are trying to solve, right? They've already solved or they're trying to solve with five or 10 other clients, right? So the overall time gets accelerated, right? In some cases it can be half the time, in some cases it could be one tent of the time, sometimes it can be one fourth of the time, but that extra, that additional mileage you get and a lot of companies are not into the business of building software anyway, right? I mean, the company like Rockwell, one of the biggest industrial automation company in the world, right? They are into the business of industrial automation, right? They are not into the business of making how to make Spark the most optimizer, how to make Hadoop the most optimizer. They just want results. Exactly. They don't want, so you want, they want you to pick the tech. Exactly. They don't want, you guys want results. You have business objectives, right? Exactly. So we have a market to serve and if we are worried about building the queries and Hadoop or putting everything together, it will take a lot longer to, and technology is moving so quickly, so something that you master today, probably obsolete. So where are you with the relationship? You guys deployed software, you guys working? The products out there, the outcomes coming in? Did you get what you wanted? What are the processes evaluating what is the best technology for our use cases? We're in the industrial IoT business, is that the Rockwell, remote monitoring IoT and from the data ingestion to the platform that does the analytics and the number crunching, that's what the, we leveraged. So is this a solution up and running right now? We have parts and pieces up and running, so what the infogic expertise bring to the table is the expertise on the big data analytics that we don't have. This is the internet of things. It's the internet of things. The industrial is the internet of things. GE, well that's GE, they're trying to co-opt that, but General Electric also has the same mindset. Rishi, explain what's going on for the world out there. It's a hot buzzword, internet of things, industrial as well, running equipment, this is a hot area. What's going on? What are these guys doing? What did you guys do to help with the key technologies? What was the building blocks? Because it's, there's some stuff out there, but if you pick something that's not baked, it could come back to bite you. So how did you make that selection of technology? So it's interesting because the internet of things, the whole IoT, actually that's the upstream system to big data, right? So as you see more and more movement in IoT, you will see multiple times movement in big data, because IoT is actually creating data. And at present, there are a lot of sources of big data, but the biggest source is going to be the sensor data, because IoT, if you talk about most of the IoT is about the sensor data, right? And that's the reason a lot of technologies, even in big data space, for example, Kafka, right? Kafka is a technology in which you can do real-time streaming, right? Now, because IoT is picking up, the sensor data is picking up, so clients are showing a special interest in Kafka, right? Which was not needed in the whole batch-oriented world, maybe a couple of years back, right, sir? All right, well, what's next? What's next for this relationship? Where do you go next? Obviously, you got to get things deployed, got to get the results in, and then it sounds like there'll be more work. How do you expand? What's the relationship go from there? What do you guys do next? Yeah, what we do next is we have some approval concept that we need to execute to select the right technology. Sometimes it's even difficult for us to figure out what our use cases are, because when you just said the future proof, sometimes you don't know your use case in the future, but you have to plan as much as possible for that. That's hard. I'm picking technology, really, and big data, you're making a big commitment because if you invest in this vendor, and you don't know if that vendor will be available in the future, correct? So, being as open-source, as open-minded as possible, but getting the job done. So, that's the reason, it's a good- Well, these guys- Well, these guys will be around. They got a good business. And they got smart people. So, I mean, you guys are going to do good. I'm in for a lot, which is solid, but- No, I mean, in public, what technology do you pick? Yeah, yeah. Like, look at Storm, we just talked about it earlier. Storm was hot, now, spark. So, you know, I mean, things are happening. Some things can come and go like flavors. MapReduce is a thing of the past. Yeah, yeah. So, we just talked about Databricks earlier, and they run on Hadoop, but maybe that's all cloud, now what if they want to go on-prem? You know, so interesting dynamics, right? So, okay, so Rishi, you got to get your take on this, because we've been looking back past couple of years. What is your message to the folks watching? As Hadoop, Merv, Adrienne, just from Gardner just said, and I agree, this is going mainstream right now. So, the language of mainstream is not speeds and feeds, or MapReduce or Spark. It's, I have to automate this analytic process and create a product that we're selling and or value. So, the value equation is huge. Time to value becomes the number one criteria. So, what's your take on this? How would you share, being in the inside of the industry, the folks watching, that's this new transition of the industry? Yes, so two years back, or even up to one years back, we had to explain to our clients that why do they need a data lake or enterprise data, how as cloud data likes to call it. Now, that problem is solved when prospects come to us. They say that, Rishi, I understand that I need to have one big data lake in the company. Now, tell me that I have the disparate data sources, right? Can I get the right connectors for it, right? So, the first part is already there. So, that'll write more software. Right, or use already built connectors, right? So, that problem is solved, right? Now, the next problem they have is, they say that can I do all compute on this one platform? I mean, is it what it's been promised to do, right? I mean, especially with Spark, right? And, or maybe Cloudera Impala for distributed query processing, right? I mean, can I really, whatever is advertised that I have this one big data lake and I can get all the analytics from there? Is it really possible? Some of them say, what about my visualization tools, right? And the kind of visualization which I can do with the OLTP sources, can I get the same visualization, right? Can I get the same real-time performance? And can you also make sure that my batch jobs, they will not get affected by it, right? So, all these kind of issues, but the question, the point is that no convincing is needed now, right? Now, clients are already convinced. Now, they are saying that tell me that whatever I want can really be done or not, right? Because we need somebody who has done it in the past. So, there's definitely, you agree, consultative help right now is mandatory. Two, we also hear about reference architectures. What are you hearing on that? Is there certain reference architectures that help people get the data lake going? Is there certain things that work well that you've seen out there? Is there certain playbooks? So, playbooks have been more client-need-specific. I don't think the whole big data thing has evolved to a stage where you know that this building block will go here and that will go. You can't cookie-cut or a boiler plate it over. It's pretty much you got to go in and do some assessment. Yeah, you have to do some assessment based on their needs. And as I said, and now if we talk to different vendors, I mean, they would say that my playbook is the right playbook, right? And that's where we as a vendor-neutral partner come to the picture, that we look at the client's needs and we say, you know what, probably this much you can use with a pure open source and maybe this piece you can pick from that one, this vendor and that piece you can pick from that vendor and that is going to make your solution perfect. Okay, guys, I really appreciate you taking the time to come on theCUBE, give you the final word. Tell us what's going on in your world, prediction here for this industry this next year. I think, I mean, in our case, as I said, Spark has been a blessing and Spark along with HDFS, I think that's going to be continuing in the trend. The other thing which I see is that we covered it in the last interview, the old word ETL versus the machine learning. Now I see more and more use cases of machine learning coming up. Spark has really evolved a lot. It's machine learning libraries become richer and richer. But interestingly, now, clients are talking about machine learning because mostly whenever clients talk about it they talk about SQL. And they're still talking about SQL 90% of the time, but the 10% of the time they're talking about machine learning, which is an interesting trend because it means they are using these technologies not just to do the old workflows, but they are also doing it for finding new insights, which is the promise of big data technologies. Juan, what's your assessment and prediction for the future of this industry? No, that's a tough question. I can only say for my perspective from my company, Rockwell Automation, all these technologies are relevant to what we do. And I think it's relevant to every company. Essentially, we are in the industrial IoT. And some of these terms are kind of overloaded. I guess, what is analytics? What is big data? It's a number of things that you can do with analytics and different tools. And I think we're investing on learning more and more how we can differentiate ourselves bringing value to our customers by leveraging tools. And maybe no one vendor has all the tools. So you have to mix and match multiple vendors. And thus, I see the strong relationship going with the consulting company. That's a mainstream what Murrah-Rajian just described as a mainstream need. I have needs, I don't care about the vendor. I got to mix and match whatever fits the bill. Exactly, and with the technology and the knowledge, I guess we can achieve those within the time. If we talk next year, I guess maybe we will accomplish some of those goals. We'll do an audit and see how things have gone for last year. Rishi Wong, thanks so much for coming on theCUBE. We appreciate sharing your insights and expertise. This is theCUBE Live at Hadoop Summit 2015. We'll be right back with our next guest after this short break.