 From San Jose in the heart of Silicon Valley, it's theCUBE covering Big Data SV 2016. Now your hosts, John Furrier and Peter Burris. Hello everyone, welcome back to Silicon Angles theCUBE, our flagship program. We go out to the events and extract the signal from the noise. I'm John Furrier, my co-host Peter Burris. Our next guest is Rishi Yadavu, who's the CEO of InfoOp, it's great to have on because he's not only CEO of a really awesome company, he's also on the front lines, he's almost like an expert analyst to come on and share his thoughts on the industry because he's so up in the front lines. Welcome back to theCUBE, CUBE alumni, good to see you. Good to be back here on theCUBE. So last night we had an interesting event, you were at our event and Peter was doing some new research and we were talking after. The market is really robust right now with Big Data but the Hadoop Strata O'Reilly show is not getting as much traction as we were saying and I can see from the TweetStream that day one certainly got a pop but really the engagement seems to be down a little bit. Is that a function of Hadoop or is that a function of the event? What's your thoughts? I think Strata has been the oldest show so and I think the whole Big Data thing has kind of expanded or moved on. I mean the Big Data has become a fast data. One thing which I noticed in the show was that there are more companies about machine learning than anything else. Two, three years back, every other company was building some kind of a database. SQL on Hadoop was a big deal. Now SQL on Hadoop is a standard, I mean you need to have it. The theme which I see on the ground is streaming, the real time streaming and I didn't see a lot of companies about that but the reason could be that the streaming has kind of become a standard now but I saw a disproportionate number of machine learning companies which was interesting to see. And you get the machine learning and last night the forester analyst I was on the panel with Peter Burris was talking about we need algorithms to police the algorithm because you might not know what's going on so there are some key technology navels that are pretty hot right now. You mentioned machine learning but also the business stories are big so seeing threads of I need more machine learning some more under the hood advantages when the spark certainly has still got the momentum there but then the conversation shifts as EMC's Bill Schmarzer was on who's talks to customers all the time it's the outcomes again back to outcomes so you're seeing the thread between some really key stuff going on under the hood and then the business outcomes as the core themes. What are your thoughts on that? I mean I mentioned machine learning anything else that you're watching with spark and what are some of the customer conversations that you're having? Yeah so you're right. So I think it's become rather than technology it has become more of a use case oriented. In fact most of the customers when they talk to us yes Hadoop is there, spark is there, Kafka is there but most of the time when they come to us they talk about the integration right? So integration is the underlying theme which is not, which is always underrated here. I mean nobody talks about integration but most of the work which we are getting is about integrating different sources whether it's IoT sources or the enterprise sources or anything else. Let's talk about that because the integration challenges on conceptually sometimes seems very simple and the industry's gotten really good at moving data around. But there are a lot of challenges for still defining formats, defining the tooling, how much it's gonna cost, how long it's gonna take. So when you talk about integration what are some of the new considerations that people have to factor into the decisions about what they do? Yeah so the first thing about integration is and going back to the Hadoop and I'll get back to that that why Hadoop itself is losing relevance. But it started with all the unstructured data and the stories around that. But most of the data in enterprises is structured data. So when you're integrating data, most of the sources you have are the structured relational sources from where data is coming. And they all had their own siloed approaches how to pull data from them. And now with Kafka for example is doing pretty good Kafka, has something called Kafka Connect which came out a month back in which you can ingest directly from the relational sources using the JDBC connector. So connecting mostly with the relational sources is becoming the theme and then the real-time streaming with other event data which is coming from multiple sources. And those connectors, that's really talk about the integration piece. I mean that's a big thing again that we're hearing. I mean Peter was talking about the integration piece on an opening day here on our keynote our opening segment which was that path to digital business really has nothing to do with technology. And that was one of the questions I thought was awesome last night when you asked what do you worry about the technology or something else? And everyone was like I worry about something else not so much the technology. Or at least they said it last night that they worry maybe 50% of about the technology but that it's moving up. They're increasingly worrying about the business objective that they're trying to serve and the people that they need to work with to serve that objective. So the integration becomes a, there's certainly, there's always going to be a technology component but there also needs to be some policies, some governance. We have to take the business activities and the business insights and what the business needs and drive that into the different challenges or different approaches to thinking about integration and that's more than just moving something over the wire. So again, as you think about the integration challenges it's moving from relational out but are there some new things as people try to machine learning is clearly people are trying to turn it into a killer app. Are we seeing some new integration challenges beyond just moving the data around as we try to get at that application value? So I would say we are looking at the old integration challenges and what I mean by old means not old in the Hadoop space but in the other enterprise technologies. Security, for example, governance is a big deal now. So security and governance which people did not care about in Hadoop five years back now they have become front and center. Because so those are one of the biggest integration challenges. It's not just moving the data around. I mean moving the data is good. You have to move data at the low latency. You have to clean the data before moving but again that cleaning data has been there from last 20, 30 years. Nothing new about that. So that's an interesting signal then. So would you say then that that signal means that when you start getting those checkpoints if you will or but speed bumps, whatever you want to call it means it's coming more into the enterprise radar because those are more table stakes for the enterprise. Oh, I think that's exactly, what have we learned from data warehousing that can now be applied to some of the things that we're doing with big data and governance and those types of issues are really crucial. So how is that starting a factor in your customer's thinking? No, absolutely, you guys are right. So Hadoop has become rather than being siloed it has become mainstream now. So all the concern as it has security and governance and other cross cutting concerns which apply to every enterprise application which have always applied to every enterprise application but never applied to Hadoop because that was more of a siloed sideline project which people are doing. And now they are applying there. And at the same time, Hadoop itself is becoming less relevant but because when you're connecting with the sources okay, you want to do stuff in real time then you need Kafka and Spark. You want to store data. Yes, Hadoop is good to store the data but you know what probably will go to Microsoft Azure or will go to S3, right? So Spark has cut the head of Hadoop and S3 and Azure they have cut the legs off. So there's nothing left except the name, right? So let's talk about that because that is really a big statement because I mean, you've been invested in Spark early. So one, I want to get the update on Spark. Then I want to come back to the leg chopping off aspect of the cloud. So first, give us the update on Spark because this is really a dynamic. You don't need to have Hadoop to run Spark. That's a misconception that IBM certainly cleared up this week. So Spark, what's the update? So Spark streaming is becoming more and more mainstream now. That is front and center. From last one year, all the new business we are getting that is about integrating with real time or near real time to be honest, streaming sources. So and Spark, mostly because of Flink and the market push. Spark has paid a lot of attention to streaming. So streaming has become really, really important there. So streaming at the same time, you still want to have access to the SQL sources. So SQL on Hadoop, you don't even talk about it anymore because that's a given. You have to have it. Whether it's streaming or machine learning or everywhere else, that access from SQL has to be there. Yeah, and I validated that last night on a tweet. I threw out a tweet out of our quick board on CrowdChat to some targeted people. And they came back. Absolutely unequivocally, SQL's not dying, going away. That's how you get attention. Say something's going to be dead and you get 10 responses. But that's a good point. So with respect to the Hadoop getting kind of squeezed, if you will. So you mentioned the legs of being chopped up by some of the big guys like Azure and others. What do you mean by that? Is that the cloud storage and the cloud data stores are replacing Hadoop? And take us through that. Yeah, so how do the compute and the storage part, compute part, was taken away by Spark? So the only part of the Torso which is left is the yarn piece. So yarn more of as a resource negotiator to manage multiple compute resources. That has remained there. But storage, yes, you can use Hadoop for storage, but the public cloud providers and their storage technologies. Microsoft is coming with this. And the impact of Hadoop is what I'm trying to cut pieces together, so I'm just trying to, and I get Spark, I get Spark done. What's the other piece of it? Just the cloud guys? Yeah, cloud guys are doing the storage, right? So you don't need SDFS for storage when cloud guys are already taking care of it. Microsoft has invested big time into their Microsoft data lake solution which they're coming up with. S3 has been a standard for ages. So the visuals really, it heads off and the legs are off, you got a Torso. That's what you're basically. And the Torso's yarn? Yes. Yeah, and that's a really powerful way of describing it. But are we going to, so we have kind of metadata for the resource management and way of applying that metadata. What about some of the management or administration interfaces that are going to be necessary for the data officers, the CIOs who are responsible for knowing where the data is, who controls it? Is that also going to be, is that going to be more associated with Spark or how's that going to be incorporated into this whole integration challenge? Yeah, so the data management and the governance piece, I think that piece still needs a lot more work to be done there, right? Cloudera has their navigator, other vendors also have their own tools. So that is the piece on which I think there will be a lot of work which is going to be done in the next one year. So the word Hadoop still means something. I want to get you to talk. So we're going to be at Hadoop Summit with the Hortonworks show coming up. I also hear Strato at Hadoop World, which is the Cloudera show. What's the future for these guys? I mean, you got Cloudera and Hortonworks. Where do they all fit in? How do you see that those guys kind of settling in? Is there, I mean, Cloudera just got valuation clipped down by fidelity investments. So I see they're looking, they're not the million dollar valuation anymore. So what's the take in the Hortonworks public? You see their numbers. What's your take on the Cloudera versus Hadoop? I mean, Hortonworks. So they do, I mean that's- It's Cloudera, I mean, they're good friends. And Cloudera, they have huge first-mover advantage and I think that's going to remain there. I think Hortonworks and MAPPAR, they do not have that advantage. So it's going to be slightly more challenging for them. MAPPAR is kind of moving away from Hadoop. They are like, you know, we are more of a, more data management platform. I think I heard somebody saying that compare us to Splunk as opposed to comparing us to Cloudera or- So they're already kind of seeing the positioning shift. They're jumping to an area that's going to be safe for them. And what is that? That's basically Hadoop plus something else. And we saw Hortonworks has the data platform and they have this new emerging products group, which is seems compelling to me. I mean, so you said they're making their moves basically. Yeah, so I mean, I haven't followed Hortonworks for the last few months. But I think the challenge with going too much open source is that then how do you make money? So I think Cloudera has been able to maintain a fine balance there. But I think profitability is still a challenge for, I think it's going to be the main challenge for all of these vendors. Nothing's ever free. That's what Jerry Held was saying and that's the key. All right, your thoughts on the next step for you guys. What's going on with InfoObject? Share with us some of the things you're working on in the business. Yeah, so I think for us, streaming is what we have been focusing on from last one year. I think streaming is going to remain a theme for 2016. As Intel comes up with their 3D cross point towards the end of the year, I think the whole non-volatile RAM. So again, first taking the hard disk was taken off. Now, I think once the 3D cross point comes, the flash will also be taken off, right? So it's just going to remain the memory chip and that's going to become the primary storage. So in fact, if three things or three commandments I have for this year, they are streaming first approach, streaming only approach and in memory storage, right? So streaming first means that no matter what type of data you have, that is going to be streamed first in. Streaming only approaches that you don't need Lambda architecture or anything. All the data will be streamed. First data is streamed and you'll figure out what to do with it. And while you're figuring out, it doesn't have to go to Hadoop or anywhere else. The data will be stored in memory. Well, so, but there are still some limits to streaming. So as you think about what your clients are going to ask you to do with Spark, they're probably going to ask you to do things that Spark itself is not necessarily built for. How do you think you're going to end up adding value on top of that stack for clients as they try to solve challenges or applications that may require lower latency, the ability to act on single events, those types of things. What's the role of the services as we move this forward? It's got to be more than just implement Spark, right? Yeah, so, yeah. So I mean, the first thing which when clients come to us is they say that we want to lower the latency. Right presentable latency is a three minutes and we want to make it say 30 seconds or 10 seconds, right? And there, yes, Spark is one part of the play. I mean, data is coming from Kafka. All kind of disk IEO we have to reduce there or almost make it zero. That's why it goes in memory. Yeah, so that's why it goes in memory. So, but you're right. So it's not about technology anymore. It's more about the use case, right? How they can get the highest performance, highest throughput and lowest latency for their use case. And I think that's going to remain the theme. Rishi, I have final question I have for you is I want to just take in a concept around new way versus old way. You mentioned a few things in the interview year around, you know, that's the old way of doing, you know, auditing and integration and you know about streaming. I do bluesing relevance in the sense of the overall big picture, but finding it's one spot, integration, machine learning and streaming. I see key components you're seeing on Spark. You guys are in the front line. So I want to get to your thoughts. As you talk to your customers and prospects, you have a lot of candid conversations around the big picture. What's the pattern that you see if you were a machine learning algorithm? What would you say is the pattern around what the new way is, what they want to do versus the old way? In other words, what are some of the old things that are going to be retrofitted or thrown away or replaced? And what are the new ways that they want to do business with big data? Can you share any observations, anecdotes? Yeah, so I think one is that rather than technology-oriented approach, whether it's Kafka, Hadoop or Spark, I think it's going to be more solution-oriented approach. This is what even InfoObjects is focusing on. For example, one of the verticals which we are going to focus this year, big time on is manufacturing, right? So it's not that what Hadoop or Spark can do. It's like what InfoObjects can do for manufacturing industry, right? So I think that's going to be the new way, that it's not about what the technology can do or what your throughput is going to be. That's all okay, but for a given business problem or for a given industry, what we have done so far and what different things we can do. And what are some of the conversations around that thread? Is it integration? You mentioned that. Is it the technology? Is it architecture? Where is the customer in the progress of that journey of figuring out these new solutions? Is it like a new conversation? What are some of the specific conversations? So it always starts with integration and it's going to end with the predictive analytics and the predictive maintenance and those pieces, the machine learning piece. But we are still far from there because I think for maybe at least next two to three years it's going to be about integrating with multiple sources. I mean integration is a big deal. I mean when you have hundreds and thousands of sources. Well we've been saying on theCUBE the integration is the new barrier to entries for young companies and certainly big companies to maintain relevance in their enterprises because that's now the glue and there's a lot of work being done there. So great observation. I appreciate you, great stuff. As always, great to see you. Great perspective. It's like having an extra analyst on the set here. Appreciate it. You guys are doing a lot of great work. In fact, talk about streaming. We'll be streaming live at Hadoop Summit in Dublin next month. So watch, keep following theCUBE on Twitter, at theCUBE. This is theCUBE with more live coverage here in Silicon Valley after this short break.