 from San Jose in the heart of Silicon Valley. It's theCUBE covering Big Data SV 2016. Now your host, John Furrier and Jeff Frick. Okay, welcome back everyone. We are here live in Silicon Valley in San Jose at the Fairmont Hotel right across the street from the convention center where Strata Headdubs happening. It's part of our Big Data Week and Big Data SV. I'm John Furrier. This is theCUBE, our flagship program. We go out to the events and extract the signal from the noise. I'm joined by my co-host Jeff Frick and our next guest is Keeble and my Ryan Peterson who is the hot property on the market. Big startup, CTO. Hot startup called Data Republic, formerly with EMC you've been on before. CTO at EMC as EMC and Dell merged in or Dell bought EMC on private. All that stuff's happening. So even the front row of big data now doing the startup. So give us the update. What is Data Republic? Why did you leave EMC? What enticed you to jump ship into a growing startup, a round type funded startup? Well, before we get into that, let me just stop and say something. You guys have been fantastic. Through the entire process of transition, you've kept up with me. You guys are connected to everyone and it's been really, really fantastic and I appreciate everything you guys do. But let me tell you about Data Republic and you know, is that EMC and we talking about data being the most important commodity of the future and thought about how do you value data and how do you use that data for more value? Well, certainly everyone's been talking about the data lake and bringing everything into one place but we talk about everything and everything really is just your data. It's everything that you care about inside of your company and reality is that's a small slice of what a person's actually doing with you. You don't know what they're doing with everyone else and so that's what we sought out to change is we can get everybody else's data into your environment where you can start looking at the holistic view of a person. Now you've got real power in the process. So were you guys waffling between Data Republic and Data Empire? I mean bring the whole Star Wars theme. Let's just bring it together. I mean Republic, Empire, I mean, what's going on here? What is it? Republic or an Empire? The Republic's all about. Let's break down the Empire, the empirical control and let businesses get back their data. You know, frankly, we think a lot of data is starting to show up in a couple of really big key companies. It's all keep nameless, but everyone knows. You end up, if you're a business, you're working with one or two of these guys to gather your own information back. Well, at least I thought that was silly. If you're able to get your information and bring it into your environment, analyze it the way you want it. You shouldn't have to be giving your information to someone and then having to pay for it. Well, it's always interesting because I think some of these big companies that will go in name to start with G&M and other things like that, they figured out early on kind of the value exchange of service for the value of that data. And the value of that data exceeded, obviously, what it cost them to provide the service. And I don't think a lot of people really understood the value. And maybe the value as a solo person, it's not that much and it's really the value in aggregate. So how are you kind of wrestling that back? I'm sure a lot of people are very interested, A, to get it back, but two, to really start to get compensated or get value that's worth what they're actually providing. So we decided to go closer to the source. So instead of looking at somebody who's observing our behavior, we're looking at what actually happened. So the fact that you searched for an iPhone on the internet, it doesn't mean that you're necessarily gonna buy one. But the fact that you bought one means that you bought one. So I'm going to the actual banking data. I'm getting the transactional information and finding out exactly what people are doing, getting that kind of content. We're going to the merchandisers and saying, hey, I know what you bought because you bought it. And I'm figuring out how to do that in a very secure and private and regulated way. And we believe we've solved that biggest challenge. We spent two years building the company to the point where all it is is a legal framework. And we just launched it about a month ago out of stealth and we're already seeing profitability from it. It's amazing. So talk about the big trends that you guys are riding behind. And a specific question I wanna ask you since you're here and such an expert on the market is, what's the big story here at Strata? We're out hunting, we're fishing for stories. Obviously theCUBE, we're extracting the signal of the noise. There's a lot of noise still in this market. I mean, a lot of people kind of groping for success. We know a lot of companies are going sideways, but yet some are growing exponentially. So there's gonna be the haves and have-nots. And then you get the big guys coming in, the rich are getting richer, as Dave Vellante always says, with this kind of softening of the market. Yet there's underlying technology change. So with that, what's your view on the top story here in this show at Strata Hadoop? It may be a contentious conversation, but I actually feel like there's too many people who are doing the Me Too. And there's gonna be value in that. The Me Too's are gonna get to the point where they can be acquired. I think a lot of consolidation is based with the bigger organizations who start to swallow up the guys that have figured out something interesting. And we'll start seeing this show start to become just about the big guys over time, as well as, of course, the open source community, which will never go away. What I think is interesting is I'm not seeing a lot of very unique differentiated companies come out of the marketplace. I'm not seeing a lot on the floor of, hey, show me something that you're doing that no one else has done. So you think that rich are getting richer? Yeah, rich are getting richer. And I think that the startups are spinning out, and then ultimately they can spin back in. The Intel's got Oracle, you got Cisco here, you got the big whales, and you got Cloudera. Not yet public, and you got Hortonworks. All they're all here. So who's gonna break out? Who do you think's gonna be? What's category, I should say. You don't have to name companies. But like categorically, if you are gonna implement this data republic, you hope people don't go away. But you want certain actors to be in the market. Who are they? Certainly, I think that you've seen the majority of the fight be Cloudera and Hortonworks in the marketplace, MapR closely following behind. I think that's gonna shift a lot. There's obviously been a lot of ups and downs in the Hortonworks stock. There's been a lot of big announcements by Cloudera, some good and some bad. I think MapR has actually made quite a lot of push in the recent history from my view, at least. I think that those three are gonna continue to push forward. I think Pivotals, it looks like they've changed a lot of their methodology and direction. At the same time, I think that what they've done with ODPI is incredible. We're gonna see those four players, and as well, of course, IBM just doesn't go away. So data republic, it really implies data working together, data fusion, whatever word you wanna use, essentially an overlay, mix and match, alchemy, whatever the term is, so I wanna get your thoughts on a topic as you're so close to the action. You're really in that first party mode, right on the front lines. So a theme we've been kicking around in the journey of big data, which you're on, continuing on, and now it's a little bit hotter in your kitchen, because you're gonna start up, you know, a lot of pressure in the big company of EMC. But three areas I wanna get your thoughts on, completeness, integration into production, you know, invention, innovation of production, and three hybrid cloud. Those three areas. So about what that means to you in this industry scope of are we complete, what needs to be completed, where's this integration, which now is coming up as a core theme, and ultimately the role of hybrid cloud. So going back to what I was saying earlier, starting off with that completeness, so we believe that the majority of data is what you actually have about your customer, and then survey data that you buy from someone else, and we think that that's pretty thin survey data. So you get deep information on what you know about your customer, and then you get thin information about everything else. We think that filling in those gaps are gonna be with things like banking transactions, merchandise information, all the other collective data about a particular individual, or a particular company, whatever it might be, and that's gonna complete the actual data set. So completeness on the progress bar of completeness halfway there, not complete, we're certainly incomplete, if you think about it. I think that the most successful organizations of building a data lake and have all of their data in there are still only 10% the way they are. They only have a small, very small slice of the data that they need to truly be successful. So that's why I really wanted to dig into data sharing and data exchange, and how do you get information from their companies. It's really, really, so we're incomplete on a road to completion. Okay, now integration. This seems to be the new barriers to entry that some startups can't get out of the hurdle on because this seems to be enterprises saying, okay, we've done some data laking stuff, we've done some to-do, but I gotta integrate in with a lot of other operational systems. Your thoughts on that. Yeah, I think part of the problem is that there's so many different tools to choose from, and there hasn't been a lot of consolidation yet, that'll get better as we start to see tools combined with each other, and so the bigger organizations bringing a higher class of product than the smaller guys can actually create. I think that the idea that governance has been a real problem of data. We're seeing great companies emerge, like Waterline and Zoloni, trying to solve these challenges of governance and cataloging. Part of the problem has been it's just another place to put the data. We've extracted it from the data warehouse, so we've now put it in a data lake. Now what is the same thing I had before, just a bunch of flat files that I have to figure out how to work with. You gotta use those. It's gotta be usable, right? It's gotta be usable, it's gotta be searchable, it's gotta be findable, and I think that the next step is really to integrate truly, you've gotta get to the point where you can find all that data, search through it, and then also find data sources outside. Okay, hybrid cloud. Just the compute engine role of it, is it relevant, irrelevant, non-factor, or just compute or deployment factor? Yeah, I think the companies that push hybrid cloud think that it's very relevant and very important, and I think that to some extent, there's truth to that. There's certain things you don't want to see outside of your organization, physical organization, and there's some cases you want stuff in the cloud and people are saying, well, hey, let's try and keep both because the people who are losing on either side want to say, well, it's better to work together, right? So hybrid cloud, I think, is a word to say, hey, let's try to keep our data where people care. Frankly, it's a bit of a challenge, and it's something I was working on quite heavily at EMC, was how do we make it so that you can analyze data that is geographically disparate? When you have data that's in a cloud and data that's on local, and you know that having that locality is really important to get to the point where you can search through it and run through it, how do you make that happen? Brian, I want to follow up on your point about, you said most of the data that you want is stuff you already have and the stuff that you don't have is thin. The counter would be that there's so much of it, does so much of it make up for the thinness or is your hypothesis that it's just so thin that even though there's so much and it's growing, that kind of this ancillary data, that it's really not as relevant as you're saying, if I understand you, kind of horse trading for the other transactional data with other people that have the richer set. Yeah, I think the challenge is that people are going through and they're looking at their data and the CEO's asking a question and the CEO's have been trained to not ask a question that they don't think could be answered and that's a real problem, I think. The reality is that they're starting to ask harder questions or to quote Cladar or Michelson who said this, ask bigger questions. I think that those bigger, harder questions are things that you have to be able to answer with other people's data. So if you have an airline who says, how much am I making compared to, how much of the spend do I have? Well until you go to other airlines to figure out how much spend they have and you really don't know, you can't just go ask your competitor how much money are you making, but you can certainly find out by looking at the transactions and credit cards, what the percentage of utilization is on a particular place. So getting that information answers that question more specifically more accurately than any other place that's been before. So those are the kinds of harder questions we're trying to get to and we think that we've found that answer for being able to ask for that data quickly, easily, efficiently. Today people do this, happens all the time. You get two partners that work together, they exchange some information, they sometimes do it completely out of compliance. We're trying to solve all those problems and get some please. And is it peer to peer? Is it within a group? Is it buying to the network? Kind of what's the business model or what's kind of the transactional model in terms of the actual sharing of the data? So our business is to take it from being peer to peer which it is now. I imagine somebody working at the legal contract that could take a year for someone to finally get to the point where they work out one deal with one partner to make it a start to apology. Everyone comes into the center environment, they all agree to a standard terms and conditions and that's what makes it available for everyone to use. One of the things that we've been talking about on theCUBE, the great thing about going to theCUBE we all have different events and talk about different topics but we were just at Oracle Cloud World in DC and it was interesting and we're getting geeking out on some product managers about the notion of how the infrastructure certainly changed and we always talk about that but specifically the DMZ. The DMZ was a nice area, the demilitarized zone and IT was the area where you would have extra nets and you would do some of these exchange concepts and we saw B2B exchanges, all kinds of technologies around web services enabled that but with security kind of now perimeter-less, there's no more perimeter, but there's no more DMZ, it's been gone for years now. So you guys are doing something interesting with an exchange of data, so that's cool. How does all this fit in? Is it an enabler for you? Is it a technical challenge? Is it a condition that allows you to get a position in there? Because the notion of a DMZ was to create a safe zone. Yeah, so we are the DMZ to some extent. You're the new DMZ. We're the new DMZ. For data. Yeah, we're the place where data comes in, data flows out and we figure out how to handle all of the process in the middle, so whether it's the financial transaction, which is probably the lowest level component, how much are you going to pay me for the data? Which by the way, and to the ancillary to that, imagine in the future people start seeing the biggest number on their P&L, the biggest number on their balance sheet be data, which I believe is going to be a future for us. But in any case. So it's a capital item, it's going to be a not oil, it's going to be actual. They've been making money on selling data for a long time, right? That was kind of the head of the curve and they didn't have a lot of competition. Data Republic's all about bring that back to the users, bring it back to the people who own that data and can actually make money off of their own content as opposed to companies like that or just like that buying data, reselling it for a profit, they can sell their own profit. All right, so what is, I mean, we've heard these stories in the past. They've been grandiose and a lot of them have flamed out and a lot of it's timing. So what makes you think that the timing now is good for this not to flame up? Because it's very democratizing, oh, we're going to democratize data. You know, you smoke the peace pipe all day long but the reality is what's going to happen with that? I mean, do you think the time is now and why? Well, I think the first problem we had to solve was the data, like getting all the content into a single place, cataloged and find a place where you know what the data looks like. The next challenge becomes, how do you value the data? How do you know what it's worth? And until there's an open marketplace for data exchange where people can start to figure out exactly what the financial value is in economics of supply and demand, you won't know how much data is worth. And so we figured out that if we don't start doing that process in a shared way where we can start to manage and look at that kind of information, we won't ever be able to tell somebody how much their data is really valued at. So is that your goal to help people quantify what the data's worth? Yeah, I think that's one of the goals. For customers or end users. We believe that the biggest goal is to get data, more data into their environment so they can learn more. The ancillary goal to that is of course they'll make money off of that and put it on their P&L. But the long term is we want to be able to tell people how much their companies might be valued at. I mean, look, it's a great example. Caesar's entertainment, bankruptcy, big problems, sold their data set, billion dollars. Had they known that their data was worth over a billion dollars, would they've even had to go to bankruptcy in the first place? Those kinds of issues are the kinds of things we want to extract early before it becomes a problem, right? Where can I find a billion dollars to get the company going? Well, I've got a billion dollars that I can go sell right now. So, Ryan, I want to get your perspective. You're at a big company for a long time. Now you're at a little company, and we see it over and over at all the shows we go to. The big guys are carrying their freight. It's the little guys are really where a lot of the cutting edge innovation happens. From your perspective where you sit now, and you just said a little earlier that you're not seeing a lot of original things here at this particular show, right? What would you tell people that are out there trying to find opportunities to come at this problem, additives to the problem, a slightly different twist on the solution from an entrepreneurial point of view? Are there still a lot of great opportunities out there? How should they look at the problem? And where do you see some of the things that obviously aren't directly competitive to what you guys are trying to do, but really still some greenfield opportunities even though we've been coming to the show since 2009? Yeah, I think you have to, it's hard to quote Steve Jobs, but I think he's got that great quote out there about you have to build what people need, not what they ask for. I think part of the challenge is a lot of companies just look at what are people asking for, what's the market analysis, and they tend to do this superficial view of things. You really have to look at what could completely change the world, and I think I've told you guys in the past, I'm all about let's change the world, let's do something, and change it for good, but how do we make it better, faster, easier, and I think there's so many opportunities out there, and we're seeing some really great innovation. I just don't think we're seeing as much in big data as we're seeing in the rest of the technology world. Apps are coming out every single day with something, and I just saw an app where you can now rent a car by simply walking off the plane, and there's somebody parked their car over there, and you can go, you said, I mean, those are the kinds of things that I think we're just going to change things. It changes the way that the world operates. Well, Ryan, great to see you, great to have you on theCUBE, Greg, thanks for coming in and sharing your insight here on our CUBE Insights segment here, the Ryan Peterson entrepreneur now, CTO of Data Republic, congratulations, good luck with your success, and we'll be following you guys as theCUBE, extracting the signal from the noise from Big Data Week and Big Data SV, and Strata Hadoop, we'll be right back with more after this short break.