 Live from the Fairmont Hotel in San Jose, California, it's theCUBE at Big Data SV 2015. Okay, welcome back everyone. This is our final wrap up of day three of Big Data SV, which is our event called Big Data Silicon Valley. And this is our companion event to our Big Data NYC, which is our event in New York City. And this is SiliconANGLE Media's event where we have specialized conversations about what's happening in Big Data on both coasts. New York City, we take a follow the money, follow the business logic, follow the key winners around going public, Wall Street, capital markets in New York City. And in Silicon Valley we cover the innovation, startups, the growth strategies, all the key things that make people successful that powers the engine of innovation and we're so pumped to do that and we want to thank our sponsors who let us do that and having our own event in conjunction, out in the open, inclusive with everyone is our sponsors, WANDISCO, Pivotal, IBM, Hortonworks, Syncsort, H2O, Trasata, InfoObjects, Plattfora, WANDISCO, Teradata, BMC. We want to thank our sponsors because we bring a lot of great conversations and we have great guests here and sharing that with you. And our headline sponsors EMC, I can't forget the headline sponsor, big logo there, EMC. We're introducing headline sponsors and we want to bring more content. We had both New York City and Silicon Valley, we had a great presentation from Jeff Kelly on his new research. We'll be doing more of this, introducing new free research and then bringing a panel of experts to extract the signal from the noise in a panel format in front of a live audience, stream live to the whole world for free and we're proud to do it and certainly throw a party for our community after. So Jeff, very exciting week here for Big Data SV. I am personally pumped and excited to associate myself with great content and people that we've interviewed. And again, New York City, same thing for the capital markets, for the public companies and whatnot. So, great event. What's your take? I mean, we're excited. Your new research was groundbreaking. I thought you had an amazing response from the crowd. The live stream is available on YouTube. Share with us your new research. What did you talk about and how did that play out over the course of Big Data SV? Sure, well, we talked about a few things. Couple of the themes that we've been following here during the CUBE broadcast as well was all about the enterprise. Making Hadoop Ready for the Enterprise was kind of what we've been hearing about leading up to the show. In this show, we started to kind of expand even beyond that to some extent with what are we gonna do with all that data? Big Data Analytics, Data Science, and really a lot of innovation's gonna come from the internet of things. In terms of, you know, we covered some of the news that happened this week. Obviously, the big news was the open data platform announcement from Pivotal, Hortonworks, IBM, EMC and others, really with the goal of kind of hardening and solidifying the Hadoop core so we can open up application development on top of that core and accelerate adoption in the enterprise. One of the themes in my presentation was around some of the market dynamics and how we're expecting consolidation in this market. There's so much pent-up demand from the enterprise. And while we have a lot of innovation, and if you go to the floor of Hadoop World strata, you'll see all these great startups out there and they're innovating great. But what startups do, they invent stuff and they're great at that. What they're not always as good at is kind of opening up their distribution channels and setting up some of the processes necessary to get that out into the enterprise. So we think there's going to be some consolidation. We think some of the big guys are going to actually do quite well in this market. They're not necessarily being disrupted by big data. It's an opportunity for them. So that was another theme that we covered this week. I mean, as you mentioned, your sponsors. How about the big three? Cloudera, Hortonworks, MAPR, obviously, those are the big three. When Disco and others are all changing their strategy or change their strategy. And doing very well. They're not doing their own distro. That was a few years ago. Now they're winning. That was, yeah, I think the, I don't think you're going to see any new Hadoop distributions anytime soon. In fact, you might see some go away. WAN Disco, good call by WAN Disco. Yeah, well they were smart. They got out of that business. You know, there's, I think there's room for one or two vendors, you know, in that Hadoop distribution business. I don't think there's a room for five or six. So I think those were some smart decisions companies like WAN Disco made. I mean, at one point we even had companies like Fujitsu had their own Hadoop distribution. So, you know, that's solidified around those three companies you just mentioned. Of course, Pivotal and IBM also have their own Hadoop distributions. But now part of the open data platform, they're not getting rid of those or keeping those if customers want them. But it seems to me like they are kind of sprinkling the holy water, if you will, on Hortonworks, the ODP team. Then you kind of got Cloudera and Intel in another camp. They're well capitalized. Certainly the big three are very well capitalized. You talk about that. Between the three of them, they've raised over $1.6 billion over the last couple of years. So they're well capitalized. Hortonworks has gone public. Cloudera is probably on pace to go public sometime later in the year. They made an announcement around, there's some of their revenues just a couple of days ago, which leads us to believe, now it's gonna be a little bit further out. You can't comment on revenues if you're about to file. So it'll be a little further out. But yeah, they're all very well capitalized. And as we talked about on the show, part of the reason for that is it takes a while to build this market because of the open source nature of it. On the one hand, the innovation happens very quickly in open source. We've seen that in the Hadoop space. It's hard to keep up with all the new projects and all the innovation around Spark and everything else that's coming out. And it's fantastic. The challenge though is building a business around all that can be challenging when you've got so much innovation happening. And so you need patience and consequently you need money. So that's one of the reasons we've seen so much money being raised. And I think the other thing is you've got a lot of startups who are, sorry, you've got the big guys who are playing in this market as well. So the startups need cash to fight the marketing battle. They've got to cut through the marketing machines of the big player. So that's a couple of reasons I think you're seeing so much money being raised in this market. Is it a bubble? You could say some of these companies might be a little bit overvalued, but I don't think it's a bubble like we've seen in years past. It's not like the internet bubble, I don't think. But the most exciting thing for me is we're really starting to get to the point where we're talking less about the infrastructure, less about the plumbing and more about the applications. We're getting there. It's a little bit frustrating. We'd like to see more of that happening, but I feel like we're really on the first place. Yeah, but you got to look at the industry. I mean, you're right on the money. I told your research, I thought was right spot on. But if you look at the big three, and this is why it's close to my heart because Cloudera, Hort, and Morrison map are really have shine and blossom as companies. I'm personally disappointed that Cloudera would not come on theCUBE. That company that I personally love and have a soft spot in my heart for it. I love that company, watching that company grow and Michael's and Amara would all great people that have enabled a lot of people around them to be successful. And the fact that O'Reilly will not let them come on theCUBE or whatever that rumor is, really disappoints me and I'm bummed about that. Hortonworks, we thank them for their sponsorship. It went public. I mean, I remember criticizing Hortonworks when they announced that they were coming out because we were such fanboys of Cloudera and they've, and I challenged them and I said, guys, play, do well. And they did, they did grew and they went public. And then map are sticking to their knitting, doing the boring stuff really well and making a boatload of money. I would say, I would even say, they're not just doing the boring stuff. Well, they're doing, I mean, they're really innovating as well because of the nature of their business. You know, they've taken a, you know, they've been dinged because of their more proprietary than some of the other open source focused big data companies. But the reality is they're building pretty, you know, high performance technology in this space, which you can do to some extent when if you're not relying on a larger community because you can just focus on it, build it out and get it out to your customers. So they're doing pretty well, you know, as well. I mean, it's gonna be a very challenging market for all three of them going forward. But the other thing is there's just huge opportunity. I mean, the TAM is, you can't even calculate it. And the innovations there, you know, on our panel, your research really highlighted the trends about the rich get richer and then there might be some depth in startups. But also, there's innovations still on the start. We talked to Ping Li and Frank Artali and they highlighted specifically, yeah, the grass is growing a little slow in some cases, in some places you got some blowout examples but they're still great innovation. The thing about this market that excites me is, you know, when you see a new market emerge, you get a lot of startups innovating and then when you see some of the big companies come in and maybe make some acquisitions, and their mandate then is to ramp it up and get it out to the enterprise and start making a lot of money on it. And unfortunately in some cases that's where the innovation stops. The great thing about this market is because of its open source nature, the innovation is gonna continue even as the big players come in. I mean, you know, IBM comes in, HP comes in, whoever the case, whoever the big whale is that comes in and makes some acquisitions, that's not gonna stop Facebook and Netflix and Google and the other web companies from innovating and open sourcing their new technologies and then there's gonna be new startups to commercialize that. So I see this innovation continuing and that's the exciting part about this market and there's still so much opportunity because of the application space where ultimately that's where the value comes from. Yeah, and also one of the things we had through the guest, the common theme was so our event, content event with your research and the panel and then the live audience went extremely well, the party was fantastic. The party was great, it was busy. I have Kim Stevenson was there, CIO of Intel, CUBE alumni, amazing person on the board of cloud era. She's been with us ever since the CUBE was here. We saw our CEOs, we saw our developers. It was really a great community, some great exchange of ideas. It wasn't a pub crawl kind of format, it was just more of a high end sharing ideas. It was really fun and then our guests were awesome this week. I thought the energy was intoxicating in terms of the content was so motivating and inspirational. You had entrepreneurs, you had multiple PhDs on. We had great insight from data scientists and of course we had a ton of crowd chats. I thought I was gonna OD on crowd chats this week but Jeff, great content. Anything stand out for you in the conference from a theme standpoint? What was the core theme that you saw percolate out of the CUBE this year? Like I said, I think the biggest theme that I heard over and over again is that the conversation is increasingly turning to analytics, data science, and then that kind of next step is operationalizing that to the applications. So we heard great examples from Bill Schmarzo, the dean of Big Data came on and talked about some healthcare examples that he's working with. We talked to some practitioners from Simply Hired talking about what they're doing with Big Data. It's not just about storing it, it's about analyzing data and delivering that in terms of data products. Halliburton, kind of a 100 plus year old company in the energy market listening to what they're doing with essentially the industrial internet data flowing off drilling equipment and seismic data. So to me that was the most exciting theme hearing about what people are actually doing to move the needle in terms of driving new revenue, solving some big societal problems and really doing stuff with all this data rather than just talking about the infrastructure. So to me that was the big theme that I really enjoyed talking about this week. So I gotta get the take from the guest perspective. Who do you think was the big sleeper out of our guests this week? The big sleeper, that's a good question. Let me think about that for a minute. Let me turn that back on you while I think it. Who do you think was the big sleeper? I mean, I thought Michelle Chambers was, for me personally, that interview was a good one because RapidMiner's a Boston based firm. Their venture back there from Germany and what was interesting is it's not your hipster product hunt driven venture backed company blocking and tackling with a new twist. They're doing some very interesting things, solving their problems using modern techniques of signaling and crowdsourcing. I thought that was phenomenal. That was one of my favorites. And the Google guy, Eric Schmidt, his name is Eric Schmidt, but it's not Eric Schmidt, the billionaire who was the CEO of Google. He's a product manager in the Dataflow cutting edge area and he validated my data ocean thesis. The Google is right on the money. He guy said data lake is batch. Data ocean is real time and streaming. A lot of concurrent changes going on. So data ocean is back on the table, Jeff. So we need to do a whole research report on this. If anyone's got a data ocean, it's Google. But do you buy the thesis? Batch is a lake, okay? Well, ocean I guess you can say because ocean has currents and it's moving. All right, I'm with the only analogy. Ecosystems. Yep, okay, okay, I can go with that. I think. Riding the waves of innovation. That's the ocean. I think if a lake is still and not evolving, I suppose, I think the ocean works very well. Look, I'm just going to say, ocean's the way to go. Ocean's the way to go. The lake is passé. We'll confer it, Wickey, but maybe we'll work that into our next research. EMC's got to change their foundation to the data ocean foundation. Part of it's semantics, but I think that the important thing is that people are talking about operationalizing. We're going to start a meme. Data ocean. Getting into the enterprise. Hashtag. Yeah, I mean, I'm just looking back at some of the guests we've had. I mean, I was impressed with Donna Perlich, who's the VP of product and solutions marketing at Pentaho, and now Pentaho is still pending in acquisition with Hidachi data systems is acquiring Pentaho. I thought she was really articulate about their approach kind of blending the data integration, essentially the streamline data factory, if you will, data pipeline, having visibility end to end from the ingestion through the analytics. I thought that was, I thought she was a great guest. You know, you reminded me that I was a big winner and was early in the day. On the first day was Rob Thomas at IBM with Beth Smith. IBM, I believe, has a great strategy and Warren Buffett's now buying more shares in, so he's believing it and I don't know if that's just more supporting the CEO or the company, but I see IBM as a winner, I'll tell you why. They have a huge legacy in large scale systems, obviously mainframe, huge database background. You know, Oracle always talks about IBM. That's the only company Oracle constantly talks about in terms of companies is IBM. IBM's background in databases gives them a unique insight and Rob Thomas and the team at IBM, I think they see the vision and they're on to this whole systems of engagement thing and no one here in this show, maybe a handful of people that I've talked to, really grok this whole systems of engagement concept out to the edge with internet of things. Systems of record I see, Jeff, everyone's got a good handle on data warehouse analytics, but the systems of engagement is going to be a very cutting edge and IBM is all over this, so I love that interview, I thought it was really, really diverse and broad and interesting. Yeah, some others just looking back at the schedule, sometimes you got to look back, we did so many interviews, I got to refresh my memory a little bit. I'm just going down day one. I love that one, I love that one. Sean Connolly and Leo Spiegel, who's the SVP at Pivotal, he answered the questions. I gave him straight Adam about the ODP, he answered them straight up, looked me in the eye, delivered the answer. Sean Connolly does his normal self and teased out the frameworks where the lock-in specs are. He opened the covers to the secrets of open source right here on theCUBE, that was awesome. Yeah, and I think looking back here, I think certainly Dr. Priyadarshi from Halliburton, data scientist, and Dr. Chang from Simply Hired. We heard some similar themes from them about building data science teams and it's very much a team sport, they're not really looking at finding that unicorn data science, they're looking at finding various data scientists to build a team approach. Those are two great interviews. Bill Schmarzo was my favorite, he actually became an interviewer and interviewed us. Yes, he did. He turned the tables on us. He turned the tables, so we just had, we was talking to Jim McHugh at Cisco last night and Bill Schmarzo, we might bring Cube alumni to be like guest anchors on theCUBE. You could have guest hosts. They'll be like ESPN, like take the pressure off us. And then the last one I'd point out today actually, talking to John Cardente from EMC, talking about the climate change project they're working on, the climate change data lake essentially or data ocean, sorry that they're creating to help deal with that kind of challenging environment if you will, no pun intended. So those are some of my favorite guests this week, but overall it was a great show. I don't think we had any bad guests. Well, just to summarize, it's a wrap for our big data SV and I want to just thank all of our sponsors, our headline sponsor EMC, Pivotal, IBM, Hortonworks, Syncsort, Wendisco, BMC, Pentaho, Triseta, H2O AI, Teradata, Informatica, InfoObjects and Platformer. Thanks for your support. We are going to bring you more events like this, big data in SV, Silicon Valley and New York City. We're off to Las Vegas for IBM Interconnect and go check out interconnectgo.com. That's a website powered by the CrowdChat platform and of course now CrowdChat's available for everybody. Go get some innovative CrowdChats this week. We are going to create the conversations, join the conversations and bring you what's happening in the moment at the events on the front lines. This is theCUBE, I want to thank the team, everyone here. I want to give a shout out to the team too. They did a great job this week and these guys got to get up early tomorrow and they're heading to Vegas, so great job guys. Appreciate all your work. That's a great team, guys in the ranch. Dave Vellante couldn't make it because it snowed in Boston and holding down the fort down there with Stu Miniman, Jeff Frick, Greg Stewart, the whole team. Guys, thanks for watching Arianna with the CrowdChats. Thanks so much guys, appreciate it. We'll be back off to Las Vegas. Stay tuned and watch theCUBE, SiliconANGLE.tv. Thanks for watching and see you at our next event next week.