 Okay, we're back here live at the Stratoconference O'Reilly Media's Stratoconference in Silicon Valley in Santa Clara, California. This is the epicenter of big data innovation, startups, businesses, growth, a lot of competition, a lot of news happening. We're here on the ground covering at SiliconANGLE.com and wikibon.org. I'm John Furrier, the founder of SiliconANGLE.com. Join with my co-host. I'm Dave Vellante, wikibon.org, and we're here with Ed Dumbbell, who's a longtime CUBE guest principal analyst at O'Reilly Radar. And of course, for Stratoconference, one of the co-chairs of the conference. Ed, welcome back, always good to see you. Thanks, it's great to be here. So we know you're super busy, we'll get right to it. A lot of action quickly. What are you seeing? What's the vibe? What are the quick highlights from the top of your head right now, and then we'll do a drill down? Sure, well, as I said at the beginning of the show, you know, the death of big data, as referenced in that venture beat article, I'm sure you read, has been greatly exaggerated. And you know, the fashionable company is wearing a Hadoop distribution this year. Big data's dying, but yet the big money players are actually investing millions into growing the market. That's interesting, right? Absolutely, you know, the reality of it is that we're just getting started and getting down to business and finding out how to make this stuff sustainable and how to make it really work. So we talked with Roger yesterday at O'Reilly, and we talked about essentially a concept, we just kicked out a term, applied big data, and Greg Sands, who's a VC, just brought a post on it today about it. But this, we're at the beginning of now, the applications are coming in, analytics are out there, Mike Olson said a while ago, oh, it's application, but they really haven't come on. So analytics really was the only killer app and the platforms are being developed. So what are you seeing here at the show? And talk about some of the keynotes this morning, and as you put the program together, how do you make a decision between all the greatness of what you can showcase? Well, you know one of the key things for me is transferrable knowledge. So we really do give a lot of preference to people who can walk the wall as well as talk the talk. And I think Gail Garten from LinkedIn and the keynotes this morning really showed how they use a data-driven approach to their product development. And she was very insightful, had a lot of hard information about what she did. So that to me was just in a nutshell, exactly what we're looking for. So Intel brought up this user experience angle, and then when they talked about their distribution on theCUBE, and a lot of people tend not to think about that, they think about the speeds and feeds, but what LinkedIn I think points out, and what you're essentially alluding to is that there's a huge user experience component that needs to be in the design side of things. Can you comment on your point of view of that and then how companies are thinking about big data in terms of a user experience, whether it's data as code to applications? Yeah, it is absolutely critical. That's one reason we renamed our UX and Vistrac as design this year, because we think it's not just about the pretty graphic that comes out of the end, but also about the design of how you collect the data, how you iterate and work with the data. And frankly, we're at a very early stage for even understanding that right now. But I had this thing about really data being the lifeblood of a digital nervous system that we're building out because tech is everywhere now. We're kind of post this phase of instrumenting paper activities, and now we're in the situation where companies are producing 100% digital solutions in their business. And so with the data being kind of this blood that goes through, there are different parts of the body and it runs everywhere. So every time it touches people, it really matters how that's right. You have to have a good heart and don't have a heart attack. Don't have the clogged arteries. No, big data, that's a good analogy. Big data is the blood, it has nutrients, it can feed a value proposition. What we're doing is building a model, frankly. It's a big data about building a model of the world. And we start with small things like users, what they're doing on mobile apps and so on, or maybe your customers. But that really is the task which we're about to model in the world with data systems. And so the world is about people. Therefore, it's obvious that the people factors should be very important to people. You talked this morning about some of the hard problems. Last couple years we've been doing some of the cool stuff and now we're really starting to attack some of the harder problems. We were at the startup showcase last night and we were talking to some folks from Berkeley and they said, our problems we can't find problems hard enough to solve. So of course we threw at them the speed of light and they explained how they solved that algorithmically and put us in our place. So talk about, maybe elaborate a little bit on what you were speaking of this morning and maybe give us some examples of some of the problems that this industry is trying to solve. Yeah, actually I don't think that the hard problems are necessarily the algorithmically hard problems. I think they're the organizationally hard problems. How you use data in an organization and make decisions around it. But if you want to go down to the technical level and the trends we'll see. It's great. So you've got a bunch of big data. Where did it come from? What have you done to it since? Who's allowed access to it? Is Dave the same as Dave two or Dave three? All these things that the BI and data warehousing world made some effort towards. They all still apply in big data. And if you're going to start using it for real you need to be able to manage it and track it. And they're some of the problems that the data systems need to grow up and people need to understand how to use those too. I spoke to somebody recently building out an architecture for big data and they're very large enterprise and they're saying, if you don't accompany your data with descriptions of what the fields mean, where the data come from, who touched it. It's not getting in to HDFS. We're not going to put it in. So you need to, they're the hard problems, right? How do you do it so you get best value beyond the pilot? Being able to automate that in a way that can scale. So there's also, Ed, obviously we're big in the design. In fact, yesterday we were talking about the data science piece talking about the art and science of data science. There's an artistic and creative side as well as the geeky side. But let's not forget, there's two other issues going on with security. Tim just tweeted five minutes ago, Tim O'Reilly expects some black hat. That's data scientists soon. Joe Turian also made a comment on that as well. So there's this notion of, hey, you know, society's changing, bad actors. And Tim said that on theCUBE here. But we also just had Peter Wang on from Continuity and Analytics, which brings a physics mindset, the machine learning or I would say the instrumentation. So there's some science involved, high performance computing. So it ranges from the sexiness of design, user experience, but down to the alpha geeks of math, science, everything and everything in between. Right, so I mean, where are we? What's one side floating up? I mean, how do you view the marketplace right now? I mean, you know, you look at the range of what we've got in the conference. That's pretty much my mind, right? That we have to include all these things, that we have to look out for the bad actors. You know, it's funny people think data crime might be a new thing. What is spam, right? Spam is data crafted to get through a system and infect us in adverse ways and spamming email is something we all deal with. But imagine this is like that problem, but a thousand times greater. Evil genius, I just tweeted back to Tim. I said, you know, it brings back the notion of the word evil genius. And in order to fight the bad actors, we have to actually think like evil geniuses to combat that. So, you know, it's just evolution. And but with that regard, what technology are you seeing here? Obviously, besides the competitiveness thing, I mean obviously the big money players are coming in that shows the growth of the market. Big data is not dead. Venture beats really wet on that and really should rethink their editorial. But more importantly, there's tech involves, right? So you have real innovation going on, disruptive technology around databases, tools. What are you seeing outside of the competitive thing around the tech? Well, I think one of the things that is going to be important that will come out this year is an agreed data architecture, right? So great, we have this tech, we have fast streaming data in through Storm and so on online analytics. We also have Hadoop and whatnot. Where do these pieces fit together? It's a really important problem. And so it's kind of a technical achievement to produce something that isn't just a horizontal layer, you know, that, you know, I joke, whatever people release this show, we'll be doing it as a favor because next show, everyone else will be able to say, well, I'm a 10 times faster than that, right? 400. Yeah, we just had it. The app was just on, just in show last year for SQL Hadoop. Now EMC comes out, so again, to your point. Exactly, you know, Impala was faster than Hadoop, now we're faster than Impala. But that's really just a side show and these guys need to be working harder at building up vertical applications that people can put to use. So some of the technology is architectural in that sense. That said, I think there's great scope for companies who can apply machine learning in a way that doesn't need too many experts to really automate more of the analysis and the decision-making process. You see Eric Coulson's keynote this morning talking about Stitch Fix and the way they use machine learning and analytics to still involve humans, but make it so they have to weigh less of the grunt work and they can bring a personal touch where the machine has done most of the things. Yeah, he's coming on later. That was like IBM Watson for apparel. You know, that's such a great story, that's not bad. You know, when Tim was on the queue, we had a great conversation and we'd love to talk, as he around, if he comes on, we wanted to have him back on, but one of the things we talked about, I think last year was that the user shouldn't have to work as hard. And one of the things that was really innovative about Web 1.0 was with Search Engine, especially Google in particular, they reduced the time it takes to do things. They reduced the steps and made it elegant and simple and reduced the steps to do something, make it simple and elegant and fast. And you win. So users shouldn't have to be entering in data. So we're moving to a world of adaptiveness. Absolutely. And this is why you need to think about it vertically, right? Why just coming up with a faster map reduced now or something isn't really helping people. Frankly, I don't think it's a great business model. I think you need to look at the problem and work it back. And if you're in any doubt that the problem, you know, really affects things all the way down, you should listen to Facebook. I was listening to a talk by Jay Parikh who's their VP of infrastructure. They talk about their photos, right? You think photos go up on Facebook? Well, it turns out, you know, after a couple of hours, most of them never get looked again, they just go through the stream and then they get old. So Facebook have got these data centers where everything's replicated seven times on instant, you know, high power availability for the photos. 80%, 90% of that never get looked again, it's a total waste. So they had to architect low power racks for these, you know, what they call cold photos. They even built data centers. They built a data center just for cold photos, right? And that's all software driven. If you think the software you write doesn't matter all the way down to the buildings you're building, you're wrong. So this stuff, you need to look up and down, not just at one level. Well, this brings up a theme we brought up yesterday in the show here is performance, right? So, you know, how they do it is because they're scale, they got to do their own, they're doing open compute on the hardware side. So Facebook is really a good indicator, but we're talking about performance. I mean, that's one thing that Intel, I give them props for is they come out and say, hey, you know, we want to increase the performance and security. And so that's their approach. Yeah, that's a great announcement. You know, I think it validates for people exactly what we're doing here. Tell you what it reminds me of, you know, back in the day, you had compilers and there's free software people will use GCC. But before that, it was the chip manufacturers that were making the compilers. And if you see Hadoop as kind of the compiler of the big data space, it's not surprising that Intel would want to come out with that. IBM made a similar play with their power series about the chip helps and the fact of the matter is it's got a lot of un-chip cash. So all these kind of in-memory stuff can make much faster. Oh, Dave and I were talking last night after the show, we were commenting around the growth curve of this marketplace and it feels so explosive but you're always in a market where how much bigger can it really get? So we're really debating on where we are on the growth curve. If it's going exponential and going straight up, are we in the bottom of the curve? Because right now it feels like it's going up but we don't even know where we are on the bottom of the curve. So I want to ask you, where do you think we are on the curve? Are we, I mean, it feels early. You know, we're reporting that it's early. We're saying top of the second inning. But what's your take on that? You know, I totally buy the top of the second inning thing. And that's why I talked about architecture being very important. We don't have an accepted general big data architecture right now. And that's going to be one of the things that will really unlock growth where people know where to put it. The second thing is when people know a bit more about how to use it and their business and their organizational structure. So I agree, top of the second. One of the things we brought up with Intel and this is Dave and I were trying to tease this out and we're trying to get our arms around it but it's not so much Intel. First of all, it's validation. I agree with that 100% there. But it's not Intel competing with other people. What they're doing is they have interests in the data center. They have the interest in the internet of things. These are all instrumentable devices that have data. So it's not so much big data is about analytics. It's a huge range of use cases. And you know, if you think the battles between Hadoop distributions, that's not really what big data is about or where the opportunity is. It's frankly, I'm a little bored of the bickering between the Hadoop guys and I'll say it to their face if they come up to me. It doesn't look good. And it's not really helping the problem that we're trying to solve. So, great, more of the merit at this level. I'm bored about Hadoop battles. Let's look at the next level up. Well, that's why the Intel announcement, I mean, you talk about the internet of things. I mean, there's a whole new security model needs to be put in place. Well, value creation, we were teasing out today saying, hey, we had Green Plum on a particular, well, we'll name names. We say, hey, look at, you know, be aggressive, put a stake in the ground. But remember, you're in the data warehousing space. It's a bigger world out there. And the point is, there is a lot of value creation on the table. There's not, you don't have to fight for scraps in a growing market. Well, to Ed's point about business value too, and we, ironically, we keep on, we size the market and we still got a little vendor revenues. That's fine, but so much more value is being created by the practitioners than is being created by the vendors, which is, you know, we asked Michael Dell about that at Dell World in the company. He said, that's always been that way. But in a way, in our industry, it has it. I mean, Microsoft's, you know, valuations, Cisco's, et cetera. You don't see those kind of massive valuations exploding in more, okay, Apple notwithstanding, but so much more value is being created, you know, within the end user community. So it's interesting, I think there are three places that it's really good to be in this market. First one is down on the metal side, right? Yeah, right. Cloud are making metal, making chips, great. You know, the second one is at the data side. If you data owner, then you can really turn that into business value. The third, I think, where there's value, and this is the difficult one to tease out, is around the interaction of data with people. So this is the analytic tools, the group ware tools. I was saying, one of the most important things that we're creating is an agile data platform. And the best of breed I've seen is enterprises building right now. They're building something that enables their staff to get on and do their job without asking permission, without going around a three month BI cycle to get a new report up, without having to requisition kit from IT. And so that's where the users are really going in there, building the agile stuff out. One thing I... You talked about that this morning, starting off with smart data and agile data, where the two sort of examples that you gave, that's right out. For businesses, they want to make the smart use of the data, not just reports, they need to understand it's a lot more to have big data than reports. And for the IT side of the house, you need to be creating infrastructure that enables your colleagues to be agile. Yeah, one thing we find, we obviously we track the IT market pretty aggressively, and it's evolving from decades of consolidation, cost reduction now to an enablement model where growth is on the table. And so there's a lot of transitions going on, but we were joking this morning, we were on a tweet chat with IBM throughout the notion of data as code. And then we just said, hey, what do developers do with unstructured data? Because data quality came up. And we were talking about, there's no one data mart anymore that you can have a data mart that evolves into multiple data marts, integrate third party data marts, so the data mashups. So that was one concept. The other one was shadow IT has really enabled some innovation, even though it's been kind of don't ask, don't tell, you're seeing enterprises saying, hey, let's do the innovation, it's been the R and D, okay, let's adopt that. So we talked about shadow data. So the data equation is different. I want to get your point of view on this because the question is, companies have to store the data. There's also compliance issues and legal issues. So there seems to be a shadow data market emerging. And what do we should take on this? Well, this is really the hard problem I was referring to in my keynote remarks. Finding the data, keeping track of it, making sure people don't do too much duplication. I think there's this kind of, as you say, this iterative cooperation between the IT organization and the rest of the business needs to be in play. They shut it down, they're going to shut down innovation. They can't do that. But as vendors, there's a huge opportunity for the market to really take advantage of some web technologies. What is the web, right? But a large shadow data network where you don't know where everything is, there's things multiple places. All that stuff, you can turn inside. You can use search, great. And here's my controversial bet, all the stuff that's been built up with the semantic web technologies over the last 10 years, and people go, ah, that's a failed project. No, it's a 20-year project, we're year 10. And a lot of that stuff about cataloging and organizing things is going to be very important. It enables that, you're right. You've seen it with Google now, the knowledge graph, they know it's important. Someone who figures out how that can work in a scalable way in enterprise, it's going to be a very happy person. Well, we're trying to do our chair on that with our little tools that we're building. But again, add your thought leader, final comment here on the show. Obviously doing a great job. You guys run, this is an amazing show and certainly it's going to have a lot of legs. Final comment, what are you expecting for the rest of the day, the next two days? What are you looking forward to? How much sleep have you had? You know, have you checked email at all? You know, I'm just very happy. These three days make it all worth it, you know, like throwing a party. I'm just impressed people's energy and their imagination. I love the data stack spot I pub into the expo hall. And it's just great to see everybody coming together and recognizing this is a place where work can be done. And Dunnville program chair here at Strata. He runs the concert with Alistair Crawl. Fourth show, I believe, right? This is the fourth one. You know what, this is our third seven part. It's our seventh Strata all together. Seventh Strata. We're looking forward to doing a memoir with you guys, great show. Stay here on siliconangle.com for all the exclusive coverage with O'Reilly media. This is theCUBE. We'll be right back with our next guest for the short break.