 Okay, we're back here live inside theCUBE. We are live in Boston, Massachusetts. This is SiliconANGLE and Wikibon's theCUBE, our flagship program. We go out to the events, extract the signal from the noise. Here's the HP Vertica end user conference. The hashtag is HP Big Data 2013. This is HP Vertica's their big data stories. It's not a vendor promoting themselves with product and wares. It's really, you know, they're doing it through their customers. This is a customer event, very intimate conversation with their top customers. A lot of heavy weights here doing some really good work. Getting into the trenches. And we're here to extract the signal from the noise and provide commentary and share with you what we've learned. And some stories we can share with you because we're under strict kind of an NDA not to talk about until they get through their quiet period and we'll respect that. But when Suzanne's done, we'll be going crazy on that. SiliconANGLE.com and Wikibon.org. I'm John Furrier, the founder of SiliconANGLE. I'm George, my co-host. I'm Dave Vellante of Wikibon.org. Kurt Monash is here. Kurt and I just came off a panel. Kurt Monash is a longtime industry analyst, blogger, opinion maker. Check out his blog, dbms2.com and other resources to Google Kurt Monash with a C. Kurt, welcome to theCUBE. It's great to see you. We get the absolute best get. Yeah, you do need a headset. So, I'm just going to keep talking while John puts the headset on. So Kurt and I were just on the panel with Vinny Mishindani. And hosted by Chris Sand, we were covering big data, you know, the big data of the term is it BS or is it real? We're going to talk with you about privacy issues. We're going to get your opinions on the evolution of the database business. BI, but again, welcome to theCUBE. Yeah. Thanks for having me. So you're well known in the industry and I like to refer to you as a man who likes to cut through the BS, go right to the meat of the story. You've got a great, respectful reputation. You're not afraid to say what's on your mind and obviously you're not influenced and that's well known. And you get a great readership certainly in the space and congratulations on that. So I got to ask you, I got to ask you on the panel with some controversy update. I mean, what do you think about all these end users here? I mean, obviously HP is trying to speak through their customers. That's not a lot of grants thing on the HP, pushing the product. They're trying to work with their customers. What's your take here at HP? I'm not totally sure what you're asking. I didn't attend the sessions. What do you think about the topic of big data relative to Vertica and their opportunity, their openness, the ability to create this software enablement on top of it and haven the prospects of say haven, for instance. Okay, so I hate the term big data because one of Monash's laws of commercial semantics is bad jargon drives out good. And since big data is hot, everyone uses a term, whatever they could possibly mean by it. I'd like to have different terms for each of the famous V's or maybe a combined volume and velocity and the variety variability be something else. So that said, there are a lot of things that are called big data that have a lot of reality to them. Certainly the size of the databases is large and getting larger. And if we need to scale out today, we will always need to scale out because on the machine generated data side, the sensors that generate more data get cheap just as quickly as the systems that could hold more data on a single node. I think the dynamic schema thing is really important. For decades, we've had the so-called TEDCOD guarantee in the relational system, which is you can do whatever app you want against the same relational database. And so the apps have been loosely coupled to the database, but the data structures or the different apps have been tightly coupled by the DBA. And that's not always a good system if you wanted to develop something quickly. The other way around is you have the, for what app you have the data, the app tightly structured, you pay your technical debt later on. And there's just so many reasons for schemas to change. You do M&A, you're acquiring new things. You get new apps, you're doing new things. You try a new marketing campaign, you're doing new things. You put a few new sensors or you replace sensors with a new model to generate some more data. You're doing new things. You're doing analytics where you derive data along the way and putting it back at the database. And then you discover what is useful. You're doing new things. So trying to stay in a tight schema can be quite painful. So help me understand what you just said about that being tightly coupled to the database. So take, for example, a workday. My understanding is that app is tightly coupled to that, it's database. First of all, is that correct? And are you suggesting that there's trade-offs down the road? Workday has, as a structure, as a database architecture, it's really completely different from what most people would recommend for an in-house enterprise app. I mean, they just take a bunch of job objects and serialize them to basically a key value store pretty much, although they use MySQL to do that for historical reasons. And then they also have a few things like payroll details that they throw into huge tables in the traditional way. So talk a little bit about privacy. This is something that is a hot button of yours right now. You've been spending a lot of time thinking about it and you're not the type of person that just throws out ideas without having thought about them for a while. So you've been ruminating on privacy for a while. What's your current thinking? Want to talk about prism a little bit as well. Get your thoughts there. I mean, I think it's hugely important. And, you know, governments have guns, they could kill everybody. We have rules in place that normally dissuade them from killing everybody. But, you know, we have to have those rules and we would live in a tyranny. I think surveillance is almost on that level of seriousness. You take a look at what's being tracked. You know, every transaction is tracked by the credit cards. Our physical location is tracked by the license plate cameras and our mobile devices in the credit card. Our communications are tracked. That's been the big revelation of prism over the past couple of months is that, you know, I call them the Snowden revelations because prism is just one particular program is that, yes, that's real. And, you know, the implications are, you know, extremely dangerous. Obviously, you can have any kind of, you know, government doing whatever they want, having the information to have the one to the citizen. You could have discrimination by employers and insurers and credit granders and so on. And our rules as the society, whether in the US or in other countries, are not yet adequate for controlling that. And the direction of attempts to control it aren't going that well because controlling information to flow alone won't get the job done because you have the anti-terrorism argument we need everything in the hands of the terrorism fighters. So the government will have all the information and therefore we need more rules as to how the information is used. And we need to prevent actual abuses and we need to prevent them so well that we don't have a chilling effect on the exercise of ordinary freedom that you shouldn't be afraid to go rock climbing for fear that it's going to make you look like a risk taker and not get hired by a conservative organization 10 years down the road or only pay cash because you don't want to pay a credit card to admit that you do recreational rock climbing. That would be way too far. And as an industry, we have to show leadership and educating the lawmakers and regulators because there is tough technology. And it's so early too. I mean, one of the things that I commented on about that whole privacy thing was obviously it's still early post 9-11. Look at what the whole purpose of all the surveillance was and it's always been going on. The question is how early in the innovation of the data analysis and the kind of new algorithms that are put in place with machine learning or other computer science approaches that are relatively new. So you hear about that in the big data space. To me, the question always is, yeah, we don't want lobbyists controlling the government which they do now. That's even to me worse than what we're seeing in the privacy debate. The guys who are actually going to be regulating privacy with the safe prism and the effects of the Snowden leaks or whistleblowing or trader that he is, that some people say, all three have been called. The question is, how do you balance the innovation that needs to happen here in the U.S., not just for federal, but for commercial? I mean, Dave brought this up on the panel about the chief data officer. Great theory, a new CIO-like title, but is that going to be e-discovery? Is that going to be more restrictions? And I just think it's too early, Dave and Kurt, and I want to get your comments on that. When do you say, okay, something has risen up to the point of maturity to be looked at? That's beyond innovation. Putting in the new laws and new procedures is a very difficult problem. And it's going to take years. So we better start right away or continue right away. Right now, the best political action for avoiding, tyranny through surveillance is just inhibiting the exchange and storage of information. That's a very crude tool. This downside, I mean, HIPAA from the 1990s is tremendous downside because it interfered with health research and we could interfere with business or other things. So we need to get started as fast as possible on doing it right. But if what you were implying was that the analytics are much cruder than they will be at some point in the future, I absolutely agree with you. I mean, these uses of graphs are just people like looking at the two hop or the three hop graphs or whatever and guessing what patterns might be relevant is really no sophisticated analytics there. Statistics isn't that far past the linear regression except for a few machine learning things. And I think there's a new generation of startups which are trying to change that in a profound way. We know the classic use of SAS or SPSS or even R is really not that interesting. It's a few basic algorithms, a lot of variants on it. It's a very stultified procedures. We're getting the work done. So yeah, the analysis is going to get a lot more capable and therefore the privacy threat is a lot more deadline. So have you been on this privacy kick for a while? Because you mentioned credit card information and I really haven't thought about it that deeply but I feel like if the government is tracking the metadata around who I call, I'm less concerned about that than I am that Equifax is selling my information. So am I being naive about that or have you always felt like this has been a problem or is it because of the massive amounts of data that are now hitting us that this is going to become? One of the least prescient things I ever wrote was November 2008. Oh good, Obama is president. He's going to fix this problem that has been emerging. So that didn't work out very well from the surveillance standpoint. I'm a good Democrat and in other ways I am pleased with what happened in the elections but the surveillance side has been a disaster as far as... Is that clear what he knows now that we don't know that he's privy to? That is where I completely concede that the government needs all the information can get for narrow use cases. A, they probably do and B, even if they don't we're never going to win that battle. So who cares? Let us just assume that for narrow use cases the government has access to everything and to something that they trade off. If they were doing a better job of snooping on our residents maybe the Suneov brothers would have been stopped and we wouldn't have had 9,000 parabilitary cops running around shutting down Boston. There are trade-offs. The TSA Alliance are ridiculous and they did a better job of snooping on people. Maybe we wouldn't suffer so much at the airport. There are legitimate reasons for the government to defend us against the worst threats. We just have to very narrowly constrain the use of that information. I don't even want to use the murder trial. Certainly I don't want to give it to the DEA that's already going to Falker. Yeah, it's a trade-off. I want to change gears a little bit and go into the world of Hadoop. We had George Kedifan and we talked about why HP doesn't have their own distribution of Hadoop. And he was clear, hey, we support open standards. We do not want to see fragmentation. I then went on my little diet tribe how Linux was forced by their Solaris, HP UX and AIX and IBM's Unix main software to kind of consolidate. But Hadoop didn't have that same thread of some looming monster incumbent. But now fragmentation seems to rear its head in some people's minds that fragmentation's the enemy, not unification. So will Hadoop unify? Obviously you've been tracking, you wrote a blog post about Hortonworks and the change, leadership and some other updates. What's your take on the Hadoop ecosystem? What's your view on how this could track and what are the upsides, downsides and what do you think might happen? Well, I mean, remember that Clodara still is a huge share of the commercial market. You know, Hortonworks is number two, but it's largely through the help of Teradata and Microsoft. And Microsoft is the company that way back in the 1970s convinced the world that proprietary software paid for was a good thing. And we shouldn't assume all software was open source. So they have the rah-rah, everything should be open source. Companies are being pushed by Microsoft is a little odd. But Hortonworks is clearly doing good work. IBM has this loyal, wears blue underwear customer base and buys whatever IBM puts out, but there's really no good reason for IBM to have its own disk rather than that they feel like it. And they can have their own anything. And what's Intel's motivation? Just they don't want to pick a pony, it's too early or they really want to have their own disk to embed into some sort of intelligent edge or? I mean, everybody wants, I mean, Intel may just be trying, well, okay, so first of all, there are a lot of rumors that Intel was going to buy Hortonworks and so on or offered $700 million and was trying to buy other things. And I found them pretty credible. Those rumors pretty credible, widespread rumors. So maybe they're just looking to diversify. But secondly, Hadoop is very immature in many ways. So the decision to accept the latest and greatest and less mature technology is a bit individualistic. So for example, in June of last year, Cloudera shipped elements of what is called Hadoop 2. Hortonworks still hasn't, because they're waiting for the whole thing to be baked. But those elements that they shipped, a lot of them developed by their own engineers were good things, like a major speed up to HDFS, Duke Distribution File System, and this big name node will fail over that everybody cares about. They were shipping that then, even though the true essence of Hadoop 2, the new engines, execution engines, still aren't ready for prime time. And that was, I think the right choice, as I blotted this week, and I think everyone should agree it was a responsible choice, whether or not you think it was right. So I think that's what's driving it. You know, there are certain features that should be in the main trunk and certain companies get impatient and say we'll take responsibility for getting ahead of the generally released stuff. Will that cost fragmentation? People won't wait to get impatient, kind of shoot the starting line a little bit and try to force it into their stacks, or is that a good thing, bad thing, indifferent? I mean, I don't think it's a big problem. A lot of that's, you know, the fragmentation's on the management side, in some sense of management, or the performance side, rather than the actual programming APIs. You know, as of a year ago, the only way to really talk to Hadoop was through MapReduce 1, or 14 months ago, let's say. And that's that. So fragmentation below that doesn't strike me as a huge problem. Now, would I suggest somebody hook up with MapR, because they think MapR performs better this year? Probably not. Because, yeah. Clodaro's a safer bet. And is Map, what are you hearing about MapR? What's the latest on the? I'm not hearing much. Yeah, you've written that. You've written that there's not a lot of action going on. I mean, I certainly know of more than one company that felt MapR had enough users that they needed to support MapR. But that seems to have come in large part from the early burst when EMC was pushing MapR, which they no longer are. Yeah, and they got Pivotal, which, what is that all about? I mean, Green Plum Pivoting, again, seems to be kind of trying to find their home centroid, if you will, in the world. Yeah, they've thrown a lot of resources at it. They say a lot of things in the sales process that may or may not be entirely accurate. Parish forbid that salesmen should say that. And I think they've been on the call, which was pretty dramatic along those lines. But, you know, there is- Like a number of contributors to their op, their adieu. Yeah, but, you know, there is a problem, which is that EMC would like to sell you expensive storage, and I'm not really sure why storage should be expensive. OpenStack and Swift and all that or other alternatives, I mean, and I'm not sure there's enough value in there to pay for the expensive storage. In Pivotal or OpenStack? In EMC's core business. Oh, EMC, yeah, exactly. What you're taking- And basically, they're religious for, they're saying, we are more, EMC is strategic to you, therefore you should buy our database software and our Hadoop distro, and why should you, I mean, I think, you know, Oracle is more strategic to them. I think having the best Hadoop APIs that support is more strategic to them. Well, I think if anything, EMC is strategic to the customer. I meant to the customer. No, I'm saying- I'm saying, if they get into a strategic war, say, pick one user to follow Oracle or EMC, they're more locked into Oracle. Well, but the better example might be Oracle or VMware, because it's not EMC's storage container that is strategic to the customer, but VMware probably has a stronger case than EMC's spinning roster. And VMware is awesomeness for legacy applications. When, yeah, I think every enterprise will have a, you know, do most of the computing on a small number of clusters. I think one of them will be a VMware cluster. But that doesn't mean you need to do your, your analytic database management on VMware. No, I'm just arguing- That's not what I was built for. From a Pivotal's operating leverage standpoint, they'd probably get more from the VMware juice than they would the EMC's container. Nonetheless, they have a bunch of EMC salesmen around. That's true. That's true. Saying that, you know, a large fraction of your data counted by byte will be an HDFS, and you should store data in EMC, and that's a synergy. But- Basically, we're just saying it's a weak story, and we're phrasing it differently, but we're agreeing that it's a weak story. I guess so, I think, although I like EMC because they reinvented themselves many, many times, and I actually think Joe Tucci realizes that, you know, the end is near, whatever near is, you know, mid-term to long term. Well, they got to ride that horse in the sunset, but the storage is not going down. We were at EMC World. Symmetrics was supposed to be dying, you know, 10 quarters ago, and still growing, you know. So it's like, there's going to be a need for massive storage. How much and where it sits, relative to commodity, hardware, and software. I mean, a lot of the acquisition of HDFS, Hadoop and HDFS has been a big, big bucket, which now is called Data Lake and Reservoir, another wet metaphor. I hate that term, Data Lake, I hate it. It's an ocean. It's an ocean. It's a freaking ocean data, you know. And the whole point is that it's a cheap way to do it. Yeah, yeah. So that's kind of antithetical to EMC. So one thing we've been hearing here, we've been hearing it since day one, we are here, is that a lot of the guys, and we met with the guy, Looker, a startup that was funded by an architect, he used to be at Netscape. And then he didn't have to do any more startups. He's about my age, in late 40s, early 50s. And he says, hey, you know what? I want to do a BI software that, because I don't want to, I now can build software without the constraints of the BS that was involved because of the slowness of access to the data. So with Vertica, which we'll build on top of, he doesn't have to deal with that anymore. He's writing specific software taking out those old software layers. So that kind of brings a mindset question. I want to ask you is that the mindset of these new developers and entrepreneurs, you talk to a lot of startups, you get briefed a lot, you get a different mindset of these new software developers. What are you seeing, and is that similar? Do you hear similar stories about guys coding differently, stripping away some of the older models of business intelligence? You get platform, you just got massive funding, you get Looker's going to announce a round of funding. So what are you hearing from software developers, entrepreneurs, folks writing new software, not retrofitting legacy? Is there a mindset shift? Is there certain things they look for in vendors and technology? Well, it's no one thing. I mean, they're addressing a lot of new needs, they're taking a lot of new opportunities and the company claims to be taking every opportunity at once. I am very, very, very skeptical. So there are many ways to reinvent BI that are worthy. There is the do a tableau or click view style interface, do it on different scales of database, which is basically the platform story. Do it with somewhat more friendliness to data and messy schemas. Clear stories in stealth mode, but there's very much the clear stories. No one's seen that, clear stories. Have you seen any of their code or product yet? Oh yeah, you have? Yeah, and in there, I mean, that is focused, yes. I mean, that is focused on data, marketplace, the third party, stuff like that, and therefore focused on the problems that that orientation causes, but yeah, that's focused, so it's combining data from multiple data sources you don't entirely control, it can give you a heartburn. Yeah, and you mentioned on-stage schema changes. What's your take on that, what did you mean by that? You mentioned that, come on, I feel the question would be like, there's a huge issue around the multiple dynamic schema changes. Yeah, but that's a lot of that, so should we be talking to the guys? No problem. So I mean, yeah, Mongo is hot, and to a lesser extent, but still admirable, so are Cassandra on eight Cassandra stage base. And at this point that that's less about the pure ability to scale, because the new SQL guys, my SQL have caught up to some extent. And it's just about that, you don't have to specify the schema before you write code. Now, why is that a good thing? Part of it is laziness, there are some very good developers who graduate with computer science degree and they know how to write a DBMS and they don't know how to write a SQL query. So some of it is just that, but there also is the fact there's also just general flexibility. Yeah, it handles any data, right? I mean, that's what you mean by the flexibility piece, it's easy to integrate different data types or? Yeah, I mean data types is a technical term, so I wouldn't use that word. What's the technicality correct? But the sense of what you were saying, absolutely. But it's sort of that the structure in which you want to accommodate data changes quickly. So if you're doing a 360 degree customer view and you're pulling in data from a lot of different systems and have a system of your quasi record for just that purpose, then you actually, that's your use case for Mongo inside the enterprise and it has nothing to do with some sort of a website or anything like that. If you have a product catalog where very disparate products that you keep adding to your cell phone company and you have service plans, you also have electronic devices, you're selling, you know, sort of accommodating that with simple classical relational schema is sort of annoying. I've already referred to many other cases where you can buy data from many sources. This is not so much of a Mongo but a Hadoop dynamic schema use case or adapt into the relational case is you're doing analysis, you derive some data, you enhance some data because then next week you do a better piece of analysis and you slightly change the data you want to have stored in your database. Those emails can change very fast. Kurt, thanks for coming inside theCUBE. I know you got to catch a car and we got to get our next guest on. Thanks for coming inside theCUBE. Great Monash research, go to his blog, search for Kurt Monash, a variety of different technology, newsletters, sites, blogs, well followed in the industry. Thanks for coming on theCUBE, sharing your knowledge. This is SiliconANGLE keep on theCUBE here at the H.P. Broderick End User Conference. We'll be right back with our next guest after this short break.