 live inside the cube for Hadoop Summit 2012. This is the heart of big data in Silicon Valley. I'm excited to be here. I'm John Furrier, the founder of SiliconANGLE.com. This is theCUBE, our flagship telecast. We go out to the events, extract the signal from the noise. I'm here with Jeff Kelly, my co-host. Jeff, we have Val, Val on from NetApp. Val's an officer of the CTO at NetApp. He's been on theCUBE many times. Val, welcome back. Thanks, John. So you're a tech geek. You also were at SNW a few years ago, told us on theCUBE that early on, you've been following big data at Hadoop in particular, mainly because NetApp is in the storage business and you guys know tech, you're pretty deep in tech, but you've been tracking Hadoop for a long time from right when Cloud Air was formed through now. Yep. So first question is, what's your perspective right now in this market? I mean, besides the fact that it's out of control, growing like crazy, take us through your observations of where it's come, where it is right now. So I think the first thing we're all trying to figure out is where we are climbing the hype cycle. Have you reached the peak? Is the peak still even far? Is it a big peak for big data? Is it further up? Certainly not, Sherry, I'm not ready to call that one yet, but we're definitely climbing. A lot of energy, a lot of excitement, but I was reflecting actually last night on this very question. I was drafting up a quick blog in support of the press we did for this show. And the one word as mentioned to Jeff is maturity for me. I definitely see our core market remains still enterprises and enterprise IT departments in particular. And I was up on stage keynoting last year at this event and really I just, literally I looked at the video and said I come in peace. We had a very, very generic high level message. We had a lot of speculation about our value add to the Houdini ecosystem, particularly for enterprise deployments. And over the past year, what's really encouraging to me is not just the progress that that app has made, but our ecosystem, the one we know well, so Cisco, VMware in particular, Microsoft and so forth, have published some really solid proof points, really detailed reports on network utilization for Hadoop deployments, on storage utilization, on IO profiling throughout the stack from the app through the file system layer, hypervisor and VMware's case down through the storage. So there's just a lot of credible technical maturity in terms of research findings published. And if I'm in an enterprise IT right now, Hadoop is no longer something that's scary because it's unknown. It's becoming very, very well-defined. And the business anecdote, you must have heard this before. Last year, the business quote was, you know, what is Hadoop from executives? I'm hearing a lot about it. This year is, I don't want to be the last guy on the block, not taking advantage of Hadoop. Because of the competitive opportunity, the business. There's no threat and the opportunity now. So Adaptor Dye is always the monster we talk about here in Silicon Valley. Question for you around the existing marketplace that where technology has been enabler. NetApp has been a leader in storage and sand and all kinds of enablement for customers for years. Like, companies like HP and the server side, Cisco, you know, the old guys, right? If you look at, I was talking to someone at Intel last week, last, yesterday in New York, and interesting statistic. Facebook and Google make up 20% of PC shipments in the industry in terms of what they're buying, okay? Both have built their own traditional storage systems. And you know this because you're always one of these emerging clients, these hyperscale clients. I mean, those are two, you know, skew data points, but still 20% of the market. It's an indicator that the market's changing at least on the infrastructure side, on the storage side. Where data is not stopping the growth. So that's going to enable new class applications. So I'd love to get your perspective, both from as a tech CTO and also someone who works at NetApp as this evolves to mainstream. Obviously data warehouses are impacted, traditional storage architectures are attached. How do you, what do you see coming? Freight train, opportunity to jump on the train, head on collision. The first thing that you use very self-evident is the only constant is change here. So we certainly can't look in the rear view mirror or look through sort of old IT projections and estimates and analyst estimates as to where the market's going to go in the future because it's definitely shifting. But I don't know if I mentioned this to you before. We have three fairly distinct phases of evolution. We've seen any one of these markets. And the Facebooks, the Googles, the Apples, completely the Ahus, which is a big customer, completely apply. The first phase, particularly for these large web-scale deployments now, which are reality, they are business, is that flexibility phase. And that's the phase where you definitely need a traditional configuration that Enterprise IT is familiar with today. Standard hypervisor-based deployment, a shared virtual infrastructure for agility is really the key primary factor there. Flexibility, agility, really Trump's cost and Trump's scale at that point. And whether you're a startup or whether you're an Enterprise IT department, the next phase you inevitably go to is scale. Now, clearly in the high-profile example of the Facebooks and Googles of the world, scale at that phase becomes a number one driver of infrastructure. And they had to, by the way. Their app was growing and leaps and bounds. You eventually hit, and again, it doesn't last, but you eventually hit this insatiable demand phase where all you want to do is satisfy the demand. So scale is your number one IT priority at that point. But what I love about Facebook recently going public, despite all the brouhaha about the NASDAQ and all the issues with their trades and their symbol, is the fact that this is a lesson Apple teaches us over and over, this is a lesson any large public company, you know, T-Systems, another provider, is once you're a public company and you've got to report on your financials, they're scrutiny on your P's and L's. And efficiency, operational efficiency, and really balance-heat efficiency, OPEX efficiency rises to the forefront. And at that point, we actually see the need for shared virtual infrastructure management but at scale now. So it's the OPEX versus the flexibility that's driving the opportunity. And that is a real big focus of engineering for NetApp. Since we acquired e-series, we've been able to participate in the scale phase of all these evolutions. And now that we have e-series under our belts for about a year, I know my research team and certainly our development teams have been working on some really, really cool frameworks to combine a lot of our traditional shared virtualization flexibility value add with storage efficiency, but in a big data context. So all of these things at scale are where I think the information infrastructure market's headed. Let's talk about like the, now in the real world. Now Facebook is kind of, I don't view them as the real world because, you know, well, they're an example, but they're a skew in my mind, but still an important part to look at. A great outlier. A great outlier. I mean, a big outlier. But let's go back into the mainstream, a pharmaceutical company. We're doing some research in the Wikibon community around conversational indexing. So in the pharma vertical, we have data that shows that, this again, using Twitter as a proxy, and it might be bad data. Only 3% of the conversation is IT related. And so a lot of, in these environments, they have huge infrastructures, they run their business, they have NetApp, they probably have EMC and other stuff as well. They're looking at a big data mandate. So you work on both sides of the fence. You look at the Hadoop, you see under the covers early, you're watching the trends, but you also look on the other side of NetApp where you've got big customers. What's your advice to those big guys because that's ultimately going to be the store for the startups and the applications. How do you advise them, one, to prepare for big data? And what specific steps can they do now with their existing infrastructure? So I mean, the big guys have an inherent problem in that like any large organization, natural barriers due to distance and time zones and organizational boundaries get in the way of communicating. And what we find is with the larger customers, a year or two ago, what I had to tell our own sales force was, look outside of IT for that first big data project. And I had that three strike, right? Ask once, what is Hadoop? Ask twice, no, there's no Hadoop in my customer account. Ask a third time, oh yeah, there's a pilot somewhere outside of IT and it's already a million or two dollars in. So what that's evolved to right now is it's not- It's the vectoring in is not an IT. Always outside of IT, you know, always is a long time. For the next two or three years, this 15 year phase of evolution of the cycle, the disruption, it's going to start outside of IT. I've yet to see a single instance, including a net app where the money, the sponsorship and the business problem started inside IT. Some progressive IT departments step up to the fore and actually get ahead of the curve and help, but they're not the ones that come up with the idea, let's, you know, let's tap into all of this sentiment data, let's tap into all sorts of telemetry we get from our customers and actually find out more about it. That's a marketing business opportunity. That's a sales opportunity. That's a customer support opportunity. That's a manufacturing opportunity. Those are the business units that come up with the math for these things. So the biggest advice I'd have for large enterprises today, especially for that segment and that demographic, is there's more than one pilot going on, likely. You know, general, general electric could be an easy one because they're a holding company, but you don't have to be a big holding company to notice that if your marketing department's interested in this, then your sales department's likely interested. And if both of those guys are creating a whole funnel for this kind of information, you can damn sure bet that the customer support department, the customer service department also has a strong interest in this technology as well. And if you're in manufacturing, if distribution's part of your business, big data can impact your entire business operation. And what's probably going to happen, this is me projecting out into the future, is that big data will surround enterprise IT and, you know, kind of circling the wagon and eventually it'll penetrate enterprise IT by sheer force of will and osmosis at that point. So back to, now coming back full circle, let's talk Silicon Valley, let's talk New York, we're in Boulder, Colorado, where all the startup ecosystems are really booming. And all around the world with big data, you're seeing people in the UK as well, London's actually a big market. The BRIC countries are into this in a big, big way. So entrepreneurially, the entrepreneurial landscape is going to fill the void. The application's the big rage. What are you seeing in terms of that penetration? What's your take on the startup landscape? I saw you in Silicon Valley, we spent some time talking about a big data after hours event and talking about that startup scene as it starts being acquired, they're rapidly building up with tech, some core technology being bought, some are expanding, what's your view of the ecosystem? I think the scene is really, really vibrant, more than I can ever recall. There's obviously the perfect storm of factors combining to the technology trends are proven right now. There's money there and there's more money than I've ever seen before, not just at the venture capital level, but at the angel investor level and even some of the angels now have funds, which I think is an oxymoron, but there's even a sort of true angel, private investor level. There's a lot of money, there's a lot of technology and there's a lot of talent in this space right now. This show in particular, I'm seeing a lot of folks that weren't at the same company they were last year out either at different companies or more often as the case, out at new startups, trialing out a new idea. Data driven businesses now are a really, really hot investment area, probably the hottest that I can think of at least out in the West Coast. And for me, I see potential in horizontal markets where we have, and from a technical perspective, I see potential where you can establish a platform to build upon. And I was seeing hints of this last year, but there's definitely a lot of confirmation right now that Hadoop, pick your distribution, is becoming a platform right now. And people are now funding and building entire businesses layered on top of that as a platform. You had some very intuitive and astute comments at this SNW for folks out there who want to watch Val's last interview with us on YouTube.com slash SiliconANGLE, look for SNW 2011, I think it was. We talked about this and you said, quote, open sources about winning with code, donating competitive advantages to contribute code, be a good citizen, but win with honor and code. Okay, cool. Love that. That's a great interview. You got to listen to that, came from Val. But I want to ask you on that thread, we heard from Rob Beard, the CEO of Hortonworks. This is good citizenship, technology purity around distribution, and the open red hat business model. So the question is, will that work? Is, can you say, red hat for Hadoop given that the market is completely different where unlike Linux had a little bit different market path who's different dynamics going on at the time, could you share your perspective of that during the Linux revolution, you had a clear leader, unified force, you had a benevolent dictator kind of in that model, he's more community collaborative, which is riskier. Yeah, yeah. Can the red hat models win? Short answer, yes. I think there's pluses and minuses here, the situation you say is different, but honestly, you've got the benevolent dictator on the Linux side, you know, moving that forward at a pretty aggressive pace, that entire community forward, leading and moving. Patchy can be that way too. Could be, that's a downside of the community model with Apache, there's a lot more politics, but remember, the GPL license is almost what I call a communist license, it's very capitalist unfriendly, it's very hard to obviously monetize IP with a GPL license, whereas the Apache VSC-style license is a lot more business friendly. Yes. So I think that positive outweighs some of the political negative of the governance model and the committee. But will that license also create more innovation, so less of a clear leader like Linux? Because that really rallied the enterprise vendors, like in HPs who, you know, they do some tinkering around, but okay, we'll just do a deal with the leader. But now you've got Cloudera as the number one, Wharton works as following as the number two, and a variety of sub players. I still feel, you know, if we were in the first inning of this last year, then we're in the second inning at best of it this year, there's a long way to go and I predict inevitable consolidation in this market. And I know it's controversial to say like EMC and NetApp becoming friends, but I actually predicted Cloudera and Wharton works have a natural set of alignments actually. And they did. Yeah, and I think they'll be seeing them combined because they actually bring different strings to the table for the same customers often. We see that. So I can see them certainly becoming better and better business partners over time. And a rising tide, I mean, the Cold War argument kind of went away for two reasons, one, that both bring something to the table, plus the market was all of a sudden the thermal growth. And I actually think the Apache distribution, and I see it more and more, it's from a technical code perspective. It's that distro that's becoming key and that's not fragmenting at all. We even find MapR contributing back into that, for example. So that distro is getting stronger and stronger over time, leaving plenty of gaps for vendors to fill right now and make money, but over time, that distro is becoming the distro. It's getting, you know, functionally stronger, it's getting more resilient. And, you know, that Red Hat model, I mean, Red Hat isn't pure litics kernel either, but adding value and working off that same core kernel, I think it's proven to be a viable model. You've got Rob, of course, Bearden, with his old Red Hat experience, and, you know, wisely guiding the ship, I think, at Hortonworks right now. He's a wise, he's a wisely, he's wisely guiding. I've got a lot of confidence in him. Mike Olson's proven himself as a businessman, so I'm optimistic, I'm very, very bullish, I think. This will work itself out, not in any Nirvana scenario, but it won't be a panacea, but for business people looking to solve business problems with this technology, the barriers are falling. Where do you see the real opportunity, let's say above the distribution layer? Is it at the applications? Is it the integration? I'm hearing today, you know, especially, and I don't think it's just today, integration's still a huge gap and a huge opportunity for people to fill. I think was it Scott from Teradata, actually, had a very, very astute keynote today, which is the large opportunity here outside of outliers like Facebook and so forth is absolutely in the tens and hundreds of thousands of enterprises around the world, small and large, and they have existing infrastructure, they have existing applications and application integration projects underway right now. Vendors that can eliminate the friction between integrating existing apps and existing data markets, data sources with Hadoop technologies and vice versa, those are the vendors that are definitely going to be seeing a lot of, I think, short-term wins and opportunity right now. And, you know, in a three to five year timeframe, I expect the core Hadoop distro to grow with some mature open source projects which will solve that problem, but that is a real business problem today and it's a real opportunity that vendors can solve. Can you, let me ask you a question. This is more of a looking back, knowing your background in the industry. You've seen the virtualization movement, front row. I mean, VMware, when they came into that business prior to being bought by EMC, really helped themselves to the server marketplace, right? So, the server market was an existing market. They essentially were re-monetizing existing server business and we kind of know what happened there so you get the point there. Now you got Hortonworks. Their strategy's very clear. They want to re-monetize existing positions. We look at the high availability with VMware. They're filling the holes by strategic. Do you think that's a good move? Do you think that's viable? What's your take? I think it's a little premature to predict whether Hortonworks others will succeed to the scale that VMware has. But one of the things I look at just architecturally, the dichotomy between VMware and the Hadoop community you're trying to do is quite stark. VMware made a killing on consolidating workloads and consolidating servers effectively. Lot of customers, including NetApp customers, have those before and after pictures and a whole load of dump trucks with old IT equipment basically and a much smaller rack serving the same workload. You don't see that with Hadoop. You see more and more racks, pictures of multiple racks versus smaller racks. So if anything, it's definitely, I think, just a... It's the reverse. Yeah, it's the reverse from a server and storage infrastructure perspective. Right now, you look to deduplicate and reduce physical copies in a shared virtual environment whereas you look to create multiple copies for performance and availability in a big data environment. Obviously you work at NetApp. You guys are a leader. You guys are in storage vertical. But let's talk about other companies like Microsoft who have a different market challenge. And they bought Skype. They got the Xbox is booming. Great stuff going on in Microsoft there. Search, Bing, they're trying to reboot that. But ultimately their core franchise, Windows, and the enterprise business that they have in IT is still massive, but it's under threat. This is threatening their business. So they're looking at this marketplace as a pivot point, as a lever. What are companies like that, like Microsoft, like the EMCs, like other companies? How do they move their value from an existing market to another market in this expansionary IT market? I think the battleground continues to be developers. And as I mentioned on this show and others in the past, Microsoft has an incredibly strong legacy in terms of catering to understanding and recruiting a large and loyal development community. And I think that's their lever that they're able to use. They are pivoting quite a bit, I notice here. They've announced obviously Linux VM support in Azure. They've announced Hortonworks support for the Apache distro on Windows. I know from a storage perspective, they've been proactively open and engaging with standards bodies on the SMB protocol, which has matured and evolved to be a pretty cool technology and protocol. So I see Microsoft making certainly a lot more aggressive and a lot more nimble moves than they have in the past. I don't really see them resting off or trying to wall off the Windows franchise, which would have been a strategy of three or four years ago, quite common with them. So they're making some good moves. And I think it's a different battlefield. They definitely have to establish credibility in some of these new communities, but they're coming from a point of strength, which I continue to believe is that developer loyalty they have. And as long as they offer their developers access to these cool new technologies, and I forgot even what it was called, they dump their Hadoop clone. Well, everyone has kind of figured out that. You join in, everyone grows together. They're making savvy moves. And again, if they can continue to feed their developers and recruit new developers, I think they'll continue to dominate. As much as we're bullish on big data, as you know we're falling over ourselves around the opportunities and the wealth creation from developers to business, we're still constantly the shortcomings of the market. Obviously batch right now, HDFS has got some great improvements we've heard from folks earlier around security. So you're saying people are filling in with good code, but still from a high availability, from a scale online transaction processing, no one's going to run their credit card transactions in a Hadoop environment in the near future. So there's still a long ways to go. So what is your prescription for the community? If you had to kind of give your speech to guys out there and saying, hey, opportunity, but here's the key work areas. Let's see some double time on these areas. I'm actually seeing a lot of sort of heterogeneity in the marketplace right now. I was having dinner with a big customer last night and I see this with lots of customers. People, customers aren't dumb, right? They're not trying to solve every problem with one technology. And again, this was, I think, acknowledged by either Rob or Eric or Sean up on stage at the keynote this morning as well. They referenced new SQL and no SQL technology specifically. We are really in an era right now where once we've outgrown our traditional tools and traditional infrastructure, we're looking towards all sorts of purpose-built technologies to solve these specific problems right now. And because we are venturing into new ground, to certainly into new scale of data, I am seeing people combine the best of Cassandra for Ingest, the best of CouchBase for in-memory processing, certainly SAP HANA for in-memory processing, if you're into that kind of infrastructure, the best of Hadoop and HDFS for Batch, HBase for a little bit more real-time and transactional workloads. It's very much a polyglot environment out there right now and I'm seeing people be very pragmatic about applying appropriate tools for the job. And that will again be a sustainability issue over time because we don't want to fragment and dilute a lot of expertise across these different technologies. And we're probably going to see, I think, Membase and CouchDB, that combination perhaps is a bit of foreshadowing of what might happen in this business, start-up level consolidation before big vendors eat up a small. But that's really where the market is right now is people are using purpose-built solutions for these tasks that are challenging existing infrastructure and I don't, I see that proliferating as opposed to what he's diminishing at this stage. So you mentioned earlier kind of, Big Data is kind of surrounding enterprise IT. So from a practical advice perspective, if you're a storage pro, you're seeing this happen, maybe you're seeing a few deployments here and there in your organization, what practical advice do you have for them? How do they approach this from both a technology point of view and from a bigger picture organizational, kind of looking forward, the architecture, how's it going to look in five, 10 years? Part of it is drop-dead simple. Go out on Google and do some slide sharing and learn what this technology is all about. You really got to grok the fact that we're talking about parallel processing now in business computing for really the first time in history as opposed to high performance computing only. With parallel processing comes distributed infrastructure, distributed storage if you're a storage pro. These are, again, contradictory messages to what you've been living probably historically. Cloudera, of course, at the standard, Hortonworks has a very nice educational program now. Get out there and convince your manager to invest in education on administering Hadoop and or developing for Hadoop. And what I've seen is certainly over the past two years, folks that I work with, customers that I work with have gone through that motion and they've completely turned around right now. I mean, the biggest Oracle and SQL Server bigots right now haven't abandoned their love for that technology, or SQL in general, but are realizing the potential of this whole new unstructured, semi-structured, multi-structured world and the opportunity that being a very marketable, technical expert in these technologies provides them. So it's very pragmatic advice to begin with and the rest of it is standard MBA advice, right? If you're in technology and you want to get ahead in your career, go find a business person, learn what their problems are, and then go back a month later and try and solve one of them. So you combine those two things and it's, I think, a very pragmatic roadmap into this market right now. Val, we got a break here for our next segment, but I want to just give you last kind of bumper sticker, kind of futuristic Val perspective, because you are a good predictor of some of the trends. What's your outlook? How is this, what's the next leg of the journey in this Hadoop meets data warehouse, business intelligence, storage mashup, data growth, craziness? I'll send out a bit of a teaser because it'll be a good discussion topic for six or 12 months from now. We're seeing some massive changes in terms of the infrastructure in support of these things. And specifically what I mean by that are some of the limitations we're encountering with flash technology right now, while at the same time having firmly established solid state storage is a requirement, not even a luxury, but a must have with an enterprise IT infrastructures. So what if now solid state storage moves into storage class memory? And what if we have bit addressable storage for the first time in history as an application developer, I don't even call it that. I call it memory that I can just mark persistent that it'll survive a crash or reboot whether I want it to or not. If you're the Facebooks of the world, you're going to innovate, if you're Apple actually, you're going to innovate in terms of how you rewrite your OSs, how you rewrite your apps in that environment. And that'll make the notion of real time complex event processing a lot more accessible than it is today. So this is- And long-term storage a lot different than what we know. And the outcome of that is great user experiences, new user experience. The outcome of that is fantastic user experiences, great economics once the economies of scale for these technologies kick in. And again, as we started this discussion, a massive, massive change is the only constant here. I love my job when we have these kind of conversations of Al Berkev-Ochi from office of the CTO at NetApp, great innovation, the infrastructure, great user experience, new user experience, great economics, which is the equation for wealth creation opportunity across the board. So from entrepreneurs to the big companies, this is the center of a whole new industry. We're excited to cover it. This is theCUBE, SiliconANGLE.tv. We'll be right back with our next guest.