 Live from the Fairmont Hotel in San Jose, California, it's the Cube at Big Data SV 2015. Okay, welcome back everyone. We are here live in Silicon Valley for Big Data SV. This is the Cube, our flagship program. We go out to the events and extract the ceiling from the noise. I'm John Furrier with Silicon Angle, and we got back into the conversation about open source. We've got Sean Connolly, VP of Corporate Strategy at Hortonworks and Ryan Peterson, Chief Strategist for the Emerging Technologies Division with an EMC. Guys, welcome to the Cube again. Thanks, John. So, EMC Hortonworks announcement. Give us the update. A lot of partnering going on. A lot of high-fiving at this event. And talk about the context of the event. Okay, a lot of partnering. You guys have been partnering for a long time. Hortonworks is in your DNA. EMC is now, especially the Emerging Technologies Division, is very much in the market, right? They're not coming in whenever they want. They're actually part of the strategies to get in and be open, be part of the community. Talk about the relationship in context to this new inside-out organization of open source that's happening in front of us. And the accelerant to get to value. That seems to be the theme at this event. Customers want value. They're part of the conversation. It's community-driven. It was some context to the announcement. I'll sort of start up, because you brought up the customer. And when we were talking yesterday with the pivotal half of the Federation, right? What we're doing here is customer-driven, right? And so we have a variety of joint customers who asked us to make sure, particularly around the Hadoop platform on EMC storage, particularly some of their emerging storage technologies, to work together to make sure it works seamlessly so they can get the value out of their existing investments. And you could walk through a little bit of the outline of that, but I think we're executing really well. Yeah, I think we have 6,000 customers using Icelon exabytes of data available. And going back to that whole driving value, we can now take that information, have portmers come in, attach to the information, and get unique insights very quickly, as opposed to having to think about starting up a whole new architecture, copying the data from one place over to that architecture and having to run that same analytics. So it just gives you kind of that direct, remote access into the data sets. We had Microsoft on earlier. Actually, Revolution Analytics is now part of Microsoft. Davis Smith, and the discussion is coming out of the Apache Software Foundation, and your thing is that this whole fragmentation and all this stuff is going on as a conversation. So I want you guys to talk about one concept again. The difference between fragmentation and choice. And that's an important nuance. So fragmentation means, it's a certain thing I don't want to give my definition, but I want you to hear your version. Fragmentation, unification, choice. Customers want choice, they don't want lock-in. So EMC have a lot of partnerships, and you guys partner all the time. You know, for the benefit of the customer, what does fragmentation mean, and what does choice mean? And are they mutually exclusive, and what is that? That seems to be a debate this week about fragmentation of standards. Open source is a done deal. Certainly we know that, but... I think certainly the ODP initiative is a great example of how you're taking some of the components and starting to standardize them so that you can get to a place where it's a little bit of cleaner implementation. I think the idea that the Hadoop ecosystem was built to be somewhat of a fragmented components that could be swapped out very easily, find, you know, this one's good and replace it with another. Well, at the same time, we're finding that customers are having problems with, well, which one? Is it this one? Is it that one? What's the best sequel on Hadoop Engine, et cetera? So it's not good. Fragmentation is bad. Well, I think it's good for development, but bad for getting really clean implementations, right? Solutions. Yeah, you need really good solutions, tight solutions. From a customer standpoint. Absolutely. And choice, what does that mean? I mean, in terms of your, from your view, you have portfolio products. Obviously, you have two devices now, emerging technologies. Absolutely. I think there's benefits to different kinds. And I'll talk about storage. You know, we have this ECS product we launched. It's all about object-based stores, large-scale, geographic dispersion. That's something that Iceland product isn't really that great at. So from a customer choice perspective, you've got to install the right tool for the right job. In the exact same time, Hadoop has shown that it's this really great system for analyzing content. All of that data set to be available for our customers and be usable and extrapolate value very quickly. And when I think of fragmentation, there's, I've view it at a couple of different layers, right? So there's market fragmentation and forcing the ecosystem to have to make it more complicated for them to choose to be able to plug in and get the value, right? And I think that was around the common core stuff, is if you can ride around some commonality, then it takes a little more, you know, of the guesswork out ecosystem and plugging cleanly and not be bogged down with that level of fragmentation. With the core concept. Yes, but then there's, you know, and that's market fragmentation. You can do things to help coalesce and make the market function better, right? And remove some of that friction. Then there's architectural fragmentation, right? The classic silos, right? And so it's incumbent upon vendors like ourselves to actually work together to develop architectures that reduce that fragmentation, right? So, you know, it's easy to talk fragmentation. You have to qualify it on, specifically, what are you talking about, right? I like to solve market fragmentation, friction issues, architectural fragmentation issues. This announcement, frankly, is around architectural choice, right? If we're able to allow our customers to deploy and get value from a history platform. So you're eliminating architectural fragmentation with this announcement. Exactly. So what you're saying, if I hear you correctly, then market fragmentation is a little bit easy to deal with when it's open. Architectural is where the trip wires are, right? Exactly. There's work to be done in both, right? But at the end of the day, it requires, you know, vendors, particularly in an architectural fragmentation situation to, if there's skirmish areas, identify what those skirmish areas are and then collaborate around the areas, listen to the customers. Like when I was at JBoss early days, you know, I formed a relationship with Microsoft. I was like, let's put Java versus .NET aside. Let's optimize this platform on top of your platform. How does it integrate with the rest of your ecosystem? Because the customers are asked for it. That's unification. That's unifying. And that's a unifying architectural approach. Previously, it wasn't able to be done. Architecturally, you remove the issues. So I'm an executive. I'm a business school strategist. I want to lock in. I want to win the battle. I got the smart bombs built in. That's where fragmentation can help me if I'm that evil genius. I'll say, hey, I'll go in the architecture. No one's looking at why are these two things together so that I lock in. Isn't that the old way used to be? Two approaches are fragment the market, take a piece of the puzzle and own that area. Another thing, and we talked a little bit yesterday, is make the pie as big as possible and then your slice is as big as it can be. And if the pie includes architectural choice in that, where you're plugging in an A1 choice, then inherently that pie is going to get bigger because the customer has a lot more options. So talk about the announcement now, the specifics of your announcement with this piece here. Architectural unification and core. Yeah, I'd say taking a little further to the full validation and kind of proven nature of it, it's taking all these kind of components. Like I said, trying to lock it down to something that's really enterprise class and quality. One of the things that's great about what Hortworks has done is they've taken all of the tool sets and created these test suites. Test this, see if this works. Kerberize, non-kerberize, this particular tool versus that tool, and then put that over the top of our different storage technologies and test I think it was over 6,000 tests. And so as a result of us passing 6,000 plus tests, that's when they said, okay, we're good to go, we can certify this platform. And then it just goes through. And vice versa, you get certification on an open source platform like I do, they get access to customer base, so it's kind of a win-win. At the end of the day, the customer's happy because they know we work together and it didn't run through just a Barney set of 10 tests. It was real deep rigorous testing. So there was some real integration testing going on. Absolutely, that's the whole definition of joint engineering deep certification, right? And 6,000 tests, a lot to pass through. It's run the gauntlet of HDP and ISLON. Frankly, it's the best MO secure integration out there today on the market. You know, we had the dean of Big Data on Bill Schmarzo earlier. He's been a Cube alumni, actually wrote a book and launched it on the Cube by Wiley, great publisher. And he says, you know, he goes talk to customers all the time, we love to hear their customer stories. And you know, I said, what's the big takeaway this year? Bottom line for him. And his answer was simply, Hadoop's a done deal. I mean, like beyond done, it's like no longer discussion. So there's no more discussion. Hadoop is it. There's no, not a question of when everyone's doing it. So that's a fact. So that's cool. So customers look at Hadoop. So now it comes back down to the EMC equation. You guys have huge installed base. What does this mean for you guys now? You have a lot of customers certainly doing POCs, maybe going into production with Hadoop at large scale, medium scale or whatnot. But they've got a lot of drives laying around. You've got tons of new stuff happening. What does this mean for you guys? I think I look at historically when we came out with systems like Symmetrics and VMAX World and it was big block stores. The RDBMS was getting bigger. The RDBMS, it started out with, well, it's a great tool set, but where's the app? And now we're in that same position, right? The infrastructure's been built out. Hadoop has gotten stable and it's gotten to the point where we can start building the applications over the top. So I think this is the year of deriving real value and repeatable value because I think Hadoop has kind of been a science project for many customers and we get to the point now where we can start building actual applications. Maybe it's fraud analytics applications for banking. Take that and repeat it over and over and over for various banks and we can start creating real value. That's a repeatable process. I think that's the next big step. Yeah, and then we had a chair kicked around the cube this week and today, and it was amplified, we've got to get out of this proof of concept business and get into the proof of value business. So with that, what have you guys seen as key touch points that you can point to and say, hey, there's some real value proofs coming through the market? I mean real value, not like proof of concept, but like where the Hadoop thing is certainly making a big difference. Can you guys share any stories around that regard? I mean, we play across a wide range of industries, particularly in our area, ingesting more sensor internet of things, use cases, high-speed analytics mixed with sort of deep historical analytics is a very common pattern. Everybody talks about the 360-degree view pattern. We see it not only customer, we see it 360-degree view of product of supply chain. We're actually seeing some advanced use cases where telemetry into the customers and they're buying patterns geographically, you're actually able to optimize the supply chain to get the right inventory flow through into the right areas based on fairly advanced analytics. It'll be interesting to see the impact of big data on classic supply chain thinking. I think that's something that's going to be playing out over the coming years as well. Well, that's real value. You start to get into the ROI equations. Yeah, I mean, there's real revenue. Someone's happy and someone's smiley face, a happy customer. Talk about real money. Real revenue with better margins on that revenue, and that's what we hear in those types of use cases. I like to define those as kind of three major categories. You have the kind of CEO conversation. Can I create a new business for you? That's CFO conversation. Can I save and optimize your business, save you money? And then there's this third category that's my favorite, which is that change the world category. You may not have a financial value, but just the personal value to get out of it is incredible. So I think I have a great customer example where they set it up and pulled out call detail records and started looking and analyzing the content. And as a result, they were able to find missing children and return them to their parents and things of that. And I think you really can't quantify how important the technology has become until you start seeing those kinds of messages come through. So the easy part is going to be building an application to do fraud analytics. The hard part is going to be how can we use these technologies to save the world. So I've got to ask you both. And Ryan, we'll start with you because you, as a DMC, has a lot of customer base, big and installed incumbent positions, right? So we were talking earlier, especially in big data. We have the same issue with our crowd chatting. We've got to talk to social media, the same thing. It's like, oh, I don't need another platform. So people are tired of coming in, people selling a platform of something. So brings up the question of time to value. This comes down to what is the tooling? What is the platform where someone can get to value fast in the enterprise? So who do for the enterprise certainly is a real deal. What is that speed to value that you're seeing that customers can do without having to just roll out a full-on platform? Are you seeing? Because you're dealing with large customers. So it's not as simple as saying, hey, I'll wrap some Hadoop. Drop it in. You can drop. You got personal services and whatnot. But I don't want to have to spend a ton to go look at value. I think the easy thing from my perspective is go set up a quick virtual machine or two and get a Hadoop system set up and get to the point where you can look at some of your data. I think that the one that everyone likes to talk about is ETL, right? That seems to be a quick and easy. Save yourself a lot of money because ETL systems are typically fairly expensive and EDWs are even more expensive. And so if you can figure out a way to save some money through that process, that's usually a good way to provide instantaneous value. At the same time, I think that our perspective is look at storage from a protocol perspective. So all the different kinds of tools that you ever may need to use, whether it's an endpoint sensor level or if it's a Microsoft device or a Unix device or any sort of particular item, be able to get information in and out of that particular protocol is going to save the customer a lot of time and money and get them to faster results. So what's your take on what you see? I see you have a breadth of different industries. What's the low hanging fruit that you see? Just anecdotally... We see a very prescriptive journey and actually in the first wave of Hadoop adoption I think the ETL offload and that optimization path freed up budget. Whenever I talked to customers I was like free up your budget. That's great. Pick two applications. If you don't, you will stall, right? It's not about saving costs. Finding more off they can chew, then they can chew or... No, no, it's scope issues. From my perspective, the successful deployments that I've seen are application driven, right? Business driven. And frankly, I'd rather have the cluster start small and then add in two, four, eight, sixteen apps then try and assemble data lake to cut costs and then figure out now what can I do with that, right? And so first wave was let me assemble all my data. Maybe I might figure out something. I'm less a fan of that approach, frankly. I'm more a fan of start with some concrete applications you might chase, optimize some of your data architecture along the way to free up the budget to fund those and then have an onboarding strategy where... So get a proof point knocked down, hit a few singles. Don't try it for the defenses. Exactly. Start small, it will grow. Like I said, you don't have to work at filling the lake. If you have successful applications, the lake will be filled. Right, data lakes. Again, I love that term data lake. I'm more like a data ocean these days where customers see this tsunami, but that's going to get adopted that time soon. It's a journey to that lake is the point. Lakes seem boring to me. Lakes, you play in a lake. Ocean, you go to the Pacific Ocean. You can die. The thing is that the data business is really heavy right now. If you look at the dynamics, real-time and things, I've got to ask you, what's the next wave? As a company, if they don't get the data strategy right, they literally could go out of business. The competitive pressure is strong. With that, what is the next wave? What are customers looking at? What I see is the advanced analytics, fast analytics, those situations are popping up left and right. Ultimately, I do think, and we're even seeing some of the technology emerge, transactional semantics will come in and around the platform. I think that's going to play out. But right now, where a lot of businesses are headed, and I heard it on the show floor, at least a dozen times, is if I'm a manufacturing company, I want sensors on my assembly line so I can get out ahead and be proactive in the maintenance of that. But I also want deep analytics and archival-type analytics scenarios all on one platform. And that fast, combined with deep, is a common pattern. Sounds like an ocean to me. What's your take on the... It's a speedboat. Actually, Lake Tahoe had six-foot waves this winter, so that's a couple weeks ago. It's like an ocean. It's interesting. I love that question, by the way. I feel like customers, in general, any company needs to be thinking about how they replace themselves. And I think that what Hadoop can bring to the table is taking all of the data that they already have and figuring out how to use that to replace themselves. And I think it's a great... A good testament of this is, I think Uber is a great example. How many people have been switching over from using taxi cabs into Uber and the taxi cab companies are really hurting as a result. And I think that any company who thinks that they're safe is not safe. I would think that Uber is a great example of how it's just going to have the taxi line in front of the hotel. I'm like, look at these guys waiting around. I'm like, they're going to be extinct soon. I mean, why are they standing on the side? We're waiting for people in the cab lines. Like, no one's going to the cab line. And when we talk about these different patterns, right, and it gets back to some of the work you guys have, particularly in the emerging technologies, we have the Iceland stuff that we're talking about here today. But there's the SSD technologies, there are higher-speed technologies, there's cloud-type technologies. And the reality is you need an architecture that's going to support that whole gamut, right? All right, so talk about the deal now between you guys. So what specifically are you announcing? What's the product integration? What's the offering? Is it for GA? Is it just certification? Give us the details. So today, it's certification. We've gotten through, like I said, over 6,000 tests, full validation, ready to go. And now we're ready to start going to market. We've got some customer base that started off a little earlier before certification that is now very excited to see that we have a joint-supported offering. So is it a joint sales? Go to market? Or are you going to market together? Is they selling through you? So how does just, can you share that? Today I think it's meeting the channel, but certainly there's opportunity and discussions about how that will work. Our fields have been engaged for a while, and that's really informed a lot of the work that we were doing. But we really wanted to make sure we, you know, tighten the screws and made sure that we blessed it so there weren't sharp edges, there weren't any insecurities. And it actually worked across the full platform, and that's really where we spent our time. So Ryan, talk about EMC, honestly, emerging technologies. It's done some really amazing work. You're able to viper, and then we send a lot of stuff certainly from viper, and you've got the big data stuff going on in clouds happening. A lot of good stuff. What is the EMC culture like now? See Jeremy Burton has brought a lot of mojo there. We interviewed him when the Cube just started in 2010, when his first week on the job, now he's president, and we have a chat with him on Friday, crowd chat, ask him anything at 10 o'clock on Friday the 20th, a little plug-up for Jeremy Burton. But he brought a culture of risk-taking and big ideas, right? So is getting into the trend getting into the trenches is going to be a challenge for EMC. I mean, EMC has always been a field-driven company on the sales side. Can you bring that same mojo into the community world? And what's the strategy? What's the plan? What's the culture like? Is the mindset ready? Is it everyone's running that way now? Or is it just starting? Or is it already happening? Yeah, it's very interesting because we've, I mean, just over the last few weeks, we've seen things like we're doing more open-source contributions. Obviously, you know, some of our Federation partners have been taking things completely open-sourcing components. And I think this is a big shift in thinking from traditional old EMC thinking to new. I think the Emerging Technologies Division is a great example of where, whether it's acquisition or a build-from-scratch, we're creating new technologies that are intended to replace ourselves and get to the point where we can constantly keep up with the latest and great trends. Hence the separation of the different focus. Obviously, the other group's more the blocking and tackling, bread and butter, EMC. Except for, you know, extreme mojo, that's the flash stuff that's built in. So this is just a new team go invent the future. Is that kind of the mindset? It is. It's really about, let's see what the future looks like and start to build out those technologies and it's things like scale-on. It's things like big data and we're an object in cloud and we're really figuring out how do we tackle what the future looks like and learn how to replace ourselves. Well, we're psyched to broadcast on theCUBE we've been following EMC now. It's like I said, since 2010, theCUBE has been great support. Thanks so much for the support and obviously Jeremy Burton's a visionary. We love that guy. So he's making some great changes. You know, Microsoft now under Satya Netella, similar mojo going on. You know, since there's no longer about Office and Windows and you've got a guy who knows cloud, you've seen a lot more open source. They donated great reference design to open compute. So this is the new model, right? It's exciting days. So what's your take on this? I mean, what is the new formula of success in open source? This new generation, the young guns are coming up. We're all the old guys now. The young guns. I don't even know how to go T anymore. Be all white, you know. But if I did it would be like all white. We tried to synchronize. Yeah, exactly. I could grow my out fast enough. There's a new school. What's retained? What's retained in the model and what's new? What's developing? Yeah, I think it enables a more open way of collaborating. And frankly, the new organization has made it easier to collaborate across multiple fronts. So from my perspective, less friction in the collaboration is a good thing, right? And I think that's what we're seeing. We're beginning to see those walls be knocked out. People go faster, too, when you have decision making that's not like five chains command across organizations, right? Yeah, we're seeing, you know, open source, there's a religion there. And religion is getting, is morphine a tiny bit. So I love the concept that open source is starting to feel comfortable working with some of the close source vendors. I think when it comes down to it, we're all going to be in the same bucket. Customers are going to choose different things for different reasons. Open source, I think, really helps grow out technologies very quickly. At the same time, I think the vendors that ultimately take those components and build them into something that's a little more secured or more enterprise class, they're both going to exist in harmony. Well, it's been great to watch everybody. And certainly we've been watching this evolve. I remember when Pat Gelsinger was at EMC, we said, hey, you don't get in the sandbox or throw sand around in this open source world. You've got to be careful. And you guys have had to evolve after the Federation with Pivotal. You guys have been very successful at Cloudera. The whole ecosystem is fruit on all the trees. There's plenty of beach head to be had in this trillion dollar TAM opportunity total addressable market. So it's like a growth machine. So good luck, guys. We'll keep tracking. Thanks for coming on theCUBE. Hortonworks and EMC. This is theCUBE live in Silicon Valley. I'm John Furrier. We'll be right back after this short break.