 Live from the Fairmont Hotel in San Jose, California, it's The Cube at Big Data SV 2015. Welcome back everybody. We are live here in San Jose at Big Data SV. This is the Cube's premier Big Data event. Welcome to the program. We're joining in this segment by Jim Walker, who is Senior Director of Product Marketing at Hortonworks, and Rob Rosen, Senior Director of Product Solutions at Platfora. Guys, welcome to The Cube. Thanks for having us. Long time frequent guest. Rob, I think your first time joining us on The Cube, so great to have you guys here. So we've got you both here. So I guess, you know, I want to start the question. Talk a little bit about the relationship. I mean, we know that Hortonworks is very important to build out the partner ecosystem. It's a really integral part of your business model. Talk a little bit about, Jim, why don't we start with you, kind of, where Platfora plays and kind of set the stage for us, kind of, where the relationship is. Yeah, you know, it's funny, you know, a fundamental belief at Hortonworks has always been to enable the ecosystem. I think from, you know, day one when I walked through the front door, I sat down with Rob Beard and he said, you know, look at, you know, our strategy is really a triangle, and one of those pieces of the triangle has been really a partner strategy and an ecosystem strategy, and, you know, partnership with Platfora has been going on for, oh gosh, quite some time, really since the inception of Platfora. And so, you know, really early on, you know, we saw that, you know, this visualization and the analytic layer that needed to live on top of Hadoop was definitely a necessity in our customer base. So, you know, gosh, we've been working together for about, like I said, about two years and, you know, the partnership has had, you know, I think Hadoop had to make its, had to be established as a platform, you know, people had to dump a lot of data into it. I think initially people were using Platfora as a way to kind of play with things, that's completely changed. You know, we're at the stage now where people are looking at tools like Platfora and then in particular to say, hey, let's really extract some real value out of this Hadoop thing. You know, let's, you know, start with this EDW optimization, whatever we're doing, but real critical to a lot of our customers today, yeah. So, Rob, tell us a little bit about your approach at Platfora. You know, Jim kind of alluded to the first phase of Big Data was a lot about getting data into the data like, if you will. And now we're moving to, we're seeing, well, how do we get some, a lot of value out of all that data in there? And that's where I think Platfora plays. Tell us a little bit about your approach to helping customers actually get some value out of that data through the analytics and visualizations. So, when we founded Platfora four years ago, we heard very frequent comments around, hey, we feel like we're data rich, but insight poor. And when we looked at how most of the traditional business intelligence vendors were approaching the market, the conclusion was that kind of bolting on Hadoop to a traditional BI pipeline was going to make that problem way worse. So we decided that we really need to enable end users, business analysts who want to be able to get into all the data with Hadoop, find patterns very quickly, we need to enable those folks to iterate to insight very quickly by doing a lot of the work that IT would traditionally do around data prep, around analytics enablement and kind of get them out of the world where adding a new column of data takes nine months and allow end users to do that themselves in like nine minutes, right? So that's really been our focus over the last four years. We've seen some amazing use cases that were enabled by HTTP to help people kind of move from that sort of initial probing around what do we do with this stuff to some really critical business insights that they can find very quickly with Platfora and HTTP together. So dig into that a little bit more. So what are some of, as that evolution has occurred, what are some of the common use cases you're seeing among your customers? So we see about 70% of our customers doing work around customer analytics. The traditional customer 360 sort of sets of use cases where they're combining the traditional data sources around highly structured data, which they're scooping into HTTP. And then they're combining that with maybe sensor data or lots of less traditional data sources, clickstream data. And they're getting a full picture around how a given customer might be interacting with a brand on their wireless device versus on a website versus in a store, right? And once you get that complete picture, you can really get to the bottom of what people are really perceiving about your brand and where they're getting hung up when they're trying to transact. That's been a real common area. Now are you seeing that develop kind of side by side with some of the more traditional ways they've been doing things? Or are you seeing more of a rip and replace kind of strategy? Or are you seeing that this is more complimentary to what they've been asking? It's really complimentary, right? So you might take some of the same data sources that a traditional customer analytics solution like Omniture would use and enrich that with other data sets that maybe people wouldn't have the ability to do easily with a traditional analytics solution. And the same sort of paradigm holds true for some of our other customer solution sets like security and Internet of Things, right? There's a lot of very traditional analytics approaches that people are using and they're complimenting a platform play with the HTTP and platform together to sort of flesh out the overall big picture. And over time we probably anticipate seeing some of those traditional workflows migrate over to a Hadoop and a platform kind of a play. So you mentioned security, obviously that's a hot topic. Pretty much every other day, on the front page of the Wall Street Journal, there's some kind of data breach. I'm curious, from a security perspective, from both you guys. So Jim, let's start with you from Hortonworks perspective. What are you guys doing in the security space? I know you launched the data governance initiative a little while back, which is related to security. How do you approach security in a big data context? So there's really two sides of this question really, Jeff. I mean, there's security of the platform itself. And we're entrenched in that. We incubate projects. We actually purchased the company last year and then incubated that as an Apache Software Foundation project. It's called Apache Ranger. So we're in the business of really securing the platform, right? And so I'm not gonna talk too much about that. I think what it becomes interesting from an insights point of view and what the tools you can do to actually implement security for an organization is really, really interesting to me. I don't get me wrong. I love the bits. I love the Apache projects and all these things. But I think from a pure consumer point of view, what can people use Hadoop for to implement better security in the organizations? A whole lot. I was in the computer security business for, I mean, I was coding ACL frameworks in small talk in Java in the late 90s. And then I moved into master data management. So it was single view of the customer stuff. And when I found Hadoop, gosh, six years ago or so, I just, I had looked back on my career and said, wow, there's a lot better way of doing all these things. The single view, the 360 degree view, as you just said, we start to look at the amount of data that it takes to actually do forensics on say a breach. When we look at the amount of data that it takes to actually understand how holes happen or look at the perishable insights or the moments in time that our security breaches, be it fraud, at a credit card transaction, right? Being able to do that at mass scale has always been a challenge because systems in the back end, it's just too difficult to scale. You can't, A, you can't throw enough hardware at it or you couldn't throw the software at it to actually deal with this. But then as the problem kind of progressed, it just had its own problems, right? And so I think what's happening with Hadoop is people are starting to look at the way that they've done things in the past and said, let's do them better. And so when I look at things like platform about simplifying that front end and being able to investigate data or interrogate it, I guess, for the security side, that's really interesting to me. And so, like I said, I think there's two sides. We could talk long about security inside, Jeff, but I think that's kind of the security side. Is that what you guys are seeing too, yeah? Yeah, I mean, the kind of canonical problem for security analysts is that 95% of your network traffic is normal and appropriate, 5% is really dangerous. It's really hard to find that 5%, right? So what we enable with Hadoop underneath us is to extend the timeline past the sort of traditional 30-day boundary that you see with most security appliances, right? So most of those security appliances that are deployed on the enterprise network have MySQL or Oracle as a backing store. And that takes you out to about 30 days. After that, you're kind of on your own, right? So to really kind of get the big picture on where that dangerous 5% might be, you need to take data from all those different devices, ingested into Hadoop, and now you have sort of a multi-structured data set and a much bigger picture that goes back much further in time. So it's much easier to figure out what a traditional pattern is and whether or not it's, quote, normal, because that's the same pattern that's been happening over the last six months. Or it's abnormal, and all of a sudden you've got an endpoint on your network that's sending sensitive data out to China. That's never done that before, okay? That's an area that you need to pay immediate attention to, and that's really what HDP enables for us, is the ability to get a much longer period of time because you have very cost-effective storage platform. Right, and the ability, so now you don't have to sample, you don't have to just interrogate a very small set of data where you can miss those anomalies, which in security is exactly what you're looking for. Exactly, yeah. We all know that hackers have been very, very sophisticated around their ability to get in underneath the wires and stay effectively dormant for six months and very slowly pick up very sensitive data. So if you've got the big picture and you can get down to the individual record level, you can actually detect those things much more easily than if you're trying to sample the data or you're working off of just aggregate data. If you want to find a needle in a haystack, you have to have a haystack. That's right. It's really as simple as that. I mean, really, that's it. You have to have the haystack, and we simply have not been able to capture the haystack. I mean, this stuff just ends up on the data center floor for years. And so all that is gone. Just understanding IP traffic in itself is great. Taking ingesting data from every laptop in the organization, corporate data, you simply weren't able to do that before. And now we can do that, which is really over. It's awesome. So let's turn back to enterprise adoption. So one of the challenges we hear from practitioners all the time with Hadoop can be complex. You've still got to put a lot of different pieces together. We're starting to see that solidify, things like the announcement yesterday around the open data platform. Really the goal there is to solidify the core to make it more consumer-friendly enterprise. In the case of partnerships, bringing together more of a, if not a one-stop shop, one-stop solution, a much more cohesive product, essentially, that an organization can bring into their data center. From a platform's perspective, how do you look at this market? How do you look at this technology? Does it require a platform play? Or can this idea of cobbling together best of breed solutions, is that sustainable? So it really depends on the physical geography, literally where you're at. We see adoption of best of breed approaches accelerating much more quickly on the coasts, the United States, and in certain territories and EMEA in Japan as an example. Other areas, if you look towards the later adopters geographically, they can really get a lot more leverage out of a pre-integrated platform. So they're able to not have to spend as much time in the weeds trying to figure out how everything works, and that's a big issue for them because they can't find the talent typically in those areas. So there's definitely a huge potential for having kind of a common platform that's pre-integrated. And so I completely agree with you. What we're seeing is certainly on the Global 1000, the really big enterprises are taking steps with the Duke. You've got the kind of foreign data-driven startups that are kind of built in their DNA. But it's that fat middle around all the other enterprises out there that are starting to think about it. But we saw this in the data warehouse space, for example, around consolidation, around the appliance model, kind of making it easier to consume. So Jim, talk a little bit about that approach. Do you see that playing out in this market? And how does ODP, is that part of that? Yeah, I mean, absolutely. So ODP in itself and the way that this kind of market is now progressing, we've reached a really interesting point in time with the Duke. I mean, for years it was the question was, do I roll my own or do I use a distribution? And God forbid you rolled your own, because think about Duke distribution comprises, depending on who it is, 11, 13 different projects, maybe 15 sometimes, you know, whatever that is, that's really complicated. That's not easy to pull together. And actually, I mean, that's why there's distribution companies. If it was really easy to create a distribution, there'd be a lot more than a few of us out there kind of doing the distribution game, right? And so that's not a simple task to take on. And so if you take that a step back and say, okay, great, we've solved that, we have distribution companies. It's kind of its own standard that people can now build on top of. I think this market, you and I spoke about this a little bit earlier. We've moved from kind of the build to the scale phase for Hadoop. I mean, we see people out there saying things like, Hadoop is no longer kind of an option. It's kind of a key piece of the overall data architecture. And so as that happens, in order for this, for everybody to kind of scale and move forward, there does need to be some sort of kind of a core, if you will, that people like Platform can build on top of. So they don't have to build five different versions of their platform that's gonna interact with five different distributions or whatever that's going to be. And so really this is an effort to kind of push forward in that direction to help our ecosystem of partners kind of move forward in a much more simpler way. And so that's the way I look at it. It's not about slowing. I mean, innovation is still gonna happen. Are you crazy? The innovation that's happening in and around Hadoop and the various different projects, the astonishing, awesome, right? That production of code, the upstream production of code is, that's not gonna slow down. In fact, that's accelerated and I'll say, because more organizations are becoming involved within the general Hadoop community. And that's one of the roles that we help play within this whole game. It's about consumption. And how do I ease consumption of Hadoop within the ecosystem of applications that are gonna live in and on top of Hadoop? I mean, the other conversation I had earlier with you, Rob, was go walk around the show floor. I'm gonna tell you, there's a lot less focus on the distributions and a lot of focus on the platforms of the world and the applications that are sitting on top of Hadoop. That's just where we're at. That's just kind of where we're at in this marketplace. And I'm excited to see it as being in this game for quite some time. You've been in this for quite some time, Jeff. It's a real exciting, this is an exciting time for I think the overall Hadoop market. No, I agree. I think we're, like I said, I think we're moving from big data, one data, whatever you want to call it, where it was, I think a lot of the use cases were focused on some of those cost savings from my team perspective. I'm gonna save some money on my data warehouse because I'm gonna move some of that data over to Hadoop and I'm gonna maybe just store it there long-term archiving but I'm not doing much else with it. To, okay, how am I gonna actually drive revenue? How am I gonna actually maybe cost savings but in a bigger picture way? Think about some of the things that you see GE doing with predictive maintenance on industrial equipment, that kind of thing. So I think it's, I agree with you. I think it's a pretty exciting time. We're starting to make that move from kind of 1.0 to 2.0. It's a little bit an overuse phrase but I think that's kind of where we are, right? Hadoop 2.0 was really the beginning of this change, right? And so, you know, turning on Yarn, turning on Hadoop to be kind of this multi-tenant platform, you know, shared security governance and operations, multiple different applications really turned on a lot more things than just say one serial application living on Hadoop. And I think it actually makes even more important for you guys because now we have lots of different types of data. How do I do discovery across not just one set of data customer but, oh my gosh, you are feeding in, you know, 15 different sets of data here. You know, how do I make sense of that as a business person? And that's where I think, you know, these type of tools become extremely important. That was the beginning of the shift is all I kind of wanted to add but I think it's really, really imperative to understand too. Yeah, I completely agree. Now the interesting thing about this market is you're seeing the, you know, the industry heavyweights are getting involved. A little bit more in depth. So whether it's IBMs of the world or EMC slash Pivotal, et cetera. And they're moving up the stack, they're doing some of the analytic work, the database work. So my question for you, Rob, from Platforms for Special, what is it like out there as a startup here? You're innovating, you're trying to disrupt this market but now you're competing against and you're hearing this lot of noise coming from some of the big players. How do you approach that? How do you fight through that noise and gain mindshare and market share? Yeah, so it's kind of interesting. You see a lot of the taglines literally being identically copied from vendor to vendor especially in the BI market, right? Everybody claims that they do self-service analytics on Hadoop, right? So I think the real challenge is to go back and say, well look, how long does it take you to actually get to insight? If you're a business end user, you're probably really frustrated, especially if you are understanding and have been using these traditional business analytics tools for years because they haven't really changed that much in the last 20 years, right? So your ability to get to insight quickly and to figure out which questions to ask quickly has been frustrating historically for a very long time. So we really kind of focus on enabling that insight and we talk about how our customers are seeing some very significant things that they didn't expect to see when they started going down the road with Platformer, right? So the great thing about Hadoop and having a schema on re kind of approach is that you can put all your data in there. It doesn't matter how it's structured. And solutions like Platformer can help you find insight and discover things you didn't even know about your business. So we have amazing great stories that I tell all the time around how people are getting to the bottom of what is preventing them from really adopting more customers around specific use cases and they're discovering things like, hey, if I'm operating a web conferencing service, it looks like it's actually poor audio quality that's driving most of my customers to tear out their hair. And I didn't even know that before because I didn't have the data around the poor audio quality to begin with. So being able to sort of get to that insight quickly and then once you discover that, you know which direction you want to go to and kind of migrate through all the data in Hadoop environment very quickly, that's a new thing. And a sort of traditional BI approach is only going to treat Hadoop as a bolt-on, which means that that ability to get to insight very quickly isn't really changing from where it was 20 years ago. Well, I think you hit on two important things, one, time to insight, it's critical, and two, kind of that lack of a better term, self-service capability where you can empower a business user to go ahead and do this without having to go through the nine, 12 month cycle with their IT department and building out the data warehouse and modeling the data, that kind of thing. That is what's, I think, frustrating a lot of users and creating a lot of interest in what's happening in the Hadoop ecosystem, what platform is doing and others. So we got to wrap up, I guess one last question would be, what's next on the agenda for the Hortonworks and platform relationship? Looking forward over the next year, how do you see this kind of involvement? So I think we both see a lot of tremendous potential around Spark as enabling technology and we're using it in different ways, right? I think I'll let Jim speak to what Spark does in terms of enabling ingest and other sorts of technologies around Hadoop. What we're doing with Spark is we're leveraging it to enable much more complex transformation work and we're enabling a whole exposed interface called platform extensions, which allows end users to create very customized transformations for very complex data sets and also do things like machine learning and graph applications, which are traditionally been the province of predictive analytic sources solutions. So Spark is a great enabling technology for us and we're actually building that technology into our product. We've been doing it for the last couple of releases and we should see full formation around the part of this year. And you know, from a relationship point of view, thank you for going right at technology because yeah, there's a go to market and we're gonna figure out how we go out and sell and talk to customers. We're great at that, right? We both understand data, we understand schema, we understand the time to insight. That's why I think people are choosing companies like platform and like ourselves because we're nimble, we get it. We get this new world. That's all great. Our partnerships are led by engineering relationships. That's what we do at Hortonworks. We're gonna work with our partners to make sure that Spark is a key piece of what's gonna happen in platform yet still Spark is gonna work on yarn so that I can deliver on the promise of a data lake where I have all my data and access it multiple different ways. No matter what the engine needs to be because, again, people are gonna use Hive at the same time on that data as well. So, you know, our role is to make sure that all these things do work together and then have these deep partnerships, be it platform or any one of our other partners that you talk to, any one of them. It's an engineering relationship as well. And so I think it's a unique approach to partnerships. You know, where our breadth is wide as well, you know, we take pride in making sure that the tech is going to work together. It's gonna be simple and it's gonna really fuel the use cases that their customers are gonna be because ultimately we're a platform. Ultimately we're a platform and for us to foster adoption of where we wanna be, Jeff, that's what's critical for us. So I think it's a, you know, rinse and repeat for every one of our partners and this is a great one because they're right there in the middle of the space with us. Well, some interesting things happening. So guys, Jim from Hortonworks, Rob Rosen for Platformer, guys. Thanks for joining us on theCUBE. Appreciate it. Thanks for watching and we will be right back after this short break. Cool.