 From New York, extracting the signal from the noise. It's theCUBE, covering Spark Summit East, brought to you by Spark Summit. Now your hosts, Dave Vellante and George Gilbert. We're here live at Spark Summit East in New York City, midtown Manhattan at the Hilton. The home of really, you know, Hadoop World really started here. And so it kind of feels like we're going back to the geeky days of early days of Hadoop World and Hadoop Summits. But at any rate, it's our pleasure to have Sean Connolly here as the Vice President of Corporate Strategy at Hortonworks. Sean, always a pleasure to see you. George, Dave. Welcome back. Appreciate it. Always a pleasure. So last time we saw you, at least as part of theCUBE, was you were kind enough to participate in a panel that we had at our event, Big Data NYC, which we run concurrent to Strata. You stole the show, you really did a great job, I think, representing the Big Data community. Well, you have a lot of insight and you're articulate and it was good. So give us the update. What's new since that timeframe? A lot's happened. We've gotten through our first fiscal year reporting. So we are proud to bust it through that. We were 122 million in revenue for 2015. So just a little over four years in. We set really strong guidance for 2016. And I know Rob Bearden was quoted on saying, you know, make no bones about it. We're looking to, you know, drive to that adjustity, but a breakeven point Q4 this year. So we actually moved up expectations. So from a business perspective, scaling pretty well, running around 75 to 80 percent software subscriptions versus services. So actually we achieved that point probably a little earlier than we anticipated and that kind of stuff. But that's good. I mean, it just means there's appetite and need for, you know, the software we're doing either directly or indirectly. We're here at Spark Summit. So, you know, I think one of the things, you know, I had about, you know, 10 minutes or so when I was talking, I was really just talking about since I've been an open source for probably about 12 years early days, J Boss and stuff. It's, you know, how do you make that? How do you make really great innovative tech consumable by the enterprise? And I think that was, and why should they care? And that was a bit of the theme of what I covered. Yeah, so talk a little bit more about where we've come. So I said in the early days, it was all about sort of attacking the cost equation, maybe then evolved into coming up with a better data warehouse solution with a lot more data. We killed sampling. Where are we headed now? Where does Spark fit into this, you know, collection of assets that we've now built up over the last decade? Yeah, so like I said, there's a lot of technologies in a broader data architecture. So when I look at the market, I look at it from a platforms perspective, data at rest, and data in motion platforms. You bring them together and you can actually create really innovative new analytic driven apps, right? Smart applications. If you look at, and it's interesting to see where the center of gravity is versus like, you know, if you look at early web, right, you had, you know, Red Hat went public in 99. You had Linux, clearly was on the scene. Apache HTTP server was established in 99. Apache Software Foundation was established there. It was a really strong center of gravity at that point. You look now and Apache is actually serving a very similar role as it relates to, you know, data, right? So Hadoop, Spark, Kafka, a lot of innovation happening there. And, you know, it's moving very fast and how do you actually harness that for, you know, mainstream enterprise, right? For the early majority and late majority adopters. From my perspective, you know, I think it's exciting to see where we're seeing a lot of those types of use cases that aren't just tech enthusiast type, you know, applications. And so I used a few examples. You know, web trends was a really great high scale. One where they're running Spark and Hadoop all in the same sort of shared multi-tenant HTTP platform. But I also used an example of basically a train company, right? So I was like, this isn't just about web monsters, right? It's, you know, the train doctor capturing, you know, video feeds and images of the track so they could detect faults and go back and make sure they're doing maintenance. You know, I was on New Jersey Transit. It was a little bumpy ride, so they probably should have run the train doctor over that track. But they're very much, you know, want to do it from a logistics perspective if it's the train's delivering stuff, but also just from safety, right? And who would think, you know, sort of a, you know, a 100 plus year old business is actually participate in the age of data when they are? So you're actually a good person to ask this question because you see so many different use cases. We've been working on this premise that one of the big problems that companies are trying to solve around big data is the problems of better understanding demand. The demand profile is changing as information becomes disseminated. We all have information about pricing and product quality. We see reviews on Amazon and, you know, as consumers we're really informed and it seems like data is a way for the brands to learn more about us and so they can attack that problem of demand. First of all, is that a reasonable premise that these brands are trying to regain that competitive advantage? Not only against competitors, but I guess in a way against the consumers and so knowledgeable today. Is that a valid sort of premise that that is a big problem that people are trying to attack? Well, it's, you know, that's a classic single view of customer initiative and we see that in spades, right? And one of the points I made in my keynote is geolocation. Like, there's pivot points and center of gravities and or things that will drive innovative new apps, right? So entity resolution analytics is great for single view of customer and things like that. If you have multiple Sean Connolly's have gone in the store, am I the same one so you can rationalize it down the one? But geolocation is another one, right? Geolocation has a GPS location and that train analogy is, you know, the lengths of track are captured and their GPS location is captured and very much analytics around the location of faulty track and can you infer other issues? Maybe there may be other issues on, you know, water, weather or what have you that might be causing faults. That's a really interesting problem to address and that requires a lot of data at rest to do really deep machine learning against but it also is benefits from, you know, analytics against the real time stream and data in motion and that combination of being able to, you know, play the two off of each other in a closed loop application is really where people are trying to go. Your point on where is the market, right? Yeah, it started on renovate, right? Active Archive, ETL on board, and Richard, Data Warehouse, Data Marts. The more interesting applications are really around the data discovery, right? So the web trends example was their web trends explorer product is very much a self-service discovery platform for their end users so they can have pre-built analytics as part of the web trends experience but they have a place to actually get down and dirty with the data and they're monetizing that as a new revenue stream. I find that fascinating, right? Single view, single view of customer, single view of property, right? In insurance industry, what's your risk exposure when weather patterns or global events happen in a particular area? That's a geolocation centric app, if you will. So it's changing how we think about that app space and then clearly predictive analytics, you know, real time and, you know, intelligent is where the nirvana, but there's all these steps along that journey. Right, and that was the nirvana of decision support, right? And I often tell people that one of the problems that big data helped solve was, first of all, it attacked the cost problem with data warehousing and its promise now is we will give you that 360 degree view of the customer of that, you know, customer of one that personalized view so you're optimistic that big data will live up to that promise. Well, I think it's redefining certain application areas and industries like cybersecurity threat intrusion, that's undergoing a significant transformation because, you know, you can't rely on, you know, small sampling or manual steps in the process. It's just there are too many inbound, right? You have to automate that. It has to be based off of machine learning and it has to be self-adjusting models, right? And so that changes how you think about that space. In the single view of customer, the next logical steps after that is how can you use that intelligence to actually optimize your supply chain and your distribution channels, right? And so that's going to wash through and that's going to transform a lot of those traditional industries. So in that cyber example, the innovation seems to come from both improving the models, the self-adjusting models and also the ability to deal with more data. Yep. Right? And those are sometimes like a tug of war, right? If I have to spend all my time getting data, cleaning data, I don't have enough time to improve those algorithms. Yeah. How do you see customers dealing with that challenge? Well, it's funny is so in December, where it works, we actually was ready for the holidays so I'm not sure that many people saw it. We incubated a new project in Apache called Apache Metron. So it's an incubating project. That was the geo... So that is the cyber security framework. And so it makes use of a lot of these technologies. Kafka, NiFi for getting the data. You know, Storm for a lot of the rapid, rapid ingests, like 1.2 million events per second type speeds. Spark definitely for some of the streaming but also the modeling use cases. HBase for immediate delivery of network packets that are captured and so you can actually interrogate the system. And so it's this cyber security acceleration layer, right, that has eventing and dashboarding on top and that Cisco and Wharton workers, probably over the prior 15 months, collaborated in an open source project called Open Sock, System Operations Center. That basically got contributed to Apache as Metron. So that's a canonical sort of new generation application, right? And so it's complicated but it's a lot of these, whether it's connected car use cases, they have very similar patterns on what's required of the data architecture, right? And so there's sort of these green field areas that we're really actively interested in helping to forge ahead to accelerate those. The automobile is a data platform, that's... It's basically an iPad with wheels. That's how I joke, right? It's amazing, right? A question on this. It was Metron. So what are the steps to go from a custom solution to an accelerator to something that's pretty repeatable where in the software world we'd be like 85% software, 15% professional services? What has to happen for that to evolve? Well, it's an emerging framework. There's still a lot of work to be done there. There's definitely participation from broader community of interest. There's a lot of people who want to solve that cyber issue for themselves, right? As they're sort of renovating how they think of SOC, their system, their security concerns. So it's definitely early, but I use that as... And I put it back to Arun, Murphy and some of the work workers in there is that application, I think, is a great architecture. We need to make it 10 to 100x easier to deploy it, right? And so I get excited about the convergence of things like containers and others that will help us wire assemblies together of these types of applications for this new era, right? So there has to be a lot of investment in going... You can enable the art of the possible at scale, petabyte scale, at extreme inbound. How do you make that repeatable, right? How do you make that architecture repeatable? So 2016-17 is, from my perspective, I just want to dial it in and try and make those types of architectures repeatable. But is that... When you talk about containers, that would go across apps. Are you talking about... So you've got the app itself, but the process of sort of deploying it and keeping it up to date is the part that's the hard part now? Yeah, so I mean, if you look at the Metron architecture, and I'm just using that as an example, is there's storm logic, there's Kafka logic, there's H base logic, so it's not app, right? It's a composition of apps, right? It's the modern data applications, right? For, in this case, the context of cybersecurity and threat detection. So there's a variety of apps all running concurrently, right? Some are high speed, others are doing deep dive analytics on a recurring basis, right? And so that's the predictive analytics, that's the journey everybody wants to get to. And if you want to enable that, you have to enable these assemblies of multiple services and application logic to easily be packaged and deployed, right? Not only for commercial off the shelf, right? But for anybody who's building it custom, right? We're seeing both, that's great. I think you basically throw gas on the fire if you can crack that knot of how do you package one of these modern apps, if that makes sense. Yeah, and in terms of the commercial off the shelf apps, I think Mike Olson five years ago said this is going to be the year of the app. And it never happened. Would you agree that most of the activities today is still around custom apps? Is that fair? Actually, I think there's very targeted applications that wouldn't underestimate the act of archive and the data enrichment and data discovery apps. They're tighter in scope on what they accomplished. They actually drive real value and some cases they can unlock new revenue streams. But the ultimate nirvana of a fully closed loop system that can do 10 years of history as well as, you know, give you analytics in the here and now and what's actively streaming. That's a closed loop system, right? And the cool thing is a lot of the technologies are here. Now what we need to do is we need to actually, like I said, drive that 10 to 100x easier factor. Yeah, we kind of heard that in this morning's keynotes. The good news is we've solved a lot of the technical problems. You know, adoption now and the learning curve for developers. And the journey as I like to call it, right? So it's, you know, you can't just go for that brass ring as your first thing. It's like you need to assemble the data and other things. You can actually have apps along the way that drive value for the business. In that web trends example, it was, I showed the journey of their multiple use cases that ultimately led to a lot of these innovative new ones, right? And that's all sourcing out of, you know, a consistent data architecture. Okay, so that's sort of the question or what we're trying to understand, which is the path to packaged apps. And it sounds like it's not a black or white thing. You take progressive steps. The container part sounds like it's on the life cycle, the part of the life cycle where it's deploy and keep up to date. But before that, you still have to get to some sort of, you know, common data model and how everything puts data in and gets it out. Well, and I would say, I guess I would encourage you when you say packaged apps, look at the world of SaaS applications, right? So you have the true cars of the world, the APIs, right? The web trends and others. Are they not packaged applications? Yes, they are. Are they not modern data applications? Yes, they are. They're new types of packaged apps. Exactly. So there's not one way to think about traditional packaged apps. And so there are a lot more than we're giving credit for. Coming from places that wouldn't, not necessarily from the technology industry. No, it's the business. It's the doers, right? The app is the business. The data is the business, right? That's what I said, that's what I mean truly by modern data application, right? Right. And so, you know, let's not look at it from a five to ten year ago world or prior, right? There's innovative, we interact with it on your phone. You're interacting with innovative data apps all the time, right? All right, Sean, we have to leave it there. Thanks very much for coming. I appreciate the time. I'm seeing you as always. Yep. All right, keep right there, everybody. We'll be back. We're live from Manhattan. Right back.