 from San Jose in the heart of Silicon Valley. It's theCUBE covering Big Data SV 2016. Now your host, John Purrier and George Gilbert. Okay, welcome back everyone. We are here live in Silicon Valley for day three of wall-to-wall coverage of Big Data Week, which comprises of two events, Big Data SV, our event, the theCUBE event, and also Strata Hadoop, the big tent event across the street, and we are here live, theCUBE extracting the signal from the noise. I'm John Purrier with my co-host, George Gilbert, analyst at wikibond.com on Big Data. Our next guest is Scott Nowes, the CTO of OrtonWorks, and we're gonna be in Dublin in two weeks for Hadoop Summit. Scott, welcome back to theCUBE. Good to see you. Good to be here, thanks. So, you guys in Cloudera, obviously the pioneers, and you got MapR in the mix too, the big three distributions. And the center, Hadoop World, now called Strata Hadoop, and of course we call it Big Data SV. The ecosystem is certainly maturing in a big way. We've seen the big theme here about the ecosystem around Hadoop and how the expansion of analytics and with Cloud and with Spark, the pressure to have solutions, real applications with machine learning underneath and tooling and apps are still the focus, and that's still a priority for everybody, and the conversation around Hadoop is, Hadoop is relevant, and but some people are saying Hadoop is not as big given all the other things going on, and that Spark has taken the lead in the front end, and you got the cloud underneath. How do you respond to that? Because no one's saying Hadoop is dead, okay? They're saying Hadoop is gonna be a piece of a bigger ecosystem, and that seems to be some of the stories there. I know you guys have now an emerging products group that are doing some very agile development, and you have your core data platform. The good news is Yarn's still in the center of all the action. How do you guys make sense of this? Because you gotta be faster, you gotta adapt with the ecosystem, the pressure for digital transformation. What's your take on this narrative that's kind of, there's some noise in there, what's the reality, what's the real story going on? The way I see it is that we've really reached a tipping point as it relates to Hadoop and the ecosystem, and that has a couple of implications. First, I think with the expanding volume and variety of data, and I actually call it variety square because you think about all of the new IOT use cases that are out there, and mobile use cases where not only is the data variable, but it's variable variable because end user, you and I can change the data, and the consumer of the data and the applications that consume the data aren't ever gonna know about that. So requiring just a whole new way of looking at tool sets, a whole new way of developing those tool sets and deploying them, so it's really gotten to a point where it's more about a platform conversation than it is about this tool or that tool or the other tool. I think when you combine the- But there are a lot of tools. And there are a lot of tools, so there are a couple of important things that go on with respect to that. I think to be successful, number one, it's really important to be able to bring analytic tools to the data and not have to move the data to the tools. When you think about not for money, but just efficiency costs, data movement is very expensive. Bandwidth is cheap, but moving the data, keeping track of where the data went, all that kind of stuff can be very difficult. So being able to collect and land the data easily and bring different analytic engines together for those analyses is kind of where it's at. It's become a platform. And I think in many analytic environments, what I'm seeing and hearing in the industry, certainly at the show today, we've moved from the what the heck is Hadoop to, gee, do I need a Hadoop? To where does Hadoop fit? To Hadoop's kind of the center of my analytics universe. And it's got that critical mass at this point where there's just so much data and there's so much happening there that it's become a platform. That progression you mentioned, I was certainly over the course of 10 years and the past five in particular has accelerated that kind of life cycle of what Hadoop has evolved. But I want to take that point about Hadoop being essential because this has come up. There are significantly big costs involved in managing, say, a data warehouse or some other model-based platform for stuff that you guys call perishable data. So for instance, if I have IoT data coming in, I might not want to have to have a big data warehouse just to deal with stuff I need to refine through with Hadoop. So in that case, Hadoop is critical. It's actually the de facto standard and requirement. When you think about the process of how you need to consume those perishable insights, you don't have time to go through a traditional process and use traditional tools. Traditional process and tools is find a data source, define the requirements, define some ETL, go land the data, then build an application. So you go through that cycle, you've missed the opportunity, right? Plus, since the data are variable and changing constantly, that would invalidate all of the first three steps of that process. So you need to think about things backwards, land the data, use the data to actually determine the requirements, find those requirements, and then build the analytics from it. So the narrative you was talking about, Hadoop, so just kind of put that just to close that loop there is and kind of hear what you're saying and kind of tying it together is the platforms are hard, but the audience out there is getting confused between what a tool is and a platform. So if you got a zillion tools, it depends on what your view of the elephant is in the middle of the room, which is I'm a tool guy and I think I'm a platform, of course I may think something differently, but ultimately the zillion tools, that's a marketplace. The platform underneath powers the tool. In some cases, people might have ML libraries underneath and whatnot, that's not really a platform, that's just a little bit of a platform, but it still has to tie into a broader platform. Is that right? Absolutely, and when you think about that center of gravity shift where this Hadoop ecosystem, this Hadoop platform, and I'm talking, when I talk about the Hadoop platform, I'm really talking about data in motion and data at rest, which is of course how we positioned Orton Works last fall, there's an expectation and a requirement for common and consistent security, common and consistent data governance, common and consistent workload management, operations management, all of those things that come with that actually make the whole thing valuable and sustainable in the long term, into which then you've got that platform, can I plug in those new agile, closely developed tools and have them just perform and behave as part of the platform and inherit all of the security and governance and operational aspects. So let's talk about that because I think this is an interesting addition to the Horten Works plan, which is following Horten Works for many, many years since you started, you see, I've seen the evolution. When you talk about data in flight that really speaks to a whole nother value purpose and you have a plot prop here, I noticed, but I think and I see the IoT word there. You talk about really IoT or an edge type device, it could be sensor data, humans, what not, really high volume data in motion, data in flight. That's a different paradigm than traditional. Yeah, I think there are two things that, the first is it's a new paradigm as it relates to how do I manage this IoT opportunity, right? It creates some new opportunities that get solved and this is why we got into the data in motion business with the HDF. And the notion is, first off, that you've got sensors and devices outside of your firewall. So you've got this jagged edge of security and a different kind of perimeter to go manage. That's an entirely new problem or opportunity. The second thing is in traditional, even traditional streaming, which is data in motion, but kind of traditional streaming or ETL kinds of technologies, it tends to be a bunch of data moving from one place to another place, right? In an IoT world, it could be point to point, bi-directional, it could be sensors talking to each other and being able to manage that kind of a data flow is a significantly new issue. And as part of that, managing the bandwidth in collecting that data, because it's no longer fiber optics that you bought from your data center to your branch, you may be going over cellular networks or satellite networks that have different costs and different reliabilities, being able to hold all that together. And then finally, after you've secured it and you've managed the workflow, being able to guarantee that chain of custody or the provenance of the data so that you know that when you're taking action on that information that you can trace it back and that it's real and valid. So you wrap that together and that's a platform opportunity as well, which has become HDF. We call combining that with the traditional Hadoop platform, HDP kind of connected data platforms. Now, why is that important? It's important because it really enables a new class of modern data applications that we think are gonna be game changing for business because if you think about data at rest and the analytics created from data at rest, really great. I can build a model, I can understand what happened, I can use my understanding of what happened to build models to try to predict what will happen and make decisions of my business. Can I though, in real time apply those same models to data in motion as they're streaming through this network of communicating devices? And the answer is yes, and I can apply those models in real time to actually take action in real time. Keying off that. So you build the models with historical and rich context, then you apply them to data in motion, meaning the real time stuff, the perishable insights. But you wanna drive a decision. How do you do that? Where does that capability come from? Is that a traditional transactional database or what is that decision? What's the engine of it? The way I would look at it is it can be any kind of application that application can be driven from, sure it could be driven from an EW, it could be driven from an application that exists on my mobile device anywhere in between. I think it's really important to think about the world as containerized applications that can interact with these data streams. And so when I talk about modern data application, there are some architectural considerations, but there are also some architectural dependencies and certainly from a platform perspective, those dependencies are having access not only to a broad, rich array of data for model building, but the ability then also to execute against data that are streaming. And this is a fundamentally new thing because today when they're separate, you can build the model and then you create a rule and you apply the rules to the streams. But as you apply those rules to the streams, you change the behavior of what customers, what consumers, what they do, which can actually impact the model. So being able to have that real-time interaction is where there's some really big differentiation in combining the platforms. Do you need to own the sort of lifecycle of all those components? If you generate the model from the rich historical view, you apply it against the real-time data that's coming in. But that, as you say, it's sort of like as you apply it and you're gonna change the behavior of, let's say, the people if it's outward-facing. But as the new data land from the new transactions into the platform, those models can continually rebuild in a cycle and cycle back that decision again to the streams, right? So it's like a feedback loop in a thermostat system. Okay, so from an architectural standpoint, what do you guys talk to customers about? Because your customers ultimately what the end game is because you have customers that work with buy Hortonworks support and use the free open source. You have multiple business model constituents, but at the end of the day, there's apps coming out of it, there's something going on with the data. What do you look at as the key decision point for Hortonworks? As you look at the architecture, as you have all this new evolving dynamics, data in motion in flight, and you got the core data platform, and you got the push with Spark, and you got the cloud power underneath. The pressure at the end of the game is coming from digital transformation. People need more apps faster. We're gonna have data engineers, we'll have data scientists, that's still gonna go on and go strong. We see that not stopping for sure, but the pressure to get the digital transformation stuff out the door is what everyone's focused on. Yeah, I think a big thing that we'll lead with, with new prospects is here are the use cases and the business opportunities that we can go solve with this technology stack. And that is really appealing, right? Because to your point, and everybody's seeing it, right? There's all of this data coming from all different directions. It can be very hard to consume, it can be very hard to understand and manage, especially considering the new paradigm of how the data are created and how they vary so dramatically combined with budgetary pressure that everyone has in their business. So we'll open up with, here's how we can help you down this transformation. Oh, by the way, if you look at use case one, use case two, use case three, use case four, they all have some commonality in terms of platform. So here's how we want to look at the broader data architecture so that not only we can go build out the use case that you're thinking about, but we're ready when you're ready to go build out the next three. And we're also ready to future-proof your business in terms of how we see data growing as we enter into the IOAT space. And just, I know George got a question, can you see him getting ready to go in there? But I wanna just bring that point about future-proofing and also being use case-driven. A lot of integration now comes to the tables. Another key conversation that's come up on theCUBE here this year. Again, it's always kind of in, but more than ever you're hearing about integration. Coexistence, retrofit, these are the trending words if I did a tag cloud on theCUBE. It's Hadoop, retrofit, coexistence, integration. That's kind of that means this progression going on in the technology. Because when you get to the integration conversation that means there's real stakes on the table that need to be addressed. Yeah, I mean, when we talk about future-proofing, I think there are a couple of aspects to it, right? One is having the right technology stack and tools and all those kinds of things. But the key thing there, you know, from now for the next five, 10, 15 years, as far as my crystal ball is good, right? I think real future-proofing means that you've got an ecosystem of coexisting and co-resident technologies that are integrated and that are flexible enough so that as new analytics come along or as you change your business model, you can change your mind without ripping it all out and starting over. And that's really about future-proofing. So it's less, you know, where we were in the 80s and 90s of centralized systems. Take a little case and leverage out of it, basically. Yeah, it's not everything's coming into a centralized system anymore, but it's a recognition that there is going to be an ecosystem and success and future-proofing in an ecosystem means seamless integration, seamless operational capability, and the ability, frankly, to change your mind. The ecosystem is a very key word. Again, community drives everything. This is now community-driven. We talk about open community and it's open community as it relates to the Apache Software Foundation and development of software, but also open community as it relates to the ecosystem of partners and how they share and integrate and interact. And the role of corporate enterprises in this ecosystem is just off the charts. I mean, there's record, you know, participation, certainly IBM, you see what they're doing and just it's really, that's the new normal. Yes. So let's come back to the apps because I think everyone agrees that you've helped educate Cladera, MapR, you know, what is this data, new data platform. So, but what seems to be less clear is where are we gonna get the applications from? What's the scope of an app? Is it, you know, telco churn? Is it something perhaps big like cybersecurity? And, you know, that seems a little fuzzy in terms of because we haven't really had broad end-to-end cybersecurity apps. So where do they come from? What do they look like? Who delivers them? I think they can be delivered from anywhere. They can be delivered from any vendor that participates in the ecosystem. So, and even we can help, you know, co-invest at times potentially in helping to drive that application development. Where they're really gonna come from is they're gonna come from business use cases, business pain points and competition in the marketplace. That's what drives everything that we do, right? Analytics are important, especially in a world that's highly competitive where an analytic differentiation, they can maybe change the result by one percentage point where that's significant, right? In highly competitive industries. Certainly, moving the dial in cybersecurity, there's a lot of interest there. There's a lot of community backing behind finding different kinds of solutions. Having the capability to store almost infinite amounts of log data and with HDF having the ability to transmit and transfer and apply models to that. In real time, I think there's an opportunity for that kind of application. So I think you'll see a number of horizontal applications that are applicable across any industry where there's kind of a common problem. And I think we're already starting to see some vertical applications in vertical spaces, whether it be fraud detection in financial services, whether it be fraud detection and prevention in financial services, whether it be integrated healthcare, whether it be highway safety, even autonomous cars. You can think about our ultimately an integrated data application. Let me ask one. So it sounds like some of these are traditional ISVs and maybe some of them are SIs that become ISVs. Some were straddle, but if we have... And even customers themselves. We want to make the tooling easy enough that application development is not constrained to application developers specifically. I think there will be a whole range of different ISVs and folks are creating these apps. Obviously in the show floor, we see a number of folks showing up with some of these things. I think also the goal is to actually make it easy enough that it could be developed pretty much by anyone. Let's talk about now where the platform lives. Because some of our analysts, David Floyer, who's our sort of master of infrastructure, has looked at edge computing and talked about how with IoT we can't bring everything back to the cloud. And there are gonna be things that live at the edge that are occasionally connected. Do we need a data platform for some of these things like as an example might be a submarine or a truck or something that's not got a full high band with connection back to the center, but that's more than just a collection of a small number of sensors? Sure. I mean this is one of the things that we recognize with HDF and the technology we're delivering today is that some processing is gonna exist at the edge, some processing is gonna exist at the core and sometimes even midway along the way there may be tiers, right? I may have multiple sensors that report into an aggregator sensor that reports into another thing and so on. The notion here is that the topologies may be very deep and sometimes very complex. So it's important to build out a tooling that enables that flexibility for deployment and delivery. So it's your point. Simple decisions that can be made at the edge, great. Let me push that logic out to the edge from whatever analytic I've run and let it run there and then manage by exceptions as a way to optimize the bandwidth and the data communication. And maybe after I see a certain amount of summary information coming in from the edge, I look at that and say, wow, it changes needed. Let me go get some more detail which may still be stored at the edge up to whatever capacity is available there. Pull that in, run the analytic and actually send some information back that retools how those sensors actually behave. And that's where I get into the notion of point to point bi-directional communication being extremely important and being able to enable that kind of self-healing kind of network and application. Scott, I want to get your thoughts in the last couple seconds here of the segment is your thoughts on this year and looking forward to this year's show and looking forward to Dublin without kind of revealing all the cards for what you guys got going on in store. Are we seeing enough machine learning? Are we seeing enough tech? Are we seeing more outcomes? Are you happy with what's happening in the industry right now? Things that caused you to pause? Are there things that are popping up that are surprising? What's your general thoughts about the state of the market right now? You know, like I said at the very beginning, I definitely think that we've moved from kind of the origins to the curiosity to the more mainstream to the tipping point. I think that's a very important trend. And while that's happened, I think one of the things that we've collectively been able to do as a community and an industry is actually accelerate the rate of pace of change. And I think that's a first in technology because normally, as human beings, we mature, things slow down. As technology matures, it gets a little bit slower, more robust. I think the amazing thing that we've been able to pull off, which is going to continue to drive this market, is we've actually accelerated the rate of pace of change while doing all of that maturity along the way. Okay, great. A lot of stuff going on. The markets grow and it's big. Peter Burris yesterday presented, the market is just exploding and no one really has that 10% yet. So huge opportunity. You guys are enabling that at OrtonWorks. Thanks for sharing your insight. Looking forward to seeing you in Dublin on the 14th week of the, in two weeks, the Cube will be frothy with Guinness and Hadoop. It'll be drinking the data full in Ireland. Looking forward to being in Ireland for Hadoop Summit. Watch the Cube in Dublin in two weeks. We have more coverage coming up here live in Silicon Valley for day three of Cube coverage of Big Data Week, which is Big Data SV and strategy. We'll be right back.