 Live from San Jose in the heart of Silicon Valley. It's theCUBE, covering Big Data SV 2016. Now your host, John Furrier and George Gilbert. Hey, welcome back everyone. We are here live in Silicon Valley for Big Data SV, Big Data Week, Strata Hadoop. This is theCUBE's Silicon Angles flagship program. We go out to the events and extract the signals from the noise. I'm John Furrier with my co-host, George Gilbert. We have two great guests here. Matt Morgan, VP of Product and Alliance Marketing at Hortonworks and Anna Young, who's the Director of Product Marketing, Emerging Product at Hortonworks, a new group bringing all the new, cool stuff. Welcome to theCUBE. Great to see you guys again. It's good to see you too, John. Okay, so I got to get into the meat and the bone here with Hortonworks. As you guys obviously, this is the Hadoop show. And of course we'll be at Hadoop Summit in Dublin with theCUBE. So, tons and tons of Hadoop action going on. But also the ecosystem, Matt, is really booming. You're starting to see the ecosystem build around Hadoop where there's a lot of stuff going on around Hadoop, right? And in real time, we see Spark certainly capturing everyone's attention. You're seeing the role of data and developers again, front and center and the cloud kind of underneath. You guys are the center of that with the leading distribution of Hadoop. So talk about some of the dynamics in the industry real quick around the relationship between Hadoop and the ecosystem. Okay, that's great. Great entry to this conversation, John. We are looking at a pretty significant shift. If you were to back up the clock about 18 months ago, there were pilot projects that were breaking out all over the Fortune 500. The type of projects that we were implementing were point solutions for individual Fortune 500 companies that were just trying to get comfortable with the enterprise capabilities around Hadoop. I'm talking about security, ops and governance. They wanted to check those boxes. They understood the capabilities, but they weren't gonna bring in a rogue system into their operation and create havoc. Today, we have moved so far past that. We're moving beyond pilot implementations. There's almost no pilot implementations now if I look across the Fortune 500. We were talking second and third stage implementations. So you take organizations that are in manufacturing, organizations that are in energy, organizations that are in retail. They're now building what we call modern data apps, which are not just analytic apps, but different types of applications we've seen in the past that add business value by doing things that frankly we never saw possible just five years ago. And the application developers always wanted, the data's at the center of the VIPARP, so I'm just gonna get the emerging products, conversation, I have a specific question on that, but really the data layer is now the glue, it's the new middleware. That's what we're hearing and seeing, I mean that's kind of our anecdotal kind of positioning of it, but it goes beyond that. You guys have had a data platform for a while. You have a new approach, I wanna get into that, but really the key that we're seeing is this unlocking the data value and that comes from decoupling. So this notion of decoupling from different components that may or may not be in the enterprise. So versus having something associated with it, I have to have this for that, all these contingencies you see are in a loop of requirements versus just I have the data, I have to make it frictionless. That's what customers are looking for. Is that something that you guys see and agree with? How do you guys look at this data layer as a glue, this middleware, and what's Hortonworks doing there? Okay, so let's talk a little bit about the different types of buyers. They all have different criteria, right? So we'll talk specifically around the line of business owners. So we see increasingly the line of businesses coming to the table with the dream of having these different applications adding this enormous value. They do not want to be, they don't want to discuss the complexity of it. They just want to have the application delivered, right? So when it goes back to the question of, what type of input are we having in terms of building out that type of platform? We are being driven by the line of business to be able to abstract the complexity and deliver a complete solution, right? So you see that in the new platform that we announced on March 1st, this connected data platforms concept, where we can say, listen, there's a lot of complexity and plumbing that goes into building an end to end data architecture. We want to make that super simple because we want to give you a single platform that can reach all the way from the jagged edge, all the way to the data lake for data that's in motion and data that's addressed. And we want to give this as a platform for your developers to, again, build these unbelievable apps that can add that business value because we see it today that every business is in fact a data business. There's not a single company today that can actually go through and close their books, plan their next quarter strategy, process their sales pipeline without really the use of big data. And I want to get your thoughts on this as an emerging product person because the IoT out there, you have all kinds of new use cases that are now hitting the table that are throwing off data. And it's a new kind of data in flight you mentioned. So you have systems of record stuff. You put stuff into the databases which has its own kind of opportunity. But now the new class of data is stuff that's moving around. It could be trickling in off IoT devices. It could be wearables. It could be something else. It could be machines. A variety of very rapidly flowing, flying data flight. I love that word. So data in flight. That's a big thing now. So how do you guys fit that into the architecture because you want to connect it but you also don't want to make it more complicated for the other data? Definitely. It's not more complicated but I think you want to make it more frictionless like you were saying. So definitely you want to make that data in motion more frictionless as it's coming in and that's part of our announcement that we put out today about the log analytics optimization. A lot of data in motion that's coming in from the edge is related to data that it's coming in from machine data. All sorts of machine data. If you just think of consider web servers, there's classic machine data log files and things like that. But there's also things like your wearables, your Apple Watch, things like these remote sensors, your IoT that you're talking about. All this data is in motion but moving it all back to a centralized, say data lake is not necessarily efficient and not necessarily very smart. So you want to move it all in and you want to take what you care about. So this is really the concept of edge analytics. The edge analytics, data in motion, you manage it when it's point of being perishable, this whole concept of perishable data at the edge. You decide on the point of receiving it, do I want this or do I not? Do I send it, do I not? Do I pay the transport costs to move at all? Because moving at all, it is a deluge and you don't want to be moving terabytes of data from the edge of your IoT network into the center. So that's definitely part of what we're actually saying today actually is an announcement of optimizing the analytics from the edge of the network. What is the specific announcement? What's the hard news on the announcement? Yeah, specifically for HDF, which is Hortonworks Dataflow release 1.2, which is coming out tomorrow actually. There's an integration specifically with certain log analytics systems such as Splunk, Sumo Logic, Graylog, different log analytics systems that are very popular. A lot of the data that's being moved is very complex. It is very difficult to capture all of it. And we would Hadoop and HDF together, HDP and HDF together allow you to capture all of that in a very cost effective, very efficient way. In a very, with using content-based routing, which is a very intelligent edge-based routing. Let me drill into that a little bit because sort of Splunk made its name with machine data, but it was primarily in the data center. And you're moving this sort of collection capability in analytics, sounds like closer to the edge with connections to Splunk and similar products. So where is the data originating? Like what's, is it Splunk software? Now running on intelligent sort of devices or sort of branch, I don't know if I'd say data centers, and your NIFI product, the sort of data in motion product, is that what's bringing selectively this data back to the center? Correct. So there's definitely an aspect of moving data from a traditional source like a web server or say a network device of some kind. And where a Splunk product would sit, we would help aggregate that data and help transform or even kind of filter that data for what's really necessary before sending it on. There's also the aspect of Apache NIFI because it is a very small footprint and can run on like a JVM and available device, could go out to smaller devices that say an existing log analytics system would not support. So it does open your world technically to the IoT side of things. So if someone's trying to get a mental picture of how this works, I mean in the past, the whole idea of machine data really existed just in the data center. So it's pretty much up in your company's cloud, virtual cloud or in the public cloud. So help us draw topology of at the very, very edge where you say NIFI might be running on JVM, then who's collecting and refining and filtering some of that data before it gets passed along up to the public cloud? Yeah, actually the Apache NIFI aspect of Hortonworks Dataflow, it's one of the components of Hortonworks Dataflow would be providing that collection capability and that aggregation as well as the content-based, I could almost call it content-based filtering or content-based routing. So it intelligently would know. So the classic example of an IoT type thing is I have a keep alive a signal and I have an I'm on fire signal and you can prioritize between the two which one you're gonna send. If your keep alive is earlier than your I'm on fire signal five minutes later, don't send the other stuff. It's not really worthwhile. So definitely there's an intelligence at the edge that you need. So I love the value proposition. You kind of have two separate components. How do they come together and what are the use cases, Matt? I mean, because you're talking about putting this in the practice. What are some of the use cases? I know you have a prop here. Yeah. Let's go look at the data you have. So what is this, this is a use case, right? Yeah, so let's talk a little bit about the two classifications of use cases. So as Anna articulated, there's a drive around innovation, but there's a much cleaner, more easy to consume drive in the early adoption scale around renovation. Renovation's pretty simple. You're taking architectures that are proprietary in nature that are high cost, that only speak structure data and you're replacing them with open solutions that speak all types of data. You're bringing your costs down by 90% and as a result, organizations that just embrace that are able to actually afford to keep on online active data repositories north of 500 times larger than they can today. But what's really exciting about the renovation side is once you have that platform in place, you really can start to bring online these use cases that I think you were referring to. The concept that we like to say is this is a tie between renovation and innovation. Innovation allows you to seize the art of the possible. So great conversation around innovation usually falls in three buckets. Data discovery, which is about exploring all of this data whether it's perishable insights from data in motion or it's historical insights from rich data at rest. You have single view use cases which is about creating a 360 degree view of an entity so you can make smarter decisions. A great example of a single view use case is Mercy Hospital. Mercy Hospital had three large epic implementations. None of it spoke with each other. They couldn't capture the information. They couldn't correlate it. So the finance team had no opportunity to see exactly what the operations team was doing and vice versa. So different dashboards basically. Different dashboards. So this is just one example. This is a massive scale example but they now have the ability to bring all of that information together so when they have a patient they can have better healthcare, they can have better context sensitive records and they can have better conversations on the finance like all of it adds value. I have retailers with the same conversation. Same person walks into the retail store. They wanna know if that person was on the website yesterday and what did they look at so they could have better offers. Single view use cases is super simple, easy to capture. Predictive analytics. This is when you start to really have fun with big data. A good IOT use case for predictive analytics is progressive. Progressive has the snapshot, plugs into your car. It's an opt-in service. They have the ability to basically create dynamic actuary tables where they can offer you better discounts and this opt-in service is now a 2.6 plus billion dollar business. And you have reference accounts on all of these three things. It sounds like too good to be true. Yeah, these are amazing. But I think the net net about the type of use cases we're talking about here is they are being manifested by a different type of software application. What we say internally at Hortonworks is what happens after software eats the world? Well, data comes along and data becomes what defines things. We used to say software defined networks, software defined storage. We're now moving to this data defined layer where the data defined world is manifested by these modern data applications. And Hortonworks point of view is the data in motion side and the data at rest side come together to facilitate this. And when you do both together, you really gain a different kind of intelligence. We call it actionable intelligence where you can make decisions on actions that are in flight or rich historical information. That'll give you that context to be sure about the decision. So I just put the board. I wanna ask a question on the board if you can pull the back up again. So the innovation I get, that's the R of the Possible. You have use cases on that. That's, you know, you're selling the dream but you're actually implementing it now with use cases. Data discovery, single view, predictive analytics. That's whatever one wants to get to. I totally understand the degree of that. Renovation now, I'm trying to understand the renovation side. Are you saying that that's the impact of the innovation? So in other words, the renovation is what has to get done. I have a better prop for that. So, okay. Let me switch to this one. Well, it had 90% reduction in cost. Oh, you like that one. I like that one. I got my attention. Well, let me finish on this prop and then I'll switch. Yeah, 90% reduction in cost. That's an impact of the renovation. So I'm gonna renovate and actually lower my cost. That's like ending in addition on your house of like half the price. Well, you know, that's a great point, right? If you look across technology, usually the big impact tech has always cost more than the legacy systems because it added more value. This is one of these unique cases where it's actually a lower cost scale and a lower cost price table. When you're dealing with open solutions you're able to add enormous value. And the technology behind Hadoop has created a Yahoo catalog the contents of the internet. It was designed to be in scale. So yeah, it doesn't make sense. The traditional technology vendors usually charge more for the increase in data. So you have 500x increase in data. Yes. 100% online available and then all the data types are available. Yeah, so let me hit the renovation part again, right? We say renovation. We're talking about looking at your current data architectures, the legacy data architectures, rethinking those, driving in new capacities to store unstructured data, driving new capacities to manage data in motion, driving new capacities to have all of this online all the time and to bring your cost curve way, way, way, way down. In fact, we see it all the time. So look at Truecar, right? They were able to take their costs down by north of 90% by simply replacing a structured system with high license and scale plus proprietary hardware with a commodity hardware approach using Hadoop. They were able to bring that cost down dramatically. As a result of that, they keep more data online because the costs aren't there. You see organizations that embrace Hadoop that can literally have 500 times as much data online and available. We call this active archiving, where people used to turn to tape backups as the only economical way to store data. They can now keep it online and available. That's the type of capabilities you get. We have a few more minutes. I want to get that other prop here you open that up. Make sure you get that in there. So what is this? All right, so let's talk a little bit about this. This is something we call the placemat. The placemat helps have a conversation with our customers around the different use cases across both innovation and renovation. Innovations across the top line, these are the three use cases we just talked about. Data discovery, the single view, the predictive analytics. That's the progressive. That's the mercy. That's the retailer case studies. The bottom line is renovation. So these are the use cases that an IT person would love to talk about. These are use cases that they simply can't deliver to using a legacy approach. So this is taking the active archive strategy we just talked about, keeping data online at a much lower cost. This is about taking ETL workloads and transitioning them all into Hadoop. This is about taking specific EDW architectures they may have and optimizing them to be able to store unstructured data and having that as part of their analytics. These are the type of conversations that we're having. We created this because it was a little abstract, John. The people didn't know where to start. In our conversation, as you can start anywhere on this line, but we actually create a nice scale of complexity across the top, how many more people do you have to get on board to bring this online? So anyway, it's a really popular view towards use cases. Following up on that, when you separate on renovate as being appealing to the IT oriented user and innovate maybe more in the line of business, is one more repeatable than the other? Is it easier to package up and more frictionlessly deliver? Like ETL offload, it sounds like it's something that you can explain and stamp out to every customer, except that their data warehouses are going to be different. Whereas on the innovate side, predictive analytics, you can build a model with maybe a recommender except that all the inputs are different. So where do each of these fall on the repeatability and ease of implementation for the two types of customers? OK, it's a great question. Across the top, we talked about the innovation line. If you remember, I mentioned earlier that we're seeing the breakout of these modern data apps. Now these can be custom built by the end customer or off the shelf. The reason I articulate that is that's the barrier to value here. Right, and that's my question. Modern data app that is built either custom or off the shelf that can add this type of value. There's more effort in that because there's an app that either has to be created or there's an app that has to be brought online by a third party. Across the bottom, there's none of that complexity. You're taking architectures that exist, that you're spending money on, and you are either adding these open solutions, open and connected data platforms, or you're replacing all together. Typically on the ETL conversation, people are augmenting an enterprise data warehouse so they get their processes underway. I mean, if you think about it, the tech is only one area of investment. There's internal processes. There's internal comfort. There's internal people that have been trained. This is an area that works its way through in a matter of normal time course. I'd love to get a copy of it on this place. It's really good to have. It looks well done. Congratulations on that. That's a good layout because it's complicated. It lays it out. Love the board. Love the reduction in costs. Thanks for sharing the input. And I want to get to your final comment here. What's going on with the emerging products? What kind of stuff comes out of your group that we can expect to see? What cool stuff's coming? Obviously, the flight stuff is you got that now. But what else is coming around the corner? I guess I'll tell you that it's the integration of the data in motion and the data at rest of the concept of a modern data app, so specific modern data apps for different purposes. Every company you can think, if you think about these, if I'm a, sorry, I'm going to hold this up. If you're a payment tracking, if you have payment tracking information at data at rest versus payment tracking information of both data at rest and data in motion together, as Matt is saying, you can open a whole new world of possibilities. So the emerging markets is about combining the two worlds. So any concept here, if you look at here, predictive analytics, especially if you want to know what's going on in the future. If you want to plan for it, you want to know what's happening. But you want to know what's in real time. You don't know what happened three months ago. That doesn't help you plan for the next 10 minutes. You guys are like a dedicated team. Just kind of keep an eye on what's popping up, what's growing out of the ecosystem, what use cases that might want to jump on and double down on. Is that kind of the concept? Yeah, definitely. We're looking at the far right edge of the transformational type of modern data apps, because that's where the strongest value is. And in some ways, the simplicity that you need of a modern data app, because we take out the complexity providing a modern data app for everything that encompasses everything you need at once. So we make it easier for you. So definitely, it's making the complex simple. And that's what the emerging products currently is looking at. All right, Matt, thanks so much. Hortonworks here, making it easier. One step at a time. Hadoop World, Strata Hadoop. This is theCUBE, live here in Silicon Valley. We'll be back with more after this short break.