 From San Jose, in the heart of Silicon Valley, it's theCUBE covering Big Data SV 2016. Now your hosts, John Furrier and Peter Burris. Welcome back, and we are here live in Silicon Valley at Big Data SV. It's Big Data Week here in Silicon Valley in San Jose for Strata Hadoop, Big Data SV, and theCUBE. This is our flagship program. We go out to the events and extract the signal from the noise. I'm John Furrier, my co-host. Peter Burris, Head of Research at Wikibon Silicon and Angular Media. Next guest is Wei Wang, Senior Director of Global Marketing at Hortonworks. Welcome back to theCUBE, great to see you. Thank you. So the data platform is the battleground right now in the industry, but it's where the customers are seeing the most value because it's really where the action is in terms of people building real solutions on. And the doers out there and the digital builders are looking at the platforms of data and want the most enabling, most reliable but yet supported platforms because they have apps to run on it. This has been a big part of the Hadoop ecosystem but it's expanding beyond Hadoop now. So I want to get your thoughts on Hortonworks current data platform. You guys are really looking at the IoT thing which we talked about in New York. How's that connected to other data? What's the update on the platform side from Hortonworks? Yeah, it's a great question actually. So as you guys know, the data used to actually double in a century. Now it doubles a couple years. So with the leading industry analysts exactly expecting, the data is going to expand it to 44 Zettabytes. And we, you and I had a conversation back in New York at Strata last time, talk about the acquisition we just had right on Yara. We truly are breaking into the field of we call it data in motion. So looking at the data that is at a flight and with the tools that we have 100% open source of Apache 9.5 project. So now yes, you're right. Hortonworks has been known to offer Hadoop distribution. We offer our Hortonworks data platform for managing data. We call it data at rest, basically the historical insights. Now with the new members of our product offering, we now offer not only just data at rest, but data in motion. So look at it, think about the 44 Zettabytes of data. And let's make sure we get the camera on this. So you have a chart here. So you're going to take a straight example of some of the data in motion. So get a shot on that. Okay, good. Yeah, sounds good. So we talk about it. The data really has go exponentially to 44 Zettabytes by the end of the decade. That's the prediction. That truly the traditional platforms, as you have mentioned, the platforms that the corporations are familiar with is just not capable of to scale up to the platform or to the usage of that today. So for that matter, the race is truly on, right? Now all the corporations are now thinking about what is the next platform they can get themselves deployed and get up and running to provide actionable intelligence that allows them to win the next generation of customers to be able to provide the next generation of services and to be able to empower themselves for new efficiencies. So we look at the data here is truly, we look at the data fly by of the data in motion data could be on your Fitbit on your iPhone of the sensor data of an oil rig. Now truly almost 50% of data is now a, almost we call data flight data in motion, right? To, and the 50% is the data at rest. So the data scientists really are being challenged to find a point solutions that is custom built to just to give them some insights for that. So you had a lot of, I love the motion. So motion just give, let's just double, double click on that one thing. So data in flight and data in motion are the same thing or part of the same thing? Yeah, we're calling it as a data in motion. It's data at rest is essentially systems of record. You got to store it somewhere in whatever format they might be interested in doing that. Structure databases, right? Database tables we're familiar with. So you have a little box around their legacy big data platform. So that's basically saying, okay, if you have legacy, you could probably use that. Is that what you guys do? Absolutely, we're saying that you're, you're going to be slightly at a disadvantage because you are looking at only the data at rest, right? You probably are pioneer couple of five, six years ago and really go have your jump with your both feet into the big data world. You have some kind of a big data platform and do platform you're using. But at the same time now, the world truly has moved on. You have new challenges you have to facing and how do you gonna solve that? You know, I always say, you guys have words like data flow, which is a trademark term and do Hortonworks data flow, data platform. And the thing, the joke is just when you think you're complete, new data comes in. That's right. The fast data or we used to call, you know, data ocean, data lake, a lot of action going on. Especially around IoT. So I want to get your thoughts on this. It's really a philosophical thing. And I want to find out where Hortonworks lands on this. Some people believe that you should couple the data platform with devices or systems and some think you should decouple the data platform from devices and other subsystems. What's your thoughts on that? I truly think it's all connected, right? So that's why we call it now a connected data platform. Connected data platform that have the data at rest and data in motion, right? Data at rest and data in motion. So I think that for example, quickly finishing up here so that I can show the chart. So connected data platform, right? It's our offering and we have data in motion and we have offerings for data at rest. So let me tell you why that's important that we need to have the same platform that actually connects the two openly. I'm going to give you an example, right? Everybody knows about progressive insurance. Progressive insurance that actually offer its drivers a kind of opt-in snapshot devices that they can plug into their cars. And before they have the sensor can really gather the information, basically IOT information on the fly. That's the data in motion part, it's streaming in. But because they don't have a data at rest platform to allow them analyze it, they can only gather 25% of the data that hosted in the traditional system and it takes them five to seven days to analyze the whole data. So now you have actually a loop. You think about the data streaming in, you need to make some actionable intelligence immediately. We call it perishable insight, right? Are we going to do something about this in the next four, six hours, 24 hours? Then you move the data into a data at rest platform to do long-term analysis, to do predictive analysis. Then what's the insight? The insight you get is whether or not you wanna give this driver a kind of a discount based on the driving behavior because in the system, data at rest system in progressive holds actually 10 million driver miles data. So what comes out of it has to loop back in to the data in motion part. So what I think it should be a connected data platform that is has open, it's completely open that also that has the security and governance and operations for you. So in the picture though, if I'm a customer and I say, okay, I love this idea, this data in motion super strategic because I have IOT and other things going on whether it's mobile data or user data or whatever it could be, social data. If I have a legacy set of storage or systems, that's not gonna change what I do on data in motion. It's just kind of decoupled, if you will, from the platform work and connected. So connected and being decoupled gives me freedom. Is that kind of what you're saying? So I'm a customer? I can do that? Yeah, so what I was trying to say also, you're right, definitely. And also to emphasize that is a continuous feedback loop. So what you get in data in motion can immediately feed into the data at rest system. And then what comes out of it as we use Spark and other tools now to predictive analytics, the outcome of that can immediately feedback in to our data flow platform and actually give additional feedback to the end system. To your Fitbit as well as your iPad or iPhone to give you some indication say, given our prediction, you gotta slow down. You gotta do something. You gotta take it action right now. So it's a continuous loop. So we've learned a lot about how to utilize data in a variety of different ways and a variety of different technologies. And Hadoop extended that pretty dramatically. We've also discovered that there are limitations to what some of these technologies to do and how we have to complement them with new technologies like for data in motion. Are we also talking about new expertise centers? Are we talking about the same group of people utilizing these technologies to solve different classes of business problems? So I have to be very frank. I think that's the both. That's how I met in trade shows and even at this show and other our Hadoop summit is that they are the same type of people that who are now really are dig into the new technologies. They have to, right? That's their job. The responsibilities has been expanded. That they need to learn what is data in motion. What is actually applies to the traditional system. So even the traditional DBAs now have new tricks that are at the sleeves that they can apply. At the same time, there's a new crop of brand new skill set that needs to apply to the system that allows you to do never impossible before. So I think that I will say it's a combination. I've seen it and talking to people that has both the traditional skill set and brand new skill set. So your business model ties you intimately to your customers. Has to. Absolutely. Your business models are also helping to extend the characteristics and the technologies of the platforms that you're actually then distributing. How do you see new folks being attracted into or how will this data in motion attract new experts, new people into that community so that the community can accelerate how it solves problems even faster? Yeah, that's a really good question actually. I think that the community itself is, if I may say there are two things. One is really tied into the use cases. You can think about the corporations that they are in a sense that right now, probably not five years ago, recruiting on younger folks on, younger folks on basically new skill set for completely open source projects, right? They are doing that. So for that sake that they are already bringing these people into the community well before, to be honest, they're graduated from college. On the other side, at the other time, we have folks that who are traditional, you will say, almost doing only sensor data, don't have IoT expertise or even cybersecurity expertise. They feel that they are coming to a show like this week and Hadoop Summit is to learn what is possible, what are the new use cases that enables them to do and augment the skills that they have currently to make it a bigger tight and secure a community? They either dragging the people here to their community or obviously they want to join themselves to a bigger and a more sophisticated community that we have. We have some crowd chat comments from Madhu, one of our CUBE alumni in our community says, we need to move from conversations from data at Raston Data in Motion to the value for the customer. And then he goes on to say enterprise customers want insights and extract actionable intelligence to make decisions on the business front and leverage with data and machine learning to achieve this. So one of the things that you guys have in your platform is this notion of the data platform and then the data flow and then the intersection is the actionable insights. Can you talk about that piece? Because that's where the action, that's where the magic happens. And what's the impact the customers on the app side? So I think that we feel like, we call it data in motion, data at Raston enables modern data applications. The modern data applications is one that I mentioned is either custom built or off the shelf that allows you to truly, we call it almost like a little assembly line, you assemble it and then make sure that your connected data platform allows you to pick and choose the functionality you need. Absolutely, this chat is correct that connected data platform, or forget about talking about what is data in motion and data at Raston. It's truly, as I mentioned before, what are the specific use cases? Some of the use cases could be completely on the data in motion side. I just have a variety of streaming data coming in. I need perishable insight in the next 20 minutes to an hour. Forget about the data at Raston because that doesn't quite apply for me. I worry about that tonight when the data has been flowed into my data at Raston. But right now, in order to make new insights, actionable insights, I need this. You mentioned perishable data, I like that term. So that's a term that you guys use as part of one of the use cases that's the actionable insight. It's something real time, something in the moment. Is that what you refer to about perishable? Yes, like a Fitbit, like an iPhone. You're doing something, your heart rate is going too high. If you're streaming into the data, obviously into a data in motion system, you got to do something. It's perishable because that intelligence is not going to be applied to you or not going to be very useful for you for hours from now. So to talk to Madoos who's watching, Madoos just, if I can break this down, I can put this package together. What we're saying here is the environment consists of data at rest and data in motion, those are things that are happening, those external market forces in big data. IoT and other applications is where the action is for the customer standpoint. The morning data application. So that's the data application. Okay, so now the developers that are out there, what does it mean for them? For the app builders. So at least the way our philosophy is that we're going to provide it, within the technology we're having, provide the most comprehensive, most open tool set for them to use, right? Just like we talk about our Hortonworks data platform, everything we offer is 100% open source. Even with the data in motion on Apache, NIFI, Hortonworks data flow, again, it's 100% open source. What our philosophy truly is, we're going to give the developers all the tools that is out there that is open and useful. So I'm going to come back to this notion of data in motion, data at risk, and the question that came in from the crowd chat was a great question. Because you're right, perishable data, for me when I'm exercising, which is so very frequent. What Fitbit is telling me right now is relevant to me right now and it may not be relevant to me in two hours. But if I have a health event, then that data that was perishable a second ago may in fact become extremely relevant because the context has changed. So how do we ensure that what we call perishable before we understand the role of the data? Can it also be employed in a new context if we have to work with a physician or a healthcare provider to address a problem that emerged a couple of hours later after I exercised? Yeah, so that's precisely when you guys asked me the question, which one should I lean into? That's why it's an integrated system. It's connected data platform, right? We gather the data from you, give you indications of what your action should be in the next couple of minutes, then the data automatically is flowing in through data flow or Hortonworks data flow as a product into our data address platform. And then so you can dig back in any time you want. Same thing with any kind of events, trigger events, oil rig, fire, there's a fire, right? It wasn't a fire five seconds ago, so the data that was not relevant for us 10, 15 seconds ago now it's become really relevant. So we go back in, look at the data, it's already collected back to the data center and do some analysis and see what's happening. Yeah, I think that's a really important point, just one last thought on this, is that the idea ultimately is that the data will persist but the context will change. How the data gets employed as a consequence of how different people come together to do different things together will be a very fluid thing, but we have to make sure that the data is moving to that moment or that context and in a way that allows us to apply multiple contexts or utilize multiple contexts on that data. Absolutely. Hortonworks has had a secondary offering recently. It's a company that has been explosively growing, certainly in the post-Cloud Era era. Cloud Era was a solo pilot in the beginning. Hortonworks came on, fast follower, but there was really a two horse race between Hortonworks and Cloud Era. So much has changed and a lot of people have been speculating about the whole business model of Adupe, but now that you guys have this connected platform, it goes well beyond Adupe at this point. Yeah. We're seeing that. I want to get your thoughts on that because I think that if you can share that narrative with the customers that are watching, your customers that are watching our Qube community because it's really going beyond Adupe, is one aspect of you guys, obviously a big player in that, but this notion of having a connected platform is a big deal. So absolutely, thank you for the question. We are, you're right, we are a public company, so all the things I say here is probably you can find it on our filings. We have over 800 customers now as of the end of 2015. Our growth rate is phenomenal. We continue to really expand ourselves into a variety of industries. So absolutely for our customers that who are want to do what is impossible or never possible before, right? The things that I talked about progressive for them before was never possible. Now it's how did I live without it? How did I actually live without the sensor data that for my drivers to have usage-based injuries? I couldn't imagine before, but now I can't live without it. So for the customers out there, we continue to go and expand. We already talk about our cybersecurity initiative. We actually are pioneering the initiative in Metron. So turning in, we're going to continue to expand our product or solution offerings, but the philosophy has not changed, right? We're open, we're completely open, we're all for the community and we're very proud, just like Sean Colony said, yep. Well, you guys are great to work with and we're looking forward to Dublin on April 14th. The Cube will be there. We're flying to Ireland to have a few pints of Guinness, which is a touristy thing to do, but hopefully they have some good beer there. But with some great Hadoop conversations, can you share with any of the big announcements coming up at Hadoop Summit? Oh, we're kidding, I'm sure you're not gonna say. No, I can't share anything that is, we are going to have a very big announcement at Hadoop Summit in Dublin. Yes, please turn in. I would definitely buy you a beer in Dublin, but no, I can't share anything. It will be in an Irish accent though. Coming up later today, we have a lot of great guests. We've got CEOs, we've got entrepreneurs, we've got startups, we've got some public companies. Again, here, live in Silicon Valley, stretching the signal from the noise, at Big Data Week, go to hashtag BigDataSV, hashtag BigDataWeek, and go to CrowdChat.net slash Strada Hadoop. We are here for our Big Data SB in conjunction with Strada Hadoop. This is theCUBE, we'll be back with more after this short break.