 Live from San Jose in the heart of Silicon Valley. It's theCUBE, covering DataWorks Summit 2018. Brought to you by Hortonworks. Welcome back to theCUBE's live coverage of DataWorks Summit here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We're joined by Scott Ganao. He is the Chief Technology Officer at Hortonworks. Welcome back to theCUBE, Scott. Great to be here. It's always fun to have you on the show. So you have really spent your entire career in the data industry. I want to start off at 10,000 feet and just have you talk about where we are now in terms of customer attitudes, in terms of the industry, in terms of where customers feel, how they're dealing with their data and how they're thinking about their approach and their business strategy. Well, I have to say, 30 plus years ago, starting in the data field, it wasn't as exciting as it is today. Of course, I always found it very exciting. Exciting means nerve wracking. Keep going. You're nerve wracking. But you know, we've been predicting it. I remember even 10, 15 years ago, before big data was a thing, it's like, oh, all this data is going to come and it's going to be 10X what it is, and we were wrong. It was like 5,000X what it is. And I think the really exciting part is that, data used to really be relegated, frankly, to big companies as a derivative work of ERP systems and so on and so forth. And while that's very interesting and certainly enabled a whole level of productivity for industry, when you compare that to all of the data flying around everywhere today, whether it be Twitter feeds and even doing live polls like we did in the opening session today, data is just being created everywhere. And the same thing applies to that data that applied to the ERP data of old. And that is being able to harness, manage and understand that data is a new business creating opportunity. And we were with some analysts the other day and I think one of the more quoted things that came out of that when I was speaking with them is really like railroads and shipping in the 1800s and oil in the 1900s, data really is the wealth creator of this century. And so that creates a very nerve-wracking environment. It also creates an environment of very agile and very important technological breakthroughs that enable those things to be turned into wealth. So thinking about that in terms of where we are at this point in time, and on the main stage this morning someone had likened it to the interstate highway system that really revolutionized transportation but also commerce. I love that actually. I may steal it in some of my things. It's good, but we'll know where you feel for it. Well perhaps with data is oil, the edge in containerized applications and typing data, microbursts of data across the internet of things. It's sort of like the new fracking. You're able to extract more of this precious resource. Hopefully not quite as damaging to the environment. I'm sorry for environmentalists, but I just offended you, I apologize. But I think all of those analogies are very true and I particularly like the interstate one this morning because when I think about what we've done in our core HTTP platform, and I know Rune was here talking about all the great advances that we've built into this, the kind of the core Hadoop platform. Very traditional store data, analyze data, but also bring in new kinds of algorithms, rapid innovation and so on. That's really great, but that's kind of half of the story. In a device connected world and a consumer centric world, capturing data at the edge, moving and processing data at the edge is the new normal. And so just like the interstate highway system actually created new ways of commerce because we could move people and things more efficiently, moving data and processing data more efficiently is kind of the second part of the opportunity that we have in this new deluge of data. And that's really where we've been with our Hortonworks data flow and really saying that the complete package of managing data from origination at the edge all the way through analytic to decision that's triggered back at the edge is like the holy grail, right? And building a technology for that footprint is why I'm certainly excited today. It's not the caffeine, it's just the opportunity of making all of that work. One of the, I think the key announcement for me at this show that you guys made on HDP 3.0 was containerization of more of the capabilities of your distributed environment so that these capabilities in terms of processing, first of all, capturing and analyzing and moving that data can be pushed closer to the end points. Can you speak a bit, Scott, about this new capability or this containerization support within HDP 3.0 but really in your broader portfolio and where you're going with that in terms of addressing edge applications, perhaps autonomous vehicles or whatever you might put into a new smart phone or whatever you put at the edge. Describe the potential containerization to sort of break this ecosystem wide open. Yeah, I think there are a couple of aspects to containerization and by the way, we're like so excited about kind of the cloud first, containerized HDP 3.0 that we launched here today. There's a lot of great tech that our customers have been clamoring for that they can take advantage of and it's really just the beginning which again is part of the excitement of being in the technology space and certainly being part of Port and Works. So containerization affords a couple things, certainly agility, agility in deploying applications. So for 30 years we've built these enterprise software stacks that were very integrated, hugely complicated systems that could bring together multiple different applications, different workloads and manage all that in a multi-tenancy kind of environment and that was because we had to do that, right? Servers were getting bigger, they were more powerful but not particularly well distributed. Obviously in a containerized world, you now turn that whole paradigm on its head and you say, you know what, I'm just going to collect these three microservices that I need to do this job, I can isolate them, I can have them run in serverless technology, I can actually allocate in the cloud servers to go run and when they're done they go away and I don't pay for them anymore. So thinking about kind of that from a software development deployment, implementation perspective, they're huge implications but the real value for customers is agility, right? I don't have to wait until next year to upgrade my enterprise software stack to take advantage of this new algorithm. I can simply isolate it inside of a container, have it run and have it go away and get the answer, right? And so when I think about, and a number of our keynotes this morning we're talking about just kind of the exponential rate of change, this is really the net new norm because the only way we can do things faster is in fact to be able to provide this. And it's not just microservices, it's also orchestrating them through Kubernetes and so forth, so they can be quickly deployed as an unstoppable and then quickly deprovisioned when you don't need them anymore. Yeah, so then there's obviously the cost aspect, right? So if you're going to run a whole bunch of stuff or even if you have something as mundane, as a really big merge joint inside of Hive, let me spin up a thousand extra containers to go do that big thing and then have them go away when it's done. So, and only pay for it while I'm using it. And then you can possibly distribute those containers across different public clouds depending on what's most cost effective at any point in time, Azure or AWS or whatever it might be. And I tease with Arun, you know, the only thing that we haven't solved is for the speed of light, but we're working on it. Well, but in talking about how this warp speed change being the new norm, can you talk about some of the most exciting use cases you've seen in terms of the customers and clients that are using Hortonworks in the coolest ways? Well, I mean, obviously autonomous vehicles is one that we all capture all of our imagination because we understand how that works. But it's a perfect use case for this kind of technology. But the technology also applies in fraud detection and prevention, it applies in healthcare management and proactive personalized medicine delivery and in generating better outcomes for treatment. So, you know, all across- In every aspect of our lives, including the consumer well-being, increasingly, yeah, all across the board. And you know, one of the things that really changed, right, is certainly a couple of things, a lot of bandwidth, so you can start to connect these things. The devices themselves are particularly smart, so you don't any longer have to transfer all the data to a mainframe and then wait three weeks, sorry, wait three weeks for your answer and then come back. You can have analytic models running on an edge device and think about, you know, that is really real-time and that actually kind of solves for the speed of light because you're not waiting for those things to go back and forth. So, there are a lot of new opportunities and those architectures really depend on some of the core tenants of, ultimately, containerization, stateless application deployment and delivery. And they also depend on the ability to create feedback loops, to do point-to-point and peer kinds of communication between devices. This is a whole new world of how data get moved and how the decisions around data movement get made. And certainly, that's what we're excited about building with the core components. The other implication of all of this, and you know, we've known each other for a long time, data has gravity, data movement's expensive, it takes time, frankly, you have to pay for the bandwidth and all that kind of stuff. So, being able to play the data where it lies becomes a lot more interesting from an application portability perspective. And with all of these new sensors, devices, and applications out there, a lot more data is living its entire lifecycle in the cloud. And so, being able to create that connective tissue and even on the edge. With machine learning, let me just say, but in a second, one of the areas that we're focusing on increasingly at Wikibon in terms of our focus on machine learning at the edge is more and more machine learning frameworks are coming into the browser world. JavaScript is like TensorFlow.js. You know, more of this inferencing and training is going to happen inside your browser that blows a lot of people's minds. I mean, not be heavy hitting machine learning, but it'll be good enough for a lot of things that people do in their normal life. Or you don't want a round trip back to the cloud. It's all happening right there in Chrome or whatever you happen to be using. Yeah, and so the point being now, when I think about the early days of talking about scalability, I remember shipping my first one terabyte database and then the first 10 terabyte database. Yeah, it doesn't sound very exciting. When I think about scalability of the future, it's really going to, scalability is not going to be defined as petabytes or exabytes under management. It's really going to be defined as petabytes or exabytes affected across a grid of storage and processing devices. And that's a whole new technology paradigm. And really, that's kind of the driving force behind what we've been building and what we've been talking about at this conference. Excellent. So when you're talking about these things, I mean, how much are the companies themselves prepared and do they have the right kind of talent to use the kinds of insights that you're able to extract and then act on them in the real time? Because you're talking about how this is saving a lot of the waiting around time. So is this really changing the way business gets done and do companies have the talent to execute? Sure, I mean, it's changing the way business gets done. We showed a quote on stage this morning from the CEO of Marriott, right? So I think there are a couple of pieces, right? One is businesses are increasingly data-driven and business strategy is increasingly the data strategy. And so it starts from the top, kind of with setting that strategy and understanding the value of that asset and how that needs to be leveraged to drive new bits. So that's kind of one piece. And obviously there are more and more folks kind of coming to the realization that that is important. The other thing that's been helpful is, as with any new technology, there's always kind of the startup shortage of resource and people start to spool up and learn. The really good news for the past 10 years, I've been working with a number of different university groups. Parents are actually going to universities and demanding that the curriculum includes data and processing and big data and all of these technologies because they know that their children educated in that kind of a world. Number one, they're going to have a fun job to go to every day, because it's going to be something different every day, but number two, they're going to be employed for life. And so, frankly, the demand has actually created a catch-up in supply that we're seeing. And of course, as tools start to get more mature and more integrated, they also become a little bit easier to use and there's a little bit easier deployment and so on. So a combination of I'm seeing a really good supply, they're really, obviously, we invest in education through the community, and then frankly, the education system itself and folks saying, this is really the hot job of the next century. I can be the new oil baron or I can be the new railroad captain. It's actually creating more supply, which is also very helpful. Data is the heart of what I call the new STEM cell. It's science, technology, engineering, mathematics that you want to implant in the brains of the young as soon as possible. I hear you. Yeah, absolutely. Well, Scott, thanks so much for coming on, but I want to first also, you can't let you go without the fashion statement. You arrived on set wearing it. Yeah, I love it. It was quite a look. Well, I did it because then you couldn't see I was sweating on my brow. I was worried about this stuff in view. You know one of the things I love about your logo, and I'll just, you know, it sounds like I'm fawning. The elephant is a very intelligent animal. It is indeed. My wife's from Indonesia. I remember going back one time, they had Asian elephants at one of these safari parks. And watching it perform. And then my son was very little then. The elephant is a very sensitive, intelligent animal. You don't realize till you're up close. They pick up all manner of social cues. I think it's an awesome symbol for a company that's all about data-driven intelligence. The elephant never forgets, but that's what we know. You didn't forget, because you've got a brain. Seriously, he or she has a brain. And it's data-driven. Thanks very much. Great, well, thanks for coming on theCUBE. I'm Rebecca Knight for James Kobielus. We will have more coming up from data works just after this.