 From San Jose in the heart of Silicon Valley, it's theCUBE covering Big Data SV 2016. Now your host, John Purrier and George Gilbert. Okay, welcome back, and we are here live in Silicon Valley for theCUBE. This is Silicon Angles Media's flagship program. We go out to the events and extract the signal and noise. I'm John Purrier, my co-host, George Gilbert, Big Data Analyst at wikibond.com. Our next guest is Martin Hall, Director of Marketing and Business Development, Big Data Services at Intel. Formerly the founder of Karma Sphere. If you know the history, they've been in the big data space for a very long time. Entrepreneur working at the big company into the blue chip, the bellwether of the tech industry, Intel, welcome to theCUBE. Nice to see you, thanks for having me. So obviously Intel's been in the news that definitely Andy Grove was really kind of the highlight of the industry's been talked about and his impact on the entrepreneurial market was amazing, good manager. That's told it how it was. That's kind of like what's going on now in the big data world is people kind of want to know what the hell's going on right now. Is there a, is Hadoop going to kind of be a feature of the ecosystem? So I want to get your take. As Karma Sphere, we covered you guys really young. We've been seven years at Hadoop, we're with theCUBE and we've watched the transitions. We've seen them grow and we've certainly watched the success of Karma Sphere and then you got acquisition by Intel and obviously Intel, big investment in Cloudera. So Intel's got their hands on Intel trying to distribution of Hadoop and then with Cloudera. So Intel is in the middle of all the action and certainly had a great investment in the cloud business as a company but your role in the industry has been entrepreneurial and now at the big company. What's your take? You've been on the front lines, now you're with the big resource company in Intel. What's your perspective on where we are right now? Well, a couple of things I'd like to pick up on from the opening. First of all Andy Grove, he was an innovator. He was an innovator obviously in Silicon but he was an innovator beyond Silicon as well. There's a lot of conversation internally about his passing, what it means, what his legacy is and I've inherited some of that in my move to Intel and I understand a lot more now about Intel as a corporation than I did when I was an outsider. If somebody had asked me a year ago, you've ever seen yourself working at a company like Intel, man, so it would have been no. I thought of Intel as a Silicon company and it's been a revelation to me in the discussions I had before I joined the company and the time in the company. This is a company that in some respects, we've taken for granted as powering the digital infrastructure but its role and its influence is huge, including in big data and analytics. So you mentioned what it's done with Cloudera for examples, that investment was on the back of its own initial foray with its own distribution of Hadoop and then recognizing that a partnership would probably make more sense. But it's very interesting as an entrepreneur and as an innovator to go from 15 years of entrepreneurial startup experience to go into an organization like Intel and realize that it is an innovator, it is an entrepreneurial company, it is investing in and recognizes the central role of analytics in humanity and business transformation and start to see and be a part of what Intel can bring to the graph. It's a great company, you mentioned the society, the corporate citizenship is off the charts, we had Angela on earlier, she's doing the women in tech and they have a whole workshop, they have real cadence of metrics. Is it as metrics driven internally at Intel? As people talk about, I hear that all the time, I've never worked there but I hear everything's data driven at Intel, is that true? It's certainly processing people, I mean you don't make silicon right without having really honed processes internally and those processes apply to everything that Intel does, how it goes about its business internally and externally. So the degree of professionalism, the quality of the people, the level of innovation across the board and as I said before, not just at the layer of silicon, Intel has this rich history of working in open source software and getting involved as something of a neutral sort of Switzerland within the marketplace and being able to make big moves that impact all of humanity in business organizations. So I'll get you to take on the industry because one of the things we were saying yesterday in our opening, George and Peter Burris who's our head of research at Wikibon and I were talking about the three things that we are seeing right now, the maturation of the tech, the path of some sort of digital business model or application and then the operationalizing of big data in the enterprise and the customer space. What's your take on those three things? I mean the maturing of the technology, we had Jerry just talk about how map reduce is kind of being sunset for less than 10 years old at Duke, you're seeing Sparker rise up and in place of map reduce, yet the ecosystem is developing. What's your take on the tech maturity of the ecosystem here? I think there's two vectors of that maturity and on the one hand in all the data platforms themselves are maturing but there's a high degree of innovation and one of our challenges as an industry is that degree of innovation across vendors, the ecosystem of vendors and open source projects creates complexity for consumers of this technology. So that complexity I think is daunting for enterprise organizations and let's face it we're still at the beginning of this transformation that enterprises are making not just to being data driven but I think increasingly of it as analytics driven because the data, having a place to store and process the data is one thing, you gotta perform some algorithms on that and that's sort of the function of analytics. How about the past, so obviously analytics is low hanging fruit with some obvious business models but the operationalizing then is the next question. I mean where's the progress bar in your mind on some of those things like operationalizing big data for the companies, we're early days we'll kind of see that but what's your take on that? I think there are several aspects of operationalization. So number one, IT folks who are worried about platforms have to worry about ease of deployment, they have to worry about management, they have to worry about orchestration, they have to worry about security, I think there's been enormous progress over the past several years on those data platforms in terms of that level of operationalization but solutions are built from multiple components and individual enterprise organizations are confronted with an enormous amount of complexity and I think our job in the industry is to really make it easier to consume the data and for especially data scientists and developers to work with that data to construct the insights and I had an interesting keynote that started this morning, it was talking about in context the importance of applications and surfacing insights in context in applications and that to me was sort of a fascinating observation that certainly resonates with us and the work that we're doing. So you mentioned two vectors of simplification that were needed and the first one being sort of how to consume all the components of these analytic tools and infrastructure but the second one I wanted to key off on that related to this keynote, embedding the analytics and applications. One of our research themes that we've uncovered over now a period of a year is that we don't see packaged applications that are predictive or prescriptive analytics growing up anytime soon that are broad based. So how can Intel as a steward of the community help foster at least the growth of embedded analytics that can enhance what's deployed already? Well, so you're hitting on a key initiative that Intel essentially has commenced over the last two years. So in addition to the investment we made in Cloudera which was all about helping foster the acceleration of the development and deployment of enterprise-ready data management platforms. What we recognized was that ultimately if you're gonna surface the value of predictive and prescriptive analytics and applications we have to make it easier for data scientists and developers to use that data stored in those data management platforms. So at the end of last year we launched a new open source project called trusted analytics platform which is focused entirely on that bridging from the data stored in these now enterprise-grade secure data management platforms but basically filling the gap between that data management platform and the surfacing of those insights and those analytic artifacts if you like within applications and making it easier for data scientists to work in an environment with other people, other data scientists and to be able to publish their insights as consumable artifacts by developers that they can use through APIs and services. So it's all about accelerating that road to the surfacing of those insights in applications. We work pretty closely with IBM and go to their conferences and they highlight their work in that area if I'm understanding it correctly with notebooks, computational documents that are meant to be shared but that surface analytic capabilities in a much more consumable way is that something we would be looking forward to from Intel? Yes, I mean it's here, we've launched this as a community open source project and it is all about the democratization of access not just the data but analytics and collaboration between those two key communities data scientists and developers. Because it's interesting that there's that McKinsey study that said we're gonna be two million people short of data scientists and if you need them you're gonna have to come to us or IBM. It's kind of ironic but it's like well we can make the tools better but the tools today are still pretty fragmented. You have to do a lot of stitching together to get a tool chain, design time or runtime. Where do you see the industry now and where do you see Intel helping us get to over the next 18, 24 months? Well again that's what we're trying to do with this trusted analytics platform is create one integrated environment where that whole tool chain that spans everything you need from ingestion through data preparation through the exploration and what the data scientists do and then the consumability of the insights that they create. So sure we'll continue to drive skills and address skills shortages but as much as anything it's about making the existing technology more usable by the existing data scientists and developers. Okay so just to follow on one more item on that is it infrastructure that helps the existing tools plug together better or new tools that were designed from the ground up to play better together? That question is about what is trust analytics platform? Yeah, yeah. That's a software, think of it as a software layer. So the way I like to talk about this is you've got infrastructure, right? You've got infrastructure that goes all the way down to silicon and servers and software to find infrastructure. On top of that you've got the data management layer. So Hadoop for example and what Cloudera do. But then at the analytics layer you've got all these tools and technologies many of which are open source that are hard for people who are new to this environment to figure out what do I need and where do I get it and how do I stitch it all together and how do I make sure that we've got security that's consistent end to end and top to bottom. So there's that complexity issue again that confronts the data scientists who are new to this environment. It's our job to make that simpler and making it simpler is about addressing the fact that there are many moving parts that you've got to make available in one place that the right constituencies can come together within. Martin, I want to ask you a personal question. I'll see you at Commerce Street, because again we're early doing some great stuff. There are a lot of new people coming into the industry who might not be married to the concept of the dupe or might not know the history, certainly the 10 year anniversary. I saw the stuffed animal getting duplicated, the dog cuttings, the original toy that was named a dupe. They're coming in and they don't have the history. What would you share with those folks that are getting into the business about where we've come from, what you've learned and what you've seen and how would you advise them to be good participants in the industry and knowing kind of what's happened, not knowing all the politics and the dogma and whatnot, how we got here is irrelevant. What we learned and how would you apply that advice to people entering the market now? Well, and I reflected a lot on my career joining Intel and what I've been involved in over 25 years in software. And I think that's perhaps more true of what we're seeing in this big data and analytics space than anything I've seen before, which is that it's all about community. It's all about ecosystems working together, whether it's ecosystems of developers and data scientists, innovating in open source projects. It's also about individual vendors, I think recognizing as a group that our job collectively, we're gonna grow this market and address this complexity, is to collaborate. We want to compete, obviously, in the free market, but fostering collaboration is really, really important and open source is one very, very important element of that. There was, many people look back at the rise of the internet-centric companies and the open source movement or the explosion of the open source movement beyond just the OS level as a scale and cost-driven that you couldn't build a Google on Oracle software. Even if Oracle was scalable enough, it just wouldn't, it was not cost competitive. When we build out clouds now to handle this, the data volumes that are growing exponentially and we build out the edge for IoT where we're gonna filter out a lot of the information, are we gonna face a similar cost constraint in using the architecture that dominates 95% of the world right now in terms of Intel in the cloud and whatever maybe Argy we know on the edge? Does there have to be a price arbitrage to get closer to different cost performance curve? No, I mean I don't think so. I mean I think we have to focus a lot on the economics of open source. I listened to your previous guest talking a little bit about this, open source doesn't mean free. Clearly there are business models built around open source. That's the community angle. Right, that is the community angle. But open source is not just about cost, it's also about facilitating innovation and enabling communities to look at the realities of sort of integration and cutting across boundaries and I think that's one of the hallmarks of the big data and analytics space. You look at enterprise organizations that have to look again at what does it mean to be a data and analytics driven organization? It's not just about technology boundaries. It's not, I mean it's all about siloing, right? We're breaking down silos here and that's about collaboration. It's about aggregation of data. It's about aggregation of technologies. It's about aggregation of analytics but it's also about aggregation and cutting across organizational boundaries. And it's also you mentioned with the community that's where the fostering of the innovation and with collaboration actually not only breaks down the silos, it's a very efficient self-governing mechanism while giving people the opportunity to do new stuff and be better, right? So it's the ultimate accelerant because it's gonna keep things moving and agile is perfect for this. So when you think about agile development in a group environment, that just completely keeps things moving along. I think that's the heartbeat to me of innovation I agree with you 100%. Final wrap up, I'd like to get your perspective of the show this year. I'll get your final word to share with the folks out there who aren't here, who are watching live over 1,000 people watching right now, who aren't here in Silicon Valley at this event. What's the vibe? What is the theme? What's the smell in the air? What's the, is it the big guys? Is it the big, is the big guys rich, getting richer? The startup's growing. What's the overall just encapsulate your perspective of what's happening at the show? Well, I probably have a fairly unique perspective. I don't know if you know, I was sort of, I worked with O'Reilly closely to sort of conceive a strata on the original advisory board. So I've been at pretty much every strata event. So I've seen this, you know, the growth of this event and I take the temperature every time I come. What's new? What's changed? And I think, I mean, we touched on it earlier. There's maturity now and it was very much kind of a technology kind of garage feel early on. Now it's very much. And then there was a phase when it was all about sort of, you know, the startups but now we've got maturity in the technology. We've got maturity of some of those startups. We've got larger companies recognizing that this is transformation, including for Intel. It transforms Intel's business, you know, both internally how it operates and externally, you know, in terms of how it serves its customers. And I think really that's the kind of the hallmark now. Everything is under this this one tent of strata. You've got technology innovation. You've got investments continuing. You've got early investments starting to, you know, to mature and you've got, you know, all the major vendors in the world participate in because they know this is where you come together, you know, to learn and do business. And of course, the cube was present at creation of the original Hadoop world prior to Strata coming out. We've been, we haven't missed a strata either. We'd love to come here and promote the event with the cube and the community here is growing. And what I'm excited about as you mentioned is that it's growing and it's not just Hadoop anymore. There's a lot going on around Hadoop and things are blooming. So great, great time. Thank you for sharing your insight. Really appreciate it. We are here live for big data week. This is our big data SV Silicon Angles event in conjunction with Strata Hadoop happening right across the street. We're extracting the signal from the noise bringing you more live coverage after the short break.