 Live from San Jose, California, in the heart of Silicon Valley, it's theCUBE. Covering Hadoop Summit 2016, brought to you by Hortonworks. Now, here are your hosts, John Furrier and George Gilbert. Okay, welcome back everyone. We are here live in Silicon Valley in San Jose for Hadoop Summit 2016. This is SiliconANGLE Media's theCUBE, our flagship program. We go out to the events and extract the signal from the noise. I'm John Furrier, my co-host George Gilbert, SiliconANGLE Wikibon analyst and Big Data. Our next guest is Dinesh Nirmal, Vice President of Next Generation Platforms, Big Data Analyst and Nancy Henley, Director of Offerings Management, IBM Analytics. Welcome to theCUBE. Good to see you again. Good to see you. Good to see you. Thank you for coming on. All right, so analytics, Next Gen. We're going to have a rapid fire question. So, what's Next Gen and how does that fit into the current analytics at IBM? Right. So, the Next Generation platform is the platform that we're building in IBM. Which targets four personas, mainly. The data scientist persona, the data engineer, the CDO, the chief data officer or a business or citizen analyst. Those are the four personas. So, it encompasses and completes the whole end-to-end picture, whether you're all the way from ingest to visualization. So, you can bring in the data, right? As a data engineer, you can wrangle the data, you can transform, cleanse the data. As a data scientist, you can build the model, take that data, right? As a business analyst, you can see the data on Watson Analytics. As a CDO, you can set the governance on all top of it. So, that is a huge- And so, the four ones. Data science, data engineer, business analyst and citizen? No, CDO, the chief data officer, right? He or she sets the overall governance on top of the data, which drives the roles of data scientist or data engineer, what they can do with the data. And Nancy, on the analytics side today, we're seeing data engineer augment and add value to the data scientist. It seems more of a platform-like engineering role or architectural role, and the data science is becoming a primary front-lines. So, it should be, right? I think what you're seeing is the shift from products and tools to more of a platform way to approach the challenges that we have, which is, it's not just about the technology, it's not just about the tools, it's about having the ability to take analytics and embed them into your business, right? The goal that we've always had. And so, if you think about how we used to think about architecture and how we would make data available, it's we'd gather up the data, we knew that we're going to answer the questions that a department would have, what does finance want, what does marketing want, we make it available through the various tools. It didn't enable a couple things, right? It didn't really enable any sort of discovery of the what-if scenarios that was done in a corner, not collaborating with anybody else. And it also didn't enable a level of personalization that a specific role might need. And so, now when we're looking at next generation, what we're looking at, we've flipped this over. And we said, let's look at how people are sharing and looking for data and collaborating on data more at the role. So, the level of personalization has gotten much different. When you think it that way, the architect very different. And that's what- So, you're flipping it upside down. We're flipping it upside down. So, we just interviewed from Arizona State some great scientists who are doing cancer research with data and they would support that thesis by saying, and what worked for them was one, access to more open data. The ability to start cataloging an observation space, which is a term that Jeff Jonas uses a lot at IBM. Jeff, if you're watching, we're borrowing and stealing your word, I love it. Having an observation space, then from there the discovery is not about the known questions but the unknown questions, as you said. Okay, I totally buy that. How the hell do you do it? What happens if I'm a customer? Do I throw everything away? Is there an augmentation? Is there a sequencing? Is it a rip and replace? How do I implement that nirvana, that dream? Right, so I just want to touch one point that Nancy mentioned on collaboration. That's going to be a key point in the next generation platform. How does the different personas collaborate and work together? Because underlying data, they all have access- Not be siloed, you mean? Exactly, right? Both in terms of access to data and also with each other. Right, I mean- Sharing. Exactly, so you can create a project within next generation platform and you could be a data scientist and you can share that data with the data engineer or other data scientists, right? So that collaboration piece is very critical. But to come back to your question, right? How do we do it? I mean, so today the data exists. I mean, data is existing in a lot of data sources. How do we bring that into the data rankling as a data engineer, right? So we have here putting 100 plus connectors into the NGP, the next generation platform where the data could be an HDFS, data could be on traditional data sources, data could be in S3, could be in Swift, could be in Royack, anywhere, right? So you can bring the data in and then we are on top of that, we are bringing data works on cloud which will help you do the data rankling. Then we have the predictive, the prescriptive advanced analytics piece where you can build the models. We have the machine learning piece there, the deep learning. And then how do you visualize the data for the business analyst to look at it? So it all comes together, it's all stitched together in a very, you know. An operating system. Exactly. And under the cover, Spark becomes the engine to execute. So it's a great story for us. And when you think about the gaps that we're addressing with this, it's like we said, it's collaboration, it's also self-service, it's really opening up access. The data is there, but the ability to go in and explore the different silos based on the questions you have or the way that you want to use the data, that is the big gap that exists. Taking that, making that more consumable, that is a key focus for the next generation. So this sounds like a very expansive vision. What can you do by embracing all those roles that others cannot? Take for example, perhaps lineage. I mean, some people say, oh, you know, we can track exactly what you do in terms of transforming the data. But you're taking it, you know, like from birth to death. Maybe that's not the greatest example. Tell us how there's a data plane and perhaps a control plane that you might uniquely be able to provide. Well, ideally, you want to be able to open up access without the user or the consumer of that data having to worry about lineage, right? It has to be tracked, it has to be there, but you want to mask that from the user, right? They don't need to know where the data is. They just want to know that it's good, it's trusted, and that they can trust the source, right? And now we're talking about open source data. You're talking about external data, data that you have inside of your enterprise. What we really are doing is putting a fabric, I guess is the best way to describe it, that's in between the systems that you've built up, external data that you want to bring in, and then all of the applications and the usage that wants to get access to that data and making that very consumable. So would it be fair to say that that fabric is the core of the value add that no matter where you're bringing data from or adding analysis to, and whatever role you are for collaboration, you're maintaining the integrity of that fabric which sounds like it's metadata more than the data itself. Metadata is cool. So that's a really good point, right? So if you look at a lot of the vendors, I mean, you can do the ETL piece, but what we bring as an enterprise vendor is a core governance. When you talk about the blanket, right? As a CDO, how do I make sure the data is trusted, the people who are looking at, so how do we make sure end to end that is covered? And that's where the governance comes in. So we are, for example, contributing heavily into Apache Atlas, which is the open metadata service, right? So we are doing that. So that is a key piece for us to make sure that end to end, that security layer is there, the governance layer is there. And you guys are investing heavily in the open source piece of this. Oh, yeah. And the other, can you just take a minute to explain that? I want people to understand the level of commitment to open source that you guys have. Right, so if you look at the community, right? I mean Spark. We have a Spark Technology Center in San Francisco where the whole sole focus is to contribute to the open source community. We have the Apache Atlas project, which is an open metadata service that we are heavily contributing. You look at System ML, we have pretty much contributed the whole thing to the open source. So IBM is becoming, you know, our... So Linux moment. Right. IBM was heavily involved. Spark SQL. Spark SQL, Spark R. Right, R, we became a consortium member, you know, we are a platinum member in the R board. So all those things, we see that growth in open source, the community, we want to be part of it. Certainly the practitioners that we talk to, certainly we've had the guys just on, before you on Arizona State, open source has changed the game. And we believe that, certainly in the enterprise side, you guys do too, that that's happening. With that Nancy, I want to ask you a question as we were talking briefly last night during the open session here about how the world's changing, right? Across the board from the media world, we live into your world and to customers. This shift is really accelerated. So it's not so much people are pivoting. I mean, you're pivoting in the sense of not because your business is hurting, but startups are pivoting because this value is shifting and people are getting a clear line of sight on some things. What is the big thing that you would share, that you see that's been different over the past year or so, where the value is becoming clearer where everyone's vectoring into? What's that main value piece? It's hard to say one thing. I think people see the value of open source innovation, and then the integration of open source capability with existing capabilities. So what we've done with Big Insights is taken some of that open source capability and integrated into the product. I think there's huge advantages in that because now you're not limited by the skills within your own organization. You can move at the speed of the community. I think there's been a big shift in thinking about how important it is for self-service access of data. And that is a big change from where we've come from in data warehousing where it was so governed and so groomed. It wasn't in real time, you didn't know your queries. We really didn't have that ability to discover and explore. And that's really what's going to be the businesses that can disrupt and change their business models, need that, that's table six for them. To the next one I want to ask you a question, is Jeff Frickle, the general editor that people always says when we talk about our business, the answer to our problem is in a cube interview somewhere. Are we interviewing such smart people? But that really kind of highlights what we're seeing as a number one thing that everyone's talking about is the answer is in the data. Whether it's curing cancer or whatever business, you can't throw it away, you want to collect it. If you believe that the answer's in the data, you got to have data, and then you got to have the tools for it. How do the users get the value of the data? What are some techniques that you guys are seeing? Nancy, we can open this up to you as well. It's like, what are the things that people are doing now to maximize acceleration of value out of the data, finding their answers? Is it the tooling? Is it them just doing the work? No, I mean, for example, I mean, one example I'll give you is machine learning or deep learning, right? The data is there, you build a model, you want to predict and look at the score and see how well your model is doing. And so for example, if you look at the real estate market, how do you build a model to see how well the prices will be reflected over a long period of time? That's a great example. The data is there, you pull that into it, you build a model, you give a feedback loop so that real time that model is being trained and updating the model itself. So those kind of things, right? I mean, in any segments of market, we can use it. So whether it's an airline industry, whether it's a healthcare industry, whether it's a real estate market. So vertical integration to the applications or having analytics also in the application, but also working at this new fabric, right? It's about getting insight in the data to make intelligent decisions, right? So that's what it comes down to. How do we take the data, get insight into it, and you can make intelligent decisions on it. And then speeding that into a process where you can actually take action on it, because that's what we've got to make very consumable, because the faster you can do that, the faster you can iterate through those questions and get to discoveries and things you didn't know about, the better position your business is. So is the bottleneck technology or is the bottleneck the processes themselves and the people or both? I think it's the ability to provide that collaboration with the power of the technology that we've got today. And I think that's one of the biggest problems that next generation platform will solve. Because in silos, there is a lot of these products exist, right? You can ETL using a vendor, you can do the predictive or prescriptive, but how do we make sure this all comes together, stitched together, right? So as an end user or as a data scientist, data engineer, any of those personas, how do I make sure that it's all comes in one package? I mean, you want to create users that become heroes in their organizations, the Superman, if you will, but silos are the kryptonite for that. I mean, they kill the innovation. Exactly. Silas are not good. Right, and that's the differentiator for us, is that we are bringing the data so you can have a data lake, whether it's HDFS, you can have Spark, but we bring it all together, that doesn't exist today, and we are bringing that to the market, and we already have data science experience. Well, we got to wrap up, but I want to get one final question in. What are we going to see from you guys coming up? I know you have a show, IBM Insight, which now has been called World of Watson. Watson, obviously, has great marketing. Everyone always says, that's great marketing. That's a great product, too. So we're going to hear more about the Watson and how that's kind of become a brand from pretty much a lot of the analytics. World of Watson, obviously the show. It's a big event for you guys coming up in the fall. What's going to be leading up to that? What are some of the things we're going to hear? We're going to hear a lot more coming on the Next Generation platform. So we saw the data science experience recently announced. We continue to focus on improving all the capabilities underneath it, like Hadoop and Spark and all the open source capabilities, but also more focus on more collaborative roles that will be coming out where we're changing the experience of using analytics. Great. And then changing of the work environment, certainly the future of work, as people call it, applies to this area. I mean, you see change happening. So just one comment to add to what Nancy said, which is we have a lot of exciting stuff coming. We already announced data science experience. We have data engineer experience coming. We have that data lake piece coming along with the CDO, the chief data officer. Plus, we are doing a lot of interesting work on machine learning and deep learning. So there's a lot of good stuff coming. And it's a lake you won't drown in. Thank you. You can swim in it. Yeah, I have to wear my floaties. So guys, thanks so much for sharing the data and the insight here on theCUBE. We'd love to bring that forward, look forward to the event. We're going to be at the World of Watson. theCUBE will be there from what we're hearing from. So we're going to be there at Insight now. World of Watson. IBM here inside theCUBE, sharing the analytic story, flipping upside down on its head. The world is changing, open data, collaboration, smarter data getting the value out of the data. This is theCUBE sharing the data here. I had to do some in 2016, move right back with more after the short break.