 Live from Las Vegas, Nevada, it's theCUBE at IBM Interconnect 2015, brought to you by headline sponsor IBM. Welcome back to IBM Interconnect, everybody. This is theCUBE. We're here at the Mandalay Bay. There's another whole group over at the MGM. A lot of the general sessions and keynotes are going on. Check out interconnectgo.com. It's the digital experience for IBM Interconnect. Really pleased to have Nancy Hensley here. She's the director of marketing for analytics platforms at IBM and Joel Horwitz, who's the director of the portfolio for analytics at IBM. Folks, welcome to theCUBE. Great to see you. Thank you. Good to be here. Lots going on. We're really excited. We've been covering IBM's analytics business now for several years. I often said that what IBM did is it brilliantly consolidated its analytics business, and it super-glued it to the big data meme, and really became a leader in that space. It took some time, but it's really worked out great. Absolutely. Congratulations. How do you feel? What's new? Great. I think repositioning things helps us really meet more of what the customer's agendas are, as opposed to nobody really goes around saying, I want a data warehouse or I want a BI tool or I want a database anymore. They have agendas like self-service analytics and self-provisioning of data and modernization of their architecture. This really positions us to help our clients meet these new needs because there's been so many disruptions to the data center. It's crazy. Everything's changed in the last five years. What's changed? Oh, wow. I used to call the data center that sanctity kingdom where all the data was safe and it was governed. There was all these types of control, but now the biggest pressure, I think, is self-service capability. The cloud has really helped enable that, but even just self-provisioning of data, the ability to have access to analytics in a very short period of time and to do what-if analysis, so things like DashDB on the cloud helps enable that with Watson analytics, but that sort of service levels and disruption to the sanctity of the data center is really new within the last few years, right? So, Joel, your responsibility is the portfolio. You've got portfolio in your title. Yeah. The big portfolio. Right. It's better to have no gaps. You know, better have overlapped in gaps, and that's sort of the case for your portfolio. Yeah. No, I mean, as Nancy pointed out, I mean, we are seeing a lot of changes happening in and around, you know, analytics. I think before, you know, data was kind of this separate entity than the actual tools and the applications to get value out of data. So, it's, I think, really, as Nancy pointed out, really valuable to our clients to kind of start bringing that together into one solution. I think the other thing, I don't know if you're at Strata. I don't think you made Strata last week because of the storm. We were there. I didn't make it, right? Right. But, you know, I think there's still a lot of interesting, you know, changes happening with roles. So, the data scientists is still very much at play. So, as Nancy pointed out with the cloud, you know, part of that value that we bring is being able to access data kind of anywhere. So, you know, if it makes more sense to spin up a Hadoop cluster, for example, on BlueMix and then attach it to other, you know, services that we provide, you can do that. Or if it makes more sense to add it to your logical data warehouse, then you can bring Hadoop into that environment and start working with it there. So, you know, we're seeing still the emergence of the data scientists, the chief data officer, and bringing all of this together is really making it valuable for them. How much time we got? Because we could be here for a while. That's good. I want to explore some of those things. So, it struck me, Nancy, when you're talking about self-service, and Joe's talking about the role of the data scientists, and I picked up some of the themes just listening to theCUBE last week and watching the tweet stream. The sort of self-serve of the business in analytics. Yeah, absolutely. The citizen data scientist, I'm sure you used that term, but really that's the next wave, isn't it, of analytics is empowering that business user to actually use the tools. It never happened in the old sort of EDWP. Well, it did in the shadow IT. We used to call them the shadow IT organizations, and they're really the ones that have kind of pushed us on the requirement side. Remember where we used to have just data marks? That was all the shadow, too. They were just trying to solve some specific problems, and they really needed IT to stay out of their way, and that's why they would basically build their own systems. And now we're seeing that wave again, but there's even less patience, because we've now become much more critically dependent on analytics more so than ever, so the ability to provide self-service, and it's not just about the data warehouse, because you also have to figure out how you self-provision data, how do you give that capability, and that's really new to clients, because this used to be something that we very much controlled, and you just can't do that anymore. I'm going to pick up on the sort of data mark, because it created a data quality problem. You have issues now in people spinning up big data projects with no notion of governance and no notion of the edicts of the organization. You mentioned the chief data officer. Is that his or her job? What do you see happening there? Yeah, I mean, so data is becoming more and more prevalent, obviously, not just in the enterprise, but also externally, right? We were looking, we did a crowd chat last week with you guys, and you know, we're going back and looking at that Twitter data. We have a really nice relationship with Twitter now. So, you know, as we look at how to extract insight, it's not just the transactional data we have living, you know, on premise, it's also the data all around us, whether it's coming from a mobile device, social, and other places. I think, you know, the other side of that is really about this kind of convergence of multiple technologies we're seeing, or multiple trends. So part of it is, like we brought up the cloud, the other part of it is, you know, this insertion of more and more data. And then the third thing is, I really think, is machine learning. So, you know, as we talk about, you know, how do we structure data and how do we make use out of it? I think one of the emerging technologies is really around machine learning to automate some of that process. Right. So I want to talk a little bit about, maybe go back to Strata a little bit. You were there, Joel, right? Yeah, I was not there, no. So you guys announced ODP as part of a big Opens Consortium. There's a lot of discussion now going on in the industry about it. So on the, obviously, there's a lot of competition going on, right? So ODP gets announced. Inco-Opetition. Inco-Opetition. Yes, we live an interesting time. So the guys who didn't join ODP, Mike also in particular wrote a blog post saying, oh, this is fake open source, compared it to OSF. And I've had some back channels with Mike on that. I don't really buy the OSF analogy. But his point is, which I'm listening to, and I wonder if you can respond to it, is why do you need an open data platform? It's all MapReduce anyway. It's all standardized. So what's, how do you respond to that? Yeah, I mean, these are two very different things, right? One, open source and the Apache Software Foundation is really a way for developers and, you know, innovators to come together and develop on the platform. In fact, Cloudera in their keynote address, they showed, you know, this graph of the open source and Hadoop projects themselves, like growing kind of linearly over the last few years. And so that, you know, we obviously support that. But for our clients, right, in the industry, for them to really adopt Hadoop as kind of a foundation for analytics and advanced analytics, you really do need standards kind of around that to make our clients' lives easier. As an example, you know, I was pumping my gas at the gas station the other day, and I was looking down and it struck me like you have 87, 89, 91, right, octane levels. And, you know, if you were to go to every gas station, and they had different numbers there, they were trying to describe it differently, you'd be very confusing for you. And so, you know, the ODP in my mind is really a separate conversation about, you know, then open source development. It's really about standards for an industry. If you look at the open data platform consortium, it doesn't just consist of Hadoop and technology vendors. It also includes other industry people. So customers, yeah, as well. Okay, so we should think of this as now. Am I right that you're no longer going to ship a separate IBM distro? Same thing with Pivotal? Is that right? Or that? Yeah, so we still have our own distribution. It still has a lot of, you know, you know, value packed into it. It's now free. It's kind of decoupled from the big insights package. So we kind of split that up into four different modules. So we have the big insights kind of data scientists where we introduced our support as well as machine learning, which is really interesting. We also, you know, have a smaller module called analysts that kind of focuses on big SQL and big sheets kind of for more of the business analysts. And then we've also added, you know, a enterprise kind of manager that allows you to kind of get more out of the cluster. So we've decoupled things. We're not giving up on our distribution. We're just the core of the ODP is really, you know, the file system, MapReduce, and just these really core bits, but how people still build around that will still. So that core will will find its way into your distro. Yeah, right. Exactly. And then it becomes the standard. And it's the same standard that Hortonworks is going to use that that Pivotal is going to use, etc. And then these the other people that we hope others join in. Yeah, and it just makes it much more adaptable in most people's data centers, right? Because now we have more common standards across. I want to talk about so you guys have IBM's always made the point that Hadoop is not big data, not sort of what is big data now? We define it. But having said that, Hadoop kind of got it all started. You know, it's sort of the main spring of innovation, MapReduce. And, you know, which is changing so much now, right? I mean, real time. And it's really exciting. IOT. IOT is sort of the next big wave. But what's happening in the traditional enterprise data warehouse, we did a survey, Jeff Kelly did a survey last year, and we were astounded at the number of customers who said they've begun to move workloads, transition workloads from traditional EDW to new forms of big data, however you want to define that Hadoop or not, but just new and different. And it was a huge number like 65%. Another 30% we will by the end of last year. Never see. And that was really just in the last year, right? Yes. Yeah, right. It flipped overnight. And so people just they glommed on to it said, wow, this is a cheaper way to do things. We're still keeping our really valuable stuff in the EDW, but we can offload so much. And maybe I think of tearing. What's going on there? What are you seeing in the customer? So offload is one way that customers will embrace Hadoop. But I mean, Hadoop is also about the different types of analytics. And I think initially there was when Hadoop came on the scene that maybe the data warehouse people got a little defensive, like it's not going away. It's not being replaced. And it obviously isn't. But I think what's happened in the last year is it's become an acceptable compliment to most enterprises. And so even a year ago, where we had a lot of our core data warehouse customers who said, Yeah, we're getting into that Hadoop thing. A year later, now it's on everybody's agenda. And it can serve multiple purposes, right allows you to do analytics that you couldn't do before. And now with SQL, you know, the fact that you can access that data via SQL, and what we're trying to do with making data very fluid with some of the capabilities. So even if you have the query originating on pure data for analytics, you can access data that's in Hadoop, which makes everything much more fluid within the enterprise. So it's just more consumable to clients. And that allows them to not only do more analytics, like machine learning that they couldn't do before, but also they can move some of the workloads off that might even be cold data, or even as a landing zone. And that was some of the original use cases that we talked about when it came on to the scene. I think actually what's starting to shift more is now that customers have used those use cases from a cost perspective. They're starting to understand the benefits from an analytics perspective. Oh, I think that's so right on because, you know, Abby Metta says that the first wave of value in big data was the ROI was reduction on investment. And I think that's true. Other people realize, wow, storage so much cheaper. Right? Right? Why stick it in a big expensive container when I can put it into this open source platform and I'll get great. And then they realize it's not just about the storage. Yes, right. This whole other analytic world that's open up to us. So as an analyst, when I saw that huge number of people shifting, I said, Wow, oh, but luckily, Jeff asked another question, the survey, what are the most important tool sets that you require and your big data? And the number one and number two were the traditional EDW and data integration tools. Yeah. Oh, right. Wow. Right. We're still we're doing we're still doing the reporting and analysis on our business. That's not going away, right? There's still the standard reports. But now there's much more discovery you can do. Now that you have you can have the ability of having unstructured data, right? The data warehouse just couldn't handle. Yeah, you brought up the point that, you know, when you first hit the scene, you know, it's kind of this effect of Moore's law, where, you know, the cost of disk right in terms of storing data got so low. I think actually the next kind of phase is going to be kind of around Apache Spark. So it's almost like, you know, when you know, commoditizing kind of the disk to now we're talking about memory. So you know, I think Spark is actually going to take us to the next level, which will also help kind of speed up that data kind of movement, you know, and what we can do, iterating over data to get value as well. Yeah, because I mean, you think about Hadoop, it really is a batch platform. We criticize for saying that, you know, all the time, because people don't want to market it as bad, but most of the use cases today are bad. And so everybody's trying to get to real time. That's like a whole new wave of innovation. So I wanted to ask you your perspectives on this. So we always look at all the startups, then you see Hortonworks goes public and you see, wow, pretty tiny cloud error just announced, you know, past $100 million mark. These are small numbers in the grand scheme of, you know, companies like IBM. So you're a leader. We've quantified the market. IBM is number one. And a lot of that is services. We understand that. But you're talking big business, large deals going down in big data. So we look at and say, everybody loves to talk about disruption. We see the rich getting richer in this business. It's like, everybody looks back at, oh, look at what happened to DG and Apollo and all these goes, but that's not happening today. What's different? It's like IBM companies like IBM realize the opportunity and you mobilize. How did that all happen internally? What? I mean, who knows data better than IBM, right? It's we built this company for the last 100 years around data going back to this business CMS, right? I mean, it's just how we built the world, right? And now we're just mobilizing in different ways to help our clients transition into this world of big data and IOT, but it's still really about the data. And I love there's this one quote out there. And it may be Horton works that has said it that the Hadoop wasn't a disruption to the data center. It was the data that was a disruption to the data center, right? Yeah, because like I said, we had this sanctity and, you know, it was nice and neat. And we had data that fit nicely in rows and columns. And and now it's just coming at us from every direction. And not only do we have our systems of record that run our business, but now we have and the systems of insight that allow us to look at our business. But now the systems of engagement that are all around us that we have to deal with and new ways that we never had before. And so as we restructured how we look at the world with cloud data and engagement, that's how we saw it, right? And that's our whole strategy and how we've organized our business units, how we organize our focus and development is all around those three things. So you guys own the systems of record business. You always have sort of your Z, your stronghold. The systems of engagement, IBM is demonstrating its mojo and social. I mean, you really are a leader there. The new term that I've heard, and I think you guys pointed, is systems of insight. So you've got these three really interesting workloads, application areas, value creation areas, and you need a portfolio to be able to address those. The flip side of that, Joel, is the portfolio gets really complicated. Clients sort of struggle to try to understand it. How do you simplify that? What are you doing to sort of try to integrate that? Yeah, I think it all starts with what the client is trying to accomplish, right? So when you start at that point, and we've seen a number of examples where this has actually worked, which is when you start with, okay, what's the business kind of situation? What's the problem you're trying to address? And then you kind of define what are the kind of data sets and things available to you to try and understand what's going on. And that's when you kind of bring all of IBM's kind of suite of analytics, technology, and experience into one kind of package to get behind that and provide a real solution. So I think, you know, a lot of folks have made the mistake of starting kind of at the bottom and saying, well, here's, you know, you need to get all of your data into a lake, or you need to, you know, create a hub first, and you need to do all of this heavy moving, you know, heavy lifting before you can ever even start to, you know, address the business. And I think where IBM has succeeded, you know, in the last few years with big data, has been starting at the other end and saying, okay, what are you looking to accomplish with your business? What's going to be transformational for your industry, right? So I think starting at that level and then working backwards, we have multiple deployment strategies. We're one of the, well, we are the leading hybrid solution out there, right? So it's not just cloud, it's not just on-premise, it's not just Hadoop, it's not just, you know, MPP databases, it's, it's a full spectrum of... Collegal. Collegal. That's a complicated situation for a lot of customers. And so something that you guys don't talk about a lot is the services piece of the business. 45% of the revenue and big data is services. You guys are the leading services company in the world. How do you work with your services organization to leverage that for customer outcomes? So in our new structure, they're actually integrated into our business units. So now we're working hand in hand from a client agenda perspective and all the way up through the industry. So, you know, you'll have client agendas like I need to advance my analytics capabilities around if your insurance are claims, right? So that's where we can now bring in the industry expertise even more so with a lens from an industry perspective. Now on the back end, what we're focusing on is making these things very consumable. And so a perfect example of that is that cloud and dash integration, right? So clients who are building in client cloud and today and accumulating lots of really great data now in two clicks can just create a dash DB data warehouse and off you go, right? Now you can bring in new services like Watson analytics on top of that SPSS and it's all within a few clicks. Very self service, very, very consumable. So the domain expertise for that loud lives within the analytics group. Is that we're all one big happy family? The industry expertise as well. And the industry, right? We're shifting very hard towards the industry from a go to market. But that's within the analytics within the analytics group. That's huge. I didn't realize that. I just got the one minute sign. So I got to ask you, Joe, you asked about what you mentioned, data lake. Yeah, John Furrier has been asking everybody data lake or data ocean. What's the what's the better metaphor? Well, I mean, personally, I find that, you know, a funny metaphor. I think I use it in analysts, analysts briefing yesterday where I said, you know, the idea of dumping all of your data into a lake and then going fishing is really the wrong idea. And so the ocean is even harder to catch things, right? So it's like a big game fishing. Yeah, I think about it the other way. Say, well, what type of fish do I want to catch? What's the best lure, right? And if I want salmon, then I'll go to the ocean, right? But if I want trout, I better go to a river, right? So I think we need to kind of start questioning the idea of like hauling data into a lake because, you know, I don't see it. If you want drinking water, you go to the lake. You want currents, you go to the ocean, you want to surf the waves. So a lot of diversity. You need a big portfolio to take care of that. Joe, Nancy, thanks very much for coming on theCUBE. Thanks for having us. All right, keep it right there, everybody. We'll be back with our next guest. This is theCUBE. We're live from IBM Interconnect in Vegas. We'll be right back.