 Live from Las Vegas, extracting the signal from the noise. It's the Cube, covering IBM Insight 2015. Brought to you by IBM. Now your host, Dave Vellante and Paul Gillin. Welcome back to IBM Insight everybody. This is the Cube. This is day two of IBM Insight for the Cube. We've seen the keynotes, we've been wall-to-wall coverage, the Cube goes out to the events. We extract the signal from the noise. Beth Smith is here, she's the General Manager of IBM Analytics, Cube alum. Beth, welcome back to the Cube, good to see you again. Thank you, good to see you guys. So you had the keynote this morning. Had some really interesting conversations with SETI, talking about what you're doing with IBM and NASA and SETI and all the extraterrestrial life out there that you guys are finding or hoping to find and you got a new role. Last year you were the General Manager of the Information Management Group, kind of promoted wider scope. Talk about what that's all about. Well, you know, in this Insight economy, customers really need the capabilities of their analytics engines with their databases, whether or not it's Hadoop or relational or content of various forms and then they need governance throughout that. And so we knew that that meant we needed to put all that together integrated in a platform. And so my role has expanded to have that scope of capability. And what form does that platform take? What form? What form? It's delivered through cloud and on-prem and then we also have some physical appliance options as well. But it's really a platform that's focused on how do we help clients along their journey and in doing that, how do we help them embrace open source? How do we help them embrace hybrid and what that means and do it in a trusted way? So why is hybrid an important thing for them to embrace? Well, you know, clients of all sizes have assets in their data. No question about it. Enterprise data that's important to them, that's in their systems, that's ingrained in what they do and they want what their new systems might be to be able to integrate with that. So you could think hybrid cloud, running something on the cloud, integrating with data that's on-prem. And they wanna be able to do that in a way that they're not moving the data to the analytics, but rather the analytics to the data. So that's one thing that makes hybrid really important. The other thing is it's really a world of hybrid data. You know, we think about unstructured from a Hadoop standpoint plus structured relational data. And so it's about how do people do analytics in such a way to be able to get the insight from both without having to know which form the data's in. So we talk, you guys talk about systems of insight, we talk about systems of intelligence and fundamental to our premise around systems of intelligence. Is that notion of hybrid data? We haven't used that term to the point, but it's the idea, and we saw this yesterday actually with a bank from South Africa extending IMS and COBOL into analytics. It was like, really? And so, but the point being, you're taking transactional systems which tend to be structured data and then bringing analytics together to create new business capabilities. So is that the vision and how does that actually turn into products? So it is the vision, but it's not just a vision as in the future, is real today. So let me give you an example. With pure data for analytics, so that would be your structured relational data, we have embedded capability to be able to do queries across that with Hadoop without the application having to know where data is in either the structured side or the unstructured side. And that you can do today. Okay, what about open source? Let's talk about the role of open source generally, but specifically within your organization. We all know FAMBIM is famous for its billion dollar Linux investment. You guys have made a lot of noise around Spark. What does all that do for your organization and for your customers? Well, at the core of our platform, it's about being built on Spark. And it is about leveraging that. So we made the announcements that we did this summer to say, listen, this is important. We see Spark as the analytics operating system. And so we believe for customers to be able to have the analytic workloads that they need in the future, we need to embrace Spark. We need to innovate within the community and contribute it back. And we need to then bring our analytics capability to it. So it's a fundamental element to our platform. We already have 15 of our products that are between analytics and commerce that are leveraging Spark today. And we're getting tremendous benefit from what we can see even in our own development. Is Spark the layer that virtualizes the database underneath so that customers don't have to worry about whether it's a structured or unstructured database? Well, I think one of the real advantages of Spark is this universal access to data. I think that's why it's going to flourish and be this key element of analytic environments going forward. But the other piece that I think is powerful is the programming model aspect and the fact that it makes analytics easy. So let me just give you an example. Our data works cloud service, which is about preparing and cleansing data. It was 40 million lines of code. And then when we went to adopt and use Spark, it went down to five million lines of code. And that's because of the power of that analytics operating system. So are people, they're complaining about complexity in the big data analytics business. How are they using Spark? Are they using Spark as a way to simplify that complexity, build out sort of end-to-end microservices? Are they using it as an adjunct maybe to their whatever, Hadoop infrastructure? How are you seeing people deploy that? So a little bit of both. So Spark came about from a Hadoop background. It came about to solve the challenges of MapReduce. But as it has evolved and innovation has happened, it's really become this basis for, like I said, this universal data access. So now I think a lot more of the adoption is driven from people that are trying to innovate fast around analytics. And it gives them the power to be able to do that. So when you talk about information management, you know, your previous role, you talk about governance, talk about, you know, trust. How does Watson fit in to that piece of the puzzle? So there's a couple of things. One is the cognitive capabilities of Watson. We are putting into various places in the platform. And then the other thing is the platform becomes sort of like a feeder for Watson, a way to help prepare some of your data for Watson. Let me give you an example of embedding cognitive capabilities. Within our enterprise content management suite of products, we have a product called DataCap. It's for capture. And if you think about mortgage applications or insurance claims or cross-border shipments, things that have a lot of different forms and documents, and they're different formats, they're not all neat with little fields and that sort of thing, well it's complicated to then get those organized. And so DataCap now has embedded natural language processing, embedded machine learning, embedded advanced imaging, such that you can scan in a stack of forms, 1040s, other things all mixed together, and DataCap will determine what form it is and then also identify the content within the form, and then it will learn and train itself so that it can now do analytics on the kind of documents you're scanning in. That's just one example of combining cognitive capabilities. So is this a way to address the data quality problem? Are you seeing a big improvement in data quality at the companies that adopt this approach? Absolutely. I mean, just think about what I just described and imagine the manual process that people go through to be able to say, okay, not just the content of the document, but even labeling and filing the document so that you know that this one's now a 1040 and this other one's an application and this other one's something else. So it learns the content of the document and there's a structure of the document so that it can classify data appropriately by figuring out what the document is. Right, but it can be handwritten notes, it can be text, it can be photos. You know, for many, many years now in enterprise content management, there's been the ability to scan in forms and from those forms, scanners would recognize fields and they could then store that as either metadata or the actual content, but this is now about learning more about it and it not being a structured form. So in the information management world where you lived in a previous role, the challenge was always how do I, and before this whole big data meme hit, it was always how do I defensively delete information and then all of a sudden information became this great asset and information, the data's the new oil and we had a minute, et cetera, et cetera. So how are you approaching that balance between information as a liability and information as an asset? Are you able to talk about classification? Are you able to automate classification maybe at the point of use or creation of data and how has that affected your ability to both delete data and information when you work in process in a pharmaceutical company for example that you don't want hanging around that could be a smoking gun versus maintaining that data. Are you using analytics to figure out that balance? Yeah, so I say that if you're not able to get the insights out of the data then it really does become a burden to you at the end of the day, right? You've got to protect it and store it, et cetera. And so in addition to the example I was talking about with DataCap we have capabilities around big integrate and big quality that are based on a Hadoop stack that will actually cleanse and prepare a massive amount of data and automatically define the lineage of it. So now you know what you can trust in the data as a part of it. You mentioned the trust as part of your strategy. What does that mean? Well, you need to know where your data came from, the lineage of it. You need to know that you can trust those aspects so that when you get the insights you believe in the insight, right? You know, think about how many times people say well I don't believe the data so consequently I'm not going to believe the insight. So it's about trusting that and it's about capabilities that do that for you, catalog it so then it's easy for different people in different roles of the business to be able to understand what's in the data and communicate with the data. And how about last question I have is cloud. Can you give us your perspectives on cloud, how it relates to trust? What are you seeing there? Well, I think we're just on the edge of seeing even more explosion around cloud and how different applications again will come together kind of in a hybrid world around what enterprises are going to be doing. You see some examples with our relationship with box, for example, and the fact that they're quite a trusted cloud container for documents and content and now with our enterprise content management capability you now got the data cap piece I was talking about earlier and you've also got the enterprise hybrid connection to be able to connect those documents with your workflows within the business. So last question I said, the real last question. Any surprises in your new role, any pleasant surprises, unpleasant surprises, you don't have to tell us those, but anything that's shocked you? Well, no, other than the fact, I think customers are moving or have an appetite for embracing cloud and open source much more quickly than in prior errors of technology disruptions and it's about helping them do that and helping them use those sources of innovation for how they transform themselves. All right, Beth Smith, thanks very much for coming on theCUBE, we'll have to leave it there and really appreciate your time. Great, good to see you guys. All right, keep it right there, we'll be back with our next guest. This is theCUBE, we're live from IBM Insight 2015 at Mandalay Bay, we'll be right back.