 Edge 2012 is their technology innovation around storage, storage infrastructure, storage solutions, cloud, big data, all the real enablement out of the storage business. I'm John Furrier, the founder of SiliconANGLE.com and I'm joined by my co-host. I'm Dave Vellante at Wikibon.org and we're here with Inhi Chosa, who's an IBM Vice President of the Information Management Group. Inhi, welcome to theCUBE. Thank you. I'm excited to be here. Yeah, well, we're glad to have you. We're going to talk about big data, John. Our favorite topic. If you say you're the big data lady at IBM, so we're going to talk big data. So, first of all, we love big data. I am. So, for the folks out there who know us, we love big data. So, honestly, we're going to talk about IBM all week today and tomorrow, but you guys have a ton of systems experience, ton of database, it's going way back into the inventor side, but I just tweeted out that the inventor of the disk drives a friend of mine's father, Randall Johnson, and you guys are no strangers of tech. You got Watson out there doing big data on a large scale in the marketplace, good marketing. But big data really is disruptive and disruptive in a way that we haven't seen before since the PC revolution. Really changed the productivity equation. It changed a lot of the value propositions in the marketplace, and it has to do with a lot of technology change. So, what is your view of big data right now in terms of what is big data? When you talk about big data, it's kind of one of those terms. It's kind of like, depending on what view you look at, it's different. What is it, help us tease that out. So, you know, big data is a hot topic for everybody, and I would say, for the first time, it kind of extends outside of just the IT arena, but I would couch it very simply as, we talk about four Vs, volume, variety, velocity, and actually veracity. You know, one of the things that I think companies are struggling with is, what portion of the data is the truth? And you're spending a ton of money investing in your existing infrastructure. Well, you forgot a V. Gardner forgot a V that Dave Vellante, a wiki bond. So, those are four Vs. So, Gardner has three Vs. You added veracity. That's right. The fourth V is value. Absolutely. To me, when I talk to people, that's the biggest challenge is, how do I get value? How do I monetize this data? So, we got five Vs now. So, veracity, what is? Well, veracity is truth. I also talk about three Cs, okay? So, any big data platform or technology set of capability should collect, connect, and create value. Collect, connect, create. Which is similar to your value, same as yours, your fourth V. Nice. So, big data, honestly, from your perspective around this, there's a lot of debates around existing infrastructure. In the old data warehouse business intelligent marketplace. Very mature industry. A lot of business being done by big players, brute force, OLTP, a lot of critical systems running in these environments. But, big data, the new open source stuff like Hadoop is creeping up with the next age base and other things where, it's not ready for prime time yet on some of those critical things, but you're seeing that batch meets real time, meets some new use cases. Absolutely. What's your view of all this data warehouse, Hadoop, business intelligence, kind of, this new collection of solutions? So, we've done over 200 different engagements around big data in terms of use cases. And one of the things I've noticed is actually that, in majority of the cases, that clients are mixing traditional approaches, right? Data warehousing, master data management with the new techniques, whether it's Hadoop, new applications, mobile devices, cloud. And the reason for it is, is they want to augment what they have inside the enterprise with data that exists outside the enterprise. And that's a key part of being able to really operate, right? Because they want to automate what they're doing inside their enterprises and keeping up with their clients. So, when you talk about veracity, are we further away from the single version of the truth, further away than we've ever been? Does big data complicate that even further? I mean, the promise of the single version of the truth that was there, certainly for reporting, but it hasn't been there for predictive analytics and we sort of, it's been a rear view mirror BI data warehousing world. Now, big data has this promise of finally fulfilling that dream. What are your thoughts on that? Veracity is an, you know, could be actually an all day conversation. Because you're spending so much on your infrastructure to have a master set of information, right? So that it's cleanse, that you know you have the right people, the right scoring in terms of how critical data sets are. And then when you start to marry data in the public that is written in different tones and sentiments, right? That is only partially there. We actually think of it a little bit differently. You would think, oh, well, do you need to cleanse everything before you put it in? No. We actually think about just very simply connecting the dots. So, recognize that the data externally will always have less, let's say, degree of certainty than the data you have internally. But what it helps you to do is give you a better context than maybe you've been able to have. Because it's a new set of data that allows you to derive kind of new insights for new services and new offerings. So it's very hard to actually get those insights, right? You can get the data, you know, the data's there. And we talked to a lot of Hadoop practitioners. Actually, John is a Hadoop practitioner. So you have all this data and then being able to analyze it. You know, a lot of choices. Do I bring it into a SQL database? Do I, you know, what do I do? And the T's you guys are actually, you know, building connectors and how do you see that all shaking out? It seems like it's very early days. It is very early days. A lot of folks are using, let's say, Apache Hadoop capabilities to pre-process data before they put it into a targeted, let's say, data warehouse where they know maybe something that they want to do, right? Where they understand the schemas. In other cases, what they're actually doing is taking the existing sources they already have of structured data and applying big data capabilities to it because they want to analyze the logs. And the logs could be network logs about IT failures and operations or the logs could be about customer data, what the customers are doing because it's at questions that they hadn't considered thinking about searching against before. So you, go ahead, John. So, I mean, so I want to get your opinion on something that we're seeing a lot in the market and that is the, obviously, big data is all the rage. Everyone has, quote, big data mandates to do something around big data. And the early adopters kind of have specific use cases that they know, but most of the even verticals like pharma and healthcare, like they don't really know yet where these use cases are because they have existing businesses to run. The question I want to ask you is, what do you see around the ease of use of big data because there's two modes. There's the PhD, Masters in Computer Science, Systems Guy who has to architect it all. And then there's the analyst who is the one crafting these use cases. And in a term we came up with at our last cube event was a lot of these big data is like tailored suits. These use cases are great. You tailor them up and they fit perfectly, but they're not that flexible. Get the customers want maximum flexibility in their enterprises. So a balancing flexibility, but tailoring those use cases. So there's an emergence of a class of talent that's a little bit high end right now, yet the people who are going to be architecting these solutions could be analysts and business people. So the direction I think is what you're talking about, meaning the applications for the various users to actually consume. Today it's not very easy to just go to Apache and download Hadoop and start a search. You're going to need at least six or seven key skilled set of guys to be able to. Yeah, good luck finding them too right now. They work for Facebook, Google and IBM or somewhere else complicated situation for a lot of people. It is. I would say IBM and several other vendors are really focused on how to make this a lot more consumable. And we're doing this on multiple levels. One is around the applications that we develop like big sheets is a spreadsheet style, way of visualizing mass amounts of unstructured data. And we also have set up a big data university which is free online. And we've had 20,000 students actually enroll. And you can go through and start to learn sort of the basics in programming. One of the things I think you'll start to see more and more is once these applications and tools and use cases actually become more and more available, you're going to have mass democratization of BI capabilities across the entire organization. Is that going to impact the developer community first? Obviously the developers have to build the apps. Or is that going to be more of the ops guys? Oh, I think it's definitely the development audience. Because when I think about big data, you've got to think about the big data platform in terms of what types of applications you want to build before you begin to think about the operations. Some of the operation pieces will be relevant today but until the application use cases are actually built, it's hard to manage just the ops. But when you talk about the democratization of data, you're talking about putting it in the hands of business people, right? Absolutely. Regardless of skill, that you're not trained. Doesn't the industry have to do that in order for big data to truly have the impact of its vision, which is this massive increase in productivity? I agree with you. We actually did a unique case study for the Academy Awards where they had asked us to run for four hours, do a live simulation, and look at all Twitter feeds, Tumblr feeds, and a historical look at Facebook logs, historical data logs, and marry, as the awards are going on, different trailers and response to the trailers based on the cast, music, content, and during the four hours you could actually see kind of where the audience is peaking, where they're not peaking, what's positive, negative, and then based on that you could actually modify how you're going to then release your set of trailers and or advertising around the movie premiere. So that's a very active way that kind of resonates for anyone, I mean, in the consumer audience, because who doesn't love a good movie? So you bring up a good point. First of all, I think I remember seeing that. It's a really awesome cutting edge work. It's phenomenal. It's hard to do too in real time, but it brings up a good point around the use case of user experience, right? So there's a couple things that you're bringing out. The application there is essentially using data to change the user experience, the application being the web. The other one's mobile. Can you talk about those new user experiences? Because as the users get more socially involved with their channels and the way they use the web, it's not just search on Google and search web pages anymore, it's a lot different. So talk about your view of the use cases, specifically mobile too. One of the interesting things I think with big data is it's spurring a lot of requirements not just around the infrastructure storage capabilities but new applications, especially mobile. And you'll see a huge increase in NoSQL as a emergence. I wouldn't say it's NoSQL. I would say it's not only SQL. And part of it is the high availability and access. Now one of the challenges though with mobile applications and mobile data is you've got to be able to store subsets of that data for a certain amount of time. And some of those new developing tools and programming models don't actually support all the retention requirements for the data. So, you know, this whole area of mobile has huge opportunities. Because that's really, high availability is going to, that's going to put the pressure on the mission critical applications but also the retention issues about latency and data. So like something that's five months old can be really important data. And five minutes, five seconds. So how do you guys view that solution? So a core part of what we have within IBM is called our InfoSphere family. And within InfoSphere we have not only the streams and big data capabilities from Hadoop, but we also have data protection, data life cycle and data governance capabilities with Optum, which is data archiving, test data management, as well as Guardian around data privacy and protection. So we're really looking at the end to end life cycle of the information. You believe that, you believe that the statement you mentioned about the use cases being real for customers right now? Oh yeah, absolutely. I do think the use cases are real. Now what's interesting about how clients are really spending the time spending on big data is they're doing all the things that they were doing before, but in a more creative way, with higher degrees of return than they were doing before. So brand sentiment, managing kind of public data that's available and marrying a subset of that with internal data to understand micro segmentation of their existing purchasing audience and life change moments, because life change moments trigger purchases, whether you're in the insurance sector, i.e. you just got married, have a baby or bought a car, or you're ready to buy consumer goods. Life changing moments have their kind of elements. You saw that news story about the pregnancy test, right? Did you see that big data problem? Oh no. A woman went in to get a big pregnancy test then because of Twitter, the father got notified. It's so real time. Oh my gosh. Did you see that? But the story was even more amazing and the father snapped out and so basically it was notified like my daughter's not pregnant and why would you even send that to me? And then it turned out a week later he found out that his daughter was pregnant. So big data predicted. It was amazing. We call that data exhaust, right? There's all this like loose data that's gesture data and or just, people are producing data with their mobile phones. Oh yeah. Some people call data like the new oil but unlike any other natural, but unlike typical natural resources. It doesn't run out. It doesn't run out, right? From a supply standpoint it's always available and if not it's being created at a faster rate. So I wonder if we could talk about the IBM business model and the economic model around big data. So we've been having a lot of discussions in the Cube about the red hat of a dupe and Pat Gelsinger was on last week. He said there is no red hat of a dupe. There's a lot of open source. We also had Peter Goldmacher on last week. Peter is the Cowan big data analyst and he put forth I think a very interesting premise. He said listen, the big data practitioners, the people applying big data are going to make way more money than the people selling big data solutions. I believe that. Given that, that sort of open source, red hat of a dupe, maybe, maybe not, that practitioners are going to make more money. How does IBM go to market and what's the business model? Is it to really solve those problems, find help those practitioners, create value? Maybe as Tim O'Reilly would say, create more value than you extract. What's the business model there? Well, our CEO, Ginni Rametti actually, one of her key focus areas for all of IBM is to make sure that IBM becomes very essential to every client, which means that we're really focused on clients' outcomes. And as a result of that, we're even changing some of the ways in which we're engaging with clients to say, hey, could we negotiate kind of the value based on what IBM delivers to the value ultimately get on the end as a result? Yeah, as the return versus a traditional pricing of, here's a product priced at a license or a fee. So we have different kind of, we're entering a new time period I think with big data that allows us for the first time to not only harness the new technology, but also potentially do new business models. I mean, if you even think about Rolls Royce, they're trying to figure out ways to price, purchase based on thrust, power thrust, rather than just the engine components and pieces. So in that model, you would share some of the risk and share some of the upside? Absolutely. I'll make a prediction. I'll predict that the vast majority of clients will not go for that deal because they have so much value to create, you'd make a kill on that. No, it's also hard to predict and it's not necessarily consistent depending on the client because as you said from the very beginning, the use cases vary. Yeah, and it's different. So this is a channel opportunity in our mind because maybe the client's server revolution spawn massive consulting dollars around deployments and the use cases. So what we're watching is the balance between the, and it's not a hardware product anymore, it's more integrated, so it's a little bit, it's not apples to apples, but it's close. It's like you enablement technology with big data and then around the ecosystem, it'd be interesting. My final question for you is a little bit different. It's more customer specific. So when you're in front of a CIO or an executive, a large enterprise and they ask you, give me the bottom line about big data. What do I need to do with my data? So we have a huge investment in data warehouses and database management, all that stuff, governance. What should I be really working on right now? What's my top priorities around data? How should I think about it and what should be on my roadmap? What do you tell a CIO when those conversations happen? I would say first and foremost, try to read up on what's happening. I think that becomes really important because I would tell you actually for several clients that I've engaged with, the purchaser or the inquiry that came in was not necessarily through the CIO CTO office. It came in through the line exact. So one of our key clients Vestas, which is in the windmill alternative energy space, the SVP, senior vice president of supply chain came in and said, hey, we need a better way to manage our assets and how we deploy and determine the site to place a windmill, right? Cause these things are like the size of a Hoover Dam. You can't move it once you place it. So if you think about, oh, what's the implication then to IT to predict and model the placement of these things if your IT side isn't addressing the big data aspects of calculating weather patterns or understanding public data or understanding geospatial data or meter data or in the medical arena images, then you end up being behind what the business may actually be requiring today and they may be more proactive than you are. Yeah, I think that's consistent with what we're hearing too about the business line managers who have responsibility want to get faster solutions to the market in terms of top line and cost savings. So it's really more of a business driver but yet that's challenging cause also IT has to be the enabler of that. So that's an interesting challenge. In he chose up and coming mover and shaker within the IBM big data community. Are you the youngest vice president in the history of IBM? Is that? No, definitely one of the younger ones. It's actually kind of a serious question actually but not the prototypical IBM vice president. Great to see, thank you very much for coming inside the Cube and sharing some of your knowledge with us. Great conversation, appreciate it, thank you. My pleasure, thank you. Okay, we'll be back with our next guest after this break and talk more about Big Data Cloud, IBM, storage and innovation. We'll be right back with SiliconANGLE.tv.