 Okay, we're back. This is Dave Vellante of Wikibon.org, and we're here live at IBM's IOD conference. This is the second day for us. We're going deep. We went eight hours yesterday. We're going well into the evening today. I know a lot of you hanging with us in the East Coast really appreciate that. We're here at the Mandalay Bay in Vegas at the IOD conference, which has historically been an information management, governance, how you get rid of data, what key information risk, and it's really become a big data conference. I like that move that IBM has made. Think big. That's really the theme of this conference. IBM is really putting forth its visions, its technology visions, its roadmaps, its best practices and interactions with a lot of customers. There's business being done at this event and this is a big data week for us. We'll be at Strata tomorrow. We're taking theCUBE to Strata. The team is down there setting up my colleague, John Furrier's there with the rest of our team and I'm here with my co-host. I'm Jeff Kelly from www.moohebond.org, big data analyst and we're joined by another great guest, Andrew Bombri, vice president of Big Data for IBM. Thanks so much for joining us. Another person from IBM with Big Data in their title. Yeah, so when did you actually get Big Data in your title? When did that come about? So this was two years ago. Two years ago, so you were early to the game. Almost, this has happened right at the time that we are here and I've been in data forever, right? So I've spent my whole career in relational databases and so Big Data seemed like a natural evolution but it's nice to have Big Data in your title because people go that, oh, I deal with Big Data and I said, I'm the VP of Big Data, how about that? We can partner. So you started your career in RDBMS so back in the day if I had a data problem, I would buy a box, buy a UNIX box from IBM or whomever and I might buy some licenses for an RDBMS and maybe some storage and I would put all my data into that box. That changed a few years ago, didn't it? Yes, yes. So we've seen a natural progression here, right? That you're absolutely right. That the days, if we step back, it was all about that application request being really serviced by an operational data store, right? So we saw CRM applications, CRP applications and there's a operational data store behind it and then people started really building that you do ETL out of these and start building warehouses, right? You want to do business intelligence on those warehouses. You want to and then that kind of, but that was still like this was a rare view look, right? That you were getting out of the business. Yeah, what happened last week, last year, last month? Yes, last month, you know, last quarter and then we started seeing people do descriptive analytics on the warehouse, right? So the mean, median, aggregate, those kinds of things and but still the sizes of the and they were pulling data from various operational or multiple operational data stores but you know, three, four, five terabytes of data were called warehouses and then of course the sizes of the warehouses started growing, right? So more and more structured data started going in the warehouses. So at that point, it became that, you know, once they reached about, say, they're getting to 50 terabytes, you had to do analytics inside the database. So you know, the model building and you know, doing analytics started getting pushed inside the database, right? So the in database analytics but as the data started continued to grow, right? Here we are still talking more and more structured data. Not all the data, it made sense to just put it in the warehouse. So that's when things like Hadoop started emerging. So people started copying or archiving data into a Hadoop-based system, for example. So they may have, you know, say three, four, five years of data which is in the warehouse but they may have 10, 15, 20 years of data which was copied or archived in a Hadoop-based system. And then the next step was that there's all the semi-structured, unstructured data and Hadoop became this, you know, that you feed it anything and it'll eat it, right? So it became the poly-structured data store. And so that's kind of how things have been evolving but now what we've seen in the last couple of years has been that when it started with that, you know, people were using, they were coming from the warehouse and using Hadoop as a way to copy or archive data but as unstructured data or semi-structured data also got added to it, then it was like, okay, now we need to do analytics on this data, right? So now we are really seeing this analytics sandbox which is getting, which is where a lot of the data first lands and then businesses figure out, right? What is useful, right? Let me explore this data, let me see what is this data telling me and maybe there's some data from that that needs to be moved into the warehouse, right? Because you're not going to just take some petabytes of data and say, without knowing what the value of this data is and put it in a warehouse. So you do ad hoc analytics on this sandbox, you figure out what's useful, you kind of gets compacted at that point, right? So even if you started with petabytes of data, once it's all filtered and analyzed and you realize that maybe there's some terabytes of data that you really care about and if you want that, you know, you're still your BI and analytics that's happening in the warehouse needs to continue, you could move this data inside the warehouse or you could have new kinds of analytic applications that are just running on your analytics sandbox but that means that you can't just stay with what the open source provides, right? The open source community has done, I think, a great job in terms of building the Hadoop MapReduce kind of capabilities from what IBM has done is that we are providing all kinds of connectors that you can ingest data quickly in a Hadoop-based file system. Then you want to run analytics on this data using MapReduce capabilities so that you filter out, figure out what's useful, then you want to index this and you want to be able to, you know, build facets on these indexes, right? So that as you're searching and accessing, it has to be done quickly and then there is the aspect that, you know, where are you going to build your predictive models, right? So now you have this huge amount of data and if you build your predictive models where it's now being built not on a sample data or not on a subset of data, but really on this every, all the pieces of data that are available to you. So now suddenly you have models that are tested against more observations, right? Or rows of data in the relational world and you can have more variables that you are using to build these models. So the quality of your model starts improving. And obviously, you know, as you are able to improve the quality of the models, that's really giving a competitive edge to the business, right? And you can, the other thing that is really unique is that you can run these models frequently, right? Because now you have a lot of, you know, commodity storage that is available to you. So the cost of that, you know, storing these models or the outcomes of these models is not something that you worry about. So you can have, you know, before people would run these models infrequently, right? But now if you have to run them on a more regular basis and because each one of these is going to improve the quality, right? And that is what is going to give you that competitive advantage. It's not just put a Hadoop cluster out there and you suddenly start getting more competitive advantage. There is all this work that needs to be done. There's a couple of things that you said that really caught my attention. So IBM is certainly fond of saying, I've heard a lot at this conference that Hadoop is not big data. At the same time, in a way, what MapReduce did was profound. I mean, being able to leave the data where it is and bring the five megabytes of code to petabytes of data, that really was enabling that. And that really did enable the elimination of sampling. So that's sort of one observation I would make based on what you said. The other is that IBM has, I think rightly said, well, great, we get that, but it's not big data. That's not the whole story. Your analytics story is very strong and you've sort of merged that together with the big data messaging in a very effective way. It's really, I think, changed the perception of IBM. Talk about that a little bit. I think that's the way in which you brought the analytics mojo to the big data messaging. Yeah, because, just ingesting all this data and say we can handle petabytes of data, what does it mean to handle that data, right? So yes, you can store this data. So with Hadoop MapReduce and being able to run MapReduce on commodity hardware, it makes it easier that you can now, without your cost going through the roof, right? You can ingest all kinds of data. But the real value in this data is when you extract the information out of this. When you are extracting information and you can stitch different pieces of information together to really get that what's the picture that is emerging. You know, I was talking to a CIO the other day and so I said, now when you think of analytics, he was asking me, so what is different now, right? And without getting into technically there are a lot of differences, but, so I said, see, before you had the pieces of the puzzle and you were saying, okay, let's put it together to see what picture emerges, right? So analytics was that here are the pieces of the puzzle and let's see what pictures emerge, right? Now, I said, you neither have the pieces nor do you have the picture. And by that what I mean is that you have this gamut of data. From this, you have to first extract what are the pieces that are emerging, right? You have to extract those pieces of information. Now those pieces of information then have to be put together and you will get the picture. So you, analytics has gone to the level that you will extract the pieces of the puzzle, then you put it together and see what is the picture. So to me that is like a big shift, right? Where you're not starting even with the pieces. You're starting with kind of just this data, right? And that, I think, is this is what makes it so interesting and where the creativity comes into, you know, where your creative juices should be flowing now, right? Which pieces are you going to extract from here and then what picture will you put together? And you can use the same pieces to put together different kinds of pictures, right? So this is where businesses really get that competitive advantage, right? That before, if what they knew about the customer was maybe here's the name, a gender of the customer, now when you look at analyze tweets, social media, Facebook blogs, you get different pieces of information, right? Maybe there is no direct information there which says, oh, this customer is a working mom, right? But when you pull different pieces of information together, then maybe you get the outcome, oh, this individual is a working mom, right? It's an inference. It's an inference, you have deciphered that, right? It's not a direct thing that you got somewhere, a word working mom and that's how you decipher it. Yeah, I'm a working mom, no, yes. That's built on models and they learn so that you get better and better the more data you can ask the more. Exactly. So that's interesting because I think certainly there's, you know, the experimentation you can do now. I think, you know, the idea in the past was you had to figure out the questions you wanted to ask your data before you started the modeling and putting together your warehouse and now it's, I don't even know what questions to ask sometimes. But you've got to balance that, of course, with that experimentation with let's attack real business problems. Absolutely. We've talked to a couple of guests today about that issue. How do you balance that? And how do you help customers understand? Well, you know, certainly you need to have a business problem to get started because you don't want to just kind of do a science experiment and have it kind of go nowhere. But at the same time, you've got to foster that level of experimentation. Yeah, so I think, so we've worked with with many industry verticals here, right? And I think I would say that, you know, like, if you were talking to a lot of the businesses, maybe like last year, it was still that, okay, let's do some science experiment here. But as people are seeing, right, and that there is a lot of information, there's a lot of understanding that you can get about your customers, it's really moving very quickly from that this is not a science experiment, right? And there, the easiest way to get started is that to get started, right? That that's the first step, right? I don't think that businesses can just sit back and say that I'm just going to wait for somebody else to do it. Because if they are going to wait for somebody else to do it, then they may not be around. There may not be an opportunity for them to do it because the businesses that are going to do it are going to leapfrog so ahead of the others that it may be, it will become very difficult to catch up with them. So, you know, depending on now, say, if you are in telco, right, and you've been understanding your customers and their satisfaction based on maybe the data that is there in your warehouse, you have to look at that now the volumes of data are increasing, right? So before, if there were, you know, maybe some call data records that you had now with the number of smart devices that people have, you may be dealing with six, seven, eight, nine billion call data records every day. And you don't want to just say, oh, I'm going to ignore this. So you have to have a strategy to deal with this volume of data. You have to have a strategy to deal with the different types of, you know, that this data is not all structured. They may be semi-structured on structured data. And this data just keeps coming at you. So having your data platform be able to deal with not just structured data, but deal with all these different kinds of data, be able to scale out both in terms of, you know, processing power, memory, storage. And because you don't know what your workload is going to be, it's hard to say that what is on a per second basis, what should you build, like for Telco, that how many call data records are you going to handle per second, right? But whatever happens, you have to be able to deal with it. Otherwise, you know, you lose out. So in terms of getting started, you know, build a big data platform. I mean, that's important. It doesn't have to be huge, but it could be a smaller sandbox. And then bring in the kinds of data sources that you know will give you some value. I have yet to see that any customer that when they brought in data that they were not looking at before, that they got insights, which they were not getting before, right? I have not seen that any customer, right? And I'm saying 100% that you are going to gain new insights. Hey, because if there's new data, how can you not get new insights? And there's, you know, I mean, there can be noise in that data, so that filtering has to be done, that not 100% of that is going to be useful, but it's not that 0% of that is going to be, right? It'll be nuggets in there. I'm sorry, Asher, we got in the hook, but I appreciate you coming on theCUBE. Great insights, and we'll definitely follow up on some other initiatives that we talked about. Appreciate you coming on. Thanks very much for the insights. Keep it right there. We'll be right back with our next guest from IOD in Vegas. This is theCUBE. Thank you.