 Hi, everybody. This is Jeff Kelly coming to you from Wikibon World Headquarters in Marlborough, Massachusetts. So of course, it's no secret to anyone that big data is really upending the world of traditional world of data warehousing. There's new capabilities like Hadoop and other things that are really bringing new processing storage and analytic capabilities to the enterprise. Joining me today to talk about the kind of this evolving landscape, how big data is impacting data warehousing is Frank Fillmore. He is the president and founder of the Fillmore Group of Consultancy, specializing in IBM and information management. Welcome to the Cube, Frank. Oh, thank you for having me today. I appreciate the opportunity. Absolutely. So Frank, you've been around for a while, you've founded the Fillmore Group back in 1987. So I'm curious, have you seen anything kind of as disruptive to the data warehousing market as the current interest in all things big data is? Well, I have to be honest, I've been in this business almost 35 years, so I've seen a whole bunch of the next new hot, cool disruptive technology. I mean, I go back to the days from when 4GLs, 4th generation languages, were going to take over the world. But I really do think big data is going to have a lot of traction. One of the different things that I see regarding big data that is different is that the pervasiveness, it's not something that is starting in the IT community and spreading out, it's something that is really within the community at large in the business world and even in the person-to-person world and is growing from there, it's much more organic. That's an interesting point. So what are the implications for that? Does that mean that the kind of the, your clients, the folks you work with, data warehouse administrators, DBAs and others, are they being, is it kind of coming to them, big data is coming to them, they're getting requests from management from the business side to start working with big data, and if that's the case, how is that kind of different from some of the other trends you've seen, where maybe it's started in the IT world? There's a lot of what we call management by airline magazine. So somebody who sits on a flight and reads the American airline magazine stuffed into the back of the seat in front of him and sees something about big data and comes back and asks the IT department, the CIO or the CTO and says, what are we doing about big data? And that becomes the motivating force. But I think what's happening is that we have matured transaction processing. We know how to do that. We can do ATM transactions. We can sell stuff. We've done that in e-commerce now. We're selling orders of magnitude goods year over year increases in a variety of different ways from your phone, from your computer and from your tablet. So what we're finding now is that companies are looking for new opportunities away from the mature that is engaging customers at a level of intimacy that they had maybe had not before. So instead of waiting for a customer to come to you and say, I want to buy a book where you sell it to me, which was the original Amazon play, it's much more about reaching out to people and saying, would you like to buy or have you considered? And that's where I think is a major change. Yeah, absolutely. That certainly resonates. So what is the impact of trying to bring these kind of technologies that support that kind of processes around reaching out to customers and the other things that big data allows you to do? How is that impacting the job of a data warehouse administrator who's running the existing data management infrastructure there? They've got to make sure the data warehouse is up all the time. It's mission critical. And now they're being asked to bring in new capabilities. How are they tackling that? What are some of the challenges and then maybe some of the ways they're adapting to these challenges? Good question. So first of all, I don't think that the traditional data warehouse is going to go away, again, drawn on my experience probably longer than I'd like to consider at this point. I remember in the 1980s when predictions were afoot as to when the last mainframe was going to be unplugged. And I remember that the over and under was 1992 as predicted by Computer World. And as we all know, there's still a lot of mainframe computers out there. A lot of big businesses are running and have been doing so very successfully for a while. So I don't think we're in a situation where the data warehouse is any more at risk than the mainframe was. I think there's a place for data warehouses for accumulating transactional data and providing it in an easily digestible format on a regular scheduled basis. I think that's going to be the role of the data warehouse. So I want to know what our quarterly reports, our daily reports, our sales, our KPIs, those types of things are still going to happen. What's going to be different is that we have looked primarily within the enterprise to build the data warehouse, as I say primarily. But we have taken our transactional data and we have modeled that and massaged that. And we have put it into a form which people can use to make business decisions. And that whole process is what's being upended now with big data, because we had a great deal of control over the metadata, the data about the descriptors of the data. What we're doing now in big data is we are determining the metadata by the data content itself on the fly as the data is coming across a Twitter feed or Facebook pages or any of a number of variety of other streaming sets of data. We are inferring what the data means by its context. And that's what's fundamentally different because before we would impose context on the data based on what we knew about where it came from or the system that it was a part of or something like that. So in the past, where you would model a data warehouse, you would basically understand and know the questions you wanted to ask ahead of time. Now we're trying to take advantage of these new technologies, the scalability, the ability to store large volumes of data that's coming in in near real time and actually make decisions much faster. And again, that involves really, as you said, not necessarily applying any kind of a schema to the data ahead of time, kind of schema on read, if you will. So again, what is the real impact? Or I should say, what are the challenges of bringing that type of technology into the existing environment? You mentioned they're not going to replace existing data warehouses. And I think that's accurate. But obviously they play together. So in terms of integrating the two technologies or the different technologies that already exist in the enterprise, what are some of the, again, what are some of the challenges and what are some of the techniques you're seeing your clients use to make that a seamless process? Or as seamless as possible, I should say. Yeah, well, I understand as easy as we can. We've all heard about all the Vs, velocity and veracity and volume of data that comprise big data and differentiated, set it apart, orders of magnitude differences in all of those things. How fast we get the data. We work with customers whose data warehouses are refreshed once a day. And we're looking at streaming feeds that need to be evaluated in the thousands of a second in order to make a business decision in terms of stock purchase arbitrage and those types of applications. So that's one of the differentiators. So what we're seeing is that we need to use different models to manage the data. And that's one of the reasons that something like Hadoop and MapReduce are so powerful or so attractive is that we're using commodity hardware. And instead of putting data into maybe a very expensive server, what we're doing is we're unleashing a whole bunch of inexpensive servers simultaneously to try to arrest that volume and velocity problem. So you're out there in the trench to share with us some success stories from your clients who are starting to really leverage this technology and actually delivering some business value with those initial big data deployments. Well, the one story I'd like to tell back in the early days of data warehousing and data mining are the old diapers and beer story about the guy who went into the convenience store after 6 p.m. to buy diapers. And what was the 70% correlation with a second product? What was the second product? Then it was beer. And this was supposed to be one of the transformational stories in data warehousing that nobody ever thought to ask the question, what do you buy with disposable diapers after 6 p.m.? But those were the types of answers that you were getting from data mining. They're the questions that you didn't know enough to ask. So one of the stories that I use was the recent resignation of Pope Benedict, a business opportunity. And I'd say that it was, first of all, black swan. No one had resigned the papacy in 600 years so nobody could anticipate it was there. But if I were a travel agency and I was monitoring Twitter feeds and I was looking for certain hashtags, I could tell by sentiment analysis that people were interested in this event and I would try to have within 24 hours a customized travel itinerary in the inbox of the people who had been tweeting about the resignation of the Pope, sending them the room for the conclave and the installation of the new Pope. So those are the types of things that would never be considered before because by the time we gathered the data and scrub the data and put it into the data warehouse and we're able to drive some actionable, some fruits from all that a week would have passed and the opportunity would have been lost. So that's again one of the big differentiators and big data. So we're gonna be able to take advantage of here and now opportunities that the traditional data warehouse is just too slow and not agile enough to react to. So what is your take on the actual technology being provided by the vendor community right now? Obviously you're very familiar with IBM's portfolio of data management products, both on the more traditional, I guess you would say data management tools and databases as well as they've got, they've been pushing very hard in the big data space with their big insights platform and their pure data system appliances. What is your take on the technology itself that's being offered by the vendor community? Is it up to I guess enterprise standards? Is it easy enough or simple enough for traditional enterprises to adopt? What is your take both IBM specifically but generally in the industry as well? Well it's a good question and I had just attended some IBM announcements at the beginning of the month at Alamedin Labs on April 3rd and one of their IBM's new offering is their pure data system for Hadoop and it is an integrated platform with all of the software necessary to access or initiate a big data analysis. IBM's big insight software which is their Hadoop distribution and then all the piece parts around Hive and HBase and all of Jackal, all of Jason, all the different pieces that you need in order to develop MapReduce applications and make them successful. So one of the questions that I asked one of the IBM executives was, okay if the value prop is I can get a whole bunch of commodity hardware together and I can make this work very cheaply, you're gonna sell an enterprise level IBM battle tested mission critical type application server, the pure data system for this purpose for Hadoop. Doesn't that subvert the whole value prop and they said the answer I got was and it was very insightful. We're there for phase two. Phase one is the sandbox that you go out and you buy some old servers or you redeploy some old servers that are no longer being used and you install all this software and you make it work and you get some answers that you hadn't anticipated before. You see an end to end proof of concept. But you have learned from this experience how difficult it is to get all this software to work together and to get all of it configured and now you say okay we're ready to deploy to the enterprise and we're gonna start depending on this to make business decisions the same way that we depend on our data warehouse. It's gotta be backed up. It's gotta be available 24 by seven. We can't say it's gonna be down for a week because one of our commodity servers took a hard drive hit and we didn't have a replacement laying around. So how do we do that? Well, we buy the appliance from IBM, we have all the software pre-installed, we have all the hardware integrated and we're ready to run. Yeah, there's very interesting comments. You know, I particularly liked the point you made about when they're doing these kind of experiments early initial deployments with some of the early adopters as you said, they're learning while this is a lot more difficult maybe than we thought. You know, that you think of Hadoop and inexpensive running on inexpensive commodity hardware but that doesn't include the expense or the time it takes to fashion that together and to actually make it work optimally. So yeah, very interesting take from IBM and what they're doing in that space. So last question, I wonder if you could give some advice to CIOs out there who are maybe just starting with big data, they're just starting to even think about it, haven't even done those initial kind of experiments or initial deployments yet. If you can give one or two pieces of advice to those CIOs out there thinking about this, whether it's from a technology point of view or people in process point of view, what would it be? Good question. And I would say that drawing on the experience from the data warehouse, back when data warehouses started to really take hold within the enterprise back in the mid 80s and then certainly or the early 90s, one of the biggest causes of failure that we saw was that people tried to boil the ocean. I know of a number of different large institutional customers that I worked with who tried to do the comprehensive be all and end all data warehouse and they were gonna model every piece of data that they had in all their transactional systems and put it into the enterprise sized data warehouse and then they were gonna be able to get the answers that people needed in order to run the business and that was the build it and they will come model and a lot of those failed because people ran out of time and money and patience. If you have a development cycle that's gonna last 18 months to two years and you're gonna be sinking a lot of money into this effort and you don't see any payback, you may be an ex CIO sooner than you'd like. So my advice to the folks that are getting ready to embark on a big data journey is the same thing that we came to realize in the data warehousing world which is start small, reduce the time to value to something that is manageable, a business cycle quarter. You should be able to have something up and running, especially if you take advantage of some of the IBM offerings that are prepackaged and have the time to value very quickly and be able to get some return on your investment very quickly so you can demonstrate the senior management. This does have value. These are the types of, this is the type of information these are the data points that we can deliver to you that we were never able to do before. Do you want us to expand on this? And that's the way that I think you're gonna get buy-in and support and ultimately success. Okay, great, some great advice there. So Frank Fillmore from the Fillmore Group, thank you so much for joining us. They really appreciate it, some great insights and hopefully you can join us again and we can continue the conversation. I'd like to do that. This was very enjoyable. You have a good afternoon. You too. Thanks everybody.