 at Wikibon headquarters in Marlboro, Massachusetts, and as we've said a number of times, big data is big business. And I'm here with my co-host, Jeff Kelly, and Jeff, one of the things that you said to me early this year is that you really wanted to dig in to the use cases of big data. I mean, at SiliconANGLE and Wikibon, we love to talk about the technologies and the companies and the startups and the whales and the barracudas, but really what Jeff, you were saying is you wanted to dig into how people are actually applying big data technology to create a business capability, and that's what we're gonna talk about today, isn't it? Absolutely. I think one of the things, one of the premises we've laid down was that really big data practitioners are gonna build a lot more value than the big data vendors themselves, because really there's just so much potential in all that data that's being created, both in general and internal data centers and outside on the World Wide Web and in other places. So, I absolutely want to dig into some of the use cases, what people are actually doing with big data. So today we're joined by Eric Sall, who's vice president of product marketing in the information management group at IBM. Eric's been, of course, playing a pivotal role at IBM in terms of building out their big data platform and working with customers in terms of understanding the best use cases. So, Eric, welcome to the queue. Thanks for coming on. Thanks, Jeff. So, before we kind of dig into the use cases that you're seeing most commonly, why don't we start with one of the issues I think we see with a lot of companies that are getting started with big data is sometimes they invest in the technology first before they identify some of the more, the initial use cases. What's your experience in terms of when you see companies doing that, is what should come first really, technology or the use case? Yeah, well, I think the, what comes first is really the use case because there's a lot of excitement about big data technologies because kind of what that enables. But the fact is that it's not the technology that is exciting, it's what you can do with the technology and people understand that viscerally that with all this data coming from all these different places and increasing volumes of coming at you faster than ever, there's so much to work with. So, really what you have to focus on first is what am I trying to accomplish and then you apply the big data technologies to solve that problem. But Eric, we need Hadoop. Do you hear that a lot from customers? We got to get on this Hadoop thing. Yeah, yeah. Well, people do say that and some people even equate big data with Hadoop and Hadoop is obviously a very important part of big data. Big data is much more than that. It has to do with your structured data as well, your warehouse. It has to do with streaming analytics. It has to do with predictive analytics. It has to do with content. So all of this stuff comes together to solve some new kinds of problems. So when you go into a new customer situation, they're starting to think about big data. They say things like Dave just mentioned. Well, we think we need Hadoop. You say, well, wait a minute, let's talk about use cases first. How do you kind of the first approach to that conversation? Is it, you know, let's focus on a vertical application is a more of a horizontal case, a use case you're looking at. How do you actually start to identify those initial use cases that I can give them customer side? Yeah, what I do first is kind of focus on try to get them to talk. You know, what are the problems you're trying to solve? And sometimes they go down a vertical path, but sometimes they go down a horizontal path and either path is fine because it's really about understanding their problem and then how these technologies can help solve that problem. So that's the way we think about it. And there's a difference in big data also from traditional technologies because in a traditional technology, the way it's worked in the past is you have your, the problem you're trying to solve and you structure a schema and a database and you kind of like have it all figured out upfront. These are what my users are trying to do. And so therefore here's what I'm delivering. But in big data, it's a little bit more interactive and creative and iterative. And so it's more about, okay, so what's the kind of business initiative you're trying to solve? What kind of data do you have? What do you want to know? What are some of the things you'd like to explore and understand better? And so it's a little bit more creative and it's a little bit more unstructured, not only in the data, but also in what you're trying to do with that data. Last year we did a survey and we asked users, big data practitioners, what's the biggest challenge that you faced? And one that came back, a theme that came back was we're trying to figure out what to do with all this data. We're trying to figure out how to monetize it. Are we beyond that or is that still a big theme? It's still a big theme. People are still trying to figure out, in some cases they've got a lot of data and they know there's value in it, like telecommunications companies, for example. They have a lot of information about who's calling whom. They have a lot of information around usage of their cell towers and drop calls and how many people are signing up at what times of the year and all that kind of stuff. They've got tons of data in these call data records and what they're looking for is how can I use this data to serve my customers better, to retain them better, to offer new kinds of pricing plans which might make them be more connected to me as a company. So we're seeing that very strongly in telecommunications. We're seeing similar thinking in financial services and banking, in retail, insurance. All of these companies are trying to do similar things is use the data that they have to improve that customer relationship, to optimize operations, to reduce risk. All of those things are possible to solve by just more creative thinking about the data you already have. All right, so let's dive right in. So I understand you've identified five use cases, really, that you've identified over the course of time kind of working with some of your clients. These are some initial use cases you found to be most effective. So let's just dive right in. So what's the first kind of big data use cases you're seeing? So these use cases came from real analysis of lots of engagements with clients and we really start to see these patterns. So the first one is this notion of big data exploration because virtually any company of any size and even smaller companies have lots of information of different types and different sources in different places. Some of this data is structured. Some of it might be unstructured. Some of it might be within the walls of their company. Some of it might be coming from social media or from some other vendor that they don't actually own the data but they have access to the data. And what they're trying to do is solve certain problems. So the idea here is to explore that data, it go across all these boundaries and try to get it at an answer. So an example of a client that's doing this is an aircraft manufacturer. If you stop and think for a second, they've got a very complicated business. Their product has got thousands of pieces to it, sub-assemblies, and by the way, airplanes now have a lot of software too. So it's a complicated hardware and software product. And not only that, but it's customized for every customer. So now you've got this complicated product that's customized for every customer and a problem happens. How do you know where that problem came from? Is that problem gonna recur? Is it problem, how serious is that problem? Does it affect other clients as well? This is literally a life and death matter for them. So what this client is doing is they've created a maintenance war room based on exploring all these different types of data service reports from the field, how things were built originally, anecdotal data, all this stuff coming together, and they're doing it to try to quickly solve maintenance problems. And they think they're saving about $36 million a year just based on the ability to bring together this information, solve problems more quickly. Yeah, that's an interesting example because a lot of people when they think about it, for instance, airlines, you think about the booking system, but the real mission critical system is the maintenance piece. Yeah, if you miss a booking, that's all right. They're gonna fill up the flight anyway, but the maintenance piece, like you said, it's not talking about lives here. That's right, absolutely. And it's a lot of different moving parts as you mentioned. I mean, you've got to bring data together from both the software, how the hardware's performing. I mean, there's a lot of different areas there. But the idea was struck me that you mentioned, it's kind of bringing in data from different areas and breaking down those data silos, which of course is a problem we've had in the enterprise for years. We've taken different approaches, tried to breaking that down, sort of bring it all together in enterprise data warehouse. Sometimes I'm in DM initiatives, kind of trying to focus on that. Do you think big data is really gonna solve ultimately that data silo problem in the enterprise? Well, it's a big part of the solution. You know, it's funny, because big data is part of the problem too, I would say, because the data is growing in such leaps and bounds that, and there's no stopping it. You're not gonna stop the data growth. You're not gonna stop the fact that there's more types of data coming in. The real thing is what can I do with this data? How can I get value from it? So it's not just adding to the cost of your infrastructure, but it's actually giving you value in your business. So take that example that you gave. How did IBM specifically help the client solve that problem? So we help them, in that case that client is using our InfoSphere Data Explorer product, which allows you to, it is able to interact with lots of different types of information as connectors to all sorts of different sources, different enterprise applications, as well as raw files, and be able to put it together and it can create a clustering of related information so you can more easily get the meaning out of the data. So it's not just accessing it, but it's actually, how am I gonna interpret that information and find the answer I'm looking for? So the outcome is really making people more productive. I mean, the old cliche, you can't take the humans, humans are the last mile in big data. Is that right? Yeah, but that sounds like a very soft benefit, right? People being more productive when they can go home a little earlier or something like that. But this is not a soft thing. This is a very hard thing, because if they've got a serious maintenance problem, that plane can't be in the air. So they're losing revenue every single day when that's happening and they're losing, their clients are losing revenue, but they're actually losing future business if they can't get the planes back in the air. So it's compressing time to resolution. That's right, and this is very serious business to them. Yeah, yeah. So let's move on to use case number two, kind of getting that 360 degree view of your customer. Talk about that a little bit. Yeah, so of course 360 view of the customer that's not exactly a new idea that's been going on for a couple of decades, right? But what's new is the fact that clients are now looking at, how can I get closer to that customer by leveraging everything I could possibly know about them? Because what's more important, whether you're selling to a consumer or you're selling to a business, what's more important than knowing that customer really well? And knowing what they want, knowing what they're thinking, knowing what products they have from what products they're interested in, et cetera. So this is about extending that 360 view to incorporate new sources. It might be things like what they're saying in social media. It might be what they're saying to your customer support center that might be captured in call records and texts that's not easily accessed in the past. And how can you take all this information, put together a better profile of what that customer really wants and serve them better? So an example, there's a client here that we have that is a medical equipment manufacturer. So they're B2B type of client. But they have a call center. And so their clients call into the call center, sometimes it's with problems and complaints. Sometimes it's to check the status of an order. Sometimes they wanna ask a question around capability or about another product. And so the fact is that when you have a person from your company talking to the client, that's called a customer facing interaction. You want to maximize every customer interaction for everything it can do to make that customer more loyal, to increase the revenue per customer to make sure they don't leave, et cetera. So what they've done is they've implemented a procedure or initiative they call one more question. And the idea here is to take that interaction, solve the client's problem, answer the question, and then ask one more question that's highly customized to everything you can know about that client. Not only what question they asked with, but what products do they own? What problems have they had? Have they paid their bill? All this type of stuff, right? And so the idea here is that you've got them on the phone, that's a golden opportunity to ask the right question. But it has to be the right question. If they called in complaining, you don't wanna try to sell them something else, right? Or if they haven't been paying their bills, you probably don't wanna sell them anything else, right? You wanna ask them to pay their bill. On the other hand, if maybe you've introduced a new product and this client hasn't bought it yet, but it goes really well with the product they have. So you might wanna say, hey, do you know about this new announcement we made that would go really well with product A? We've got now product B. But you can only do that if you armed that customer-facing employee with all the possible information so that they can maximize that golden opportunity. So what's the tech and the solution behind this one? Well, this is using a combination of capabilities, master data management, also uses the data exploration types of capabilities as well, information integration. These are all parts of the solution here. And again, the reason to do this is to really arm that customer-facing employee with just the right information they need at that moment when they need it. It's at the point of impact. And that can relate or be applicable to both B2B, as you mentioned in that example, but of course, B2C. Are there, when you're dealing with customers, are there significant differences in the way you approach that issue if it's a B2B versus a B2C player? You know, it's really pretty similar. A customer is a customer. Of course, the actual content will be different, right? And the types of interactions you might have. With a B2B, it's a little bit more complicated because you might be interacting with a purchasing agent or you might be interacting with an actual user or an IT person or whatever. So it's got to be customized to the role that that particular person plays. But conceptually, it's pretty similar in the sense that you're trying to maximize that client relationship in whatever form it is and do it in a customized way because then you can get a better benefit. All right, so let's move on to use case number three. Okay, well, the third use case that we've noticed as a pattern is really extending security and intelligence systems. Now, as we know, security is becoming a bigger issue in this world in a lot of ways. And of course, this can be things like perimeter security as somebody breaking into your building or your house. It can also be IT security type things and it could be things like threatened fraud as well. And now, of course, the particular situation here is you're dealing with a bad guy and bad guys don't cooperate. They don't try to do things in a way that makes it easy for you to figure out what they're doing. In fact, they try to do the opposite. So the idea here is that companies that are trying to extend their security systems, they're trying to use all available data. So that increasingly includes video content but it's also things like comparing video content with financial content, with email traffic, with social media. It's all these different types of information that can be pulled together to get a better picture and to find more connections, patterns in a very unclear situation. So it's about making a security solution better by leveraging more types of information. So a lot of the investment in security, as you know, has been on logically keeping the bad guys out. And as you pointed out, it used to be the bad guys would get in, they'd spread a virus, make a lot of noise, take credit for it, but now they don't want you to know that they're in there. I saw a stat the other day that after an infiltration, it's on average takes over 400 days for a corporation to recognize that they've been infiltrated. So the security model seems to be shifting from not, you know, 80% of the spending is keeping the bad guy out and it seems to be shifting to, okay, is he in or she in? And then how can we find that person? Is your technologies and your solutions involved in sort of extending that probe and widening the scope of security? Right, so yes. And you know, again, we're not talking just about IT security, but actually physical security as well. But yes, it's always changing. I mean, that's why it's the bad guys not cooperating, right? They're getting smarter, they're getting more clever, they're coming from all around the world. And what you really want to do is outsmart them or at least understand what's happening and try to be able to combat it as quickly as possible and to be able to investigate it after the fact too soon. An arms race, you're saying. It really is an arms race. And again, it's like this is reality of security, right? It's never gonna stop. It's about getting better and better and better. Okay, we seem to have a pattern here. How do you guys do that? Well, I think in the case of security extension, it's very often about dealing with more sources of data. When you talk about big data, very often people talk about this volume, variety, velocity, right? And in the case of security data, it's a lot of it has to do with a variety element. It has to do with things like video and audio information which can be very, very helpful. And it has to do with taking in machine data too, like from a security system. You know, what door did they come in? You know, it's like something like that is can be useful information, right? And it's also about, in the case of security, it's also about velocity because the fact is that you want to detect that somebody's trying to get into the building as it's happening, not the next day when they've already been in and done what they've done and left, right? So it's really about trying to be more real time in detecting all sorts of unknown security breaches which might be happening. So you mentioned video. That's an interesting type of data that's now being brought to bear in terms of security. What are some of the things you're seeing companies doing out there with the video? I mean, imagine it relates to physical security. What are some of the interesting things they're doing with that? And what are the technology challenges you face as a vendor in terms of developing technologies that can really harness and make sense of that data? Right, well, you know, every source of information has its own like formats and usually there's lots of formats and lots of different challenges. You know, with video, of course, you know, the video you have to get not just at the metadata but you're trying to get at, you know, the meaning of the video as well. And so there's increased, and of course, if you've got kids, you know, if there's increased amount of video online, they're constantly creating video. Everybody's a producer now of video. So there's a lot of stuff online too. So it's not just about security, it's also about things like a consumer products company understanding what people are saying about their products, right? And, you know, companies are really trying to, you know, embrace that video content as an actual source of understanding of their customer and of opportunities that are out there. So let's move to a number for kind of operations analytics. Tell us what that's about. Yeah, so the other, one of the other really fascinating things that's going on right now is that, you know, there's this internet of things where there's a lot of, you know, intelligence out there, intelligent devices. These can be IT systems like with logs and stuff like that and click streams, but it can also be equipment like an oil rig or, you know, an oil platform and everything's got data throwing that's throwing off now. Electronic meters is another example. There's intelligence in cars now as well that is coming out, data's coming out of that. So everything is throwing off this data and the opportunity there is to use that information to make better decisions around whatever you're trying to optimize. So an example of this would be like the smarter buildings type of thought, right? So you may have read about there's a lot of interest in new buildings that are much more energy efficient, that are optimized in every way, that are smarter around, you know, detecting upcoming problems so you can do predictive maintenance and deal with a problem before it actually happens. And all this is about using this kind of machine to machine data and use it to get smarter around what kinds of processes you can put in place in that particular, you know, building system. And so it's detecting things like, gee, if the boiler room is overheating, maybe there's some problem going on there, right? Or if there's a leakage in the water system or if there's an unusual anomaly, a pattern of usage that shouldn't be happening in the middle of the night, maybe there's a security problem, right? That you can, so all of this machine data can be used to get better information as well. So what's IBM's play with the sort of Internet of Things, industrial internet, whatever you wanna call it? What do you guys do in there? Well, we're doing a ton of work in our Smarter Planet initiative, and especially in the area that we call Smarter Cities. You know, there's a tremendous amount of growth going on around the world, especially in cities. And a lot of these cities have a real problem because they've got, in some cases, tens of millions of people moving into these cities and they don't have the infrastructure to support it. So for example, if you've got, you know, 10 million additional people moving into your city, how do you move them around? Like, you know, what do you do? How many roads do you have to build? Like, what kind of, what do you do to the trains and the bus routes? And the fact is, power plants and water systems and all that stuff. It's huge as you were in China, right? In China, in Africa, and in all throughout Asia actually too. And the reality there is that, you know, very often these cities have limitations because you can't build, you know, twice as many roads. There's no room to put them, right? And you can't put twice as many cars on the road because they'll all stop moving because there's a car in front of them all the time, right? Being smarter around these systems like transportation systems, water systems, and all the other city services. So we're seeing a tremendous amount of interest in this across various cities around the world like Stockholm, Singapore, Beijing, in places all around the world that are really trying to be a lot smarter about how they do city planning and how they maximize the use of the infrastructure they do have in place. And talk a little bit about the software technologies and other technologies behind this. Right, so in this case it would be a combination of real-time analytics, streaming analytics, where you're taking data off of like say, maybe you're measuring cars going by a particular place so that you can adjust the lights to make them go more quickly. If there's, you know, more traffic on that road, less on the crossing road, that kind of thing. They're also doing things around variable fares and stuff like that for tolls. But the idea here is streaming analytics where you're taking data in real-time, analyzing in real-time, taking action in real-time. It's often in combination of a technology like Hadoop where you might take the streaming information, get value out of that information and then put it into Hadoop for later analysis so you can do longer-term thinking around, okay, so we're seeing this increasing problem on these roads. Let's figure out a solution there, you know, that kind of thing. So real-time analysis, before you persist it, letting the machines make decisions, the traffic light is a simple but a good example, and then persisting it, identifying patterns with massive amounts of data, as opposed to maybe taking bits and pieces of samples, take the entire data set, developing those patterns, and then feeding that back in real-time. One of the interesting things that Singapore's doing is that they're trying to measure people's full travel trip. And, you know, usually when somebody's going to work, like they drive their car to a parking lot and then they get on a train and then maybe they walk the rest away. So if they just measure, you know, how many people are going to this parking lot, they're not really understanding your trip. So they've got a card that's like a, like sort of frequent flyer card type of thing, but it's like a debit card, and you can pay for any of these traffic services with that one card, and therefore they're getting the data that's a better understanding exactly what you're trying to do so that they can improve the entire system. So it's really about finding efficiencies in whether it's moving people around or moving traffic around or finding better ways to kind of run the infrastructure that supports these massive populations. That's right. Interesting. So let's move on to the final example. You've got data warehouse augmentation, which is maybe not quite as a grand idea as some of the things around a smarter cities, but it's certainly going to appeal to CIOs I know out there that have been struggling with this issue. Well, people want to know, is my data warehouse a dinosaur? Yeah, that's right, they do. And I think the other thing is they want to know, am I supposed to take all of this information coming at me and put it in my warehouse? Because my warehouse will be where all my money goes and it'll be the world's biggest warehouse, right? So companies are really struggling with that. What do they do with all this data? Do they persist everything that they not? And what about this data that they don't really know how valuable it is, right? So like say some of the social media data, for example, everybody's interested in how can they tap into social media? But the fact is you're not going to want to store all the Twitter feeds that you can find. I mean, that's just ridiculous, right? So what many companies are doing are doing something like a landing zone for information where they'll bring in some information with this idea of exploring it, like trying to figure out is there value in this? What kind of value is there? Do I want to save some of this data? So which part of the data do I want to save? So that's kind of this landing and exploratory zone. And another thing that's related to this would be like a queryable archive where a lot of companies are also saying, look, I mean, I'm going to keep in my data warehouse the most current data because that's the thing I run my business on. But I sometimes want to go back to the previous data. In fact, I would like to go back several years and sometimes in certain queries, but I don't really want to make all that data slow down my warehouse. So instead, what they want to do is they want to complement their warehouse with this queryable archive, typically in a Hadoop cluster. And what they now are, that stuff is still searchable, it's still usable for queries, but it's not affecting your daily operations in the same way. So it's a way of improving performance and also reducing the cost of that infrastructure. Because it's resident in a distributed manner or is that right? Yeah, and it's because it can be on cheaper hardware. It can be on a cluster, a Hadoop cluster is typically off-the-shelf hardware that is less costly. And you're shipping the old function shipping, five megabytes of code versus a terabyte of data. That's right. So it's kind of a new archive, really. And we're in the old world, when you archive something, it's pretty much, that's the last time you're gonna see that data, for the most part. This is where you can archive that data, but actually still have access to it to actually do some analysis and still get value out of it. That's right. And it's really a new architecture for your data warehouse. It's almost like a new way of thinking about your warehouse, where you have different zones, where you're doing different types of things and they all add up to your information infrastructure. So there's a zone you're using for reporting and analytics. There's a zone you're using for exploration. There's an archive zone. There's where you're doing data ingestion. There might be a real-time zone. And you've got all of these in your architecture because you've got various problems you're trying to solve and you want to use the best approach for whichever problem. Eric, my last question is, can you talk about the role of services in these use cases? What role do services play? I mean, I'm obviously a big services company. I've often, I said it's a secret weapon, but it's no secret. It's a weapon. It's like the big weapon that everybody knows about. Talk about the role that services played here and how it fits. Well, services plays a big role here because whenever you have newer technology where people are trying to solve new types of problems, they don't have all the skills in-house. And really what they're looking for often is advice based on what our other clients doing. And that's where our services arm really comes in because we've got deep expertise in virtually every industry. And so we have real experience with similar clients trying to solve similar problems in your industry. And so people really like to tap into that. But I think there's also the notion of for a lot of companies, they are under pressure. The world has changed around them. Everything's moving faster. Brands that used to be built over decades can now be destroyed really quickly. If you make a bad decision, all of a sudden everybody's complaining about you online and your brand is hit so quickly. And customer relationships are more fleeting. And the customers have more power. They're more demanding. They have an immobile device where they're interacting with you. So for a lot of our clients, they really see a need for transformation where they want to really think about how could they use the big data in analytics to improve their competitive position, their fundamental competitive position as a company. In that case, the services and the expertise that we bring with our global business services is very relevant because it has to do with understanding where that industry is going, not just understanding what the technology is. So I had one last question for myself. So really, if you could speak directly to CIOs and they're struggling with this issue of how do I get started? Not necessarily the technology, like we've been talking about, the use cases. What advice, one piece of advice, if you had to pick one, would you give CIOs who are trying to kind of identify that first killer use case that's going to kind of catapult their whole big data project? Yeah, well I would definitely say start with whatever business need you have. Right, so this idea is, what is the fundamental thing that your CEO, your CEO is worried about? And then think about what information you have that could be utilized to help that CEO make a better decision or drive that initiative more effectively. And then focus your big data effort around contributing to that important business initiative. Because that's the way to make things happen. It's the way to come up with the right focus for what you want to do. And it's also the way to get the budget and senior sponsorship you need in a big data initiative. All right, Eric, listen, thanks very much. I appreciate you coming in. A lot of meat on the bone with IBM's story, you say deep industry expertise and really appreciate your insight. So thanks for coming on theCUBE. And thanks for watching everybody. This is Dave Vellante with Jeff Kelly and we'll see you next time.