 Live from the Mandalay Convention Center in Las Vegas, Nevada, it's theCUBE at IBM Insight 2014. Here are your hosts, John Furrier and Dave Vellante. Okay, welcome back everyone. We are here live in Las Vegas for IBM Insight inside the social lounge at Insight Go. This is theCUBE special presentation here, live on the ground. I'm John Furrier with Dave Vellante. Our next guest is Franz Gill, Dill, partner at Insight Go. Vellante, our next guest is Franz Gill, Dill, partner at PKL, Knowledge Partners. He's a big data guy going way back. Welcome to theCUBE. Nice to be here. Procter and Gamble, I mean, you've got your hands, you've wrangled some data in the past. I've been there, yeah. So, you've got some experience. I want to get your take on it because Bob Pitchiana was just on for IBM. He's old school, he's seen that whole database transition, client server, operationalizing, process improvement, now to utter chaos and flexion point today. Right. What's your take on it? What's your take on the current situation? My take on it is it's something we've done for a very long time, but it hasn't been very well implemented into the corporate structure, if you will. In other words, you could come up with solutions to problems. There was enough data, but there was never the big data that we have today, but it was very difficult to get it implemented into the infrastructure of the company as well. So, there's a lot of, the word engagement is coming out a lot. I didn't have a chance to talk to Bob Pitchiana about this because he had to run, I mean they got some big time announcements coming tomorrow from what I've rumored or rumored or what's going on, but engagement used to be a system of engagement, that's a record, it's kind of a data warehousing mindset. Now we are here in the engagement, social media lounge, and engagement means actually talking to people. Right, exactly. So now you're bridging active data from people, humans into systems, and data changes data, what Bob was talking about. As a data person, out in the field, this is a complicated kind of nuance that people are trying to get their arms around. What we have today though is that we have people that are really interested in engagement. In a sense we didn't have that 20 years ago. In other words you would get the executive, you talk to the executive, you say we want to help you solve your problems and they say fine, and they would say we'll talk to you next month about that. They weren't very interested in engaging you with the data they have. Today, almost everyone knows that it's a data problem, and almost everyone knows that data is being gathered constantly by their devices, by manufacturing systems, et cetera. In a large company you've got many different kinds of data being gathered at once. Executives are very interested in how you take that data and apply it to value to them. And so at least today everybody is interested in talking. It's not to say that they know how to talk yet, they know how to make the conversation, but they want to talk. There's no doubt about that. So in the early 90s for example I worked with P&G executives, and you get maybe one out of ten executives that would have a device on their desk, an actual desktop on their desk, that would be willing to look at the data in something other than printed sheets. And P&G, they were very much dated jobs. Very much so. And P&G is a good example. They were very early, they were data oriented, they understood that data would drive their business, especially in the world of marketing. So very much so, but even there, even there, it was hard to convince people to engage, getting back to your word again. One of the toughest questions that these P&G guys were asking, because, you know, my experience in the consumer side back then, there's probably those age-old questions that come down from the top like, hey, how are we doing this? How are we doing positioning? How's the brand doing? What's the product doing? Are our customers happy? Does the advertising work? What advertising works best? Should we be doing different advertisements? In the case of new initiatives, new products that are putting out, how well are they doing? There was a lot of work going on looking at, okay, we need to know as early as possible whether this brand was doing well enough to keep it moving. Okay, so this issue of speed, which was talked this morning, it was a very important issue. But at that time, it was very difficult to get that data. You bought it from Nielsen, you got it from different services, for example. It took weeks, months to get that data in place. Huge lag. Very much lag. And now, people see, you can get that data pretty quickly. Okay, so you live in an environment where you were getting peppered with the questions, you knew there was lag, you had to send out to the reports come out, it's like that office-based movie, you know, the TPS reports and it's just like, oh my god. Right, right. How when is it going to come in? What month? Now, we're experiencing an amazing inflection point where for the first time in the history of the world, you can measure everything. You can measure everything. So how do you deal with that? How do you just jump in? I'm sure a partner in the game, all these guys are jumping in, chaos. I think the other word that was brought up today was curated. There still has to be the role of a curator, and there are different kinds of curators. There are people that know the data very well, they know how it's been gathered, where it comes from, how new or stale it is. But there's also the analytics person that needs to be involved. And that's, it was very nice seeing things like Watson cognitive, for example, and Watson analytics this morning which is an attempt, and it seems to be a good attempt to try to understand how that curation, that analytical curation process works. And the other thing I think I didn't see there in that particular presentation was it works differently for different people. So if I'm basically curating the data and the results of the analytics to an analyst, that's a different thing from where I'm curating data, for example, to someone who is an executive, for example, who's responsible for a particular marketing operation. So that has to look different and we saw something, for example, like this morning where you could look at the word count, the word map, for example, for how many people were actually looking at something, those are the kinds of interfaces that work very well with executives. They're easy to scan, they're easy to understand, you don't have to start to talk gobbledygook. It's really translating the data into a story. Right. Into a relevant, targeted match between the audience. Well, I was going to get back to it, I was going to get the story because that's another thing that Procter & Gamble and many other companies have started to pull in, which is how do I tell this as a story, get it from data all the way to what should we do, what should we have as a result of this envelope? Dave and I talk about this all the time because we're immersed in this media world and we have our crowd chat venture going on and we're data geeks. We love looking at the data, we've been looking at doing a lot of development on that side with our team there. One of the things that's interesting is that the web and the content market of the internet, go back to the web 1.0, has all the same properties that this new digital convergence, that's not so much about content or marketing, it's really about the engagement piece, so what we're seeing is some very interesting similarities between content or metadata, behavioral data, semantic, contextual data, and that matching, so the idea of providing executives with storytelling is a content problem. Exactly, and when you think of marketing, the way marketing works, this whole notion of creating stories, story boards, is a very natural connection to a marketing world, because they think in terms of what were the five steps that got me here and show them to me in a very simple format and direct me through them and also let's talk about each of the key pieces of it and let's talk about where things could go wrong or don't work with the way we expected them to do or where the data was incorrect. You also mentioned semantics, that's another huge issue for a large company, which is what do the words that we use within a company really mean? They mean different things. Like to give an example from Proctor & Gamble, the word tied is a detergent. It's also a geological and ocean term, for example, something as simple as that. Rain grass and high tide. What does that mean? So this whole notion of how do I understand, what's the ontology of that understanding of what the data, especially when we're working with unstructured data, how do we utilize that, how do we understand it, talking to a medical system, a medical analytics system versus a system for a large marketing company, a large manufacturing company, is a very different thing. Yeah, I want to get your take on this. I want to go out and spin up into the ether a little bit on some concepts, just riff with you. The notion of data operating system is interesting. What you just mentioned is essentially, data is the key resource to dynamically provision an application benefit, for lack of a better description. Meaning, if I want operational analytics, I can take data and make it work. If I want robotics, I can make machine learning work for that. So these are applications of data. If these are things that are going to be applications of data, if that thesis is true, then data is a programmable resource. And if that's the case, it should be a fabric or layer. So that's not your traditional, you know, siloed disk and storing data and, you know, mining it, pulling stuff in and out. That is an ever-evolving organism. Yeah, and do we have, the question often comes up, and I've been in many projects where we would show up and say, do we have the data to do this? And it turns out, we either don't have it, it's in the right form, it's been gathered in a way that's now ancient and can't be used. I mean, there are many reasons why data won't work against the problem that you're attempting to do. And so those are the kinds of things that you will run into frequently. This creates more chaos, which I love, the chaos theory kind of plays in with network theory and all that stuff. Kirk Bourne was on earlier, we were talking about this and he had an interesting point. I like what he said, he actually said it when he was walking off because we kind of kept the conversation rolling. He had like a behind-the-stage scene. But he said that, you know, back when he was doing all this computational algorithmic stuff, when his physics, astrophysics work is that the software package would throw out all the skew data points off outside the mean. Right. And that's why we're trying to bring back the long tail. Absolutely. Those are discovery points that now are absolutely exploratory, one, in real time. And two, are providing amazing insights. Right. So what's your take on that? And do you agree with that? Oh yes, I absolutely agree. In fact, that was almost the methodology in the, you know, 20 years ago was to throw out all those data points. There must be something wrong with that data. So why should we even keep it? And it turns out there's a richness in that data and that data outside the mean, that's of a lot of value. And it makes lots of sense. In fact, we even talked about things like how do we value? How do we value, do we evaluate those particular items of data as they sit outside, they look like outliers. But you're right. Those are the kinds of things that, that's a big change that's occurred. So Dave Vellante wants to get a word in. Dominate. Dave, Dave, welcome to the conversation. Thank you. So you started off the conversation talking about a lot of people in analytics practitioners, we've been doing big data for a long time. We hear it all the time. Big data, big deal. I'm dealing with petabytes. But things are different. Certainly there's technology like a tube and no sequel that are different. You've been talking about some of the cultural changes but I wonder if we could sort of boil it down. Is it different? What's different specifically? It's different for several reasons. One in which we can, we have the tools, we have the analytical tools and certainly the parallel processing tools to be able to look at much more data than we had before. So that's one. The second thing we have is we have the data. We simply have the data that we can work against. In the past, we very often did not have the data. There were holes in it. We just couldn't do the kinds of things we can do today. So that's another piece that certainly is very different from what it was before. There's also an understanding, it gets back to the engagement notion. There's an understanding from everyone from new hires up to the executives of the company that there's value in lots of data. In lots of, not only lots, but also volatile data, data that we would have thrown away in the past, data that we often looked at very closely. There's value in being able to take a closer look at that. And so that's another thing that's radically changed just in the last few years. Technologically wise, you know, machines are faster, some of the methodologies are better, but they have been changed as radically as that notion of we can now engage with the data, if you will, that we couldn't do in the past. So culturally, I'm accepting the data more. Of course, I'm worried, right? Because who's the consumer of the data? It's the B&L manager. And if that insight conflicts with an initiative that I want to drive, what am I going to do? I'm going to attack the data. I'm going to show a different data source. I'm going to confuse my executives and say, and they're going to say, what's the truth here? Are we further away from the single version of the truth? Has that changed at all? No, that in fact hasn't changed, but people understand in the past they didn't need to do with it. In the early days, we basically worked with the executives directly, the executives that would listen to us, that would engage with us. We worked with the data, we got results, but some of the next level of management wasn't that interested in working with that. And so like you said, they would find ways to work around what the data was saying. They aren't able to do that the way they could in the past. So it would seem that processes have to be in place to test the data. And how has that changed? The processes work to understand that business is a process. Business using models like business process models for example. That's a relatively new technology. It's probably been around 20 years, but it's relatively new in the way it's being used inside corporations. So they can say, okay, not only do I have an answer, but this answer exists in a business process model. And if you want to attack whether or not the result is correct or incorrect, you need to go into the business process model and show us why it doesn't work in terms of the way the business process model works. Sometimes even building the business process model gets you a lot of value. Just understanding it, going through it. And sometimes you'll show that to an executive and they'll say, I didn't know that's the way our business worked. So there's a revelations that occur. Once you get agreement on the model on the business process model, then you can go back and say, I want to gather data here, here, here and here. And then I'm going to be able to talk about what are the changes in data? What's the streaming of the data analysis? Can we run again? Can we run clustering or regression or other methodologies against it? But the key is you have to know how it exists in place, in the place of a business model. So I can see that being an interesting discussion too. It's getting agreement on what that business process model should actually look like because traditionally there's some schema that people have a mental model and that dictates the business process model. Now, when we talk to practitioners, they say that the number one challenge they have and tool set they use is data integration tool sets for their big data initiatives. The second one is their existing data warehouse. And that traditionally has dictated that business process model but I'm inferring from your comments that it's changing. How is it changing? I wouldn't call it changing but I think people are recognizing the fact that you have to understand the business process model as well. What you mentioned with regards to dealing with the data and understanding it, that is certainly a big part of it that your business process can do but it's also good to step back from the model and say this is what it looks like. So from our perspective that's what we did. We built hundreds, thousands of models that basically looked at different parts of the business, how they interacted, what the steps that were taken, what resources we needed at each step, what data was needed at each step, what data, how could we do a better process if we had better data at this point. So you could ask a lot of questions that were basically driven by data. So I see it coming at two different directions towards the same result. How about the data sources? I mean P&G is renowned for listening to customers, focus groups, surveys, and the like. Samples. Structured, unstructured. There's all sorts of stuff. We did unstructured work 23 years ago with approaches called content modeling, for example. It was an early unstructured data, text-based analysis techniques. So we've been doing this for a very long time. So right. P&G does that. P&G now has a system called One Consumer Place which gathers lots of information from consumers about what they like best or don't like best about various products. Again, these things are getting very large, they're getting very big. They're also very volatile. As the world changes, as something new trends, you know, the way consumers think about a product may change very radically as well. So we certainly try to innovatively look at all the capabilities like Twitter, like other methodologies to say, how do they change what our data looks like now and what will they change our data as it looks like in the future? So I'd like to ask mathematicians these questions. So we've had some big data practitioners tell us, particularly in the financial services industry, sampling is dead. I think of John Obameta who said sampling is dead. And then at the same time we've had Nate Silveron who takes a small sample of, you know, electoral results or polls and nails the election every time. That's a sample. Well, I don't think it's dead. To be honest, I mean, I've been involved with a lot of sampling-based techniques within with the company. And Procter and Gimbal, there's lots of that. We used to have rooms full of statisticians. Don't have as many of them as we used to have, but we have that today. And basically the idea is sampling has this place just as big data kind of methodologies where you're looking for outliers that really don't fit a sample, but they show you unique insight into something you're attempting to achieve. Well, I don't think it's dead. I think it's been put in its place, perhaps. I think I've perhaps pushed a little bit too far back, but the methodologies, you can use sampling. You can use other non-causal type techniques, for example, Bayesian techniques, et cetera, that can be very valuable in their own place. Everybody talks about the skills shortage and big data. And it just seems to me when we always ask data scientists what are the skill sets that I need? They're skill sets that exist. It's just a mash-up of those skill sets. It seems to me a huge opportunity for organizations to train up their existing set. Absolutely. And we do that. Before I retired, we got people young tried to train them in our business because it's a very key point to know the business. It gets back to the business process model. You have to know enough about the business process model that you're talking about to be able to deal with that model. But you also need the mathematical skills. But what I always think, I talked about that is the mathematics can be taught or there are degree programs out there. It's a lot harder to teach the business skills. So we tended to look for people that were willing to learn the business skills but certainly knew the mathematics from the point of view. As John was saying, the storytelling. Some questions from the crowd chat. First of all, we've got an amazing crowd chat going on with the influence of Brian Fonzo and Carla has run an amazing crowd chat. Carla Gentry, Data Nerds or data underscore nerd. The question from the crowd is very simple from Tim Crawford. Are folks too constrained by business process model when thinking about data? They are. To be honest, just the same way that I say that business process model is important, you have to understand that it's not rigid and in fact, even worse, it's not correct. In other words, you can interview a lot of people and discover a business process model and then you can discover afterwards that is not the model, it is really operational in that company. But what I look at it as it's a great place to start, it makes you think about all the parameters that it can be pushed back on. So absolutely, I agree in a sense, I agree with it. But so frequently it's not done in corporations and it should be done. And what do you see as something that will move to a better, is there like a checksum process that you can do to this? I mean, how do you avoid the pitfalls of falling into that hole of being stuck and we define the process we're done? Well, you need to use what we call knowledge gathering techniques. And there are a lot of techniques where basically if I had three people involved in the process, I need to talk to all three of them and I, separately, and I need to figure out are they telling me the same story or are they telling me a different story? And frequently, I say almost always get a somewhat different story. So there are means by doing that. The other thing you need to do is all business processes change over time. You need to go back and you need to revisit the business and understand what's going on and whether or not things are still working the way you expected them to. So again, I'm not saying that they give you a truth, a total truth, but I'm saying that there's a good place to start and then work your data into that. Bronze, we appreciate you taking the time here. If you had to go back and advise the Proctor and Gamble guys who come back and get transported right into the bridge room there and all executives and had to describe what does engagement mean today? Engagement being getting the data that you need to do your job whether you're an executive or you're new hire you need to be focused in the right data to do your job and that frequently do not have that. It's not the data that they need to do their job so it's focus, it's enough data and it's enough analytical techniques to support that. So the word engagement is changing from systems of engagement to more broader definition, right? Yeah, I'd say it's broad, yeah. Okay, we are here inside the queue for the social program here that IBM is doing. This is the queue special presentation Inside Go broadcasting here in the social lounge all the Influenced IBM is doing a great job really putting together a great set of influences with a lot of sharing organic, it's like an unconference within the conference we've got a crowd chat going on right now with the Influencers we're here broadcasting live here for two days it's the queue we'll be right back with our next guest after this short break.