 Okay, we're back here live. Day two, wrap up for theCUBE, our flagship program. We go out to the advanced instructor to signal from the noise. This is Silicon Angle and Wikibon's exclusive coverage of IBM's information on demand. I'm John Furrier, the founder of Silicon Angle. I'm my co-host Dave Vellante. We have Dave, a very special guest on theCUBE, Jeff Jonas, CUBE alumni, tech athlete, and literally is an athlete. Welcome back to theCUBE, tech alumni again, CUBE alumni. So tell us, what's up with you? Running into triathlons? Tech athlete and real athlete. Well, we last saw you in June. You set this ridiculous goal, which was jaw-dropping, and you almost got there. I do tend to set really high goals, and if you get part of the way there, it's still cool. It's still a good place. You know, even if you, like, botched it, so I botched my goal. So wait, what was the original goal? I was trying to do five Iron Man's in five countries in August. That's what I said. That's what Dan, I said that's... It didn't pan out. It was... Did it work out for you? Did it work out for you? No, it didn't pan out. Iron Man won on the first August. I ended up with an upper respiratory chest infection. So instead of getting a bit of exercise in, I ended up in bed for like 10 days. Two weekends later, I tried to do two Iron Man in two days in two countries. I don't think anybody's ever tried that. I was the first to attempt that. I finished the Iron Man in Sweden, drove four hours, got to Denmark. They'd give him my hotel away. We had to find a new hotel. It was a real fiasco. I started the second Iron Man in Denmark. I only finished half of it. I was the first to attempt two Iron Man in two days that I know of in two countries. And I made it halfway through. I was fighting little demons. Then the next week I did Iron Man Canada, and the next week I did Iron Man Japan. So what I did accomplish was three Iron Man in three continents in two weeks. And imagine if you got a hotel that evening and you didn't get sick. I needed more... There were more problems than that. So that's a bad one. I had the wrong currency. I can't pay the cab driver. It's a mass, man. So how many Iron Man meetings did you do this week with customers? How many keynotes did you do? I've been busy. You've been busy right around. So you talked about big data with us many times, but in particular, you sent us a link to look at which was two topics. One was in the Wall Street Journal article. We said, who's the tweeter? And the other one's fantasy analytics. So just for the folks out there, there's an article in the Wall Street Journal where you said, trying to find who the tweeters are is really hard. What did you mean by that? And what gave you that question? Who asked the question? Well, I get a lot of companies coming to me saying we really want to know if the person tweeting is our customer or not. Especially if they're saying something negative. In fact, there was a deal getting stirred up where a bank in Europe wanted to do that and there were some people that wanted to sell some stuff. I've invented to help them do that. And you got a customer in sales. People are all quite excited about doing this. And I said, you shouldn't do it. I look like the enemy. Because the yield is going to be so low. They had a country that they wanted to do this in. And I said, well, unless they tweet there differently and they're tweeting their email address in their profile, it ain't gonna happen. And I actually went and surveyed it and figured out that the percent that you could map it to a real human, let alone somebody in their company, let alone somebody that said something really good or really bad would be so low, it wouldn't be worth any money. So that's the trend that these big VC back companies are doing, social sales, they call it. Where people say, hey, I do lead gen, I do email marketing, I got a database of names. I'll just go to Twitter and get all those names. Is that where they were coming from? They're trying to get more of a profile? They wanted to know, first of all, there's lots of things you can do with Twitter data, like just see trending and sentiment. But if you're trying to map it to your customer, they wanted to know if they had some customer saying something negative about their bank, they wanted to figure out which customer it was so they could get in there and interact and communicate with them. Did they have the person's Twitter handles in their customer records? No, they just wanted to do it off the Twitter handle name. What if they did? What if they had the Twitter handle in their customer records? Well, then it's easy. Right. How many companies actually do that? Right, that's my point. Well, they're not. So my advice in that Washington, in the Wall Street Journal blog post is the easiest way to solve the problem is to hack it and communicate with them. Just send them a note, give them to talk to you, get their email address, give them something, now you know who they are, now you learn to Twitter handle. That's the short pass. All the other stuff is pixie dust. So far as positives, right? Yeah, some people don't even use their real name as a Twitter handle. I mean, people put data in their handle. Yeah, but even with a real name, they put that in a country. I mean, then there's one with the real name with the number four behind it, another one with the real name with the number 77 behind it. I mean, that doesn't help. Okay, so that said though, let me tell you, you can do ricochet shots. So Instagram, you might find a Twitter handle. And on Instagram, you might also find an email address. You know, so there's secondary data, but it's still the long route. And that's just to ask them, communicate with them. And then you might not get the full answer too. You might get 30% accuracy. You might have a pile of bad data to deal with, right? That's the whole, the point of having good data is to have almost close to 100%, right? Well, you wanna get as much data as you can right when you get it. Although, I know you're taking me on this other little path that says one of the things I'm seeing in big data is errors in the data start to become your friend. Have you heard me ever comment on this? Have you ever done that? No, no, no, no. Okay, there's three really interesting things that are happening in big data, but one of them is bad data starts to become your friend. The spelling errors, the transposition errors. It turns out you wanna remember that. You wanna remember that natural variability. The best example I have of this is when you search Google and it says, did you mean this? It's not looking in a dictionary. It's remembering people's errors. If it didn't remember the errors, it would not be so smart. And I've seen this in big data as well. So machine learning takes advantage of stuff like with patterns and information, right? That's one of the kind of expectation, maximization kind of concepts that are driving the data science world. How much of you are you seeing machine language, machine learning get into mainstream hands? Are you seeing that being still more geeky? Computer science? Are you seeing tools, tool income in there? Well, I think the big leap is gonna be this Watson technology. I think there's some exciting things that are gonna come from that. It certainly is great show. I mean, it puts on an amazing demonstration of what's possible. Yeah, I'll be excited to see how far that goes. I heard about some things today that I thought were pretty exciting. You talked a little bit about today. I don't know if they're public, but I saw, I got a few whispers in my ears and I'm like, is that on the roadmap for what's coming? Again, I don't know it's public, but I remember just thinking that is real interesting. But can you share it to warm you up? Come on, come on, come on, tell us. It's theCUBE, we're live. Don't worry. Don't worry. Okay, as long as you don't tweet this. Okay, I'll just put on Facebook. We're at another meeting. You can't tweet this. No tweeting. I put up on Facebook. You didn't say no Facebook. They put in explicit instructions when you do these non-disclosures. So they weren't accurate on that. So next topic I want to ask you is fantasy analytics. You wrote a blog post called fantasy analysts. Let's get into that. What did you mean by this? And this is interesting because you talk about the observation space. You talked about that before. Analytics is the hottest area. People want analytics. I got to just from the conversation, you were kind of saying, hey, you need to get more information. What did you mean by this? Well, you know, I'm a slow motion blogger. I blog once or twice a year only when I feel like I'm starting to repeat myself in the real world. I go blog about it so I don't have to keep repeating myself. I was running into this recurring thing over half the organizations I go, see, I say, what do you want to do with analytics? And they go, this. And I go, what data do you have to do that? What data are you gonna use with those analytics to get a result? And they tell me what data they have and I just look at them. I'm thinking to myself, are you smoking crack? And so what I did is I took a few different customer examples. I kind of blended them. So it's a mixed example, but the example kind of goes like this. You talk to the customer and you say, what? Tell me about your organization to go, our group, you know, we protect the supply chain. That's our business. Okay, great. What's your goal? Finding bombs. I'm like, finding bombs. I love projects like that. I wanna help. Tell me about your observation space. And they go, well, we got who sends it. We got who ships it. We got, you know, who drives it on the boat. And we've got the manifest about what they said's in it. They go, what else do you got? They go, that's it. I'm like, no one writes bomb on manifest. You'll never find a bomb. That's crazy. Not even a room full of divine beings could hover over that data and compute that. It makes me crazy. Fantasy analytics. Great line. They said, only idiots do that. We have to worry about them. They'll run. That's right. Yeah. The only people running bomb on manifest are the idiots. And they run out of gas on the way to the operation and take a wet match to the fuse. So they fix themselves. But in that blog post though, I did two. Oh, I'll just go ahead and go. Yeah, okay. I did two little bonus sections at the end, you know. One is assessing an observation space. It's just in short, the method that I use when an organization says we have these goals and we have this data, is how do I kick the tires on that data to even get an idea whether it's fantasy analytics or not? But more than half the organizations I go see, what is required is they need to widen their observation space. They need to extend it. And it might be data they very well have in their own organization. They just haven't conceived that they would need that much diversity of their data to get to that analytic result. And I'll give you one example. One government came to me and said, we wanna get a sense of what technology trends are emerging fast that we might need to get policy around. And I go, well, what data do you wanna use for that? They go, Twitter. I go, this is awesome. What are the data you got? They go, we'll use Twitter. Like any other data. They're like, no, just Twitter. I'm like, damn you, fantasy analytics. What about Wikipedia? They go, okay, fine, Wikipedia too. And I go, no, actually, the edit tab. Have you ever seen the edit tab in Wikipedia? It shows you everybody that's edited it. It shows, is it one person that's been talking about it by themselves? Or is it 50 people have all been talking about it and have done thousands of edits and they're all arguing with each other and how fast has that happened? It's, that's an interesting observation, a way to, you know, one more example of widening an observation space. If you wanna see a new emerging technology, you might see a bunch of people talking about it. Anyway, that's an example. What's the strategy for folks to deal with observation space? Because basically what you're saying is, you gotta get more data. You gotta ask, get more questions answered and get some more data before you can get, or basically get an observation space. What are the biggest problem people have right now that you see with your customer base around? Do they even know what their observation space is, or they don't have the data for the space? They're not thinking about it properly. What's the main thing that you see? I see a few different things. I see some organizations feel like they have to inventory their data and the problem with that is it is really, the grand scheme of things, it's an ever widening observation space. If you're building systems for the exact set of data that you have today, as soon as tomorrow they introduce one new data set, you can't be re-engineering all your infrastructure. So if you're not creating systems that allow you to integrate new and ever-changing observation space, you have a very brittle environment. So that's part of it. And then part of it is just imagination and curiosity. And I think this is where data scientists are gonna play a big role, is about using their imagination to dream up what other data the company already has, the company could collect, the company could buy. So when people don't have enough data, they don't have a wide enough observation space to use your term, where should they start? How do they? Well I have a little section at the bottom of my fantasy analytics post. It's just called widening observation spaces. And I made a little, the sequence with which I think about what you would start with first, second, and third. I put a little thought into that. I don't remember the exact order now, so I just refer you to that. So one of that. Let's go to, let's talk about that. So you say, there's a lot of ways to think about it. Data improves the ability to count or relate entities. For example, a source that may contain new identifiers like an email address. Okay, that makes sense. Right, so there's an email address. Great, that's an example back to what everybody's looking for. Hey, right back to the beginning. Oh, there's a purpose here. Okay, so the second one. Bring me back around to the top. Data that brings more facts. Okay, where, what, when, how? One of those I think is if you're trying to find lies, you wanna find, you wanna take claims that people make and you wanna find observations that would disagree with those. Like one of the questions I get, you know, people will challenge you and go, yeah, what if the piece of data's a lie? What about that? Okay, what about that? Yeah, and the answer is, no, your mama. Okay, the answer to that is what you wanna do is get a second piece of data that would contradict it. So your neighbor tells you they've never been to France. They keep telling you, they tell you they've never been out of the country. That's a lie, how would you ever know? If that's the only data you get, you'll never know. But then you go to the park with the wife and the husband and their kid for the birthday party and the wife said two beers and she goes ever since my husband lived in France, he hates French food. Well there you go, that's how you find lies. So one of the kinds of pieces of data that you would consider interesting if you're trying to find fraud or threat and corruption in your workforce, you'd look for secondary data that disagrees with claims. Huh. So look, I'm a little low on sleep. Don't hate me. So Merv Adrian tweeted earlier the quote, if I give you the sausage and the grinder, can you make the pig first in speech applause of the day? So what is that all about? You know I was talking, I was on the stage today with David Becker from Pew Charitable Trust. We were talking about how my G2 technology is helping modernize voter registration in America. And one of the things that we were doing is we're anonymizing the features like social security number, driver's license and date of birth. But then sometimes when you tell people you're anonymizing then they say, what do you mean? You say, when do I hash it? And they go, whoa, that doesn't help anymore. What does that mean? Then I use this analogy. I say, if I took a pig and a grinder and made a sausage, if I gave you the sausage and the grinder, could you make a pig? They go. They go, aha, that's the one-way hash, that's anonymization. Wait a minute. Can you guys hear that? Can you hear that? Yeah, I can hear that. That is the sound of organizations suffering. Can you hear this? No, I can't hear it. That's the sound of organizations suffering because they don't have enough analytics and they're having a hard time competing. This is one of those moments where myself and the 440,000 IBMers bear this burden. See you later, Jeff. You don't see that every day, Dave. The mic. Hey. Okay, that's a wrap from theCUBE. We're here with Jeff Jonas live in theCUBE. You don't see that every day. We write back with a wrap up at this short break.