 Live from the Mandelaic Convention Center in Las Vegas, Nevada, it's theCUBE at IBM Insight 2014. Here are your hosts, John Furrier and Dave Vellante. Okay, welcome back everyone. We are here live in Las Vegas for IBM Insight. This is theCUBE, our flagship program. When we go out to the events and extract the signal from the noise, we're really looking for really the positive, not the false positive, but the positive data points and predictions here at IBM Insight. Of course it's theCUBE. I'm John Furrier, my goes Dave Vellante, special guest, Jeff Jonas, fellow scientist and chief scientist, IBM fellow at IBM, CUBE alumni, I've been here many times. We always talk about some awesome stuff. Welcome back, Jeff. Great to see you. So what's going on? So Dave wanted to get on this Ironman thing. So let's give us the updates. So you've done some Ironmans. Last time you were on last year, you ripped out the costume. Are you ready? And you ran off to CUBE with the IBM where? Well let's set the table. So last time your objective was to run five in August. And you got sick and you only were able to accomplish three. Swim, bike and run, not just run. But yeah, full distance Ironman. You wanted to do five in one month. I tried to do five. If I was able to do five in that month, I would have become one of just a few to have done every Ironman in the world. But I got a chest infection for one of them and then I tried to do Ironman Sweden Saturday and Ironman Copenhagen Sunday and that didn't work out so well for me. Okay, so you made it halfway through Copenhagen. You got three in August, three and a half. The only count, we round down I guess. Okay, but you've reached a new milestone this year. In August, I did, when I finished Ironman Copenhagen the second time to finish, I became one of three to do them all in the world. Okay. And then, but to stay in that club, you have to do all the new ones. And they added four Ironman in two weeks. And each one- There's a club of two. Well, it was a club of three. There was a club of three. And then they added four new Ironmans and each one was a continent's change. Continent, continent, continent in 15 days. And in the middle of that, when we went in the middle of the weekend, you had to do, to stay in the club, you have to do Ironman, Mallorca, Spain, and Ireland and Spain Saturday. And then you have to do Ironman, Chattanooga, Tennessee, Sunday. And it just seemed impossible. And I kept looking at the website trying to figure out logistics and I really couldn't figure it out. And then finally I had an epiphany. And one of the other three and I did it. So we did Ironman, Spain, Mallorca, Saturday and Ironman, Chattanooga, Sunday. So Mallorca is not easy to get to. I imagine it's a little, maybe easy to get out of, but still not easy to connect the Tennessee. So how did you pull that off? It was expensive. Yeah. It's called a G4. They just said they're waiting for us. Hey, the boys club of two, who spends a lot of dough. So you guys have some. And you slept on the way over? I got a two and a half hours of sleep is my estimate. And then you, you know, literally we walked our bikes on the plane. No box. I'm just, you come off the one race course. And G4, you don't have to check luggage, just carry it on. You just walk your bike on the plane. We get off the other end, driver waiting for us, take us to the race start, check a bike and sort out your bags. You got to get all your stuff together and sort it into bags for your transition areas. And by the time I get to the race start, within 20 minutes, the next gun goes off and I'm doing another Ironman. You're in the water. Yeah. It didn't kill me. Well, I'm still in the club. I'm still in the club. You're in the club. It's a club of two. It's two people in there, one dropped out because they didn't do the new Ironman. That's, that's the kind of thing. We're going to afford the flight. You become a friend and we're going to put them on the flight with us for free. We're going to, but you have to finish that first race in time to get to the plane, to make the trip. And his time would have been a little longer. He's got some steel in him. So he's a little slower. So he wouldn't have made the flight. He got dropped. Literally. But next year, next year he'll be back in the club. He's, this guy's done 170 or 180 Ironmans. He's got the world record. That third guy. So he's back. He'll be back in the club next year. Yeah. You don't want to shut him up. Anyway. No, no, no. This guy's a legend. I'm like the new cover. Congratulations guys. So what are you working on now? So last year we talked about, about a lot of the big data stuff, but what's new with you? What are you working on? I saw some tweets earlier about asteroid predictions. Just share with us. One of you, what's the geospatial stuff? Share with the folks that context. Well, part of my main, main thing is this, this project I've been codenamed G2. And sometimes I refer to it as sense making. Cause that's what the goal is, is how fast can you make sense of all this data across the enterprise? And then can you make a really high quality decision in 200 milliseconds while it's still happening? That's the purpose of this thing. Well, one of the kinds of data that you want to be able to incorporate is geospatial data, which I'm obsessed with. But to do really high speed geospatial work, I had to craft something else called a spacetime box, which figures out where things are in space and time. And no matter where you are in this box, you get credit for being in the box. But when you do that, you can take really big problems that are geospatial and reduce them and do really, when I go explain to people what I've done with in the astronomy field, astronomers look at me and go, wow, you must have just invented some new kind of math. And I go, no, I explain how you stay spacetime boxes. And they're just like, no. And the thing we've done recently is there's 600,000 known asteroids. None of them hit Earth. But it turns out now and then they hit each other. So I ask astronomers, I go, well, why don't you just compute that when they hit each other? And they go, no, that's multi-body orbit math. It's a 600,000 factorial problem, 20 million hours of compute. And I go, but yeah, but with spacetime boxes, blah, blah, blah, blah, blah, blah. And they go, no. Ha! Ha! You're mind-blowing. Oh my God. It's so simple. It's embarrassing, really. So in less than 3,000 years. He's a mother-wife who solves the problem. We'll, goodwill hunting kind of scene. I almost want to tell you how it worked. I will in a second. But listen, we took a 20 million computer hour problem to less than 3,000 hours and we did it the wasteful way. And we've created a forecast for the next 25 years of how every asteroid interacts with every asteroid close enough to be interesting to astronomers. And we're going to produce a paper on this so astronomers know where to look when so they can see, you know, when they're either going to hit each other. This is important, by the way, because if an asteroid is going not where you thought and it's coming towards Earth, I mean Earth is where we keep our stuff. So this is important. When they hit each other, doesn't it change the whole equation? Yeah, right, but they don't, there's no forecast to say they might hit each other. So where you thought it was going, it's not going there because it found a neighbor. So you got to figure out the collisions first and then redo the math. Yeah, but I'll take me two minutes to explain how I did it. You want to hear? It's kind of funny, really, because it's so simple, everyone can get it. What we did is you go to asteroid number one and you say, we're going to be tomorrow at noon. And there's some math that astronomers use and we use that. And that math comes back and goes, this asteroid is going to be right here, exactly here. And what we do is go, well, that's pretty exact, but what zip code is that? You know, it's a big space time box. Nobody has the same asteroid, where are you going to be the next date? And it goes, oh, right, here. And we're like, yeah, yeah, yeah, yeah. What is zip code is that? So we just ask every asteroid where they're going to be once a day for 25 years. And then we just take all that exactness and put them in zip codes. Well, it turns out on average, I think it's only 2,000 a day, end up in the same zip. So now you have a much smaller number. So then you go back to the two asteroids and you go, hey, we're going to be at 1 a.m., 2 a.m., 3 a.m. and it goes, oh, here, here, here. And what we do is, yeah, yeah, yeah. But what street is that? It's just a smaller space time box. By the time you do this, the number of asteroids that are in the same street on the same hour is so small. You call the big math. What was happening is they're just calling a bunch of big math. We just figure out if you're not on the same street in the same day, no reason to call the big math. It's so simple. It is embarrassing. So that's the prediction. So what do they say, like, thanks for embarrassing me? Oh, they loved it. Come on. No, I spent five hours with some astronomers and I asked them a thousand questions. Most of them were very stupid. And at the end I said, well, the way I would use my information theory in a G2 kind of world, this is how I'd apply it to your domain. And they said, you've just named five things we haven't heard that we think would, we haven't heard in our field that we think would move the needle. And one of them was this ability to do really high-speed co-location analysis. So you could use your expensive algorithms narrowly. And I went, we went after that and did it and it's been validated. You know, we can show them, these two asteroids are close on this day and they go and look at their own numbers, do it the way they do it. And they go, you're right. So when do we dinosaurs get wiped out? Actually, here's what I really think everyone needs to know. If the team and I that worked on this save earth, you're all going to owe us. You don't have to pay taxes. That's that movie, yeah. So basically what you've done is you've made better predictions because you say, okay, we only care about the ones that are in the same zip code, therefore, are they in the same street? So you don't apply it into heavy lifting unless you know how near they are. Yeah, in hindsight, this is a whole class of things about trying to better focus finite resources. And in this case, the resource is an expensive algorithm. So it's a finite resource. So when you do context computing and we get enough data together and one aspect of that is generalizing things into space time boxes to find out things that are near each other, you can use the expensive resources more narrowly. So talk about what else you're working on. You mentioned money laundering, false positive engine, back to this better decision, because that's what everyone wants. This is the human aspect of the cognitive is getting more outcome actionable, using resources properly. And whether it's putting a good sales guy in front of the right lead at the right time or right computation around the right data set, whatever that is. So talk about some of the projects you worked on that kind of bring us to light. The one I've been spending the most time on has been helping banks deal with the large number of false positives that they're getting from their anti-money laundering engines. And I want to tell you how this looks to me through my lens, okay? Here's how it looks. They have these engines and they feed them all the bank transactions and these engines pop out leads. So these engines go, hey, oh, I've got a lead. And an analyst goes, yay, yay, a lead. They run off, they work on it for an hour and they go, that's no lead. And they put it in a pile. You need a ladder, okay? And they put it in and that's not a lead pile. And then the machine goes, it's okay, I've got a lead. And the analyst goes, hey, three years later, these people are cutters. I mean, they're just like, ah, no. So the thing is just spitting out false leads. That's why it's like a false positive engine. I mean, you can look at it close enough, you're like, what is that doing? Sounds like social sales, Dave. It's a false positive engine. You need to just use random. So what I'm using G2 for, when it's not working on asteroids, or by the way, what's neat about G2 is you can do them both at the same time. Oh, yeah. But here's what you do is you tell G2 everything you tell that engine and then you take all the leads that come out and you just put a splitter on it and tell G2 that. And then you take the cases that they've worked in the past and what the outcomes were. It's like another color puzzle piece. And you add that. So with that much richness, a more complete picture, as it makes new cases. So the engine goes, here's what I think and what we're doing with G2, like a corrective lens, is we're going, well, yeah, we see what you think. But did you know in that case, there's three transactions, you just saw three months ago, an antelossom and didn't escalate them. Why would you spend more time on those? So like we're annotating it. And then we're taking a transaction that's so far down at smaller that they would have missed and said the counterparty, they're so damn interesting because they were escalated in the past, you should look at them. So what it's doing is it's really reducing the number of the amount of time it takes per case, but it's increasing the quality of work at the same time. So they're getting lower false positive, lower false negatives. It makes a really big difference. So is the crappy lead generator retired or is it an input? Well, at the moment, you know, they trust, it's love hate, but it's what we know. And we just finished installing it. So it's like, it's shiny still. Yeah, so it's. It hasn't paid back yet. The payback hasn't hit. The pile's not big enough. You know, the thing is it's easiest to go to an organization and say, look, where have you been making your investments? Where's your momentum? And saying, we want to contribute to your momentum. So if they go, look, we just implemented this shiny thing and just generating these leads and it's, they're a bit heavy, you know, too many. Then we just go, okay, we're going to snap it on like a helmet. We just going to, you know, we'll just listen to it. And by the way, you already trust it. You already know what it is doing. And now we're going to tell you kind of what we think. So you can compare the two minute by minute, day by day. So it makes it, it's pretty safe. Low, it creates low risk. So anyway. All right, so what else? What about, we talked about money laundering. That's money laundering. That's the purpose of that. Catch bad guys. That's one of my favorite things to do. You know, bad guys. Astros are like bad guys. They don't have any privacy. Any of them could be evil. Really evil. So I got to ask you about the good guys and bad guys. Want to have a theatrical pause for a moment? Just take it all in. Yeah. Boy, that's great. Box continuum has got my mind blown. I really lost every question I possibly had. And then a street. And I'm like, God, I can't stop thinking about the little seized up motors. Oh my God, you froze the cube. He froze the cube. But what about? That's a first. Three-boot. John and Dave were speechless. It's never happened before. It has never happened. Statistically unproven. Before. 6,000 interviews. One is 6,000. It's a skew data point. Okay, so let's get back to the space, time continuum. So how does the G2 asteroid fit in with the money laundering? Is there a correlation with the tech? Or is it similar? I mean, is there a similar crossover? Well, one of the, you know, just to go back for a moment, and it didn't only take a second, back to that first cube interview, you know, it came up. IBM kind of liked one of my inventions. They buy my company 10 years ago. And that technology is called Identity Insight today. And it's the industrial strength thing that fuses data together. And then six years ago, they said, if you had a big idea, we'd fund it. And it was at that moment when I dreamt up G2, I'm like, knowing what I knew now, if I could only invent one more thing in my life, what would I spend the rest of my time working on? And one of the unique properties of G2, whether it's, last year I talked about modernizing voter registration in America, where 25% of the country is now on this. It is, I am so proud of what this is doing for democracy and access to the roles and cleaning up the election rolls. But imagine that, and then tomorrow, everyone's going to hear about what Singapore's doing with G2 to help protect them a lot of straights. With half the world's oil supply, 30% of the world's commodities, and how they're using space-time boxes to help them focus their finite resources, to anti-money laundering, to asteroids. What's unique about G2 is you could feed it all of that data at the same time. And in one engine, with the same configuration about how it's assembling its view of the puzzle, is the same. It's so different. I'm just saying, I don't know of anything else. When I meet people at a customer or a bank and they go, we have about 2,000 people, we're just going to build it. And then I go, well, I'll do all those things, it'll do it all at the same time with the same configuration. The conversation comes a bitch. The conversation kind of goes to a lull. It's not like somebody goes to the board and goes, well, let me show you how we'll do it around here and build it ourselves, because it's hard to fathom. So last year, you said down the cube, I wrote this down if I wasn't expecting your interview, it'd be as awesome as last year. Customers, well, if it's the customers now, some organizations feel they have to build systems based on their data. I put this word, data space, existing data. And the problem with that, you said, was it widens the data space or observation space. If new data sets then come in, then it increases them to, pressures them to re-engineer their organization or infrastructure. So this notion of, okay, observation space, cool. They think they're done, they set the table and then all of a sudden, new data comes in and it's like, oh shit. Now we got to re-engineer. Yeah, so where are we with that? Oh yeah, so, and that's actually one of the points about if you started the system in voter registration, which has nothing to do with vessels, and then you went to anti-money laundering, which has nothing to do with vessels. And then you said we want to help the Singaporeans do maritime. You can add new kinds, new sources of data, new kinds of entities and new kinds of features. Like vessels have call signs, ship names, they have lengths, drafts, how deep they are. You can describe those things to G2 while the plane is in the air and start feeding that data to it and it'll start fusing it with the other data that it has. Now I don't know why voting asteroids and vessels would have anything to do with each other, but I came up with one example. If you discovered that an asteroid was going to land in a certain place in the ocean and you could predict the ship was going to be there and you knew who was going to be on the ship and you knew they were a voter in the US, you would predict they're not going to make the election. And so you have to null that vote. Voter fraud, solve. You might want to mail in your vote, have them vote before they take the trip, right? Okay, so let me talk about another observation space. Dave and I, which relate to our media business, we have our observation space is the audience out there. And now they're all connected with social devices. So the observation space is the social data. So that's a hot topic right now. And now with social web, the active data is not clickstream. It's really a different kind of organism if you will. It's O-authent, it's identity based. So you have real people connected with the web using social data and exhaust the stream or whatever you want to call it, connected. So that's social data. So how do you see that? If you have an observation space called the social sphere, a lot of people are trying to figure out how to take G2 like concepts, apply a crowd chat and figure this stuff out. Yeah, to me it's just another color puzzle piece. It's like the human is the sensor, right? The human is producing sensor data. So when people are, as everyone's talking more and more about the internet of things, it's just like it turns out the human with some instrumentation is just one of the nodes in the internet of things. Thing one and thing two. I read that book to my kids. So that's your point. So they're throwing off data. And one of the things about, I mentioned yesterday about the internet of things is going to continue to add more and more data and just babble and babble. So one of the, you know what I'm saying? Yeah, chatter, chatter, you know. Chatter, chatter. The thing is, is you don't know if something's relevant or not. If I were just to give you a random puzzle piece, your ability to know whether it was random or not is exceptionally difficult unless it has a picture of the gun and the baby on it, okay? All the evidence on one observation. So what you have to do, and the essence of context computing is taking these stuff that's hashing happening out in the social ether as puzzle pieces and to evaluate its importance and to who. You're taking it like with the features off of that or what was expressed, whether it's a Twitter handle and where they were when, and you're saying what else is happening there then? What else have they done? Do they have very? It's very around them, it's a space problem. That's right, and that's, by the way, that's the definition of context is better understanding something by taking into account the things around it. So when you see the word bat, you look at the words around it, I don't know what the word bat means. Now you're taking, and geospatial data is super sexy, I think, to analytics, and now when you see something happening somewhere, you can say what else is happening there now, and you're going to use that to tell you whether that thing is important or not. It's not about staring at the transaction. It's where it fits in the stuff around it. Hence my obsession with context computing. I think the geospatial thing really puts that puzzle piece into a three-dimensional puzzle because now, if I say, hey, having a great time at IBM Insight, man, I can really use a beer. Okay, beer, insight, I'm here, one puzzle piece, me, and what's going on, I'm in the lounge here, maybe someone brings me a beer. Better yet, a great use. It's our cell. It's your limit to how much data. How'd you know my favorite drink? Is that you can watch in the observation space? You still have a leaf in your hair from last night. That's funny. It's your limit to how much data you can ingest. You know, I think about, they can't fly, the height of the disk had any lower, Moore's law, we can't make the microprocessor go, it's your limit to how much data you can ingest. You see a lot of data. Well, here's the thing, is you'll want to be able to absorb and remember as much as you can, and then there's a point where, computationally or storage, you're unable to, you have to let it fall on the ground. Now, I'm going to give you this in a biology example. There's a lot of background noise going on way out there right now, and people listening right now might have a hum of an air conditioner in their office. They weren't hearing it. It was in their observation space. They could have been collecting and processing it, but they're not. And what's happened is, over time, they've assembled enough context of the world to be able to blank that out. And that's an example of you're taking a perception and a sensor, a sensory perception, and you're dropping it to the floor and not doing any compute over it. And the more that we are able to bring data together from diverse sources and get a picture of the world, the better of ability we'll be able to have computers to do the same thing, figure out what to drop to the floor. Versus if we heard a bang right now that was real loud, we would become selectively curious. We would actually want to go get more information about it. That's one of my future features of G2 is called selective curiosity. More later. Show a little leg. Come on. Selective curiosity. Okay, I'll tell you briefly, selective curiosity. There's a lot of maybes in data. I mean, the more data you have, there's a lot of maybes. So imagine that you get a maybe like, they might be roommates. Now your first question would be, should you spend a lot of energy figuring that out? So what you're going to do is say, would anybody care? Maybe you would say, give them a $100 ad or send the army. So in these cases where you find a maybe and you'd say, but if it was true, would anybody care? And the answer is yes. One of the things I've got coming in my G2 roadmap is G2 becomes selectively curious. And that means it figures out what it wishes it knew. And then it goes out and Googles it, it goes out and maybe asks a data service or maybe it asks a Jeopardy champion. Whoa. Can we do about a boom? All right, stage exit. We're going to hope you now, let's go. I wonder why you said that. Okay, Jeff, we got a role. We're getting the hook sign here. Always great to have you on. Give you the final prediction. What's going to happen next? The world is going to continue to become a safer place. You're going to continue to live older, longer and happier than any time in the history of mankind, despite what we see on the news. Awesome, we're here and helping get rid of all these viruses, Ebola, all this other end security. Like the positives, like the positives. Do you want to, West Africa, stay away, you know. Okay, we're here in South Cape with Jeff Jones. Always entertaining, but really super amazing in terms of what he's working on in the G2. And we had this chat three years ago. Amazing conference, all becoming reality. Exciting to chat with you again. Here live in Las Vegas, we're bringing you all the data sharing with you, extracting the information from the noise. I'm John Furrier with Dave Vellante. We'll be right back after this short break.