 Hi, everybody, we're back. This is Dave Vellante with Stu Miniman. Jeff Jonas is here, and Jeff is a longtime Cube friend, and really appreciate Jeff for you coming on. I know you've been on a whirlwind tour, just ran, or just did an Ironman, so really appreciate you hanging in there for the Cube. Thanks, yeah, I was, what, 48 hours ago. I was in Cairns, Australia. Yeah, racing. So how do you feel? I'm beat up. How many have you done? That was my 30th full distance Ironman. I am, I have a bucket list. My bucket list is to do them all once. And so if you go to the Ironman website, there's about, there's 32, so they come and go, but I've got five left. I kind of scheduled this year. I got four, I hope to do four Ironmen in August, and then one in September, and the best I can tell, I'll be the third person on earth to be able to look at the Ironman webpage and all these triathlons around the world and say I've done them all. You're gonna do four in August? Yeah, and four countries on three continents. It's impossible. Plus, I'm gonna probably work, you know, I love my job, so I work insane hours. So I will work, those will be only 80 hour weeks. I will work 80 hour weeks, and then put those in between. Just because you gotta get it done. It's a bucket list thing. Yeah, but you space it out a little bit? Why do you gotta do them all in August, you know? Well, I don't pick the schedule, you know? I was closing in on it, and then the Ironman organization keeps finding new venues, so they keep, you know, it's like whack-a-mole. You know, I'm like, what, Japan? And I look, and it's like right in the middle, it's like a little break, it's, you know, August only had three, and then they had Japan, and I'm like, damn it. Whack! So, when do you have time to train? Oh, it's tough. At my girlfriend's house, I got a stationary bike, it's got a flat surface, I put my laptop on it, I work on it while I'm pedaling, and I, my girlfriend and I try to chat some too, so I call it quality time. When I travel in cities, I go out and I'll run the streets. I don't actually train that much, so I'm not that fast. What really happens is I just do enough a year. I'll do seven this year, I did six last year. The first one's really, really hard. Last year, I only swam five times, no, I swam what, six times last year only on race day. I had no training swim, zero. When you were saying leading up to the one in Australia, you were limited in how much time you had in the water. Yeah, I'd been on the road for a month. I had done really no swimming. I tried to put in two swims. I wanted to swim 30 minutes each, but I just, my arms didn't feel good, so I could only swim four and a half minutes. It really gives you a stomach upset. You know, when you got a 2.4 mile swim and you only can swim four and a half minutes, because you just don't feel like you could swim any further. And then you got saltwater crocodiles to think about. They weren't really in the water. I don't know, I didn't see any, but. And then you had a bike that you weren't familiar with. I borrowed a bike. I'd only ridden at six, three blocks. And I hurt my calf the day before the race, so I wasn't even sure. I was having a hard time walking. So it really, it was a nerve-wracking race. It took me a long time to finish, but I, you're there, you're just gone with here. I'm going to finish. It was mind over matter. And you're really going to do four in August. That's your. That's my plan. I will see what happens. I set high goals. I send a 10 to set goals beyond what I can achieve. And if I get halfway there, it's a pretty cool. That's awesome, man. It's still a good place. You won in August. So I've got my goals for. Well, congratulations. We're always amazed to hear your stories. And it's impressive. It's humbling. I was complaining the other day about I have to do three cube gigs next week. Big deal. All right, good. Well, enough of that. Tell us about your keynote. What's going on? What the reaction was from the audience? We're here at Edge, second year in a row. So it was exciting for me on the keynote today. That was the first time it was broadly publicly known that my G2 invention had a role in helping modernize voter registration around the last election. Hundreds of thousands of people ended up registered and voted that otherwise would not have. Congratulations. Yeah. No, it's really, it's really cool. You know, I like it's one of the things I like about my job is every now and then I just turn around and look at the effects, you know, when you're creating systems like this. And the effects are just amazing. And some of the goals of the of the project, and we worked with Pew Charitable Trust on this, they led all, they led all this election research to figure out the focus on the election rolls. But the goal was to increase the quality of the election role and to let states have a better understanding about who's moved, who's eligible to vote, and who may have voted. And many people that were very mobile country and people just don't know when they move that just registration doesn't follow with them. It means our roles are incomplete. So anyway, at the end of the process, it just means our election roles are more credible and it provides more confidence in our election process and more people have access. So it was real exciting. It was fun. I got good feedback. I'd highlighted my G2 invention. It's just the tip of the iceberg of what this G2 is going to do. So talk more about G2. So G2 is a technology that I dreamt it up. I looked at my body of, well, OK, let me step back. Another executive IBM says to me, hey, if you had a big idea, we'd fund it. And I'm thinking to myself, I've built 100 things in my life. Knowing what I know now, if I could only build one more thing in my whole life, what would it be? And I went, oh, it'd be this. It'd be this. Yeah, this would be cool. It'll be useful. Because I'm trying to be useful these days. So I went and showed my boss and go, hey, how did I build IBM this? And he's like, wow, you'd build that for IBM? And I'm like, yeah, funny story what happens next. But basically, I ended up with people. And we started building it. I spent the first year on paper, I spent the next year and a half doing it, basically, still secretly, like a Skunkwork project. The first two and a half years the world didn't even know about it. It's designed to take very diverse data sets and integrate it. It's kind of like how do puzzle pieces find each other in the puzzle? And then when you go from puzzle pieces to puzzles, you get to whole pictures. When you get whole pictures, you make higher quality predictions. And there's, man, the range of organizations that are going to benefit from this is just really across the spectrum. It's everywhere. And I think my own personal goal is G2 will one day be seen as maybe the first real context to wear computing, like for real. I mean, there'll be some other things bigger, better someone will have. But I hope it'll be seen as really the first of that, of its kind. It's a general purpose, the work that we're doing with Pew Charitable Trust. I'm also doing work in maritime domain to protect shipping lanes. I'm also doing work in anti-money laundering with financial institutions. And you could use one G2 to do them all, all at the same time, in the same schema, with the same algorithms, because it's actually the same problem. So the alpha geeks say that that problem is really hard to solve. It's different data sets and making it make sense. And actually getting the quality to where you need it to be, that you can trust what comes out of it. So how did you do that? Yeah. Well, the first principle is you want the data to find the data. And let me tell you just exactly what this means. For the moment, let's pretend you're an organization. Just you right here, an organization. Well, every new piece of data that arrives at you, you just learn something. For any organization, every time, soon as an employee changes their emergency contact phone number, that organization just learns something. If somebody just enrolls in the loyalty club program, they just learn something. So it turns out, every time a piece of data lands, it turns out that is the question. It is the question. If you want to be able to sense and respond. So the very first thing that G2 does does this little game called Data Finds Data. And it takes the features on a new piece of data that's just arrived. And it says, how does this relate to the previous observations I've seen? And that's like taking a puzzle piece into the puzzle to see where it fits. And I've been doing these puzzle projects. Sure. I've done two with kids. I've done two with adults. I've shown many settings. I've shown the first puzzle project where the kids were putting a puzzle together. But I haven't shown the fourth one, which was done with four drunk adults. The main thing I learned about this one is drunk people are sometimes unreasonably optimistic. Because one of the people on the project would take a puzzle piece and go, I think it fits. And use their fist and pound it into there. That was a big lesson. But anyway, so it turns out there's a very general way that you can figure out how one piece of data relates to others. And I've generalized that problem. The better of a job that you do figuring out how data relates, then the easier it is to figure out what is relevant to who. Now, so the general purpose context accumulation is give me the features from your observation space and weave it all together. And then it becomes very domain-specific about how to benefit from it. See, what a lot of organizations is doing, it's like, oh, we want to do social media and sediment. Well, let's build an algorithm just to study that. And then somebody else goes, oh, we want to do an algorithm to study fraud. Well, let's build an algorithm just for that. Well, what if you could actually have an algorithm that allows you to commingle that data and benefit horizontally across it? Talk about variety and orthogonal data. And it turns out the quality of predictions you get when you mix diverse data together goes up. And by the way, that's how you find errors in data. Better yet, it's how you find lies. Can I know more about that? Yeah, talk more about that. The question is, how do you find a lie? So if I told you I was 30, well, you'd look at me. That'd be your second data point. I'd go, I'm 37. You'd be like, you're lying. Just look at you. OK, fine. But if I said I was 48. I wouldn't believe you. If I said, yeah, I'm from Vegas. It's hot here. I'm just dehydrated. Or what about 30 Ironman do? OK, so look, if I told you I was 48 and a half, and that's a lie, I'm going to be 49 next weekend. But how would you know it's a lie? So the only way you know it's a lie is if you introduce, you have to have a secondary data point. You have to have something to contrast it. So this is a really interesting thing about these sense-making algorithms. Because I think G2, the way it'll probably present itself in IBM to the world would be under the brand InfoSphere sense-making. But it turns out errors in data, natural variabilities, you actually want it. You don't want to polish every puzzle piece to perfection. I'm not talking about MDM, where you're getting a chance to onboard somebody, and you better have the account balance right, you better have the address right, and that's your chance to get it right. He's talking about inferring something. Right, I'm saying now you've got your data that you can own and control those gold records, but now you've got data that you don't own and control. It's data that you're getting externally, and now you're trying to bring it together, and it might have lies in it. Well, this natural variability, spelling errors, transposition errors are really your friend. My favorite example is when you search Google, and it says, did you mean this? It's not looking at a dictionary, it's remembering everybody's errors. If it didn't remember the errors, it wouldn't be so smart. And I got a little personal story on this. My youngest son, his name's Dane, he's born and I get his date of birth wrong. I forgot his date of birth. I convinced mom, now, and then we teach him his date of birth and it's wrong, and everywhere we register his date of birth it's wrong, and it's off by a couple of days from real date of birth until he's five. I order his birth certificate and I get his birth certificate because I'm going to take him to Mexico, and I'm quite depressed because I can see that his birthday is wrong. You got to go to your kid, bad daddy, this is a bad daddy story. Okay, fine, this is a bad daddy story. I go to my kid and I'm like, hey, I got your birthday wrong. I know he taught you birthday, he looks pretty defeated, but I had it all set up, I had a PR line on it. I'm like, look, it turns out you're a little older than we thought for a kid. That's just great. Five days. Yeah, he was two days older. You're two days older than we thought. He just thought that was fabulous. But listen, imagine that. Any smart system would have seen his one date of birth. And the very first time I introduced this new date of birth it has never been seen across any channel. Well, of course that would be wrong. So what would you do? Any good system and stuff it out. Well, then what? I present it again, but guess what? It doesn't remember it's building up because you got rid of it. So it turns out in smart sense-making systems you have to let descent fester. And that's actually helpful in finding lies and deceit and data. So how did you apply G2 to voter registration? What was the observation space and how did that all manifest itself? Yes, what happens is states have their voter rolls. If a state would, states want to make sure that if you've moved from one state to another and you're on two voter rolls, is it the same person or not? But the problem is each state might only just have a name, a date of birth, and a driver's license. Well, in each state it's a different driver's license number. So now you only have a name that might be similar and a date of birth. Well, you can't just use name and date of birth. That doesn't give you a quality output. So then what do you do? So if you just do record, if you just try to do matching, those records are incompatible with each other. You couldn't, you didn't have to have a human and call everybody, huh? Too many maybes. So in this case, because states have access to DMV data, in the example in the keynote, Maryland with the Maryland DMV, suddenly you learn a social security number. And suddenly in Virginia with the DMV you learn, with the DMV record you also learn a social security number. Now, between both states you realize it's the same name, similar names, same date of birth, and the same social security number. Well, now at machine speed you can make a really high quality estimation and make a recommendation to a state. And now a state finds it a very efficient process to mail something out to them to ask them what their real intent is and did they move, you know? And so that's a form of context accumulation, you know? It's often not a straight line between two pieces of data. It's a few other secondary pieces of data that allow you to see the picture. And an interesting thing about these puzzle projects I've done, I've done these four puzzle projects and then in every case, with less than 50% of the whole observation space, like if I hide 50% of the pieces and have you work on the puzzle, you can make an extraordinarily accurate prediction about what it is you're seeing. And I find that inspiring, you know? It's not like anybody's going to ever have all the data. Yeah, that's good news for all kinds of folks. Talk about more about, can you tell us more about this money laundering? I mean, obviously this is something that's relatively new for you, but and it's applying G2 to that problem? Yeah, you know, I carefully pick my battles these days because, well, I'm a curious person, you know? I stir up all kinds of stuff, right? But I got to be careful what I stir up. And so I advise in lots of areas, but now and then I actually pick a real horse to ride, you know? So I actually spent three, four years on this voter registration. Really deep diving into it. Well, it turns out with financial institutions, they have these, they have software that they buy that does detects money laundering and produces leads. Well, the problem that the financial institutions are facing is it will produce a lead. Here's a machine, it makes a lead. It goes, hey, I got a lead. It gives you the lead. And you go, wow, I got a lead. You chase it down. I don't know, maybe an hour later, you're like, well, it turns out that's not a lead. But the machine finds another, here's another lead. Well, it's doing this to hundreds of people maybe, you know? It's giving all these hundreds of people leads. Well, you go, I go in there and talk to these people and guess what? You know, how long have you been working here? You know, three years. How often, when's the last time you got to ring the bell? You know, all these leads, you know? Aren't you excited? Oh, there's another lead. When Gary leads. These are leads. Well, guess what? It's like years, you know, I've been working. It's hard to keep the morale up, you know? So it's really like a false positive engine. I don't know, maybe it's better to go random. I'm probably exaggerating. It's probably not better than. It's like throwing darts into the Wall Street Journal, right? Yeah. It's like, well, there's one. So I deep dived into this and I've really, I've been meeting with analysts and I've been, I've been working. I'm really putting enormous amounts of time into this. I've actually personally written up so far 180 page technical document, including pseudocote and schemas for my technical team to do something that I think is really going to significantly move the needle. And it's just a use of G2 and the spec really just describes how to prepare the data space in a way to feed it into G2 to get a really cool result. And the end result really is going to be allowing the quality of cases that organizations are going to get are going to be higher quality cases. When you open up a case, how you choose what transactions to look at in the case, the order of those transactions will be way more interesting than what an analyst could have stumbled into. So the quality of case works going to go up and it'll even take less time per case. You get higher quality per case, less time. And that's a really interesting phenomenon about context accumulating systems. We're going to see lower false positives and lower false negatives at the same time. I mean, today what people battle is you move the needle. It's like you move the needle over here, you get bit on this side. You move the needle over there, you get bit on that side. I said it's whack-a-mole. Yeah. So it's going to change. I just, general quality predictions are going to go up. You're going to catch more false negatives the things you're missing and you're going to find more false, you're going to find more false positives and get them out of it so you don't have to waste your energies. Awesome. Jeff, this is, I love talking to you. It's just, it's always so refreshing here. This is tired, Jeff Jones. Oh God, I'm on three hours of sleep. Imagine when he's like, you know, really on. Really appreciate you coming by and hanging with us here. I know it was kind of an end of the day thing and can't thank you enough. It's always fun to be here. It's fun to talk to you. You got me all pumped up. How am I going to sleep after a talk like this? How am I going to sleep tonight? How am I going to sleep? Imagine if John Furrier was here. We'd go for it. He can go toe-to-toe with you for an hour. It's your fault if I can't sleep tonight from raw and getting me frothed up. So Edge, you know, last word on Edge, you know, we're seeing this thing grow. This is great that you're here, right? Not just the storage show. Stu, you want to get a word in? Yeah, yeah, be nice. Thanks, Dave. No, great conversation, Jeff. So you set the high bars, what you said for yourself. You know, where's IBM? You know, where's the high bar IBM setting for some of those big audacious, you know, problems out there? Well, I'll tell you what. After I'd be on Bump My Company, I roamed the labs. I'd traveled around the world and I would poke my head into a lab and I would go in and share with the lab what I'd do, you know, because that was just the new thing that IBM bought. And then they would show me the stuff they'd been working on. And two things really stood out over, and I saw lots of really neat stuff, but the two things were really close to my space. One was this thing called InfoSphere Streams. It's super low latency, decision pipelining engine. If, I'll tell you what G2 is to streams. If InfoSphere Streams is the nervous system that connects your eyeballs and ears to the hippocampus where you weave the data together to see how things relate, then that's the nervous system. And G2's like the hippocampus. So anyway, I had a lot of affinity to that and I've designed my G2 thing to run in that. And the other one, before it won the game Jeopardy, I was talking to Dave Ferrucci, the principal investigator and he was telling me what he did and I was not hearing the same stuff I kind of always hear in the same space. The way that the problem was attacked was different. And it was different in a way when I went, you know what, all of my spidey senses. You know, yeah, after all these hundred of things I've built over the years, I went, you know, that's really an innovative way to attack that. So I have a really high hopes for that as well. For Watson, yeah. Yeah, for Watson. I'm also really fascinated with Flash. I'm excited to see IBM starting to make some really big bets on that. It's, I really, I like all my systems are run on it. It's just so, it's yummy. It's just yummy. It's made for guys like you. Oh, all right. Get in my belly. That's like Austin Powers, you know? That guy in Austin Powers, yeah. Get in my belly. Well, when you're on Flash, it just, it all, you know, it makes the velocity better. Yeah. All right. Yes, you know, Flash, you know, it's fat bastard is what you're saying. Yeah, that was the guy. Get in my belly. Sometimes I think of G2 like that, you know? It's just like, oh, it's really about what is your observation space, you know? And by the way, most companies could just do a better job making sense of what they already know themselves. You know, okay, fine, they might want to go and get a social media feed, but they got the blue puzzle pieces over here and the red puzzle pieces over there and they're not even relating them and they're in the same building. You get Enterprise Amnesia. I don't know if you've heard this story. I've said it a couple of times publicly, but I did this, we did this work for a large retailer my team did and we, with this, our software and we found that two, every thousand people they're hiring had already been arrested for stealing from them at the same store. Like, this is Enterprise Amnesia, right? They just didn't realize they got both pieces. So there's a lot, you know, when it says like, get in my belly, you don't even have to look far. It's not like you got to look into like exotic and, you know. Well, but databases today are intentionally kept small, right, because they're so slow on spinning disks, so Flash changes that. Yeah, Flash is sexy. I'm like, give me Flash or give me Death. Storage is sexy, as John Furrier says. Flash is sexier. All right, Jeff, hey, we got a run. So thank you very much. Really appreciate it. Good luck. Yeah, thanks. We'll see you hopefully in the other side of August. All right, Jeff Jonas. Keep it right there, everybody. We're right back with our next guest. This is theCUBE, we're live at Edge.