 Okay, we're back live here at Information on Demand. IBM's premier conference in Las Vegas is live. This is theCUBE, SiliconANGLE.com, SiliconANGLE.tv. Exclusive coverage of IBM Information on Demand. This is theCUBE, our flagship program. We'll go out to the event, extract the signal from the noise, connect the dots, find the big ideas, find the big data, and share that with you. I'm John Furrier, the founder of SiliconANGLE.com. I'm joined with my co-host. I'm Dave Vellante of Wikibon.org, and we're here with Jeff Jonas, who's the chief data scientist at IBM. A repeat guest. John, we had Jeff on at IBM Edge. Jeff, welcome back to theCUBE. Thank you. Great to see you again. We'd love to have you on theCUBE. Of course, last time I did not sit on the interview because I made room for John MacArthur, who's an amazing interviewer. Does the peer insights for Wikibon. But we love Big Day, we love your work. So, big fan of your blog, your work. As an entrepreneur now at IBM. So, great interview by the way. It was a great interview at theCUBE last time at IBM Edge. Thank you. So, we talked about the big data angle. You talked about the puzzle pieces and collecting that in signal and from the noise. Big data is the big thing here. It's the show, it's information management. It's now morphed into data science. Data from a development standpoint. What's the update on your front between last interview and now? Just, is there anything you want to share from an update that we can jump in with the questions? You know, I've been obsessing over geospatial data. So, over the last year and a half or so, I've been working on something called space time boxes that take latitude and longitudes about like where you are right now and when and I'm putting you in a box. And it depends on what kind of thing you are. If you're a ship, I put you in a bigger box and if you're a person, I put you in a littler box. And if you're moving really fast, you have to account for that and use bigger boxes. And when you do this, the kinds of predictions that you can make around where things are gonna be when and where things co-locator is really interesting. And then, so I've been doing a lot of that. That's fun. And then I've been applying anonymization on top of that. So I'm using some of my privacy by design principles because geospatial data is really going to be super food and how are we going to protect it? So talk about data science. Because, you know, this is the rage. We had the database guys on from the IBM group. And, you know, it's only a few years ago. Databases wasn't a sexy area. It was, hey, what do you do? I mean, no one's really fallen out of it. Now, databases are big. It's fun to be in a database geek. And now data science is a big thing. So share to folks your view of data science as a career, as a role in an organization or in life. Well, I think, well, whether you're a person or an organization, one of the goals you might have is taking your observation space and being able to make the most of it. So I'm thinking about what an organization gets their hands on. It's just an observation space. Some of it's structured, some's unstructured, some's geospatial, some's social. And it's your ability to figure out how the pieces relate to each other. So when I think about big data, by the way, I mean, some people just see a big pile of data as pile of puzzle pieces, and that has a very different kind of utility than information that's assembled, like puzzle pieces into a puzzle. So the more you figure out how one piece of data relates to the rest of what you know, the better the chance you're going to make a higher quality prediction. So I've become obsessed with that, and I think what we are seeing in the markets is we're, and technologies, we're seeing more technologies deal with aggregating and pooling larger sets of data. We had the Watson guys on earlier talking about meta reasoning, learning machines, all that's kind of happening on the compute side and semantic analysis. One of the things you mentioned and you talk about is collective intelligence, making, getting little signals and piecing them together. What's going on with collective intelligence that's now different than it was just even a few years ago? What's changed? What's changed? I don't know how to answer that, what's changed? Maybe more compute power? I don't know that compute power's been the problem. I think, I'll tell you what, I got this blog post coming. It's getting cooked right now. It's called Fantasy Analytics. I'm a slow motion blogger a couple of years. Only when I got something important to say. But I cannot tell you how many organizations have this, I'll call it a delusion, about what is computable just because they have a big pile of data. It's like just because there's a big pile of data they need to golden the hills. And here's my example for you is I'm talking about, I got to change names to protect the innocent, and I'm talking to this one company, they're in charge of, this organization cares about how things move around the world, the global supply chain. So I'm talking to them and I go, what are you trying to do? And they go, we're trying to find bombs. I'm like, that's awesome. I love working on problems like that. What do you got? And they go, well, we got who sends it. I'm like, well, that's great news. And we got who's going to get it. Oh, good. And we got who's driving it, flying it or whatever. And I'm like, excellent. And they go, we got the manifest. You know, it's when you ship it, it's what you write down on the thing or what in it. I go, that's great, what else do you got? And they go, we don't have anything else. And I just look at them and go, you will never find a bomb because nobody writes bomb on manifest. I cannot tell you how many organizations have this appetite for what they think they can compute in proportion to what their observation space is, is incomplete. And what this is going to lead to is organizations and data scientists are going to really start thinking about how to shift their observation space and looking more horizontal. We have to widen our observation spaces in ways where we didn't, maybe weren't planning before. So you have this very narrow observation space, which most people, this is a logical place to start though. Isn't it what data do I have internally? It's like looking into the light though for your keys. Yes, go on. Right, okay. Exactly. Okay, so great. I know it's not here. So then how do you even decide as an organization where to look next? Well, that's what I'm going to blog about is how do you think about widening that observation space? Some, you know, sometimes when you put a puzzle together at home, you've got a bunch of red and white pieces, but they're not coming together. There comes a point where you don't just get the next piece out of the box, you've actually changed your interest. You're like, I'd like to find all the red and white pieces, because I'd like to bring some closure to this. So now you pull the box over and you're now looking for just the red and white pieces. You've just changed your collection interest. So for a business, when you've got a bunch of data piling together, if you've got a whole bunch of maybes, just because the names match, but there's no other features that relate, then you might say, well, where else in the organization or externally can I acquire data that would work like glue and resolve all that ambiguity? So it'll increase our quality of prediction. So watching what is accruing under your eyes helps you better choose what to go fetch next. But I'll tell you, my big prediction is going to be that geospatial data is going to change it all. Just an example of that, because obviously we have mobile phones. We have GPS's on, people know what we're doing. There's no big data. Yeah, but it's the analysis, I'm getting to know. So there's somebody that made his data available publicly and he said, get to know me. I just hope to throw the world what can be done. Well, I'm getting to know the guy. I've computed everywhere the guy's hovered for more than 15 minutes in 610 kilometers. In 6, excuse me, 610 meter boxes. 610 meter box if he's hovered there for more than 15 minutes. So I've taken roughly 36,000 of this guy's records over six months. And every time this guy's like tick, tick, tick, tick, every time he's stopped and hovered, I'm like, he's hanging out. We live in habit trails, man. You know the little hamster habit trails? We don't know it. But if I were to pattern a life, you. Okay, you were living a habit trail life, man. Rarely do you ever get out of there. It's like a deer, they have that little track. It's the trail, right? Right, and it is your trail. Yeah, for us, it's Las Vegas, Orlando, San Francisco, New York. The Cube as a trail, it's your habit trail. This guy's only hovered in 130 places in six months. He's only hovered in 63 places more than three times in six months. And one of the things I just did about three weeks ago was I cut the guy in half. It's like a magic trick. I actually snipped this guy in half and took the first three months of him and said, well, that's him. Then I took the second three months and said, it's a cash cell phone named Sue. That's all you know. But just taking the pattern of life about where the habit trail looks, guess, I mean, it's like a day and a half or less of watching the second device go, that's not Sue. A groundhog day, watching it over and over again. It is so like that. And the predictions that we're going to make for traffic systems and the predictions we're going to make on figuring out what, you know, not sending me a hair care ad when I'm next to a beauty salon. Like why? So the geospatial kind of services we're going to deliver is going to be extraordinary. So let's talk about social data. We had the social guys on earlier talking about, you know, social business and instrumenting everything, odds and compliance. But with real time, you get to have a trail concept. But now we have to think about the observation, think Twitter, right? So Twitter's different than LinkedIn and other social networks where it's a full global observation space, if you will. You can look at some different patterns and you have a lot of different diverse connections, gestures, a lot of noise, but a lot of signal kind of in there. How do you look at something like a Twitter and say, hmm, how do I attack that space? Well, I'll tell you, a lot of people are just working on the general trending, like just trending words and trending places. And my interest is, and how can you see something that's being said? How does it relate to one of your customers? How does it relate to somebody you know? And I've been, actually, I've gone out and I took a bank I was doing some work with and I looked for everybody that said anything positive or negative about them. And then I said, what are the odds you're going to be able to link that Twitter person to a customer? I don't know if you guys have ever tried this, but we work on it all the time, believe me. Always a bit tough. Tell me, can we just agree it's not easy, right? Because you got a Twitter handle, FlamieDog67, and that's it! Like, how are you going to link that to a customer? Yeah, what habit trail is that person on? And who's his friend? Right, you can't even, well, just because somebody's following you, talk about a loose definition of a friend. Yeah, exactly. Okay, you can't call that as a friend. Thousands of friends, what's that, it's a dumb bar number. Yeah, but that brings up a good point. It's like, how do you make sense of all this and apply the data science and, you know, we struggle, I think Ed just showed you our product that we were developing with age-based playing with the Twitter data. And one of the things that we found is by using predictive analytics, we can have a first cut at predicting trend data. Yeah, so trending's fine. If you're trying to put laser beam on forehead to give somebody just the perfect data, just the perfect time, and is it your customer saying something negative about you? There isn't even science in math that's going to fix that. The only thing that fixes that is increasing your observation space. Maybe the Twitter handle also has a Tumblr account. Well, then what you got to do is look at the Tumblr account to find out that there's also a link to an Instagram. And maybe the Instagram has a link to an email address and some other identifiers, right? And the question is, is how do you link diverse orthogonal data together? And that's what's needed to get really atomic level prediction. Yeah. Person level, vehicle level prediction. Talk about the privacy by design. As you mentioned that earlier. So we're living in an age of privacy. And I've always said this one at supermarket, I always put the discount card in there and I get some money back on my groceries. They know exactly what I'm eating. Talk about habit trail. You know, they know exactly what I'm doing. I give that up for the benefit of cash. So I don't mind, right? But people have mobile phones and there's always privacy concerns like don't give away your data, but people actually are okay with giving up their data. How's that fact, is that part of the design process? Do you look at that saying, hey, that's cool or? Well, no matter what we engineer in, I think the trend is consumers are just giving it all away. As soon as you put up an irresistible service. Like, I'm a bit of a privacy guy, so I turned GPS off on my phone. But I got an iPhone. Have you ever turned your GPS off on your iPhone? Yeah. Do you know what it says to you? Do you know what the warning, you've seen the warning? You know, if you turn it off, we can't find it later. You're like, jeez, I really want to turn it off? And we're going to find it getting more and more irresistible. And so that's the consumer and the business trend. The business is going to create more irresistible services. Consumers are going to keep gobbling them up. It's a big festival and we're all running together. Then the question is the engineers, me, my peers, other companies building other stuff, are we really thinking about what we're building? Are we building it in a way where later when we wait, are we going to wake up in the bed that we've made and go wait, we didn't, or the toothpaste out of the tube kind of phenomena? So I've been, for the last maybe, I don't know, eight or 10 years I've become a student of privacy and I've been trying to build more privacy features in and I'm kind of proud of that. Yeah, I'll tell you a funny thing that has changed. Up until, it seems like 120, what, four months ago. Up to four months ago, when I would go talk to companies or governments, I'd be like, hey, you can do all these analytics and you can do privacy. But over the last four months, it's actually noticeably changed where they start the conversation by saying we want to do more with analytics, but we're going to need to address these privacy things. Like I hear them saying it first. It's a really significant shift that I've seen. I don't know. Especially this crowd, right? Yeah, yeah. The I am crowd. That's what you mean by design. They need to address it. It's a must have, at least a feature. Right, but now it's the people that are normally just saying, look, we trust ourselves. We just want more CPU, right? Or more, now they're saying, wait, the kind of data that we want to play with, we want our policy and legal office in our organizations to give us permission to use this. We need to do a better job protecting these privacy policies. And they're out of their mouth. The customer is now asking for more privacy protecting things in the technology, which is an interesting shift. And now, is it a small bump and it'll be gone next month? I don't know, but it's noticeable. I mean, governments are reaching out, asking about. Right, right. We rocked a fella a couple of weeks ago, saying, you know, experience and echo facts, but that's such a small piece of the problem. And it's, guys like you are going to have to address the bigger piece and your customers, don't you think? Well, I think, and that's one of the many hats that I wear is I'm trying to raise consumer awareness and I'm trying to talk about responsible innovation to get more engineers thinking about it. Like, if you're an engineer and you've never talked to a privacy advocate and you don't really have this notion of the Fourth Amendment and the freedom from search and seizure, you don't really have an appreciation for that. There's little design decisions that you make or along the way where you wouldn't know that you could turn left to right, but it turns out if you turn right, it's going to be way better for the future than left. I just give you one example to be practical. Some years ago, like 10, I was hanging out in Washington, D.C. and I was talking to an organization that had watchlists, you know, and the people they're looking for. Some of the records on the watch list, they didn't know who gave them the name or why. Like, okay, so now you meet them, I'm like, what, do you shoot? Do you interview them? Are they already innocent? I mean, how would you know? Where did they come from? Right, they don't know where it came from. And one of the things that led me to was this in everything I create now, it's a full attribution. Every single piece of data knows where it came from. Yeah, yeah, we were just talking about that earlier today. So we were saying how about data DNA. There may be a birth certificate. This was born on this machine, this application. This creates this notion of, well, people talk about lineage and things like that, but there's now data mashups is a really big concept where you say, hey, I got data observing over here, this observation space here, I want to just blend them together, but I got to keep track of, is there a database of data tracking? So that's a metadata problem? Yeah, so I'm a metadata that you want to drag them around with all the data. And I've been building systems that do that, and it turns out to be super essential. And you create that upon creation of the data? Yeah, I don't actually even accept, it's mandatory in every system that I've created. You cannot give it a piece of data. If you don't tell it something about where it came from, it doesn't have a no way to process it. So then you can do data tethering. It means if upstream they do an add, change, or delete. Somebody fixes something on your credit file, and that data's already rippled down somewhere. Right? The question is, is a current in the ecosystem, and how can you keep a current in the ecosystem if you hold a piece of data and you don't know from that which where it came? So you're actually designing technology to put the toothpaste back in the tube? Well I don't know, it's probably a pretty big reach. So how about this irresistible application concept? That's interesting because you think about it, people want utility from technology, right? And we were saying earlier, technology should be invisible. Okay, so that's the case. How do you, what's irresistible from a feature? Is it personalization? Does all this observation and data science create better personalization? Do you think it optimizes your life? I mean like in most rooms, I could just say how many people use a free email service, and then most of the hands go up, and you're like, how many of you read the terms of use? And almost nobody raises their hands. And if you were to read the terms of use, it likely says the data's ours. If you delete your account, the data's still ours. But guess what? Are they currently being used in some harsh way? No, is it basically benevolent? Yes, are you enjoying it and do you love it? It's irresistible, yes. And we're just seeing more and more irresistible services. You know, I have my identity stolen every single year. In fact, somebody stole my identity in Vegas and you use my identity in Vegas, and because it's a card I rarely use, the bank couldn't see that I wasn't really there, because I wasn't trying to act with the card. For eight months, spending about $500 to $800 a month, I didn't even notice, because it goes to my accounting guy and I don't see it. But I'll tell you what, if a bank offered a service, it's almost no more identity theft, because we're going to know where you are when, and we're going to know what habit trail you're in. Sign me up. Yeah, sign you up. Because then if somebody's even in Vegas using it, but they're on my habit trail, guess what? Not me, I'm not hanging out over there. Never, I've been here 25 years, I don't hang out over there. Yeah, that's kind of data that's data you can get. So what are the exciting things you're working on now? Well, the space time box stuff is exciting, and anonymizing it is exciting. My latest invention, G2, which I've been working on, you know, I worked on it a year on paper alone, just getting the blueprints right. I worked on it two and a half years in secret. My engineering team reported to a secretary in the marketing department. Even the antibodies couldn't find me. Ah, you got it. Yeah, yeah, don't tell anyone. It's going across the ground on the building. Builder, builder, yeah. Total Skunkworks, it's a Skunkworks project. That's my project, G2 Skunkworks. But that has now seen the light of day. It has embedded a lightly configured version, it's the full piece of code with the lightly configured versions living inside of SPSS Modeler. So everybody that already uses the predictive analytics product that we have, already have access now with version 15 to my latest stuff. It's really exciting. The number of downloads per month of this, relative to the full body of my prior work, is just one more reason I'm working for IVF. So version 15 of SPSS? Yeah, SPSS Modeler version 15, it's got this free, you can put up to 10 million things in it and it'll figure out when they're the same or related. And it's pre-configured for people, companies, and vehicles. And if you want to add vessels and routers, you can do it while the plane's in the air. And you don't have to change any rules. I came up with this entirely new way, this entirely new way to do work in maritime and do work in genealogy and do work in picket and do it all at the same time on the same computer at the same time with the same config. Okay, so now it's not officially a Skunkworks, it's actually out in the open. It lives in the wild now. And it's a project now within IBM so all the engineers kind of have a manager now. Yeah, they're like, oh, there's a process. You're killing me. You're in a box. That's in a box now. That's in a box. The IBM box. It's in the box. Performance reviews, all the big company stuff. But you eventually have to grow things up. All the stuff that I've done that's become widely commercially available, you have to institutionalize an industrial strength and that doesn't come with a team I'm feeding spam under the door. The hell shall work? We had Pauline Niston today. She said IBM provides adult supervision to the industry. Yeah, well, I definitely have a team that is being adult supervised. So what do you think about IOD this year or the show? Obviously, we're looking at a banner right there. There's this big data, data everywhere. It's awful exciting to see so much energy around analytics and information in general. And for me, the show, there are so many people here that are so interesting. There's so many interesting things, just being able to sit and talk to them. What's the coolest thing you've seen? Oh, gosh. Or one of the cool things you saw that doesn't have to be the coolest. I'll tell you one, as I traveled, after I've been up on my company, I went and visited all these labs and I remember, and it's commercially available today and I still think it's really super sexy and it's really going to see a big uptake is this InfoSphere Streams, this real-time streaming super low latency. When I saw it in the lab before it was anywhere near prime time, I remember looking at that going, oh, that's interesting. In fact, I designed my G2 thing to live up inside of it so when you compile one of these InfoSphere Stream fabrics, you can compile my G2 little thinking machine inside of it. And it's about low latency. And I think as a trend, what's going to happen is the world got networked. So it was first 2.0, okay, great, the power's on the edge. And now with all the data, the question is going to be, how do we harmonize the data and how do we figure out what's related to what? And if you and I have access to the same observations and we're competing, then the very next thing you want to compete on is latency. If I can make a better decision in even an equal decision, sometimes even slightly worse decision, but I can make a decision faster than you, I win. So what about people bringing their own data to the table? So let's take that forward. Okay, I've got the same data, but now I've got the same observation space that want to make faster decisions on the latency side, totally love that. Now I have data. I know something about my customer or my environment, but I want to bring that to the observation space, kind of blend it in. So two parts, one is, you know, back in 96, I did a project where we had 4,200 daily feeds. You can't even inventory all of them. So one of the principles I had to learn is, you need to be able to introduce new kinds of data sources and new kinds of entities and features while the plane's in the air. So we're already there. But the next question is, if you want to take your data to the game, now the question is, are you going to put it up in, in the cloud and somebody else's system and it's your data? That's a pretty tricky notion. So you're either going to choose to bring it down, bring big piles of the big data down close and weave in your special sauce to get the whole picture, or it's going to be these little tricks you can do with anonymization. To anonymize the features and then still discover how the puzzle pieces relate. And that's a main area of my work right now. Yeah, I mean, it seems to be multi-dimensional the way the whole data space is evolving, almost like same spaces. So the last question is we're getting hold on here on time, but what's your vision for the next five to 10 years? Next 10 years out, shoot the arrow forward, five to 10 years, big data, as G2 is now there in there, you move on to the geospatial and all your next big things. How do you see the scene within IOD, IBM and beyond changing? I think we're going to see a better integration between the ability to do real-time sense and respond, which basically tells you when to blink, duck, stop the cyber transaction, let the transaction go through or pick it out on the webpage and integrating that part of the ecosystem with the part where you deeply reflect over what you've been learning. When you sit on the couch at the end of the day and you're just thinking about what you know, you're not reading something, you're just kind of, you're data mining on yourself. That is an extremely important feedback loop. You come out with things there. I didn't realize till I did my second puzzle project with kids, which I trapped on a boat and I got them away from Child Protective Services. Is that a joke? But I didn't realize how important, deeply reflecting of what you already know and I think when we see this integration of being able to sense and respond with the ability to deeply reflect, discover new emerging things and task to make sense of it the next time you see it, I think that's going to lead to a next round of more of smarter, higher quality predictions. So that's the concept that people are talking about, getting more out of the data. In that example, you're reflecting, take the reflection into the learning environment system and getting more insight. And I'll tell you one thing that I'm totally sure about. We're not going to get any, I did this one blog post called Data Beats Math and it basically says, if I give you a puzzle piece, a piece of data, you are not going to get more out of it by using more math and bigger computers staring at one puzzle piece. It's really going to be your ability to fuse and take, to incrementally build on what you're observing and using it like puzzle pieces to expand and establish pictures. Well we're here, Jeff Jonas with IBM Chief Data Scientist with G2 projects and his Skunk we're now integrating into SPSS version 15. So we've baked in there, congratulations. Little entrepreneurial Skunkworks to kind of keep the brain going. So you keep your habit trail in the startup world, right? So I can't wait to hear about your next Skunk course. I'm sure you already have a skunking on right now. So congratulations. Thanks for coming back on theCUBE. I really appreciate it. Thank you. Now this is theCUBE. We'll be right back with our next sketch of the short break. This is Jeff Jonas here at IBM IOD Live. I'm still going to angle with you. This is Jeff Jonas theCUBE.