 Okay, we're back here live in Las Vegas for Information on Demand IBM's premier conference. This is theCUBE, our flagship program. We go out to the events, extract the students from the news. I'm John Furrier, the founder of Silicon Angle. I'm joined with my co-host Dave Vellante. And our next guest is Jake Porway, founder and executive director of DataKind. Welcome to theCUBE. Thank you so much for having me. So you're a celebrity up there on stage, showing the greatness of big data and the social good. But you know, you're a geek at heart like us. A couple of years ago, we were nobody's. Now you're famous. Data scientists? We were nobody's. We love having data scientists. We had Hillary Mason on last week. Oh, she's fantastic. She really, she, if anyone's a celebrity, it's her. She's a rock star. She's so good. Dave and I always get perked up just on theCUBE because she's so awesome. She's such a geek. She's so articulate. Absolutely. Now she's a VC, so she influences money. Yeah, she's got everything. So let's get into it. So here, just give the folks quick update on here what you were doing on stage and some of the things that you were talking about here at IBM. Absolutely. So I was mostly talking about new ways that we can use big data for the greater good. So there's all these great applications you're seeing of businesses figuring out how to help you watch movies better or find a bar that you wanna go to. But the coolest thing to me about the big and big data is that it's expansive. It no longer just touches Wall Street and Silicon Valley and the places you traditionally see data. You're seeing it now where everyone's having their data moment, whether it's journalism or non-profits and the environment. So there's all these great new opportunities to use new data sources to solve really big problems. And so spoke a little bit about how our non-profit data kind has been using data science to tackle some of the biggest issues in the world through these sort of collaborations. So talk about the social media, social movement. It's dynamic. You've got crowdfunding. You've got all the stuff going on with people connected to the web. You're seeing new kinds of patterns emerging from folks on them. He's developing with tagging, let's say hashtags, people dialing in. We love that. So just getting involved. And so like harnessing that loose, federated, personnel asset. Not everyone has their own IT department. Right, it's tough. Never mind big data scientists. So just kind of give the folks a view of what it's like. Obviously a tsunami of potential people to work on something important. Everyone has that fleeting moment of I feel good, I want to donate or I'll give you a few weeks of my time or we've seen Kaggle and other environments. You can count the stars with open source. Sure, absolutely. So give us a taste. How do you harness it? What approaches did you take? Take to harness the crowd in a way that makes it frictionless to get involved but sticky to stay in. Absolutely, well one of the things that's really nice about it is that the community's already doing a lot of this themselves. I mean you see people taking part in Kaggle competitions or going to hackathons and really what we were missing there was that connection to the real world problems. Because if you just leave an excited data scientist on his own to solve a problem, he's going to solve his own problem, which is usually parking your car or finding a bar to drink at. And so really the trick that we worked on was actually less about data and more about translation. About finding a way for data scientists to speak the language of the people who were trying to solve the big problems. So let's say you sit down with the clean water NGO, they don't know what big data is. I mean it's scary to people in the business world much less people outside of that sector. So we've been working very hard to be that middle piece that says, hey, this is what data can do for you. It's not so scary. We're going to show you, we're going to take you through those first steps. And those data scientists, similarly, we can say to them, hey, you know, it's not just about using the coolest new technology. It doesn't matter if you get to use D3 to visualize something. What you really want to be doing is thinking about how you're using this data to solve a big problem. And this is how you guys fit together. I've always said that the best data scientist is one that's invisible to the end user. That's a great point. At the end of the day, your example highlights it. They kind of want to get to point B. Exactly. Clean water. They don't want to know the maps and the nuances. So how do you enable that? I mean, what do you do? Do you sit down and have meetings with them? Is it visualization? Well, you know, it's funny. I thought the biggest thing we'd be doing was data related, that we're going to bring all this data science in. But frankly, the biggest aspect is actually the framing of the problem, really finding the question. You know, as any good data scientist will tell you, it's not so much about the data as it is the question you start with. So we spend a lot of time bringing data scientists to sit down with organizations to really understand what they need. So we had one group that was a healthcare network and they said, we've got all this data. Let's build a visualization. That sounds really exciting and you might just jump in and do that. But we had data scientists sit down with them first and say, whoa, why? What are you going to do with that visualization? You know, everyone wants a dashboard, but dashboards are useless unless you do something with them, unless they drive some active behavior. So after they got to talking, you know, the organization started to say, oh, you know, I guess what we really want is just knowing, having like an alert, like an email saying that something's changing in our programs, that this many people are now sick. So frankly, maybe we don't need the dashboard at all. So that's a lot of the process is actually just getting to the real question before you even bring the data in. And that's been, to me, that's actually part of the, actually one of the most exciting parts of the process. So Jake, another data scientist that we had on was Jeff Hammabacher, right? He's very famous for having said, the best minds of my generation are trying to figure out how to get people to click on ads. That was actually the inspiration for data kind of. We're sitting here. I was gonna say, he would love what you're doing, right? He does love what you're doing, I'm sure. But so are you as negative on that whole clicking of the ads thing or do you think there's some good that can come out of that? Well, it's a very necessary tool to have. I mean, you know, business is a very important aspect. I don't like people who go one side of the other, right? Who vilify business or, we'll often have people come to us to work a data kind of saying, you know, I do evil during the day. So you're good. No, I think that's not the right way. It's not selling out, oh, I sold out. I hate that term. It's like, people need to make money. Absolutely, and there's a great place for that and big data has a huge role in that. So I wouldn't say that it's evil. It's just it's, we'd be selling ourselves short if that's all we focused on. I mean, to me it's like, I've always this typical Silicon Valley, I won't live in Silicon Valley for 13 years, but this sounds very Silicon Valley like they're not thinking big enough, right? I mean, if you think about it, the problem with big data is the creativity is really the power, right? Unleashing the creativity. So, do you agree with that? And what would you say to folks out there saying, hey, how do you get someone to think big in a way they can't ask these questions? They've never had a chance to ask these questions before. So how do you get them to open up and think big? Yeah, that's such a good question. And actually, it's one of the things you'll hear a lot about when people describe a data scientist, right? They have to be creative. They've got to think outside the box. It's really hard to do that, I think especially because right now, big data is still treated as a mystery. A lot of people black box what they're doing, you know what they'll say. But she consultants. What's that? But she consultants. They make money, clicking ads. They respect our consultant friends. But yeah, exactly. They're the evil consultants. Yeah, you hear things like, right, people will go, I can't tell you the secret sauce. It's an Excel spreadsheet. Yeah, exactly. But they'll come out and say, No, it's a model. It's a model. Right, but they'll be happy to say, oh, well, we unleashed the insights and released the power. But what does that mean practically? And that to me is what helps people think bigger is to show them practical examples to say, here was a problem we had. Here was how we tackled it. And you know what? The other part of the story, before we get to the gold at the end of the rainbow, is that we failed four or five times. We tried three models that didn't work. And then we tried this, and you know what? The data wasn't there. I think the more you hear those case studies and those stories, the more that you get people to think outside of the box, outside of the black box and say, ah, now when I see a problem, I can think of four or five other ways I've heard of this being tackled before. Now I can sort of think bigger and more creatively about that. What do you think of social data as the potential to drive predictions? We talk about prediction markets all the time. We had Nate Silver on a couple of shows ago. What's your take on the potential of social data to predict things? I mean, it's almost immeasurable, right? It's data about us. It is the digital representation of almost everything we do, everywhere I check in or everything I say. So the digital for prediction is huge. And we've seen this in so many examples. Like as you said, Nate Silver has, I don't know if he actually looked at this, but I don't know. He's using traditional sort of polling data, right? So it's still fuzzy, the social data. Fair enough. I was gonna say actually there's a group, Crimson Hexagon, who looks at social data and Twitter data to do predictions of big events and they have... Like iPhone announcements and things like that. Yeah, things like that. That's probably very predictable, but... Sure, sure. Okay, Crimson Hexagon. Yeah, great guy Gary King out of Harvard started that. And they've been looking at prediction markets for politics, so elections. They've been incredibly accurate in elections predictions. And then you're seeing, I used an example in my keynote today about people who are tracking social media for mentions of flu symptoms and flu conditions similar to what Google flu trends did. And there you can see when people talk about the flu, they often have the flu and that's a great leading predictor of where flu outbreaks are happening. Yeah, so I remember Nate was kind of negative on that. You weren't there, you were watching. No, but I remember I kind of tweeted it. Yeah, what was his negativity about it? Nate basically said that he thinks it may be promised down the road, but there's just not enough there based on his investigation so far. No, he was talking about tooling. The tooling available for the... Well, I'm calling the general purpose data scientist. Okay, not the, right now, I think we're pioneering, you guys are in particular. There's a general purpose, and he considers himself a general purpose data scientist. Give me the data that's available that I can touch and I'll synthesize it. I love that we haven't even defined data scientists yet. Now they're subclasses. No, I mean... Yeah, he's old scientists. No, no, I mean... He's old scientists, right? But he's like, you know, okay, he's playing, he's a data jockey, he's our Jim Ratt, as Hamer Bakker said. He's a data nerd, like us, right? So he's like, he's in there with the data, but he's not coding, exploring data sources, you know what I'm saying? So he's saying, was he saying there's not enough tools available for, say, an analyst that... Yeah, yeah, off the shelf tooling to make things quicker. Tooling, he's also talking about the maturity of the data set. I mean, again, he is old school data scientist. He's using polling data from multiple polls. Sure. You know, aggregating them, analyzing them, building a model, and... Yeah, we... It's so funny. He's just going to show you how much headroom there's left in the business, a lot of room. Yeah. I remember we used to call them statisticians. Yeah, right. But, yeah, no, I think that's a great point. There are a lot of, you know, there's probably a lot of maturity left. You know, and in fairness, I don't think Nate necessarily refers to himself as a data scientist. I think he would probably consider himself more of a statistician. There's actually, there's a great post by Cosma Shalizi who's a statistician out of Carnegie Mellon. And frankly, it's about data scientists versus statistics. Not that this is necessarily what anyone cares about, but I'm just going to say, I'm thinking of going back to statistician myself. So it's really, it was a chance, statistics lost an opportunity, basically to say, hey, there is a profession for taking large amounts of data and drawing conclusions from them in a principled way so that you don't introduce bias and you see you can take care of missing data. And they missed that opportunity to get on the boat. And now you see all these people who are suddenly given data, dropped into their laps and everyone's talking about big data and how to deal with it. But no one's talking about the fact that these people need to know statistics. Like that has been there for exactly that purpose. And we're still just sort of talking about how to make programs. I was monitoring a panel with the GE, had a big customer thing. Jeff M. L. was on the panel, all these luminaries from the executive company, he's talking about big data impact of their organization. The keynote speaker was Florian Zettelmeyer, a director center of data analytics at Kellogg. And so he's given his talk and one of his epic sound bites was he said, he says, I talk to customers all the time and they asked me, what should I do for data scientists? He goes, do you hire a CF for this to see us? Do you hire a CF for this? I don't know finance, but I have someone on my team that does. His point was that analytics has to be ingrained in the career you're in. That's like hiring a CFO that doesn't know finance, but has someone on his team that does it. So his point was is that if you wanna be successful on the data front through your organization, that that should be a discipline. What's your take on that? I think absolutely. If the question is just that data science and analytics should be a discipline, or if you need to have some. No, how do you get folks that, because you're working with people who are like, hit yourself a business outcome on the weeds and technology. So you're exposed to, I'd say the first generation of, what is that animal again? What's it look like? It's data, using data to be successful. Over time, that's gonna change where it's more ingratiated to the organizations. How do you talk to people that are maybe not scared of data? Or what's your advice to folks out there that's like, here's how you get into it? Oh yeah, fantastic. Thankfully there are tons of ways to get involved in this now, whether it's Coursera courses on data science or new universities that have data science programs kicking up. But one of the practical piece of advice I'd get before getting into the larger idea of curriculum is just go out there and start meeting the community. Come to things like this. I see meetups around the New York City that are like the data science meetup where people actually come and just talk about statistics. Data Gotham was a big event. Data Gotham was a huge event, an awesome event. I really love that one. For those who aren't familiar, it's data science convening in New York that highlights all the really cool things going on by practitioners of data science, like kind of for us by us. It's not under the banner of a company or a product. It's almost like a conclave. Yeah, a conclave. I'm not sure the conclave is, but I agree. Sounds religious. So yeah, that's what happens when the pope dies. White smoke comes out, okay, we authorized that standard. I do. Yeah, right, yeah, you don't want white smoke coming out of your shirt, so that's not good. Yeah, so I think that's it. Just get out there and get familiar with it. Know what you don't know first. What's your take on the IBM experience here so far? What's your take of the IBM personnel? Obviously, Watson's been a big marketing home run for them, as well as great technology, and now it's turning into kind of a productizing across the group. Totally. What's your take on the IBMers here? IBMs, so far, just here has been a huge, amazing experience. I mean, first off, the conference is gigantic. So the number of topics that it can span is just mind boggling, right? They have all these tracks and everything from cloud computing to social media data. It's super impressive. But I've always been sort of an IBM fanboy from the early days because some of the research and development that they do is so groundbreaking. They're not just a vendor, right? They're not mailing it in on the R&D side. Yeah, not at all, right? They've got, I mean, the fact that Watson came out of this, that's amazing. Watson's gonna be like, you know, picking the next president, something like that. So that's amazing. And then of course, near and dear to my heart, they've got a smarter planet, right? So that's using big data for the greater good. It's very in line with the things that I believe in. So where are we with the social good market right now? A lot of people can get connected. What do you see it evolving into? Full-on community like open source. Is it gonna be much more like its own little society? I don't know. I hope so. I hope we get past this point of seeing a division between social good and industry. I mean, I mentioned in my talk today that everything sort of went through its computing moment in the 90s and now every discipline is kind of going through its data moment, which is a great quote that my pal Mark Hansen says all the time. And it's true. Like I don't want there to be this division in the future of social good versus not. It's just everyone is going to reach this greater level of data literacy. And I think that's what we're gonna see is that just like everyone has a computer in their business, everyone's gonna have some data capabilities. And I would love to see that community talking about it more. Like I said, more case studies. What are we learning? How are we failing? How do we do this better as a community? Okay, we just got in, you confirmed for 530. Okay, we'll get the guests coming in. We get the crowd check going on. We'll get you to hear what's coming on. Yeah, we're gonna get the queue. We gotta move faster here. So thanks for coming on the queue. Really appreciate it, Jake. We're a big fan of your work. Congratulations. Thank you. Great to see you on stage. And again, social great goodness that you're doing. It's just, I believe that we're gonna see stuff that we've never seen before, mind blowing. I think so too. And if you want to follow along with that mind blowing stuff, you can always go to datacind.org. Follow along with our newsletter or get involved. Join us. We feel the passion. We have open source content. We build crowd chat for that purpose. Free product to collaborate groups. This is theCUBE. Keep watching, extracting the signal from the noise. We'll be right back with our next guest live from Las Vegas at IBM IOD right after this short break.