 Hello and welcome to the Big Data Deep Dive with theCUBE here on EMC-TV. I'm Richard Schlesinger and I'm here with Tech Industry Entrepreneur and Wikibon Analyst Dave Vellante and SiliconANGLE CEO and Editor-in-Chief John Furrier. We are discussing big data for a better world. How big data analysis can impact everything from farming methods in Uganda to policing patterns in New York City. So welcome to you guys, the CUBE guys, I appreciate you being with us. It's always fun to hear from you. Are people paying attention to the less fortunate, I mean to, you know, big data is a powerful tool but it's an expensive tool. Is there an awareness of that, that non-profits could use this technology, that technology for good is something to pay attention to? I mean I see definitely traction in this area and it starts with government 2.0 initiative we see with Obama which was, you know, an open government although, you know, people criticize that. It really has been a good effort and what you've seen with big data is, is the ability to use technology in a way that you don't have to be a super geek to do that. So we're seeing the role of a data scientist and other roles where people who understand solutions to problems can now use big data to solve those. So you don't have to be a big company and so you can get things done faster and you can ask new questions and solve those new problems. And you can do that on issues that improve the quality of life for people. Sure. I mean, well this concept of open government is a good one but of course it's selective. Some parts of government want to share their data, others, you know, like the budget office might and the NSA might not. And I think that is beginning to permeate to non-profits and others but this notion of crowd sourcing is what's fundamental to that concept. Because you get more people involved, more minds, you know, million heads rather than one. Right? Well the other thing that's happening in this part of this trend is the ability to do technology ventures has reduced because with open source technology the barriers to start something and create a new solution are much lower and now you have with big data the ability to actually cause change is very low. So it doesn't take a lot. It takes a little energy and you have a lot of people out there who want to do good and big data is a great path for helping people. And it takes data. And there's a lot of that. I wanted to talk about it because we found a group of data scientists who spend their weekends mining data for non-profits and NGOs. They call themselves data kind or the word that I love, generous geeks. So take a look at what they did recently on one of their free weekends. This weekend we're doing our first ever data dive. We decided to go on this mission and try to build this bridge between the data science community and non-profits, NGOs, international organizations and say what are your data problems and how could you use our help? Even if you know it will increase your yields does not make sense to invest in against other needs. For example, education of your children, maybe even feeding your children. We were hoping that we could have more brains involved in looking at our data and deciding what kind of practices are happening in the NYPD. Are they in fact discriminatory like we're hearing or are they not? Scientists have a great need right now to do analysis of data and there just aren't a lot of resources at those non-profits or at those government agencies to do those things. What I'm really optimistic about is that they'll actually be able to help us figure out ways that we can use new tools and technology to get data out to the rest of the world faster and easier. Our primary goal is that by the end of the weekend each one of these organizations has learned something new about the data that they brought here. Data scientists are completely turned on by data. The better, the cleaner, the newer, the fresher the data, the sexier it is. It's sexy if you can actually make a difference and that's what this data is. It's collected by people who want to make a difference for that goal. I'm a data scientist. I'm a graduate student in statistics. I'm a statistician. I'm a data analyst for Hedge Fund. I am a database engineer. I'm an epidemiologist. I'm a data scientist at an online dating site. This happens to be the skills that I have and a lot of these other folks have and we'd like to put into good use. Here's in doing the analysis-y stuff. We have a lot of data on the Wikipedia. I also have a PDF that explains what the actual value is not on our smartphones. You guys have a starting point on needing to criminalize the data. We need to get an SCP client on your computer and I actually realized I had one. Give me two minutes. Not gonna know what that means. Yeah, yeah. I'm 100% committed till midnight. That's the minimum cutoff for me. Closer to the end of the night, you feel more excited than you were at the start of the day and so now we just want to stay here longer. We have so much cool stuff. I'm looking forward to today's presentation because there's so much stuff that happened over the course of the last 24 hours that I don't even know about. We tried to do the latitude-longitude conversions in post-GIS. The problem really lies in the data provider level. There seem to be some kind of seasonal trend. The process is to assess it and the right data has been set up. The very significant thing that we're coming away with in terms of the actual work that was done here was a very clean package that is a building block for a lot of the work that we'll do in the future. It's gonna make it easier for my mother or my next door neighbor to lend to a woman so she could buy a sewing machine in Sub-Saharan Africa. Moving forward with the data, I'll feel stronger about the claims that I'm making and the kind of analyses that I'm running on the data. This organization is amazing. I think that's amazing that through big data I can connect with people that I'll never meet, I'll never see, and I can make an impact on their lives. It's really exciting to be able to do something that we think will actually improve the world. If you can be a little part of the machinery that affects change, I think that's kind of cool. I think my generation will have a good, sort of an important role in that. The idea of really being able to take these raw materials that are the data and bring them all the way through the process to where you can explain to someone exactly what's happening. I mean, that's the most satisfying thing in the world to be able to say, like, I took this thing that used to be a mystery, whether it was small or large, and now isn't, that's a profound thing that you've done, and I think that being able to do that is a tremendous contribution to society. Thank you guys so much. It was awesome. I love watching that piece because it shows how data can be sort of democratized, and it can be used for everything. It doesn't have to be just for big business, for big enterprise, it can be for real good. Is there a sense in the AlphaGeek community, if you will, that there's a necessity to do this kind of work? I mean, is there a social conscience? Well, I think the open-source movement really underscores that social conscience. And I know I could speak from the standpoint of when we started Wikibon, we had two major inspirations. One was Jimmy Wales, we met in Boston, and he had started Wikipedia. And the other was Don Tapscott, who wrote a book called Wikonomics. And the fundamental premise of both of those initiatives was really share everything, collaboration, get the crowd involved, and make it open, and make it shareable. And I know, John, when you started SiliconANGLE, you had a similar philosophy. Yeah, I mean, there's an old expression in the social web, make things available, and that brings more people in. People call it freemium in the business side. But I think the AlphaGeek see things like this, because it's always a bottoms-up kind of organic growth and new technologies. And whether you're going back to the early days of Steve Jobs until when he took over Apple, think differently. The geeks can see possibilities in technologies. So here, you see geeks seeing the impact of education, impact of healthcare, impact of government. So there's a lot of involvement. I think really it's early, and not a lot of people know about it. There's more evangelization needs to be done around it. I wonder how much awareness there is in the nonprofit and the NGO community of the possibilities that... Well, the interview with Strata was amazing with Virginia. She really is applying big data, because what she's showing here is that... Virginia, tell us who Virginia is. Virginia is with... Virginia Carlson. Virginia Carlson, yeah. I can help you. Virginia Carlson. Many interviews we've done over 800 days. But she was one of my favorites with Strata. She's talking about how she's using big data to change the environment around providing care. And what's interesting is that all the data is available, but no one's ever rolled it up and actually look at what's happening. So this highlights a big trend that we're seeing where people actually can instrument their business or environment and actually look at actual data. This eliminates panels and surveys and guessing to go to actual data and actually do the right thing. Virginia Carlson works in the city of Chicago on what issues? Well, the issue is how the government spends their money to provide care to people who don't have a lot of means. And the money's wasted because people are guessing based upon some sort of statistical old data. She's using big data for us to look at real-time data, look at specific population and trend data and bring supplies and care to the people who need the most at the right time. And it used to be very difficult to do without big data. Now with big data, she's actually doing it. But she's having some problems, right? She's having some problems because not all the people want to continue that kind of funding. So, you know, this is where the alpha geeks see the possibility. This is truly a game-changing situation where it will completely change how the government disperses the money. Well, what's interesting about this interview that we're about to play is that she also talks about the kind of data that she can get now and how difficult it is. So I'll let you tee up. Introduce Virginia Carlson. Virginia Carlson, she's doing amazing things. We support her. Watch this video and think this is possible in every aspect of our life and on business to instrument actually what's going on and optimize for that. It's a great video. Take a watch. Okay, we're back live at Strada. We are in the afternoon program here at the Cube, SiliconANGLE.tv's flagship telecast where we go out to the top events and tech and explore and get the signal from the noise. Extract that and share that with you. And we're joined here with Virginia Carlson who's from Chicago, Metropolitan Chicago Information Center. You guys are a non-profit and you do a lot of work with data. So first welcome to the Cube. Thank you. And let's talk about data. So tell us your impressions of Strada and what's going on in the data world from your angle. Well, this is my second time at Strada. I was here last year and as I think a lot of people felt last year, it's so good to be together with the tribe where you can sit down and almost immediately get into a conversation about sort of universe versus sampling and everyone on the table understands what you're talking about. So it's fabulous to be here. What's happening? I mean, what's old and what's new? Because does all this talk about data? Warehouse and business intelligence, same story, new wine, different bottle, kind of whatever the metaphor you want to use. But we're seeing new trends like predictive analytics and real time or whatever that means on whatever parsed definition. So what's your angle on this? Where are we? I'll talk a little bit about where we sit. MCIC, the Chicago Information Center sits as a sort of funnel between big data and historical big data and the common good, public good organizations on the ground that would have needed those data. They still need those data. For example, anything from a local American Indian healthcare center that needs to understand where to open a new clinic to a larger philanthropic organization like the MacArthur Foundation who wants to know whether or not it's local community efforts are making a difference. So we try to do what we call the data intermediary piece, curate the data, analyze it, visualize it, give them the findings. Tell us from your perspective, because you have to go out and scour sources, find sources, because that's the drug. You need source of data. So tell us, what's it like out there? How did you find sources? Are they rolling in now? Is there intermediaries? Are you brokering the data? Are you a data broker? How does someone get sources? And what scrappiness you need? Street smarts you need? The historical perspective, we were founded 22 years ago to do 3,000 household survey projects every year because there wasn't enough data to do local planning and policy development. That went away in 2002 as more administrative and operational data became available from governments and as there was less... worse response rates from surveys basically. So turning to open data and sources at that point, 2002 you had to FOIA most of it, now there's the big open government movement and now sort of folks selling their souls, if you will, selling their private data to Facebook, Twitter, LinkedIn and all the rest of that. So from our perspective, big data is, I want to say a double-edged sword but it may be a single-edged sword and that as more and more data are collected by private sector companies, there is less available data for social service organizations that need public data to do public planning. Are you saying there's data hoarding going on? Are people hoarding the data? Do they want to just control it? It should be a show called Data Hoarders. That's actually a good cable show, we should run that. Data hoarders, data hoarders. Facebook, you're hoarding data. You know, I mean, that's their business model. They need to monetize what they're doing is selling you back your own data in the form of services and advertisements and that's how they're making money but what it's doing is it's sort of... Choking. Choking, there's a good word. Other sources of data that people might use that might be more public, less confidential and that drives things up for us and the Open Gov movement in particular is all about getting operational and administrative data from the federal and local governments. So how does someone get involved? The average person who cares about this because there are a lot of people who do care. If so much data is being made private, confidential, folks are giving away their data to Facebook, LinkedIn, Twitter, all the sort of big three, how can those data be used for social good? For example, if we could get Google search results on folks in different census tracts and what they're looking for as their vision is fading and use that to help the centers for, you know, guilt for the blind, figure out what sorts of vision impairments are going on. So my column, the conversation that needs to happen is with privacy and confidentiality, how can we get the private sector data into the hands of local people trying to work on common good problems? Well, let's explore that a little bit. What responsibility do you think companies like Facebook have and how do we get to that point? How do we get them to companies like that to share that data, to make it available? What role do they have to play? What would your message be to them if they're watching right now? Well, my dream would be that they would see this as a philanthropic opportunity. There are other ways to do that. A number of them have sort of a philanthropic arm where they'll lend out their data scientists for problems, but I'd like to suggest to them that their data is just as valuable as the skills that data scientists have. And we should begin a conversation around how and under what conditions privacy and confidentiality can be preserved at the same time that they start thinking about sort of letting the data free. I mean, if data wants to be free, as they say, let's use it for public good. Virginia, thanks for coming inside theCUBE. We personally care about this society, benefit. Dave and I were talking last night around how society can benefit from big data. The stuff that you're doing and your work is phenomenal. It's exactly the kind of use cases that the, I call commercial vendors, don't necessarily talk as they're not in that business of actually helping human beings, but in the healthcare example and or doing planning around making society a better place, big data can completely streamline and make so much more operational efficiency around stuff that's already existing, that data. So I personally believe in what you're doing. Thank you for sharing with us. Keep in touch with us. Let us know how we can get a hold of you because we want to promote your work. You know, it's interesting because she's, this is sort of the flip side of open government, which everybody thinks is a good idea, but she's suffering a little bit because of the open government and the concentration on open government. Why is that not the kind of information that she needs? She needs data that's real time, you know, that guides her to the solution now so that she can cut the waste and let's face it, we all know that there's a lot of waste and so she's being very bold with this initiative. She wants what a lot of people consider dry, statistical census type data. Absolutely and not the sample. She wants the whole corpus of data as we were talking about earlier. More is better. No sampling. Just give me everything. She kept talking to you. Tell us a little bit about what she said when the camera was turned off. Well, I don't want to really kind of go into too much detail and get her in trouble, but what she essentially was saying was she wants to provide that kind of data mashup because she's identified the ability to use real time information sets of data from silos and put them all into one together to provide real time value, but ultimately it's not just about the solutions, it's an obvious benefit, but there are people that are in her way and there's nothing to do with helping people. It's about the money. People who have the old way of controlling the money through initiatives and causes that were statistically being supported by bad data. So the bad data was driving a lot of the distribution of the money to help the needy. In this case, she has a real solution to show real data, to solve real problems in real time and it's being blocked. It always comes down to this one thing, information is power, right? She's threatening the status quo, she's disrupting it in a very positive way and this is why you're seeing an Occupy Wall Street movement, that's why you're seeing Twitter rise up. The crowdsourcing information is real time and it's powerful. And that's really, I mean, that's where the rubber meets the road if you will pardon the cliche. In big data, it's all about the power. Disruption, it's all about disruption. Well, thank you both. I mean, this has just been great. It's sort of a different way of looking at big data, we of course thank you boys, John and Dave, for your great insights and really, really good knowledge on all this stuff. And we have more installments of the big data deep dive coming up. So stay tuned to the conversation with my new best friends from the Cube right here on EMC TV.