 Hello and Happy Holidays, I'm Kristen Folletti and thanks for joining us at News Desk on SiliconANGLE TV. As 2012 comes to a close, we're spending some time reflecting on the stories this year that have had a significant impact in the tech world. Joining us once again to look at the year in review and to provide his analysis on the future of big data is Wikibon analyst Jeff Kelly. Welcome back to the program and thanks again for joining us Jeff. Well thanks for having me Chris, good to be here. We saw a lot of buzz surrounding big data and the election earlier this year and big data even got its own book called The Human Face of Big Data. In an interview on November 20th, you spoke on how the idea of big data in the mainstream loses the formal term big data and is often presented to the public as analytics or something else. Should we be concerned with the rebranding of big data or do you see this as a positive sign? No, I wouldn't be concerned. I think in fact it may be a positive sign for a number of reasons namely one of the issues when you're talking about big data in a kind of a mainstream sense is that people who are not in the industry tend to think big data, kind of big brother, a lot of privacy concerns, things like that. So potentially not using that particular term might actually be a positive just in terms of the kind of the connotations and the ideas it brings to mind. But I think further we're going to see that term at some point really not be applicable anymore because pretty much everything's going to be big data. When you talk about data, it's all going to be large volumes of data, multiple varieties of data coming from multiple sources coming in very quickly, high velocity. So I think big data is just going to become the norm and we can actually drop the big from that term and it's really just going to be data and the analytics and services that are provided on top of it. In sharing some of his predictions about the evolution of big data in 2013, Silicon Angle founding editor Mark Risen Hopkins said he expects to see an upswing in data journalism specifically saying there's a literacy process that's going to need to take place. Data science is not infallible and is subject to some biases if it's not handled properly. What's your take on data journalism and how do you think we'll see it develop over the next few years? Well, I certainly think you're going to see an uptick in more data intensive or data focused journalism right now. I mean, the Guardian in the UK is one of one media outlet that's kind of ahead of most, I'd say along with the New York Times in terms of bringing data science and analytics to bear on their reporting. There's a couple things to consider, however, in terms of expanding that to more organizations, to more news organizations and that is one, as you mentioned, the skills needed to analyze and parse data that are not necessarily part of the average journalist's tool set right now. So there's going to be a learning curve there. The other issue is, especially when it comes to journalism focused on kind of public policy issues, a lot of the data concern here is housed within government agencies. And government agencies at this point, while some are certainly advancing their data management practices, others are not. So traditionally, especially on the local level when you're doing local news stories, it's not uncommon for journalists to sit with boxes of documents and files and go through them by hand to kind of tabulate data. That, obviously, if data is not in a format where it can be analyzed using technology, then obviously that's going to stall or slow down the process of kind of expanding data focused journalism. So on the one hand, certainly journalists need to improve their skills. On the other hand, the data sources, the data they want to analyze needs to kind of be put into format that it can be analyzed. And that's going to take some time, particularly in government agencies and generally on the state and local level. So then do you think we'll start to see more data scientists in the newsroom? You know, I think so. I don't know if they'll use that term necessarily. You know, I think, you know, not unlike how kind of term big data, you know, is going to become a thing of the past when everything is big data. I think as data science and analytics is applied more and more to journalism, it's just going to become a standard way of reporting. So you might not see the term data scientists, you know, particularly mentioned or used in the newsroom. But I think, you know, some of those skills will be prerequisite for a lot of journalists. You know, in addition to some of the more specialized journalists that focus specifically on understanding database technology and getting the most understanding how to really analyze data at a very granular level, which there are already some journalists in newsrooms around the country that operate in that database based journalists to kind of help the rest of the newsroom. But those skills are going to expand to your more average journalists as the years progress. On November 20th, we spoke with you about Europe's proposed Right to Be Forgotten Act, an act that has been the subject of intense debate with many people arguing it's simply not practical in the age of the Internet for any data to be reliably expunged from history. Can you briefly summarize the act for us and explain the implications such an act would have for big data? Well, the idea simply is that, you know, a consumer, a citizen in the European Union should have the ability to erase essentially their digital footprint. The problem is it's not practical. There's really no practical way to do that. Data points that make up a person's overall profile are housed in disparate sources among disparate organizations and enterprises. And in many cases, willingly or not, citizens have given their consent to have this data collected. They may not even realize that it's often buried in fine print and user agreements and things for social networking and social media sites. But in fact, you know, they essentially can send it to allow their data to live beyond their, you know, beyond their control. So practically speaking, it's going to be very, it would be very difficult to implement such a policy. And, you know, frankly, I think people are starting to understand that really it's not only is it not possible, but they're actually benefits when your personal data is out there. And not just in the way that, you know, companies advertise to you in a more personalized way, but in other services that they can bring to bear to improve, you know, the user experience with various companies and organizations. In your opinion, do you think people want to be forgotten or has a data footprint sort of become a part of us all? Well, as I said, I think it's mixed right now. I think, you know, the so-called millennial generation, I don't think they even really think about it too much. You know, I think they're just it's just part of their lives, you know, sharing their data with their friends, you know, their social networks and beyond, you know, I think older generations, you know, my generation included, you know, that acceptance is not as widespread. But I think as the years progressed, it's just going to become part of, you know, our mainstream experience online. Can you discuss the differences between commercial uses versus real-world applications of big data? Sure. So, you know, I think there are certainly there are commercial opportunities for organizations to exploit big data, you know, whether it's in retail, marketing, trying to better target and segment customers for marketing campaigns to drive more revenue, you know, if it's financial services looking to analyze financial and stock data, for instance, to make better, you know, asset purchases and sales, whatever it might be, you know, there's any number of commercial use cases. I think when it comes to more socially focused use cases, you know, that's when you're starting to talk about what the government can do to improve their services, whether it's, you know, make it easier to interact with various government agencies by personalizing service online based on analyzing user data. We all know we've all, you know, been through the experience of waiting at the DMV. Well, you know, that's never fun. If you could, you know, do a lot of those types of activities online and make it easier to do, do it via smartphone and to do it in a personalized way, you know, those are some opportunities there for government agencies and social services to make their interaction with the public a little bit easier and more beneficial. We spoke this year about Google being under scrutiny for promoting its own services and search results. How could big data be applied to an open noncommercial search engine? And would that be better in your opinion? Well, I mean, I think think about it as a user. If you were, you know, going to Google or any other search engine, you know, assuming you don't have a specific agenda, you're looking for the best results to your search. You're not interested necessarily in having the search engine or the operator of the search engine deliver results that are more beneficial to them. So I think it certainly can be useful to have a kind of content neutral or, you know, a neutral approach to returning search results based on just what the data is telling them. On the other hand, you can understand why someone like Google would want to promote their own services within their search engine. So the question is balanced there. And, you know, frankly, it's probably a legal question as well. You know, there are currently legal cases outstanding that are going to determine this question. Is it is it okay for Google to, you know, promote their own services higher up in search results when organically perhaps they wouldn't appear there? And if so, to what level? So, you know, you could argue on the one hand that, you know, yeah, a perfectly neutral search engine that, you know, that were the analytics and the data behind the search algorithms is 100% neutral and just delivers the best results best on what the data is telling them certainly could be useful for a lot of people. But on the other hand, you know, when Google, a lot of people, you know, are involved or in the Google ecosystem, they use Gmail, they use Google Plus or not that many using Google Plus, but that's increasing a little bit, you know, Google Drive for documents and other things. So if you're one of those users, perhaps you prefer your results to be a little bit biased towards Google because that's part of your ecosystem. So, you know, can go either way. It'll be interesting to see how these core cases play out. And that will really determine where that where this issue goes. We found out earlier this year that Facebook has been internally allowing a select number of marketers to see data that divulged information about the brand's consumers, including interests, such as their favorite bands or TV shows. In your opinion, should social media sites like Facebook and Twitter be able to charge for access to such user data or should it be public property? Well, that's a good question. I think they I think they should be able to charge for it because, you know, they're providing a service to consumers and in exchange, the consumers are essentially giving them the data. Now, the problem is some consumers don't realize they're doing that. You know, I don't think there's a problem with, you know, Facebook doing that. In fact, it's critical to their success going forward. I mean, part of the reason for the botched IPO and the simply lower stock price was all around a well-housed Facebook going to monetize all this user data. And in order to do that, they've got to essentially create services that are attractive to advertisers based on the data they collected from users. I mean, there's really no way around that. So, you know, in that sense, they're selling data. So they're selling personal data again. But it's important to note that not personal in the sense that it's, you know, they know necessarily that it's Kelly's Facebook feed. And here's here's all the personal information about him as a spouse is how many children he has where he lives, all that kind of information. It's it's anonymized aggregate personal data to allow advertisers to kind of segment users and target the types of users they want to target with their advertising and marketing campaigns. So in that sense, yeah, I think it's perfectly reasonable and for Facebook and others to do that. And I don't see any way around doing that if they want to be financially successful and viable. What kinds of issues can a data footprint have on our privacy? Well, I mean, I think, you know, we've all seen instances where someone has posted posted a, you know, picture on a social network or like Facebook or Twitter or wherever, or maybe they're, you know, maybe they're in college and a few years down the road, they're not so proud of that picture. And, you know, potential employer could get that because, you know, guess what, that picture is not going anywhere. That's part of the digital footprint. And even if you take it down from Facebook or wherever, maybe, you know, it could have been copied by someone else, it could be posted elsewhere. So, you know, you really need to be careful what you post online because your digital footprint really is permanent. Jeff, I'm going to stop you again so that we can continue more on this discussion tomorrow. But thank you once again for your time. My pleasure. So for part three of our holiday News Desk segment with Wikibon analyst Jeff Kelly, be sure to join us tomorrow at News Desk on Silicon Angle TV.