 So I'll be extremely brief. I have the honor and pleasure of working with a lot of smart students Researchers and other folks in the position I have so I get to introduce all of them And I will just jump up in between each one So if you're playing civic media bingo now is the time to play you will win very quickly There's a lot of cool projects touching a lot of things that I know people in the room are interested in So we're gonna go ahead and get started with Nathan Matthias who is now a PhD student here in Center for Civic Media Nathan Hi, I'm Nathan, and I do data analysis design and Experiments on cooperation online, and I'll be sharing Newspad a project that I started at Microsoft research and eventful which my colleagues colleagues carried on and both of these projects are related to this idea that sign-up views of Building community through reporting Imagine the attention curve of news online all the way from Kim and Kanye to an individual tweet We know that news organizations and Wikipedia can cover the left end of the curve really well And that curation tools can cover stories that at least have a few tweets about them But there's a huge need for work and tools that support things that haven't yet been tweeted about and things in the middle Of that curve things like the Capitol Hill a garage sale this Profoundly important community building event in Seattle that doesn't really get reported by the mainstream news even in Seattle to start out Shelly Farnham and Emma Spiro did developed new algorithms for detecting community and neighborhood based Twitter activity and looked at the different shapes and structure of networks in different communities to find ways to Help people collaboratively create news We know that cooperative systems like wiki news can help at the left end of the cure curve But curation tools are tend tend to be single user and aren't very collaborative Which is where we came up with the idea for newspad imagine once an event happens Maybe there's some sparse media Someone creates a newspad post that anyone can read and edit live people can also recruit others to contribute to this post and Important sites within the community can embed the article and the editing tools into the different parts of the community To create a positive feedback loop that brings more readers and more contributors to the post. Here's how it works We help novice creators Create a news article by helping them generate a listicle style news title and then structure their article in a way That's easy to contribute to We also structure participation by inviting contributors to list their names take concrete tasks like ask for improvements offer to help Read the article and we ask ask people to check into the article to say they're willing to help even if they're not sure how yet And then using social recruitment We have every single section in the article linked to the ability to ask your friends on Twitter or ask people via email to contribute to the article over the last summer we started with Small prototypes using actual print cards in the neighborhood and then moved on to cover a wide variety of events in Seattle and Elsewhere including last year's civic media conference Building on this idea newspad Really had the idea of trying to streamline the recruitment and collaborative editing process Andres and Elena Designed a system to structure different tasks that people can take to contribute to a news story So they broke individual stories into concrete tasks that different people can complete The idea is that you can you go to eventful Request an event to be covered to say what type it is how long it's going to last It will then create a checklist of different tasks that are need to be completed to write the story Those tasks can be sent to community members by your community mailing list And they can also be sent to task rabbit using dynamically allocated labor pools to fill in the gaps So the task rabbitters go to the event They have a checklist of things they need to do interview people get information take video That's then passed to a curator who looks at the tasks approves of the work Sends more tasks back as the story evolves and then puts it into the story Here's a screenshot of the system which you can see now an eventful That gives people a mission-based view of the tasks they need as a group to write the story and Then that story can be fed into systems like newspad into a common story and Shared back out to the community in that positive feedback loop that I described previously Newspad has been open sourced by Microsoft and I'm doing ongoing work on it And you can contact Andres and Elena for more information about their projects. Thanks Thanks, Nathan. If you want to think about how to make your projects more collaborative talk to Nathan He's an expert at that So next up we have Catherine Dignasiou who I've had the pleasure of working with over the last couple years Catherine Hi, I'm back So if journalism is based on the who what where when my how I have been working on the where of the news for the past Couple years at the Center for Civic Media So one really exciting announcement I have to make is that Rahul Bargav and I have recently released recently released a technology called cliff Which is an open source geo parsing technology and it's called cliff because it's based on a technology called Clavin So you might be like What's geo parsing so geo parsing is the process of extracting place data from unstructured text like news articles What that helps us do is look at news stories at scale and come to conclusions about long-term patterns trends and biases in media attention across news sources So for example your research might be inquiring at what kind of coverage towns in greater Boston are getting or Geographic media analysis of the Boston Marathon bombings Or an interactive tool that helps media researchers search for stories with geographic dimensions But the challenge that I wanted to take up for the combination of my research here is around an idea that Ethan Zuckerman talks about at the end of his book Rewire How can we engineer serendipity? So the very quick argument for why this is important is as follows The internet arrived with a lot of hopes and dreams for civic engagement and cosmopolitanism increased access to information increased tolerance for diversity global citizenship and today we do have unprecedented access to information However researchers like Ethan and others show us how homophily the tendency for those to for us to group ourselves with those Most like us is alive and well on the internet as it is in physical space This is combined with greedy corporate data collection and personalization algorithms that guess what you want and filter accordingly So the internet technologies that most enable our internet are informational selves Powerful search algorithms like Google social sharing sites like Facebook or recommendation systems like Amazon are the ones that reinforce our Homophily through personalization by giving us what we want what our friends want and what people like us want So I want to show you my very small Experiment in the space because it's a really big, you know sort of area of possibility My project is called Terra incognita 1,000 cities of the world and it's a project I'm working on with Matt Stempeck from the center So it's a serendipitous global news recommendation system and game It's based around exploring news about what we have deemed the 1,000 most important cities in the world It's an extension for the chrome browser. It's live in the web store now So when a user opens a new tab in her browser Terra incognita presents her with a map of one of these thousand cities along with recommended news stories about that place So it's a quick tour of the user interface The main components are that it shows you a city that you have not yet ever read about and then there's a couple of options for interaction There's the equivalent of the I'm feeling lucky button and then several options for scanning news headlines and looking at news recommendations So it looks sort of like this for the city of Shaman China So you can see that there are five things to read and those five things take you to articles from foreign policy Articles from Wikipedia to government websites and an article from global voices about protesters Sorry, oh my goodness Okay Oh my so in any case The one interesting thing about this project was that we needed to source many many news Recommendations and we needed them for the 1,000 cities in the world We source them from many multiple locations So we took data from insta paper from other users browsing history in the system We ran a crowdsource campaign we use human curators and we use live real-time APIs like bitly We prioritize alternative and local media news sources over over mainstream media and Even so I have to say there are huge numbers of blind spots One of the takeaways from this project is that it is actually still really difficult to get information in English About certain places in the world So I want to share with you some of the preliminary user study results that have come out I haven't published these quite yet But what it shows that is that 62 point five percent of users show an increase in Geographic diversity of news reading after installing Terran Cognita and a really important kind of secondary finding is that 55 percent of people are actually reading more news in general just any kind of news doesn't necessarily have to have a Geographic dimension or not so really encouraging and interesting results So our next steps is we do want to kind of complete the study do another round of development and talk to all of you about Fun serendipitous ways to engage with kind of getting outside of ourselves in terms of finding information So I'd love to talk further with you all you can find me in the fall in the Emerson Department of Journalism where I'm gonna be a professor of Civic media and data visualization. There's my contact info. Thanks so much If you are if you are trying to make your project more fun, you should talk to Catherine I consistently rely her on her for suggestions when I have something that is too boring and needs some help So next up is Ed Platt who is essentially my invaluable right hand at the Center for Civic Media. So Ed Everybody So every month Billions of people are watching and posting online videos and there are great way to communicate across borders I can post something now and it can be available around the world the billions of people in five minutes So when YouTube posted Lists of the top ten trending videos per country per nation for over 50 nations We got really excited at the Center for Civic Media and what grew out of that is what we watch a Collaboration I did with Rahul Bargov and Ethan Soekerman Video is in particularly good good at looking at culture You can capture things like music and dance that you can't capture otherwise so p-squared's video for personally is a They're a Nigerian group, but they're doing a tribute to Michael Jackson an American pop star Videos also great at crossing boundaries because it can capture spectacles. You can watch a hundred once in a lifetime Events that have been captured You know one after the other here's an example of in Paul is running from a cheetah jumping into and out of a tourist car running through the windows and If there's been a trend So not just if someone has watched a video But if lots of people have watched a video in a particular geographic region, you know that two things are true There's some some reason that that video got in front of all those people and after they're done watching it They have a shared common experience now, so it's great for looking at culture Most Americans might not understand the lyrics to the Korean Gangnam style, but you will see it in the dance clubs here So one of the things we did was build a online explorer that lets you not just see Which videos trend in a particular area, but how videos are connecting different nations? You start you start with a map of the world and you can click a particular nation to see the videos That are particularly interesting to them and also a heat map the heat map shows you How many trending videos they have in common with other nations around the world? If you click one of those nations You then see the videos that those two nations have in common and this can be really fun If you click two nations that you think have nothing in common You'll you'll be surprised to see they do in fact share things in common and it's fun to look at what those are if you want to look at a particular video like Ilvis is the fox You can see where around the world that video Trended and how many times and you can also look at over a range of countries how a video Spreads over time So one one other thing we looked at was the data. What does this tell us and? we looked at 57 countries over about seven months and The main thing we saw was a very decentralized highly connected mesh so every country pretty much shares at least one video trend with every other country and It's it's decentralized The the videos that two countries share are different from the videos that two other countries share So this this shows that there's no particular There are no particular trends that are nations Here and that videos are definitely crossing boundaries, but if you zoom in on the differences Interestingly enough you actually see structures that reflect pre-existing cultural and Geographic boundaries, so you see a large cluster of the Western world Then you see clusters for the Arab world sub-Saharan Africa Russian Ukraine, etc and one thing that we see here is that Countries that are both central to clusters like Australia Canada New Zealand have high migration as Do the bridges like the United Arab Emirates that connects the Arab world to the rest of the world and Here is the most Widely shared video in our data set. It turns out not to be a cute cat, but a cute baby That was listening to its mother sing and cycling through a range of emotions crying singing crying again And that hit 54 countries in our data set So you can use this now at what we watched at media meter org and I'm happy to chat with you either here or on Twitter Thank you So like I said Ed Ed and I work together to support almost all of these projects So he's been invaluable in that way in addition. He's also one of the best people I know it explaining complicated mathematical concepts in simple terms I think is he just experienced so that's incredibly helpful to me So next up we have Chelsea Barras and Hello So I'm going to share with you a little bit about my current research for my master's thesis So in 1987 The New York Times took a poll asking Americans if they still believed it was possible to be born poor To work hard and then end up rich at the time 57 percent of Americans said yes Two decades later amidst the worst financial crisis that we've ever seen in our lifetimes in my lifetime That number had actually increased to 72 percent So even though you know our politics in this country are pretty polarized It seems like there's one kind of belief that most Americans still kind of believe in the belief in the American meritocracy Access to educational opportunity is kind of at the foundation of this idea However in recent times we've seen the cost of a college degree skyrocket while the return on that investment has steadily diminished So yeah today about the average college graduate graduates with thirty three thousand dollars of debt Meanwhile unemployment and underemployment are at all-time highs What's worse though is that these hardships aren't evenly distributed across the population So today african-american college graduates have double the unemployment rates of the rest of their peers who graduate from college also recent research has shown that Disturbing trends of downwards mobility for people who graduate from college who came from the bottom quintile of the household income bracket So amidst this kind of dire background It's interesting that we've seen a surge in optimism over the potential for higher education to be Redemocratized with the rise of the internet and for new opportunities to emerge for us to access riches and and stake out a path for ourselves So I'm talking about the stories of the guys who leave college and pursue of their next big idea And also teach themselves to code and other really great skills using the internet But I'm also talking about the stories of the underdog the kids from rural and remote places in the world who are now able to access Resources from some of the most elite academic institutions in the world and then get into them At the heart of this story is really the tech industry and perhaps rightfully so There have been groups of programmers and hackers who've really created a lot of really vibrant resources and online communities For people to be sharing advice and help each other in practical and applied problem-solving This has led for a lot of leaders within tech to really describe tech is kind of the new frontier of meritocracy in the world however There's been a growing kind of concern going on about like the lack of diversity within the tech sector So as many of you probably know Google just recently released its workforce statistics, which show that still largely today The workforce within tech is largely white largely male Which is kind of showing us that even today is kind of world of open and free online resources The protagonist of the meritocracy story is still kind of the same as it's always been At the same time though, there's some really interesting experiments going on right now specifically within tech trying to broaden this growing sector to More accessible for more people. So they're interesting platforms kind of emerging trying to do better partnerships between Companies who really want to recruit people with more marketable skill sets Developing curriculum that is more practical and applied that people can actually do while they're actually working full-time as well as Organizations that are trying to develop cohorts for groups that are right now under represented in the sector trying to create a more vibrant social Support network for people to learn outside of a formalized classroom setting So this summer I'm actually focusing my research on understanding What are these kind of new and emerging learning pathways that people are trying to take in order to actually become? Software engineers or web developers within the tech industry to do this I'm actually partnering with a recent recipient of the night news fellow grant Code 2040 I'm going to be spending the summer with them and the 27 fellows that they're bringing to San Francisco and Silicon Valley and I'm going to be learning from them really. How are they taking that first leap of applying what they've learned? To their very first job within the tech sector These these guys are really amazing. I've already spent three weeks with them so far and they come from really a wide range of backgrounds So some of them are computer science majors at Stanford And other ones are working part-time and going to community college and have really taught themselves to code On their own so I'm going to be working with them to really understand Some of their experiences as they're learning in their internships I'm also going to be talking to thought leaders and HR professionals within the field to understand a little bit more about How they're thinking about diversity in their recruitment processes and how they're thinking about this idea of meritocracy in their work So here's some of the key research themes running out of time got 12 seconds So, you know the big the meat of it is really really wanting to understand how this ecosystem of higher education is changing And what are the social and economic factors that are really shaping who has access to do these pathways and translating those into Opportunities in the future so at a time would love to talk about this more. Thanks So it's hard to overstate how much we value the sort of sociological and anthropological anthropological focus and lens that The folks like Chelsea can bring to not only this project, but a variety of projects So that said we're about to get nerdy So put on your hats and I'd like to welcome up Sands and Ali to speak about their work Ali Hashmi Center for Civic Media MIT Media Lab grad student. I'm Sands Fish I work at the Center for Civic Media and the Berkman Center for Internet Society So in our world data is ubiquitous and it is sometimes hard to make sense of it So it is very much like this exhibit book from the sky which remains inaccessible to us without a narrative So our goal was to provide a narrative inside for large amounts of data in Contextual frames comprising keywords this is essentially a discourse based approach because it treats media and Data as a form of representation our algorithm essentially Treats it as clusters of it it discovers clusters of topics comprising keywords using a noun-phase approach and The intuition behind this is very simple that we can tease out topical themes through statistical means So we're lucky enough to be working with this amazing platform called Media Cloud Which has a massive amount of online news media. It's been collecting it for five or six years now Just a really massive source and a great system that allows you to Basically target a search use keywords use a date time range and maybe a particular type of media source So we've broken things out into Mainstream media left leaning blogs and right-leaning blogs and a number of other categories So you can focus in your search But what you get even with a keyword search just like in Google is just a massive amount of documents So if you're a citizen journalist if you're a researcher any of the people that have access to this system You still have a really hard time teasing out what the frames what the conversations are so We have been running our algorithm on this to try and tease some of these out on the NSA and then neutrality controversies and What you see is even in this really early work some very clear themes So this first done that we have here Focuses much more on the government. We have Obama. We have policy Freedom is in there and technology a bit. So these are not very very specific frames. They're Conversations that are going on that you use as a starting point for learning what's going on and how to proceed with your research The second one we have here is much more about technology. So you see internet Let's see neutrality Access company data. This is much more the technical frame. This is much more about what the technology is involved in what this The vocabulary that people use that are that are in the know basically for this particular conversation The third one stands out a lot more in terms of what the conversation to the public is So you can see vocabulary here that's being used to describe this to people who are Perhaps like lay people. So the big one is Netflix. So everybody's heard this This refrain in the neutrality debate that says your Netflix is going to be slow And this along with some of the other Companies that are up here. So you see Apple you see Google in there subscriber This is a very distinct Conversation that's going on so essentially The this tool be actually accessible through Media cloud dashboard and Ed and Rahul are going to talk about that in the next section All right, we'd love to talk to anybody that's interested in this so come find us All right, so I'm back with Ed this time So our the next presentation is connected enough that we're going to jump right into it So as you heard about the media cloud platform It's something that actually was developed by the by the Berkman Center over at Harvard And of course Ethan has worked with them for a long period of time So media cloud has is this giant database of news articles that have been collected for some of them over 10 years So we have you know huge amounts of online news reporting and that's enabled a ton of really fundamental and interesting Media studies research and you see these projects coming out of the Berkman Center as deep dives along a couple of dimensions You see things like trying to understand how the Russian blog is fear is working So you see things like understanding the great paper that Earhart and Matt and Ethan of course worked on about Understanding how Trayvon Martin that story moved from a local story to something that Obama was commenting on you see a lot of deep sort of Hugely deep dives coming out of this and that's great And we sort of is is made possible by the database that exists in media cloud that said it's also a high burden right that type of research is very hard to do and the technologies behind it are also sort of have a High learning curve. So after a while we had enough requests that we said alright We need to fix this problem This is a great touch point for us at the Center for Civic Media to collaborate with our friends at Berkman and The fruit of that collaboration is a platform that we're calling media meter and media meter is of course built on top of media cloud and is a Little bit closer to front-facing tools that people can use to actually do some of this research themselves So Ed is going to describe some of what it does and then I'll tell you about where we're going with it all right, so media meter is like Rahul said a suite of tools and a framework the jumping-off point is something we've designed called media meter dashboard and That does two things it allows you to construct queries It allows you to say what keywords you're interested in what date range Which media sources if you want to look at a particular subset of the media sources and it lets you see Results from widgets that represent all of the media meter tools All on the same screen at the same time so you can look at the same data in many different ways to see In which ways it's interesting Like I said you can you can query a single topic you can actually query two topics to compare as well So I'll jump in and talk about all of the Individual tools we have three that we've built to start with the first one is called mentions and it very simply looks for mentions of your Subject in the media online and it tells you where it was mentioned and it gives you the sentence It was mentioned in you can then click that and read the entire article if you want more context Pulse is built on top of some of the work from Earhart Grafe on a tension plotter and it lets you track the ebb and flow of mentions the number of mentions of The subjects you're interested in over time These examples are soccer versus football by the way This is one of my favorite examples You can see 52 little bumps So it's something that everyone talks about every week except for except for over the holidays This is actually searching for denies in mainstream media The next widget is called frequency it It was developed by an undergraduate who worked with us named Deborah Chen at the Center for Civic Media And it lets you see what other words are mentioned When someone is talking about a particular subject and if you have two queries it lets you see whether a word is mentioned in both or whether it's mentioned in one or the other and Like I said, this is a framework and we're currently extending it and a lot of the work is not just what we've done But what's coming forward and Rahul is going to tell you more about that So this is the the dashboard tool is available in limited beta right now and I think you are on a second But this is just the start of sort of the the summer trajectory into the fall Each of these little widgets that you see here is going to turn into a full-fledged tool to help Scaffold the media analysis tasks that that we are sort of trying to help people do So for instance Pulse is going to turn into a deeper tool that we've been calling attention plotter before Where you can integrate other things like closed caption mentions your own csv's and actually dig into understanding how the whole media Ecosystem is talking about something Frequency will be a more interactive tool that you can click around on Mentions will let you download the list of stories so you can do your own research on it So each of those is getting fleshed out more and we're extending it over the summer into the fall as well So the work that Catherine Dignity mentioned about cliff Which is the tool we've built for geo parsing and looking at what a story is about what place That is actually going to turn into a tool where you'll be able to have the same kind of access where Hopefully every story in that database will be able to tell you what place it's about and what places are mentioned there So you could do queries based on that as well Similarly the great work that you just heard from Ali and sans is going to be Integrated in some way so that if you find something really interesting Maybe you can do a topic detection on that and then get the results and see if that helps you figure stuff out And like I said, that's the sort of suite that we're building out So if you're curious about that and want to try it Please hit the website and sign up to get access right now It's limited beta because we're trying to make sure the infrastructure works before we sort of open it up But if you get on that waiting list, we'll pull you off of it and come talk to either Ed or myself about it a bit more Thank you very much So next up that was a little heady a little bit research focused So next up we're gonna get something that's a little more practical. So I'll invite Tala up to tell us about that Thanks for cool. So just a quick show of hands. How many of you are interested in what the police is doing? Okay, I thought so so This all started around the Boston Marathon bombings and I was wondering to myself I'm a ham radio operator and I was listening on the scanners and there was a lot of information out there that was Obviously reported before the TV stations reported it, but also a lot of information and nobody reported on And I was wondering why that Happens and I called up a friend of mine at the Boston Globe and I said, how do you deal with this over information overload? How do you handle all of these channels all that data and? She didn't know how to answer it, but she forwarded me to another person who didn't know how to answer it and finally I got to the guy that's sitting next to the scanner and He's just alone there and you know if something happens he alerts someone But there's no at least there and in many other news organizations no kind of proactive approach to this when something really big happens They ask for the information for the police and we'll touch on a couple of problems with that shortly So this is what the newsroom looked like in 1940 and this is what looks like today a lot of differences and This is one of the tools that reporters used and this is the tool used today also a lot of differences This is the police scanner that was introduced in 1976 And this is the one to use today It's cheaper and faster, but otherwise it lets you listen to one channel at a time live and nothing much more than that so The question is how do we allow access to this information the primary point of? Importance is being aware of what's happening around you And also ownership so it's kind of weird that when you want to investigate the police You have to go to the police and ask them for these recordings that when they were live were public domain Just because nobody's keeping records some ham operators and some journalists built their own tools But there's nothing that's really oriented towards journalists Excuse me Also, there are commercial tools out there The police is obviously using something to record the conversations, but they are expensive and you need technicians to operate them Then there is the question of access so let's say you got the data The police sent you 17 CDs of channel 1 through 5 hours 1 through 4 channel 5 through 6 hour 1 through 4 Who has the time to go through all that data if you had access the the thesis here Is that if you had access to this immediately you would utilize it more and you'd be more effective and you'd be able to Get more news out of it, and then how do you how do you analyze it once once you have it? It needs to be a group activity There needs to be collaboration on this both within the newsroom Across newsrooms in the same network and across competitors sometimes for example in the Boston marathon bombings You could hear the radio channels here in Boston, but not in New York and The Boston local outlets could have used help from other reporters to analyze all this data live or semi-live So the solution is an open-source hardware and software platform that very cheaply allows To create a Tivo like mechanism that lets the newsroom first of all record everything at the first stage That's what we're working on now second of all allow for a collaboration on the analysis of the data and in the future To transcribe it and allow to search not only by channel and time, but also other types of analytics How often does the Boston police say 15 Main Street and when is the last time they said 15 Main Street and why? For example, let me show you a quick overview of what we have now It's an overly engineered display, but just to give a Proof of concept and to help you understand a little bit of behind-the-scenes and where this might go Also, we are looking for people to collaborate on this if your newsroom is interested in exploring this talk to us We want to be want to make sure this tool gets used And helps journalists actually get their work done if we build this in a way that requires a new employee to sit down next This tool obviously nobody's going to use it So we're trying to learn how you access the scanner today and how we can replace it So really quickly this is from the lab This is what a single channel looks like this is in lab conditions. This is my daughter's baby monitor and So the the horizontal access is frequencies and the vertical access is time And you can see that she coughed five times and then twice later on This is what it looks like in real life These are police channels you can see that there's chatter on four channels and really quick Call in another channel imagine what happens right now is you're living at the top bar and you're hearing live And if you're listening to one channel, you're definitely missing another when you see this and obviously we need to make it a display That's that's that's a quest the answer we need from you guys What should it look like if you see this and imagine you can click on any point in time 30 minutes ago an hour ago and listen to what happened there and then just that I think would be a great tool Looking forward to hearing from you guys. Thank you very much So that's a great example of some of the things that we can try to do to actually work on the real problems that Some of the folks in this room are struggling with One of the other pleasures that I have is working with all these folks that have talents in a lot of different domains So for instance Ali is actually well-trained as both a computer scientist and a journalist So he and Julia are going to come up and speak about one of the more journalism project focus projects they've been working on Hi everyone, so I'm Julia blues. I'm a health reporter and Night science journalism fellow here at MIT Ali Hashmi Center for Civic Media So today we're talking about this question can big data save health journalism So the problem as you all probably know health reporters like me We don't always do a great job of reflecting science or the things that actually Impact public health. You've seen the stories on chocolate and wine and coffee They're your friends one week and your foes the next week But what do you think? What actually kills Americans so you in the audience tell me what do you think of the top causes of mortality in America? Anything else our disease. Okay. We have a smart audience today So according to the latest CDC figures, it's cancer heart disease and chronic respiratory diseases like emphysema and bronchitis But how is it? How are these things actually reported in the media? So Ali and I just did this quick search of the New York Times For the keyword cancer health stories in the last 12 months 4200 mentions so cancers Mentioned a lot, but emphysema one of the key chronic respiratory Conditions that are one of the leading causes of death in America for results in the last year So we started to think as journalists Could we use this data in a way that might actually help reporters see these gaps and blind spots in their coverage Or see places that they might pay more attention to and issues that might Issues of health that might actually matter to their audiences So Ali and I created this the health gap It's a prototype now that we're building and you can see on the top from the scroll down menu You can choose diseases and then you the the mortality in the population is visualized and the mentions in the New York Times Are visualized as well and you can see very clearly that there's a gap For example between chronic lower respiratory diseases and then the little tiny number of stories That are mentioned in the New York Times about chronic respiratory conditions and So we have found similar trends in other countries as well Here is an example from India where we're actually looking at disease burden and Unital diseases which have a huge disease burden in India are not adequately covered in the leading newspaper Times of India. We're using sources like media cloud Lexus nexus and other big data sources to To actually consolidate data across different media sources And but we want to build up from here. We're ambitious people We think that there's a lot of opportunity to gather different types of data and help Maybe inform journalists and editors on and kind of help them see where there might be again these blind spots and gaps So we want to expand the focus beyond the countries that we're looking at now We only had data from the leading newspapers in India Canada in the US But we hope to include other countries We hope to add more media outlets and most importantly we want to create a feature that makes this tool Interactive so that reporters and journalists from any news outlet can download their data And then compare it to key indicators like mortality Dali's disability adjusted life years Which are the disease burden in the population and other things like one thing that would be really interesting would be Getting research investments so public kind of seeing where there's gaps between where a population is investing its resources and research and then where journalists might be missing opportunities to report on that research that Their citizens are investing in anyway So that's where we're going with this and right now you can see The latest version at health gap dot brown bag dot me And please keep in touch and send us any of your comments or questions at our contact info there. Thanks so much Thanks, so that's one of the sort of a handful of examples You've seen around Taking the data that we've been able to acquire in lots of different places or create and driving action in this Journalist action so the the next speaker William Lee is also another example of using data to drive some action William Great. Thanks, Rahul. Good morning. My name is William Lee and I'm a PhD student in computer science here at MIT I want to talk to you today about a particular project build tracer that came out of Ethan's class on the future of news and Civic media and some thoughts on analyzing open government data So this is part of page one of 848 pages of the Dodd-Frank Wall Street reform and consumer protection Act and as smart as this audience is It's probably the case that most of us did not read all 848 pages So can we use some kind of data science machine learning text analysis to help us understand? Government systems and processes and that's what I want to walk you through today So this is another bill related to the financial crisis the troubled asset relief program or TARP This was passed on October 3rd 2008 after the collapse of AIG to bail out the banks with 700 billion dollars And what I've done here is I've represented each of the 39 sections of this bill just as dots Just the text of those sections and what I can do then is I can start to trace back to all the bills Introduced in this particular Congress and see what was introduced and every time there's a match Sufficiently matching section of text in another bill. I'll make another dot as well And so we can start to walk backwards. You might recall there was a failed house vote in the Failed vote in the House of Representatives and the Dow Jones dropped 780 points This was what was included in that bill and I can keep tracing back all the way and sort of see What was included in this particular bill and then we can look at it and see what was included finally what got into TARP so what you can see here is there was a energy bill that sort of got in there was a stalled alternative minimum tax bill that Got into TARP as well and finally the reauthorization of the secure rural schools and community self-determination Act of 2000 to get enough votes For for TARP or the emergency Economic Stabilization Act to pass and so we can do the same thing then for other bills Say related to the financial crisis in some cases bills have lots of new content content So in the stimulus bill after a new presidential administration perhaps not surprisingly most the content is new except something about Electronic health records. This is the Housing and Economic Recovery Act where the federal government took over Fannie Mae and Freddie Mac And you can see with these some of these regulations were considered for fairly long periods of time and returning to Dodd Frank This is Dodd Frank so here you can see there are certain regulations related to swaps through the antitrust savings clause and gives us a sense of What was considered in Congress and for what periods of time as a small mini extension of this It's possible to look forward in time and take a look at say a bill that was passed Maybe by the House or the Senate but didn't make it to the president's desk and this is one particular bill in 2007 and you can see what parts got into the Housing and Economic Recovery Act and in general I'm interested in extending backward and forward in Congress multiple Congresses or even into things like model legislation or white papers or policy papers to really see where bill ideas are Made where bill ideas or policy ideas are born. So I wanted to spend the last Couple of minutes just telling you a little bit of what I hope to work on I'm really hoping to get your feedback on the rest of the day today or Elsewhere in touch. So in government data in the beginning there were all these PDFs Or maybe before that Government started to put data online, but it was difficult to parse difficult to understand and then as a results of a lot of Efforts of journalists and open government advocates and civic hackers and open source projects a lot of leaders of whom are in this room There's been a lot of success in obtaining cleaning parsing structuring government data as a small tangent I think for me as a PhD student in computer science There's an opportunity to work on what I'm calling automatic structure recovery possibly to help with this Process but beyond that now that this data is available now that this data is open There's potentially new opportunities for new kinds of data science type analyses and Visualizations for understanding government whether it's things like tracing the trajectory of policies in Congress with this bill tracer project Or things like perhaps Mapping the cross citation structure in the United States code or another project We worked on last year to try to unmask the authorship of unsigned per curium Supreme Court opinions So these are some things that might be possible with the data available today. I'd be very interested in continuing the discussion with you Thanks very much for your time Stuff is so cool. I had like 50 ideas already and we still have three speakers left So next up I'd like to invite you Wang who I've not only I've learned a ton about the Chinese context But also I'm just consistently impressed by his ability to get stuff done Hello everybody, I will spend some time introducing How crowdsourcing works in our project the NGO to our project? We works with Chinese NGOs or non-profit organizations and we Priced solutions to their communication needs Technology needs and resource needs today I will showcase two of these 10 projects under this umbrella our Our cross-sourced my map and the field guide to sort of errors for NGOs This is our map in this map every NGO can register on the map and publish their events or projects or update their information and they can also find cooperation with social responsibility information and best practice Practices of cooperation and NGO partnerships The interesting thing is that when an NGO first used our math at the very first time They start to locate themselves within the map and see what's happening in their surroundings One NGO from Hunan province found that there is exactly another NGO just Just in their next door working in the same area But they have never heard of that before so this map may provide opportunities for participation or collaboration or at least let NGOs know each other and Not only NGOs can find other NGOs NGOs can also find corporations for potential fundings and the corporations can find the NGOs if they want to do some corporate social responsibility work and the small companies they can search for the big players such as China mobile to find what they are doing with the social responsibility and In the current version of the map we allow NGOs to mark their Collaborative relationships in the map so they can see a net graph showing their impact or influence in this country and Another thing is our workshop. We this is our 10th workshop on web 2.0 technologies for Chinese NGOs held in Guizhou province and This is our curriculum we teach them how to build social media strategies and how to use the information communication technologies and how to use cloud-based tools for their Organizational management and in the last part we have our non-profit toolbox These toolbox are the few the guides for software to NGOs is a Cloud-sourced platform that anybody can write a piece of article about software Which could be used by the NGOs for non-profit use currently it has more than 130 articles in that and During the workshop we ask our participants to select a Maximum of six articles to build their own tool set That's useful the most useful for their own needs Here is the result of what one organization did they picked up these five tools For their use they includes one knows Microsoft one knows for collaborative note-taking and why why that's interesting because it's a voice communication platform Initially used by the online gamers in such as warcraft, but the NGOs started to use that for the team training Something like that and Mike CRM. That's a free online customer relationship management tool to manage their volunteers and Founders and the VCHAT is a mobile app to you know to communicate within a group and Finally our NGO map so this is how the cross out cross out in works with us and You can follow out our Twitter or ask questions So I can answer you all if you have way more accounts. You can also follow that Thanks So next up I want to invite advice who is working on the midan project Which I think not only myself, but a lot of other folks in this room are following Thanks Hey everybody, I am Ed vice I work at midan with a bunch of really remarkable people in San Francisco Vancouver Cairo Tucson Brazil Most other corners of the world as well We're thrilled to be a night prototype award winner for our check desk project I'm not going to talk to you about check desk today though. I'm going to talk to you about a really fascinating project That like many of the fascinating and wonderful things in my life is downstream of an email from Ethan So I have to acknowledge that Paul Salopec is a Pulitzer Prize winning journalist Who is I mean in in in this era of the the five-minute half-life of a tweet we're doing a digital project with Paul that spans seven years and Traces a 21,000 mile arc around the world Paul is literally walking the path of human migration and and this will take seven years so cover 21,000 miles and Midan is along for the ride We are following Paul and every 100 miles along this route Paul is filing a dispatch When Paul files this dispatches he's he's sometimes stopping in some of the most remote and desolate places in the world Turns out that much of the world is empty in desolate, but Along the way almost everyone he encounters is connected to the internet through satellite phones or mobile devices Along the way there is is is a parallel conversation. That's that's that's happening on the internet so We are Aggregating curating and translating these thin sections through the global social media. So imagine Imagine this as you know modern archaeology We're taking 100 mile samples around the path of human migration So how are we doing this? first we're working with Todd Mastak and and map D to Geo bound and temporally bound In an initial data set of social media so the workflow is on gets a Message from Paul or Patrick and and they say okay What if here's the here's the lat long of our next coordination of where we're gonna be in two days We then kind of build this data set and then like any good scrappy nonprofit We figure that well we should make use of tools we already have so we hacked the Check desk platform you see here's here's check desk, which is Basically an annotation framework that allows journalists to create a change log verifying social media. So we're using this to Create translations and create an embeddable object with those translations And we're also using it to we hacked it to create some translation notes and see here the the Content the translation and then translation notes that provide some context so it's it's the the resulting We sort from ten thousand pieces of social media down to fifty or a hundred translate those and That that that kind of editorial prerogative is is anything that's poignant mundane inspiring, you know Or you know Trivial that's anything we just want to provide what we think is a thin section across this these sample points, so Why are we doing this big picture? I'm gonna I'm gonna provide some backwards Context and then some forward context This is Mona safe's Facebook page It's really important if you want to understand Egypt you should be able to understand this Facebook page and Mona's brother is the blogger and technologist a la So you see your profile picture here three jailed Egyptian activists Can't read it. We've been working at me Dan since I think the first social media translation project that we did Was 2008 and and here's a And then 2009 we did a Farsi translation project translating tweets I Mean this so poetic we place our feet in our own step own footsteps, and we set like the Sun This was in in Farsi would never have been never would have reached outside that language community We also did a lot of work around the Obviously around Arab Spring having an office in Cairo Whole project started with a partnership with IBM Foundation. We created one of the first cross-lingual websites Way back in the day so Where we're headed with this project The forward context so a seven-year project National Geographic is supporting Paul in his walk We're we're bringing in resource behind the technical side of this Seven years gives us I think think where the web is going to be in seven years as Paul traces this arc of human migration we're going to trace this arc of creating the cross-lingual web and and This this is I want to close with this and then I want to read a Tweet that's going to be sent about something that I said so that I'll actually say it so it can be accurately tweeted This is a beautiful quote and And it goes to the heart of the you know what we've been talking about over these last two days Which is the the importance of the connected web. This was translated by my colleague Anas Kitesh at Anas and and the There is a parallel that we haven't talked about at all in these two days and the parallel is language Language is just as important as connectivity language is an access issue. It were isolated The worldwide web is is a thousand siloed webs right now and it won't be open Until translation is baked into the structure of the web. So we need to work with Facebook We need to work with Twitter We need to work with Mozilla to bake this in so that communities can collaborate to translate important content Thanks very much Great so last up we have Dan Harlow and Allison talking about the amazing hackathon that happened right before the conference started So yeah, hi, I'm Dan sinker. I'm the director of the night Mozilla open news project We're helping to build and strengthen the community that is coding in and around journalism And one of the things we've done for the last three years in advance of this conference Thanks to the support of the center and the night foundation is run a hack weekend two days Saturday and Sunday We grab folks that are going to be coming to the conference and we give them the opportunity to to build and collaborate together This year we had six really great teams And I wanted to just give Allison hurt and Harlow Holmes a chance to talk about two of the projects that came out of this weekend's hack weekend Hi, I'm Allison and so when Russia annexed Crimea there was a bit of discussion reports in the news about how on the Russian version of Google Maps Google.ru they saw a version of the map that showed a very solid line separating Ukraine from Crimea indicating like you know Russia owns this now But other versions of the map like the US version of the map or the Ukrainian version of the map should actually a dotted line showing kind of more more of a dispute so For this hack day we were looking kind of exploring that and looking at other possible other disputed territories so we started with a data set of Disputed territories around the world narrow that down to ones that actually showed something interesting on different versions of Google Maps And built a site. So here are a couple of examples We see it most often. I mean most of the time Disputed areas are just represented dotted lines across like all the versions of Google Maps So we were seeing but particularly in cases of like India China Russia where they were disputes The Chinese version might be totally different from the Indian version and not really necessarily reflecting You know sort of the dotted line that this is a dispute it's a solid line This is this is our view of our fact only you know our our version of the facts It's kind of fascinating I mean granted I'm coming from a you know a US perspective and like Google Maps It's from a US perspective, you know kind of the the notion that kind of what is fact is kind of different depending on where you're at and Kind of enclosing a good one thing that's kind of fascinating looking at the Google version or the Chinese version of Google Maps is There's this dotted line that kind of like you know the very solid line of Google's work of China's borders But also ever a dotted line kind of going down like down past the Philippines and around Taiwan and like definitely sort of like you know, this is ours and Kind of declaring ownership which is just kind of fascinating anyway a project URLs here and the code is up on github Hi, I'm Harlow. I'm gonna tell you about a project that we did. It's called Keebler We built an ad hoc network big whoop, right? However, this is actually a story about remediation and so I'm gonna talk about something that inspired me to to do this Back in the aughts. There's a technology that was called whizzy digital courier. It came out of South Africa and parts of Zimbabwe and its purpose was to connect Schools when the internet went down with the capital I as it often did and the way that it did was by There was a router in a truck the truck would drive up to a school The school knew how to associate to the router it would put all of its email and you know extra data onto it the router then would Send this the school's network, you know the email and correspondence that was for it Drive off to another school lather rinse repeat And this was actually a really really great way of shuttling around information when the grid went down So we decided to use how or the goal was how can we creatively using tools at our disposal? Remediate such a network. So Keebler is actually if you hear a play on words If you can imagine a router like this that like sits up in a tree But like they're like all these elves inside that are actually, you know passing your data around So it yeah back to remediation on this box is a piece of really awesome software called open Wrt open wart which is pretty much a unix os for a router and What we did then was we actually combined that with the idea of a bulletin board So, you know how there's public messages all of the messages are encrypted. Maybe if you know Adam Langley from Google's Project called pond this might sound familiar to you Some of the met all the messages are encrypted the ones that are for you you get the ones that you aren't for you Well, you can't really do anything with them and then we threw a third thing on top of that which was git Well SVN or whatever and actually via other packages that allowed for syncing instantaneously, so all of the stuff that was on you know the in the repository was shared Immediately in synced between mobile clients that were connected to our network. This was really really fun and really easy to do Here's the team the five of us here are Current or former night Mozilla fellows. However, I want to give out a shot out to Hazen who is a interloper Independent journalist from Turkey who while she was not a coder looked at exactly how easily we mobilized and put together this project in such little time and Thought you know, I could bring this home and just with a little knowledge We can just share the knowledge of how easy something like this can be. Thank you So awesome now that your brains are full, thanks very much and Ethan's gonna bring us to a close Wait a second. Don't go anywhere. I First of all, can we get a round of applause for the amazing folks who just presented doing this? Look this job that I have here is about as much fun as you can have and be employed by an academic institution And it's because we get to work with people like this all the time It's absolutely wonderful and as Rahul has been really good about saying on this you've got the opportunity to work with them as well Everybody who presented here needs help on these projects needs help on trying to bring them further Please find people over lunch after the conference push things further Let's also get a round of applause for the guy who not only put this session together But has a hand in something like 80% of those projects Rahul Bargav Who's just done an incredible job of really picking into an extra gear the research we're doing here I couldn't do this job without him. It's amazing to have the chance to work with him. Everybody we've got