 Thanks Nate. Hi everyone. It is a pleasure to be here. I haven't been invited all the way from Accra, Ghana in West Africa, first time in Portland. My talk is titled, When Data Collection Meets Non-Technical CSOs in Low-Income Areas. I know it's a it's a lot. But before I dive into that, I work with Open Knowledge International and School of Data. Open Knowledge International used to be Open Knowledge Foundation and what we do, we are a non-for-profit tech organization that works with civil society organizations to help them see the value of open data in their work, but also to help them build the skills and the tools they need to do their work. And also we help set up the School of Data Projects, which is an initiative to train journalists and civil society organizations to use data in their work. My official title is Africa Lead, but I like to call myself more of a data plumber because I zip around the African continent working with civil society organizations to build their open data capacity. And there's a lot of learning that comes with that. And this is what has led to this specific talk. When we talk about open data, there's this glamorous feel to it, data journalism, which has value. But how does that really look like on the ground, particularly when you're talking about low income areas? And how can we talk about the realities and the lessons and be able to have a real conversation about how to move this forward? If you want to be nerdy or more mathematical, my talk can be solved or described in this way. CSOs plus data collection minus income minus technical. I would like to acknowledge the open data for development network, which funds some of the work that we do, which is leading to some of the lessons that we are learning and obviously School of Data. You should check them out if you don't know about them. Okay. Before I go on, it's important to define civil society organization because we tend to throw that around a lot without necessarily having a base idea of what we're talking about. Some people think it's referring to NGOs. But based on this definition that is coming from a really good report that the World Economic Forum put out in 2013, it's far more than a mere sector dominated by NGOs when you're talking about civil society. They can be organized and unorganized groups, but they are spanning across different both online and offline. So you're talking about media organizations. You can be talking about individuals who are deciding to advocate for specific needs. You can be talking about faith-based organizations. You can be talking about research groups, but they exist in this space where they are not necessarily government entities, they're not necessarily citizens sometimes, but they have a crucial role to play and that's their broad definition that I'll be using for this. So when I joined the open data movement or when I heard about it, this was a typical value chain that I got to hear about. There's somebody who produces data, typically the government, and then there are some entities in between who are intermediaries who work with this data or who get this data to user groups that can be citizens and particularly the intermediaries tended to be civil society organizations. Who are not necessarily data producers, by the way, working with this data to turn that into insights for advocacy or some value to its users. But this is actually evolving, particularly in low-income areas where civil society organizations are moving to a data production role and this is because of two reasons. Most of the time, government does not have the capacity to produce the data sets that are relevant to civil society organizations for their work. Economic reasons just not a willingness to do so. But the second reason is also citizens or the public does not necessarily trust some of the data sets that governments are producing, which is requiring civil society organizations to serve as a data producer to help verify or to help compare the data sets that exist. Which is a great thing, but then the challenge that it presents is do civil society organizations have the capacity to assume this role. And over the past year, we've worked on projects that have presented this to us and we're learning lessons about what it takes. So this talk is going to share with you some of the things that we're seeing, not because we have a clear picture, but to expose the process that exists and then I would love to really hear from other people about your experiences if any of it exists and what you think is the way forward when it comes to thinking about this. Particular attention should be paid to low-income areas and the way I describe that is you're thinking about areas that are limited in terms of resources. There are definitions in terms of like money, socioeconomic status, but low-income areas does not necessarily have to be a poor region as defined by any world bank term, but it can also be a specific area in a rich or wealthy country that lacks resources and the applications can be across. So the first experience I'm going to talk about is our first engagement with an organization called the Women Environmental Program based in Abuja, Nigeria. They work on women health environmental issues and service delivery and they've been around for 20 years now. The value of this organization is they engage a lot with the community and they tend to understand the core issues. So we first engage with them in March 2016 through a project that came that they applied for and they got a grant for. So on our first call, John, one of the program managers was like, David, we are going to develop a structured questionnaire, print them in several copies for data collectors, take it to the field, administer it. When it's completed, we're going to return the questionnaires and they will manually input that into the computer for analysis. I'm like, wow John, great! To them, well-intentioned, data collection is something that they see as valuable. They buy into the whole open data movement and they want to do this. But then there's a problem with this. This is so oversimplified and to them it's something that you can easily do and in my mind, this is what I'm thinking. Obviously, you have to specify the data collection goals, which they are good at doing because they understand the community. Then you have to design the methodology, you have to design the questionnaire, you have to recruit the data collectors, you have to then train the data collectors, you have to collect the data, you have to store the data, you have to verify the data, you have to clean the data, anonymize the data, document the data, publish the data. But you can't tell them this. Right? You can expose this to them slowly, but you cannot tell them this. So the first thing that we did was we presented them a document that fortunately we had created at School of Data that walks people through mobile data collection and it's an easy-to-read document. I'll share, I'll show you that later on in the talk and gave them a week and a half to read through and this gives a picture about the process and what you should be thinking as an organization if you want to do data collection. Once they were done with that, the good thing was that they were already engaging with the National Statistics Office, which had taken the role of helping them think about the methodology, which communities do you work with and how do you make sure that the data you're collecting actually captures what you want. It wasn't going to be a representative data collection process, but it was supposed to highlight some areas that later on a bigger entity like the National Statistics Office could focus on and tap into. Then we went to the designing of the questionnaire. So it's like John, can you send me the questionnaire that you're working on? He did, he sent me something via email and I was asking John, can you share with me via Google Docs, big mistake because they knew about it, but they didn't typically use something like Google Docs. They didn't have a collaborative system involved. They were used to sending documents back and forth and that's one of the things we should pay attention to as we're going on with this talk. So then talking to them, trying to insist, it's important that we use something like Google Docs because it helps, but that still wasn't working. We did this for weeks until I actually went to Nigeria and then I showed them what the value of doing this. But after almost a month of working back and forth with this, we're able to design a questionnaire that thoughts through the flow of how you go about asking questions and how you skip and technically this is called a schema, but you don't tell them this is called a schema. You try to walk them through and I keep on repeating this is because sometimes and the reason why I say non-technical is we as technical people see the value of using technical words and see the value of technical processes. But when you have non-technical individuals, institutions coming up and being exposed to a process, it may not be the best way to throw these words at them. Once they get to the end and they have started the value then you can throw this in. But quickly to run through this entire process, there was a tool that really saved them a lot that ended up making this process a little bit more exciting. This is Kobo 2box. Has anyone heard of Kobo 2box? Okay, so has anyone heard of OpenDataKit? Okay, so OpenDataKit is a platform that helps in mobile data collection. It was built by a team of, I believe, from Harvard and some humanitarian organizations. And the whole idea of it is to make it easy for people to collect data using mobile devices. And that's a really great job to help you design that. It uses some really easy fun Excel-based formats to do this. But Kobo 2, OpenDataKit can be a little bit technical also. So Kobo 2box has been built on top of OpenDataKit, which makes it really easy to build your questionnaires, to collect data, deploy it, and I'll show that real quickly. So that saved us a lot in terms of collecting data, storing data, being able to clean data and publish it, and I'll go into that later on. But long story short, and I can show this real quickly is we managed to go through their process and the value was working with them to understand what they want to achieve, doing a lot of the technical work, which we've documented over here. So you get that by never told them about GitHub, but working with them to document the entire process, what are the project objectives, and a very important step was verification. So the idea of verification does not exist necessarily for them, but letting them understand why it's important to verify your data. So as you can see, we've documented what is the process, how many responses came from each community, how many responses came from each ward, a ward is to some extent a district, how many responses came from each data collector, and thinking through some of these validation processes is easier for them. And then documenting things that we changed, words that were spelled wrongly and documenting everything, because as civil society organizations move into the role of data collection, data should be able to be verified. If you present this to the statistics office, seeing that this is something that should be published, having that documentation process is important and teaching them how to do this in a non-technical way is very important. So we have all this documented so that when they present this, people can see what has changed, what do they collect initially, what has changed, and all that is here. The long-term goal is that they will learn how to use GitHub, but that's not something that we're putting out to them yet. This is the clean data that we eventually ended up getting. And again, we used open refine. That was mostly being done by us, but not by them because they feel that it's also a little bit too technical. So clean the data a little bit, show them what has been done, again through the verification and cleaning process. And then once everything is fine, we still keep a record of the in open refine, you have a history and a JSON format that still exists for anyone technical to work with. The long-term goal, eventually now that the clean data is available, they were able to go ahead and publish reports and a brief that details what they've learned, which is very valuable to them. And whenever anyone asks them to, can you show me what you've seen from the report? Because the data is already available, they can quickly generate something that is some value to them. So I'm talking about all these snippets, and I'll talk about some of the lessons that we are learning later on, but it's important to pay attention. One of the key things is data collection is complicated. When you're doing it in low-income areas, it becomes extra complicated. When you're doing it with non-technical CSOs, it becomes extra, extra complicated, but it's not impossible. We need to think about the best way to help approach this because civil society organizations are increasingly going to play a role, and we need to embrace them more than to push them away. The next quick example or scenario that I want to talk about has to do with advocates for community alternatives. So as opposed to the women environmental program, they're based in Accra, Ghana, they're less than a year old, a core team of four people, they are more tech savvy, and the reason why I say that is they use, they understand what Google docs are, they use technology, they are aware of that, and they had already been working on their projects, and what they do is they work with communities that have access to natural resources like gold, oil, and when that happens for a community, you have mining companies coming and saying we want to get by your land. Advocate for community alternatives, ACA, works with the communities to understand the value of their land and be able to provide them with alternatives for better development value. So saying that don't sell your land because maybe you can plant better crops that will bring you money, talk about the potential long term, determines the environment. But in order to do this, they have been collecting baseline data on community members. So they do that, what land do you have, how much money are you making from the crops that you're planting, and they do this on a six month basis. So one of the things that they mentioned was if we're going to do data collection, we need to have a way of tracking individual household members and how their livelihood is improving. And then my technical brain comes on and said, oh, you can use an identifier for each person. That is great when you are in typically high income areas where people have access maybe to even national identifiers like passports or IDs or even email that obviously you're not sharing that email, but in the back of your system you have a way of linking different data sets that have been captured at different times. When you have somebody who's living in an area that doesn't have an address, doesn't have access to email, doesn't have access, or is regularly moving away, that becomes a problem. And we didn't have an answer to that. We're saying, okay, maybe we can link their name and their date of birth to their household where they are, but that wasn't really working. And that raised the question, are we really thinking about ways that we can deal with scenarios like this? And when we talk about data collection, we make some assumptions, but we just had realized that we didn't necessarily have an answer to this problem of what is a unique identifier for an individual who needs to be tracked in your system over a period of time, but doesn't necessarily reside in a permanent place. Still looking for solutions if anybody has that. But yeah, for them, by the time we got in there, we had learned more about data collection in low-income areas, and they used Kobo Toolbox also, and we've been able to build a process for them that allows them to use this to in an easy way to be collecting data, and they really love it. So these are two examples that I want to share. There are more. One example that I'm not going to spend more time on is doing some training in Sudan, and a journalist was like, hi, I know we're working with government data, but I don't trust the government. Can I go collect my own data? And we said, no, not yet. But increasingly, you're going to have individuals who want to be able to do this, and how can we ensure that they do this in a way that makes sense for the context that exists? So one of the lessons that we are learning is we need to build simpler tools for low-income contexts. An example that I'm seeing is Kobo Toolbox has helped reduce the headache of going through that really complicated process by making it easier for individuals to easily create a questionnaire on a platform, use a simple mobile phone that doesn't take a lot of storage data, be able to collect data offline, and then once you get access to quick internet connection, be able to send it to a dashboard. Most CSOs cannot go out alone when it comes to data collection. So complicated that we need to think about potential partnerships. So working with telcos, media groups, national statistics offices, civic tech hubs, there are a lot of them popping up all across the African region, and they have the capabilities to either train or provide access to tools and resources that make it easy. Academic institutions to think about questions around the unique identifier that I just brought up. So that's one way that we can approach it. Internet. Access to affordable, reliable internet is a big deal. Currently, this is a report that recently came out from the Alliance for Affordable Internet, and as you can see in 2015, to get access to one gigabyte of mobile data, it's about 15% of an individual's income, and that's ridiculous. That's not affordable. And the reason why this is important is for CSOs to tap into a lot of these tools that they can use for data collection, they need to afford internet. If they can't, then it makes it extra complicated, and the collaborative process, GitHub, all these things don't really make sense for them. So way forward, for us at School of Data Open Knowledge, we're trying to build more tools that are contextualized to data collection for these civil society organizations. So we have one tool on mobile data collection that I shared with Web in the beginning. We're building more modules, and we're interested in working with more individuals and organizations to build this, so that it's easy for people to learn about the process and get jumping to it. We're embedding experts into organizations for three to four months to work with them. It tends to be different topics, but data collection can be one of them, teaching them the process, slow-paced, doing the work for them, but also showing them the value in how it's done. And then obviously, as I mentioned, make internet affordable and accessible. So Alliance for Affordable Internet is doing this, but also a cool part that is called BRIC that is a device that makes it easy for you to connect several devices to this hub that gives you access to internet. And I think the last thing I want to share is to document and share the experiences on the ground. Most of the time, we ignore the low income context and make it seem unintentionally that it applies everywhere, but we need to really think about this context and build tools, share skills that will apply to these. So I'll end with this. This is a quote that came from the, well, the Economic Forum report that I was talking about is, civil society has a unique role in fostering innovations. It has the ability to experiment, move faster than governments, and act as an agent of change. And let's empower them to do this, thinking about the context that they are in because in this time, in this era, they are going to play an increasing role also in producing data. Thank you.