 Hi, hello everyone. Good afternoon and welcome to the Berkman Center's luncheon series. My name's Felipe Hoyser. I'm a fellow here at the Berkman Center and it's my great pleasure to introduce you to Tim Davis. Tim is a good friend. He's also an affiliate to the Berkman Center. He's a former fellow, master in science from the Oxford Internet Institute and a PhD candidate at the University of Southampton. But more than all this, Tim is also a great guy and a very enthusiastic advocate for open government, particularly when it comes to civic participation and access to public data. Tim's talk today is going to be about open government and open data more generally. And it's a topic that has not only captured the attention of sort of academic audiences, such as this one, but also the attention of civil society organizations, politics, international organizations, and even private companies, all which claim to benefit more or less from more open and accessible public data. So probably one of the reasons why this sort of weird coalition of transparency and open data advocates is so diverse and at the same time big, it's because of the announced impacts or potential outcomes we are supposed to expect from open data implementation. Linking open data to transparency, open data and civic participation, accountability, improving public services, and even leading towards some kind of economic growth. Among all these apparently sort of virtuous correlations, Tim will be addressing today perhaps one of the most interesting ones, and that is to ask whether if open data efforts have or can reconfigure power relationships in the political space. So as Tim has mentioned in the past, the promise and the reality of open data remains wide. Reason why it's great, Tim, to have you here and have the chance to address some of these promises here within your family of the Berkman community. So before giving the floor to Tim, also remind you that this lunch series is being streamed and also it will be recorded. And also let you know that there's a conversation taking place on Twitter, so you're encouraged to use our hashtag at Berkman. Also you have Tim's Twitter account, so you can point out to him as well. Questions will be addressed after Tim's talk, which will last about 20, 25 minutes or so. But also you can address your questions via Twitter and Sarah will be picking up those questions and comments during this session. So welcome everybody, Tim, all yours. Thank you very much. Thank you so much, Felipe. And so in this talk I want to try and synthesise a number of different ideas I've been working on but really use this an opportunity for discussion with you. So I will, as Felipe says, spend about 20 to 25 minutes going through some thoughts and work recently on open data, politics, power and infrastructures, but really keen to have your questions and discussion around this. And I want to start off with a few thanks firstly to the Berkman Centre for hosting me as a fellow after over the last year when I was working on many of the ideas in this presentation and to the new economic models in the digital economy project who gave a grant to support my time here. Also, very importantly, I've had the pleasure over the last couple of years of working with the World Wide Web Foundation on three projects which feature in this presentation and that was funded by Canada's International Development Research Centre. And I'm building really on the insight to many of this research community who I've been the research coordinator for and lots of this learning is as much down to them as to my own research. But the ideas in here are mainly those that I'm working on as part of my PhD work at the University of South Hampton's Web Science Doctoral Training Centre who have also provided a really good environment over the last few years for exploring this work. So to give you a brief overview of what I'm going to try and cover, I'm going to try and tell you a bit about the framing of this discussion. Look at a brief history of open data in its evolution and then look at a standard model. How can discourses of open data tend to operate before trying to question and unpack and end on an idea of information infrastructure, civic information infrastructures and the potential that gives us to reimagine what an open data agenda might be. So over the last five years I've been following the spread of open data policy and practice initially about in the United Kingdom and increasing the cross-country context around the world. And I've been approaching that not as an advocate of open data per se, but as a practitioner and researcher interested in the quality and the inclusiveness of civic engagement and governance. And so I'm taking this as a starting point that it's both normatively and consequentially desirable that those people who are affected by decisions have a role in shaping and taking those decisions. And the empirical question then is, are there configurations of practice around open data that either are equally normatively desirable, they're just good in and of themselves or that provide consequences that lead to the kinds of civic engagement we need to see that empower people and so on. And so as a result, I've been finding in light of the grand claims of the major open data, I'm increasingly engaging more with a series of what I put as null hypotheses so that statements that we might want to seek to disprove either empirically or practically. So posing these challenges of open data research in terms of these null hypotheses moves us from the way much open data research happens right now, which is anecdotal evidence, stories of occasional impacts that data has had to ask, is this scaling up? Is this delivering on promise that justifies the kinds of investments or crucially not only can we then look empirically at this, but we can take an action oriented approach and say, can we see interventions that are more likely to falsify these statements to secure the kinds of change that can deliver them. So to just give a brief overview of methods, I'm going to draw on four projects in what follows. The first is the open data barometer, which is an expert survey of over 75 countries looking at the policy and practice and implementation and impact of open data, secondary data and survey data. Secondly, and he doesn't have a good logo because this is my own PhD work, a policy analysis of six countries based on documentary evidence from those different contexts trying to represent a range of social, economic and political settings where open data is playing out. Thirdly, all those people you saw on the slide before the open data in developing countries research networks project, which has supported case studies in 12 different developing countries to look at the use of open data and how it's being used in particular domains, ranging from budget data to higher education governance and so on. And fourthly, drawing on my own participant observations as an action researcher, working on open data standardization processes, so things like the International Aid Transparency Initiative and most recently, the development of a data standard for public contracting where we were working on the technical development and that gives rise to much of the thinking on infrastructure. So let's go to a history of what's happened in open data. There are very many different routes to the open data movement, different groups who've had an interest in open data and that's one of the things that makes it so fascinating to look at. One of the key routes of open data can be found in the civic technology movement and you may recognize some of the people on this slide here. This is a photo from one of the key meetings in the evolution of open government data in Sebastopol in late 2007 that brought together over 30 open government advocates to develop, as they put it, a more robust understanding of why open government data is essential to democracy and to set out a set of fundamental principles for open government data. So this meeting combined both traditional ideas of open government, access to information to records and so on, with ideas from civic technology, the idea that we can build tools, platforms and services to improve government. Many of the people at this gathering were building civic tools but were finding they were frustrated by the lack of access to government data. The data they needed was either not being published at all or was only available for a charge or under terms that restricted your ability to reuse it. Things like the Postal Code database in the UK that turns an address into a local authority area that lets you build stuff wasn't easily available. So you've got people here with an interest both in opening up government but also an interest in very specific data sets being open, not data in general, but specific data. Yet, as people like Tom Steinberg have noted, to advocate for very specific data sets doesn't have much policy pull, much policy currency. You need a grander picture, a grander piece of advocacy to take to officials, to take to parliamentarians and government. And so this movement becomes broader and becomes an advocacy for all government data to be made open. It's a broader claim but it's more likely to succeed than picking off data sets one by one. And that strand of civic technology thinking gets joined by another strand, particularly in Europe around public sector information, large information industries who rely on government data to build products and services ranging from transport services, weather services, mapping services and so on. Now, you get a comparison from the 2000s onwards between and even earlier between the US regime where federal data is freely available and European regimes where this tends to be traded, data government treats it as a revenue source. And this is leading to an argument the restrictions on the sharing of state data are both amounting to a double charge on citizens for this data. You're paying through taxes to collect it and then you're paying again in the services. And also to an argument that this is creating a dead weight loss to the economy by not providing data openly. People aren't able to innovate on top of it. You're leading to an under use of data and as these products are valuable further down the value chain, you're harming economies as a whole. And so this argument finds some support in civil society but particularly in both those large public sector information industries and the emerging small medium enterprises, startups, others who are looking for low cost access to data to innovate on top of who want that data to be available. And it's interesting though to note that the interests of those large industries are not necessarily in data being free. They're in the data being uniformly licensed and under consistent terms so that it's predictable how you're going to access it. So you've got differences of the exact interests in these groups. The civic technologists maybe need free access to build what they're building, different interests here. And lastly of the strands I'll touch in this abridged history, you've got a political crisis happening in a search for political legitimacy. Now there are many more strands I could draw upon here. But the trigger in many ways for the open data policies of we've seen them today might be seen in the late 2000s where in the US President Obama is coming to power following concerns about state secrecy in the last era of Bush's government and in the United Kingdom. There's an expensive scandal over parliament hearings, expenses including spending money on duck houses as the photo there shows. And this creates an environment in which governments are grappling and seeking for policies narratives that can stem a democratic disaffection. And they draw upon these ideas of open data portals that have emerged in a few cities already and frame that and launch that and that gets launched in 2009 in the US and in 2010 in the UK and many other countries following with the data.gov styles of website. And this sort of proliferation of activities converges these different strands with their different interests converge around what I would call the standard model of open data and it's that that I now want to turn to explore a little more. So how is it that those groups come to work together? And in many ways it happens as a result of a big tent idea of open data defined around three simple properties. That's just to show briefly in the open data parameter we look at how rapidly open data spread around the world in 2013 50 percent of the countries in our sample have some sort of open data initiative or policy. This isn't just restricted to the US, UK, Europe it really spreads globally again enabled by this this standard model framing. And that really says that open data is about three things. It's about data being proactively published, being machine-readable and being legally reusable. And that gets applied in three core ways. Firstly it gets applied by saying proactively published means put it on a data portal. Set up a website where data is published. It's interpreted by saying machine-readable means care about the format the data is in, the container of the data. It doesn't necessarily look inside the data to say how is it structured and shaped internally. It says we prefer CSV files to PDF files and those are the sorts of metrics that are being used to measure the implementation of open data. And then it says we want legal reusability and that comes from the explicit application of a license. So these three things become what's written into policies around the world as part of what open data is, or at least is in the policy advocacy around the world. I'll come to whether it's actually in the policies in a moment. And it's worth noting that this creates a fairly binary definition of what's open data and what is not. Because the licenses that say this must be free for anyone to reuse without restrictions mean any data that's personally identifying or even quite closely derived from personally identifying data cannot be placed in the open. Now if you think about many of the systems on which government policy is made inside the state, those systems rely on personally identified data or the analysis of it. So this by definition means the data that makes it outside of government as open data cannot be the same data on which policy is being made. But in setting up that binary, we perhaps restrict our space to have a conversation about data that policy is made on and what access should be allowed to it. And the last element in this standard model is the theory of change it works on. It's a use-centric theory of change. It assumes that data comes outside of government, there are some sort of intermediaries or technical applications and then some action will be taken that results an impact. And it's kind of this domino effect that we have those things, we set it up and it falls down. Now in the Open Data and Development Countries project we've elaborated that model and found there's many many layers in that domino chain and in many countries and contacts a few of those elements are missing an absent and that can really frustrate attempts to retrieve, receive and achieve impact in this sort of theory. And it's also notable that the kind of behavior change or the kind of way that impact will happen is under specified in that standard model. Is it going to happen through competitive politics, media being the user who take data and get people to change their voting behavior and exercise influence politically or is it going to happen through co-production or markets for public services making a more efficient way of people selecting schools, hospitals and so on. And by focusing on this narrow, narrow modeling of what Open Data is, these are the deeper questions about what kind of change are we creating go unanswered. However, before exploring that point more I want to jump back and just look empirically. Is that standard model in the advocacy translating into practice around the one? At this point there are less pictures, more graphs but the slides will be available I think on the site so if they're not terribly visible they will be there later. So the Open Data Barometer project I mentioned lets us look globally at what kinds of data are being made available and whether they're being published as open data. And in general we see open data policies around the world are implemented through administrative action rather than through law. So the data that is published is at the grant of government. The power to choose what's shared remains with governments in that common framing and there are only a few exceptions where there is a right to data enshrined in law. The UK has made an amendment and this morning I was reading a draft of a Philippines FOI bill not yet law but that tries to go in this direction but in general this has been administrative action. And if you look at the blue bars here this is based on a sample of over a thousand data sets each based on a 10-point scale that we kind of weight together an openness score as it were. You see the data sets that have opened are much more likely to be those from departments with established data handling practices, census education environment but those kinds of data that our civic activists were looking for things like mapping data, spending data, further down that list of actually having been provided as open data and some of those things you need for the hardest edge of accountability like company registers, land ownership data and so on are amongst the least likely to be available and open around the world. And the red bars are then indicating how many of the data sets in each category meet the standard model idea of open data being machine readable proactively published and free. And given about 50 percent of those countries in this sample have some sort of open data policy it's quite surprising that very very limited number of those data sets actually meet that definition and that got me curious to explore what's going on there. Is it that it's just early days for open data and people are struggling to publish or is there something else in how the advocacy is getting translated into policy? And you can read into this kind of graph many of the daily power struggles that take place around opening up data in government. So statistical agencies are up there with good availability of data but very low open data publication because they're broadly independent from government. Those administrative policies don't bite upon them and their culture is perhaps to interpret analyze and give people analyzed information not raw data and information. The mapping agencies where they do provide open data share it quite quite actively under these open terms but are often very concerned about losing their ability to charge for data because then they become a line item in a budget rather than revenue generating and able to control their own futures. And so in this shift to open data there are many different power struggles playing out. And importantly as I mentioned earlier we need to look not only at the format of the data but look inside the data sets and this is just one of the many reports in the open data in developing countries project from two partners in Brazil who look at whether the Brazilian transparency law on spend and budget transparency is actually being applied in practice at the content level. So there are data sets for most state capitals but when you look inside them you find there's many things missing around the nature of spending the sub functions who the beneficiaries are that are there in law supposed to be provided but by just looking at the container rather than the content we're missing whether this is really opening up access to the kinds of things citizens need for scrutiny and accountability. So this unequal application of elements of the standard model comes out when I look into those policies in each country. What they did was to look at timelines policy timelines for each country looking at how their their agendas have emerged over this kind of period obviously these are initiatives starting at different points in time and then I looked into the different components that governments said they were going to implement and the different goals they set out for their policies. I'm not going to go into the goals work too much here but but one key thing that's evident in that is the initial launch of policies particularly in the US, UK and India is framed in democratic terms the later development of those documentaries frame much more in economic terms of what business can do with data and that shift is is quite notable and something I'm keen to kind of monitor in the more recent entrance to the to the open data landscape but when I look across the policies and what components they create that one standard model piece the portal is up there every country's got a portal but actually these other terms like specifying license explicitly not charging for data are only in just over half of these cases that I look at now obviously this needs to scale up to see more countries but it seems that advocacy that standard model isn't actually what governments are doing in their open data policies they're blending open data policies into existing agendas and the area that caught my interest most was was these points at the bottom here around an emphasis on high value data sets picking which data sets we want to make sure are providing an open and using open data agendas as a chance to rethink data infrastructures rethink the way government data operates so we had maybe many many years of government programs building up the data systems of the state and in this open data area we're seeing a move to rethinking those those government data infrastructures and trying to reshape them that's particularly articulated in Denmark in their good basic data program and in the UK's agenda for a national information infrastructure and what I want to argue is that if there are an important power struggles to be had in which data sets get released and and that has significant consequences then there even more important power struggles and debates and discussions to have around how the new infrastructures of state data handling get created how those are designed will lock in practices for the future around what kinds of policy options are open who can engage in that policy making and so on and what spaces there are to shape these infrastructures so infrastructures themselves are things that are generally invisible in our day-to-day life we don't think about them in the everyday in their classic tech balcony and star put this as they only become visible during breakdown or perhaps as as the pictures during their construction when infrastructures are working they're embedded and sunk inside the other social structures of our world they set the frameworks within which actions take place and their malleability tends to be pretty limited because they're linked to many other things there's not an effect of what they interact with and but at the times from which they're bill it being built and created there are chances to rethink them not from a blank slate but potentially within some new paradigms and open data itself is involving many different infrastructures from the open data portals that are making it easy to publish data outside of departmental websites but that fail to facilitate the linking of policies and practices to data sets through to the data standards that are rendering different data sets compatible I'm going to focus in particular on those data standards in the last few bits of this talk and there is the obligatory xkd cartoon at this point open data brings with it a wide range of standards so you've got the google feed transit or the general feed transit specification that lets your mobile phone pick up when the next bus time is through to data standards for budget the international aid transparency initiative is one of the most successful global open data standards it's taken five years to develop and recently projects around open data standards for contracting and each of those standards defines what can and can't be rendered in tropical what can and can't be rendered known about the kind of aspects of the state that they're presenting so when we look inside the data set we see it's not just what data sets released it's what's inside it how it's shaped who it's who that structure makes it who it is easy who finds it easy to use that data given the structure it is in and so forth and the xkcd insight here is key when you talk about standardization they tend to proliferate but in the open government space we're seeing a very interesting role of of of governments in setting standards so you have things like the open government partnership which is encouraging common adoption of standards and things like the g8s open data charter which again edges towards encouraging governments to be more interoperable so that standard proliferation might also be met by a political process that encourages a rationalization of standards again linked to this flowering open data agenda and so often these open data standards are treated purely as a technical issue not something that critical scholars or policy makers or others should engage with just there to make some data sets talk to each other but as Gaston John Palfrey have noted in in their work on interoperability the questions over what we make interoperable and what we make non-introparable what diversity we allow what uniformity is created can have really quite considerable consequences so I'll use one example to to explore that from the open contracting work I've been doing recently and in this project we were faced with the challenge of creating a standard a data standard for contract disclosure and we started by looking at what governments already published and right now many governments do have disclosure of tender information in order to get people to bid for government contracts and they have logs of contracts that have been issued but the tying together of those is often not there and when you engage with the users the people who want to access this data particularly from a civil society anti-corruption point of view the tying together of those two is really important you want to know how the tender relates to the contract so we could have had a standard emerge that says well here's a way of standardizing your tender data here's a way of standardizing your contract data and not making that connection but the technical intervention of standard setting of infrastructure building provides the opportunity to try and link those two things together to standardize not the contract but the contracting process and those decisions that are being made are interesting to explore the tensions within them and some of those tensions get down to the technical level down to the level of do we standardize in a flat format a spreadsheet format like CSV or a structured format like JSON which allows extensibility allows people to describe the world of data with their own concepts and ideas as well as sharing in those concepts provided by government and there's a real balancing act in the development of these standards to do that and what we found from the international aid transparency initiative experiences these standards don't just become things that shape the data government puts out to the world they shape the way government potentially handles its data internally over time they feed back into government they're a shaping of systems not only an outside space so that standard model theory of change that says the impacts are kind of linear about the release of data and what people will do with it misses much of what's going on which we term in the open data in developing countries work the ripple effect where these changes shift back inside government in how it is working with and handling data so in the open contracting model we try and provide a way of releasing data at each stage of a process narrowing the gap between the thing that government needs to tell the world in order to do its business to say there is a tender here there is a contract being issued and the data that people want for transparency and accountability purposes and this contrast from the classic kind of data dumping approach that says what we're doing is getting data outside government putting it online to say we're rethinking the way data is shared open by default not just being a data set an imagined data set being open by default but the systems being opened up by default now of course we've just only just kind of launched this particular project so it's a question of how far this works in practice and how we adopt it and but it's at least one attempt to engage with those null hypothesis of saying that the open data impact have not necessarily scaled by thinking about how do we design the kind of impacts that can scale because when you play out that standard model and try and think how would that scale we don't see that the the pen essential is necessarily strongly there so coming back to these I'm not sure we can necessarily disprove these statements yet I don't think we have the empirical evidence to say that we're seeing widespread civic engagement from open data and I think if we conceive open data in that narrow way we probably won't see those impacts but there is an opportunity to see the opening of data as part of a broader process that involves a rethinking of government's information infrastructures in the direction of openness and that creates greater opportunities for civic change equally however I think there are many threats to that civic potential that shift from framing open data in government policies as a as a civic thing to being something purely around economic growth and commercial reuse means the partners the people who are working with government on this standardization are more likely to be those private sector actors and if the data sets are shaped in the direction that serve their needs only and don't serve civic needs we may end up really with a lock-in of infrastructures that don't serve the public good and I think we need to be really critically conscious of that so I'm going to end with three thoughts around reimagining what open data might be but what I'm really keen to do is get your thoughts and reflections on where you think things might head the first of these is the need to move from decontextualized data taken from departments and thrust onto data portals to putting data in context and that means two things one it means a cultural change inside government to handle data close to the departments the agencies the spaces where it's generated and share it from there rather than abstracted into portals and secondly it means focusing on data sets that need to be released together so the UK Labour Party just released a report yesterday on digital strategy that argues the publication of spending data alone in the UK played into a narrative of public spending cuts and it was not useful without data from public sector performance alongside it so they're arguing that we need to link those two things together and think about the contextualization of the data set in the wider politics and the contextualization of data points with those that need to go with them secondly I want to argue we need to move from from what I call epiphenomenal data data that's just coincidental to the to the action that's taking place to much more active data the data that is shared being part of the process of government not just an afterthought that's shared to let you maybe account or observe after the fact so the contracting example is one example of that you could imagine that the contract is not legally issued until the data is published that says it has been issued then the gap between the act and the data narrows and we get the greater ability to have accountability we historically had public registers public records that were part of enacting things into being and our data isn't yet doing that our data isn't after the fact still in the way we're operating and lastly is this call to move from from an idea of raw data now to designing inclusive information infrastructures think about who's a party to that process of design who gets to be involved in shaping these systems for the future because if we don't think critically about that we may end up with infrastructures that don't really serve our civic need in future so on that note I'm going to draw those thoughts to an end with 23 minutes I think of input and yeah really welcome questions and discussion so I think there's a mic that will go around yes and we can just take a couple questions and see hello there's a switch on there so let it on it's on um so there's the standardization process is clearly key to to what you're talking about here it seems to me so the way I understand technical standardization in other areas is that different companies providing different technologies push their own standard they of course argue that it's in the general interest but it's but they're pushing their own agenda in the sense that it obviously it improves their sales but it also it's about how they get an edge with with the whole hinterland the the entire operations that that that goes into producing the technology and maybe the the people who would speak for the general public interest are not at the table now it seems to me what the the what you're describing here is slightly different is it is it not yeah so the there isn't much technical interest in here I mean if it's CVF well PDF I guess adobe would like it to be PDF but I mean that that that doesn't really matter so also it's it's very unclear what the political interest is here so how does that work how does the dance around standardization work here yeah so I think that's a key question and one that's not being explored anywhere near enough so I think we have got the political commitments coming first often so things like the international aid transparency initiative were governments committing to share data better with each other so it's often around governments identifying a data sharing need or the open government movements kind of political pressure creating a demand for disclosure of budget or disclosure of contracts and so on after which comes a desire to to standardize to create some sort of standard way of sharing that and I think you're right there's not necessarily commercial partners immediately in that conversation saying this is about interoperability of our product it's it's relatively niche now there are obviously people providing data systems to governments but the market for things like contract management platforms is fairly diverse I think around the world often governments build their own systems or highly customized something so there's not a a big commercial interest in shaping that standard it's much more happening I think at a very ad hoc level of de facto standardization of who's the first movers so things like the public transit one where perhaps there is more commercial interest that was the city of Portland and Google collaborating on a standard then through the market power of the intermediary becomes the de facto standard I think that is one thing we will see is is intermediaries in this space have a lot of power in de facto standardization by saying we will make that data visible in our platform if it's in this standard but a lot more of it is open and collaborative communities drafting stuff but in relatively under resourced and kind of ad hoc ways yeah some more questions I'm particularly interested in cities and I'm interested in how interested cities are and being responsive to this movement and opening up their data making it available to the public the public is running with it and asking questions of cities often that cities aren't able to respond to so I'm really interested in your last comment are you looking at how this backward ripple effect of data is changing cities are they in a muddle are they reorganizing because of this data pressure or what yeah what will happen so I have not looked at cities closely and specifically with a number of the case studies in the research network have and I think what we what we're probably seeing is at the level of technical implementers or people who are directly involved in the open data policy there's this really interesting two-way communication going on it's not just about getting data out there it's creating this space for for collaboration between those parties but almost accidentally or because it's the culture of those practitioners who are from an open source background sort of adopt that culture spend time with those other outside the city actors and there's a positive relationship there it's not necessarily in any of the spaces I've seen being designed for at the broader level of saying the person who has data on parks and public spaces who doesn't see themselves as part of the open data initiative isn't being encouraged to engage with people around their data so I think there's a there's a real gap there in the wider cultural change that says this isn't just about opening up a one-way flow this is about opening up a two-way flow because the relationship when you shift from an information asymmetry that says government has this data but come and tell us your thoughts and consult with us and engage with us but we're still holding the data ourselves to let's share this data let's talk about this as a shared problem I think is is potentially a powerful civic shift I don't know where the mic's gone but ah shall we come there's a comment just behind you there before we go over to the other side of the room Hi Nathan, Freda Sperkin Center one of the things I've observed is the power of the permalink in open data which wasn't directly referenced but it kind of goes into the idea of openness as practice or as a stream and not a drop so as a phenomenon I saw with the New York State Legislature that masseuses massage therapists physical therapists organized around some obscure piece of legislation related to their licensing once they could find a permalink to the bill and then inversely there's been resistance against permalink because the entrenched data managers are protecting their business model so in the case in New York State there's an institution that sells legislative data and their system actively denied permalinks through crazy URL obfuscation so I wonder your thoughts on the value of I guess the permalink and the stream versus the feed and I guess sometimes I think maybe permalinks could have a downside related to you know the threat of social media and out of contextness so maybe if you've seen that as well yeah so I think that's an absolutely key point on kind of the way the technical infrastructures the web infrastructures of this play out so in the open contracting stuff we've given governments a kind of five-star scheme to improve their data publications that says if all you can do is put your documents online do that but ideally get to the point where you've got a permanent URL for each stage of the contracting process because it becomes a referenceable public object at that point until then it's download this file comb through it find the line that should be there that describes this thing and it's not a public object that we can discuss I think then the contextual thing is is a question of of design and again seeing this not just as data but as redesigning kind of governance processes to say say it's a contract for example again something we do in the open contracting standards talk about it these releases of data but also a record that sums up the state of the contract as of now so you know on the page where you're providing the snapshot give people a view to also the overview kind of picture of what's happening in the context give people the ability to talk back so one of the things that's missing for example from the International Transparency Initiative Standard is informational if you do think that this data highlights some corruption who do you talk back to and by the time this data has traveled through lots of intermediaries you're so far from knowing kind of its context that you can't get back so I think it's yeah designing context into our data and using those things like permalinks are really crucial so do you want to go along the back and maybe take a couple of questions on him yes please do so we've got a question from the live stream so this is from Andrea do you have a definition of civic hacking if yes what role is it supposed to play in open data that's a jolly good question I don't know that I do have a working definition of civic hacking I think if I were to be creating one I would want one to be broad enough not just to be about things that involve code but about things that involve all sorts of engagement with information about local communities and rethinking how those communities reimagining how those spaces should work I'm sure other people will have good definitions of civic hacking so if anyone does want to come in with one please feel free there was someone at the back here who had a question just behind you there hi so I'm really interested in your in this shift that you observed between from rhetoric of democracy to rhetoric economy I think that's fascinating I'd love to hear more about it I'm wondering if there's if one of the dangers there is that once kind of like free becomes a business model is open becoming a business model that governments can essentially use and are we sort of moving towards this sort of neoliberal approach to openness as good for the economy and what does that do for the for the sort of overall health of democracy yeah so I think that the key person I would definitely recommend looking at is the work of Joanne Bates on this who's written a great PhD thesis on the the UK's kind of open data policy with that critique of its neoliberal sort of turn and to actually this is enabling marketization of public services and and focusing on government data essentially is a subsidy to all sorts of private industries I think some of that policy shift so there's this big tent argument in many open data communities that says because we got we wouldn't have got anywhere if we didn't have those different strands of thinking that I mentioned earlier on in the history coming together one of those groups on their own wouldn't have got this sort of policy change therefore we've got to keep all these groups together and the interests of one serve the interests of the other I guess my argument tends to be that only goes so far at some point you have to think about what design of policy serves the civic interest and the the economic interest and that that comes down to that level of how do you prioritize what data gets released how do you shape the data that is released many data sets are useful both to public and and private kind of good arguments but it then matters how that data is shaped and who's getting involved so one of the challenges in say the the UK context is there's an open data user group discussing what data should be shared but it's predominantly private sector involvement in that group getting public sector parties or civic public interest parties to to participate we don't have those sort of civil society organizations established to get involved in those debates and discussions enough so I don't know that got to the heart of your question though but let's follow up on that more yes come in Hi Anna Coe from the Human Rights Program you sort of suggested that by definition the open definition excludes personally identifiable information but that doesn't seem to be a feeling that's necessarily shared across the open data community and for example it seems like public registries are a a new frontier and I mean you have the famous example of you know like a public gun registry being digitized and then mapped and so you know we hope your neighbour has a gun or if he's a cop or something so how do you have the kind of nuanced discussion around creating the type of inclusive information infrastructures that you talked about when the mantra is so strongly open everything you know open is good how do we have these more nuanced discussions going forward so I think you're right to point out that yes it's it's a misstatement to say the open definition excludes personal there's often a contested idea of what is personal or public record and the register's example is a good one where previously we kind of managed that by two-way visibility you could see the person who was requesting the register and that person could see the register or there was a there was a scope within which it was shared the kind of boundaryless nature of open data means this quickly these registers crossover into other uses and I guess one way of dealing with that is to allow ourselves a broader language around openness that doesn't just say it's a binary it's open or closed but says we've got different sets of responses that we're designing and thinking critically about working within an open by default framework where we say we should be erring towards openness but at the moment we can see that there's it's either open by default or it's not open rather than it's open by default or it's being shared under these conditions or it's not being shared and we understand the reasons why or it's not being shared and we should be using RTI legislation or other things to get it into the open so I think yeah I've just been having a Twitter discussion this morning with a number of people around the need for much better language in this space and I think some of that ties into the fact that the language has been appropriated for many different things so an open language has been appropriated to talk about data sharing and private data sharing and other things which has really clouded the picture come to the question at the back and then the two questions down here first or come here first and then we'll come to you I'd like to drill down a little bit more on sort of the the language of economics and the public private dichotomy here with a a specific local example in the transit space there's a company called bridge are you familiar I'm not there okay there are pop-up public transportation company they've observed that public transportation runs fixed routes that don't change as fast as the flows of urban population have ingested all the open transportation data they can and are seeking to run you know on-demand jitneys you know buses small buses for hire and as you can imagine they're running to interesting you know regulatory problems et cetera though there I mean there are cities in the area they're saying yes please come and cities like Cambridge where they're worried about very congested bus stops and adding to you know congestion I mean there are there's two issues here one I mean they are you know cherry picking from actual public transportation which isn't really an open data problem but the other one is this was a fascinating interaction apparently the bridge folks are you know reasonably good people and they offered the city of Cambridge all their data to help them regulate this you know we're a pop-up you know we're a pop-up company you know you want our real-time data so you can regulate us in real-time you know have at it it's it's what we owe you as a civic obligation and of course the city didn't know what to do with that because the sort of you know skills and infrastructure to handle real-time you know data is not something you know any you know the city thinks of itself as a data producer not a data consumer so I'm I'm wondering about your thoughts on that I mean is you know in the regulatory space you would want you know you would you would sort of want to flip things around and have the you know city you know be the open data ingester of data and are you seeing any sort of trends towards that or any way of balancing the skill set and infrastructures at that level that's a really interesting example I I I don't again at the city level I haven't seen much but I think the the key idea there that governments need to think about two way flows of of data and information is is a really key one like governments are not often set up to to take that in slightly tangentially example but there was a the release of the public transport database in the UK show where all the stops were found to not be in the right place this is told as a story of open data showing errors and helping cities correct it but as far as I know very little of governments managed to reabsorb all those corrections that were made on open street map and actually now put out corrected data because there's no mechanism for those feedback loops and I think thinking about how those are created and designed is is really important as he's thinking about those flows of data into government being perhaps open data flows not just private flows so the one case we've got in the open data in developing countries research network in India looking at extractives industry regulation and saying here's a setting in which government demands off the private sector massive amounts of data to regulate but what's happening is it's demanding that data it's collecting that data then it's not really knowing what to do with it it's just handing it up to the next level of government if you were to make those flows about private sector is mandated to disclose this data government then absorbs it you create a space for other actors to also say we could use that too to think about planning transport in the city or planning around extractive industries so I think that's another really interesting area to explore is what are all those places where governments are already collecting data off the private sector in closed ways to republish it later on but where what they collect at source could be more shared let's come to the question at the back because I know you've been waiting I have a question on the intermediaries the sort of emerging intermediaries my work involves power and money only the money power is held by institutional investors and I'm particularly focused on civil society institutional investors which in Suffolk County alone the top 65 collectively hold over 54 billion dollars that's a lot of money sloshing around Boston that nobody knows how to access but a couple of few of us do it's open information it's not machine readable but it's in the IRS form 990 holdings so I'm involved with a project that is trying to look at the way in which civic education can both map monitor and manage you know that data in ways that are inclusive that bring citizens in so that they can say hey wow look there's all this money around where is it going why do we have to have a gas tax when we've got pooled assets that could be leveraged for creation of infrastructure so the first part of the question is I think it's not just the government it's also civil society and in the corporate world right now there's a huge amount of effort going on in terms of standardizing corporate disclosure but widening that in terms of integrated reporting so it's not just financial data it's environmental social governance and economic data as well that means I think that as the platforms get created in terms of tapping into data that's already there some of which is crap because it's just PDFs that have been thrown up you know on a website somewhere some poor person has had to you know scan it and put it up there my focus is on the emerging educational infrastructure that will be needed in terms of those intermediaries that can not necessarily that are not necessarily dealing with code but are dealing with citizens who need to rediscover the voice and power they already have so what have you thought about in terms of characteristics of those entities that can be similar I suppose to consciousness raising groups those of us women in the room of a certain age to help both sort of reveal you know what's there engage those who may feel alienated and empower you know plain people to understand the data to digest it to learn about how whether it's reliable or not whether it's you know phony yeah and then what are the implications for improving the quality of civic life yeah so I think that's that's that's absolutely key to look at those intermediary patterns and I I'm grappling for a better term than ecosystem but I think I'm going to have to use that term it's not just the individuals but it's how they fit into a wide ecosystem and I think there's something really interesting about the design of the data originally how easy to use without having a code of is it then what those intermediary platforms look like because any attempt to take data and mediate it changes what possible things you can do with it so you know if you've got direct action you've theoretically got more ability to explore no discoverer and more power as a result when you have that intermediary they gain some power in that that that value chain and then what kinds of organizations can engage at the grassroots one of the things I should put these out earlier there's some findings from the open data in developing countries projects I'll leave some of the flyers here and the links will be on the slides but one of the findings there was talking about the need to really capacity built with existing civil society organizations and those existing partners who have the relationships in communities and not to do that by saying next week look here's some cool shiny tools next week change your practice to use all this data to recognize it's going to be a long process it's going to be a number of years worth of process to say for example in budget monitoring groups in developing countries you've been combing through those documents every year to produce an annual report if we give you the data tomorrow you've still got to produce that annual report and now you've got to learn to use the data you're not going to use it in year one it's going to be year two three four before you can use this stuff effectively exactly and I think we don't have good enough methods for building those communities of practice and a lot more kind of work is needed and I think a lot of bridging of cultures of data practitioners and community practitioners is needed around that so question down here and is there anything else on the tweets okay great I just talked really loud it's fine yeah that's for the record it's for our role oh so I I come from sort of the hospital outcomes healthcare data world and could you go back one slide really quick and to me it's it's one of the fields that's going to take the longest amount of time to go from epiphenomenal to active data right because most of the things that we're trying to study the reasons why people come into hospitals 90% of the reasons why they come into hospitals are the social determinants of health and how you measure those is legion there's so many different ways that you could sort of go about doing that and I'm just simply curious in in this moment in time where is the talk of standardization within the health world like between countries internationally things like that or is there a dialogue at all at this moment yeah so that's when I definitely have to throw to see if anyone else knows the answers that I've not focused on health very specifically I know one of the challenges in the health space is the distinction between data sharing and open data and that's one of the areas that's become particularly clouded about what is part of the public record and what's kind of patient data that needs to be kept more tightly managed have you got a thought of a response to hand the mic back yeah just so a massive global problem is patient consents and this pops up in all kinds of places one where there's an awful lot of money behind the problem is in genomics research and there is a working group which has created a model policy for that and we me and researcher at the Berkman Center have shown how you can do that essentially in a way that will then map and be computable across all jurisdictions so the patient consent our guess is the most complicated problem sort of in in ascent and highly regulated across many places so my guess will be that through the prism of that problem much of the rest will come to me standardized any other thoughts from reflection so there is one more question from the live stream so this is from Cristiano Therrien and he's asking you may have kind of addressed this with the two way discussion but did your research notice difference in open standard policies between national and municipal relationships and data sets so I don't think I've gone in enough depth to to look at that yet there are some good case studies from the ODDC project looking at cities which I that question would get me to go back to and analyze again yeah no I haven't looked enough and that's that's a really interesting question to explore more so I think we've we've come to the the half hour so if anyone's got any other thoughts I'm happy to chat more and really keen to get your thoughts reflections on how we do kind of shift this discourse to to to talk more about the civic potential and how we really build for that but thank you for thoughts and inputs