 Hello everyone, I'm Phil. This is Makayla. We are developers from the Government Digital Service. You might know us first and just keep the noise down over there please. That's good. Right. You might know us for things such as computers that don't respond. It's always the way. There we go. That's how slides go. You might know us for things such as Gov.uk. I've shown you hands. Who knows about this? There's a few people, maybe about a third of the room. This was us celebrating their third birthday a few months ago. This is what some of us look like. You might also know us for a data.gov.uk. Who knows about this? A similar number of people. Today I'm not going to talk about either of those things. We are going to talk about what we're working on, which is registers. To get into that, I'm going to start by saying let's talk about countries. Does this look familiar to anyone? A few nods. I pulled this from a data set from Companies House. Companies House is a UK organisation that deals with setting up and disbanding companies. In that data, there's a field for countries. This is some of the values that you get for Scotland in that field. There's Scotland and Scotland and things like that. I could have taken it from any number of data sets. I'm not picking on Companies House here. The point here is that friction caused by badly curated data makes it hard to build digital services and do analytical work. A lot of what the Government Digital Service does is around building digital services. Good data, clean data matters to us. If we want to fix this, we need to decide what form is correct. To do that, we need a canonical list of valid names. Let's try to find one. I went ongov.uk and there's a page to apply for these application fees. There's a list of countries and territories. Then I go to getting married abroad. Here I need to choose which country or territory I want to get married in. It's not actually the same list. It's a PDF of ISO country codes that I found, which has a list. Again, it's a slightly different list, which also seems to think that the internet is a foreign country. These lists don't even agree on the number of countries, let alone the right spelling for these countries. It turns out that one of the lists published ongov.uk was authoritative. One of them had weight behind it. It was published by the Foreign and Commonwealth Office, which promotes the United Kingdom's interests overseas. Supports our citizens and businesses. They lead on the government's list of approved geographical names. They've been publishing a range of country-related lists ongov.uk for some time. It used to be published like this, which was a page rendered from a CSV file. You'd always have to remember to check whether or not there was a new version available. The same list was also published by another related group called the Permanent Committee on Geographic Names. There was another URL for the same data. There were several versions of the truth. This is what we've built. This is the new country register. Let me see if I can get that away. It's curated by the FCO. It's provided by the Foreign and Commonwealth Office. This is the list of countries according to the UK. There is no other list now. This is the canonical one. You always know you've got the latest information, and I'll show you why that is in a moment. But this is what registers do. Each record in this register is a country. If you go to this URL at the bottom, this is the record for the Gambia. Within that record, there are fields for useful things, such as the word you should use to refer to citizens of that country, the official name for that country, and then the official short form, the standard short form for that country. In this case, we've got the Islamic Republic of the Gambia and the Gambia. There are date fields too, because countries change over time. The record for East Germany has an end date. The record for the Federal Republic of Germany has a start date. The next day. The data is available in different formats, because people who consume data have different needs. Here's the record for Germany in JSON format, and here's a bunch of countries available in CSV format. We recognise that different users have different needs. JSON and CSV are just two that I've shown you, but we also provide YAML, we provide tabs-operated values, we provide RDF in the form of turtle. The country register is being put to use on the e-petition service. Another show of hands. Has anyone heard of this? It's worth talking about a little bit. The e-petition service is something that any UK resident or British citizen can sign a petition, and you can see there's a few popular petitions listed here. When you sign a petition, you need to write, tell the petition service where you are. You get asked for your location in a box here. This used to be a hard-coded list that they had within their app somewhere, and now they have a nightly job which loads the latest data from the country's register, and therefore it will always be up to date. There's some interesting things as well, you can ask me afterwards where you can get data about petitions and see signatories but broken down by country. The way we have created these registers is by talking to lots of people, working out what their needs are, trying to come up with characteristics of what makes a good register, and we've written them down in a blog post. I won't go through all of them, but I've talked about a couple of them, but they're live data, but I'm going to dig into that a bit more now. It's not a file that gets updated once a year, and that gets out of date progressively more and more throughout the year. It's a service you can go to all the time and see what the latest authoritative information is. So the page for a country here is the latest data. You can also sit and watch for updates. As new changes come into the register, there's a single URL you can go to and see new entries being made. You can see all the entries all the way back to the beginning of the register. In principle, if you wanted to download the whole register, you could just go to the entries page and replay every single entry in order. This gives you all sorts of other properties too, and once only can you have a download that you can keep up to date. You can also have a data set that is historically aware, and you can say what was true at a given point in time rather than what is true today. When a record gets changed, if a country gets renamed, a new entry is added to the end, but the old entry is still available. So Phil has described something that already exists. The country register is out there. You can access that. I'm now going to talk to you about the things we want to do next, some of which I can't go into too much detail, and that's because they mainly are still ideas on our wall in the office in post-it notes. So what's next for registers? Well, we want to build more of them at the moment. We have just countries. There are many, many other data sets in government that are all held and maintained quite differently. They may already be open, but how accessible are they? If you're a service that needs three of these data sets, one that's only available in CSV, one that's only available as HTML, that's quite difficult, and it's difficult if you have to build bespoke software to access each of these data sets. So we want to build more of these as registers so that they're available in a more standardised way. It should help everyone to build better services, cheaper and faster, and across the whole of government. So we started to think about registers for local authorities, schools and businesses, but this is just the beginning. Hopefully there will be more. Data in individual registers is also linked to each other, so a data in a register should reference data from another register. This reduces duplication and spaces for error. Linking the data is possible because registers use standard names for their fields and standard data types, and each entry in the register that you saw earlier has a unique and stable identifier. An example might be a register for food hygiene ratings of if a local authority goes to a restaurant and gives it a rating on how hygienic their food preparation is. We might need a register for that, but they might want to link to the register of companies so that they don't have to maintain their own version of a restaurant company and maintain that twice. So technical standards are definitely something we require to link between the registers, but more importantly for these links to work, there needs to be trust between the organisations and services that are producing these lists. For the registrar, or authority creating the food hygiene ratings register, they need to be able to trust that data in the register for companies, otherwise they won't want to use it, and they'll duplicate it. Trusting a register means that you must be able to trust that the list will be kept up to date, as Phil discussed earlier. It's not going to disappear and it's not going to change shape either. We must also be able to trust that the authority we said created that list genuinely did create it, and that what they said this data means is what it means. So what we want to do now is build a layer of proof, what we're calling proof mechanisms on top of our registers. So I'm now going to describe three of these proofs that we think there's a new to four. Each of these proofs will allow us to verify certain property of a register. So the first of those is what we call an audit proof. It allows us to verify that a single entry in the register genuinely does come from that register, so it was created by the Food Standards Agency, for example. And if someone has given me an entry from a register, I can prove that that hasn't been tampered with, that it genuinely is that copy. We should then be able to take this entry that someone has given to you and give it to someone else, and they should be able to prove that that is a genuine entry in the register and that they can trust what you are saying. For example, here's a picture of the window of my local pub. The sticker on the right is what a food hygiene rating might look like in the UK. So this is telling me as a customer that my local pub restaurant has a food hygiene rating of five out of five according to our local authority, according to the Food Standards Agency. But when I look at that, how do I know that that's specific to that restaurant? There's nothing specific on that sticker to that restaurant. They could have actually bought this on eBay because that has been possible at one point. But what if the food hygiene rating sticker looked more like this? So it's specific to whatever the restaurant is. There's a machine-readable fingerprint on it that allows someone, a potential customer for a restaurant, to be able to check that and say, okay, that genuinely did come from the Food Standards Agency. It's not just what my restaurant wants me to think they have. That's that one. So the next one is a consistency proof. This one allows us to check whether a registrar or authority has rewritten history. So if they issue, if an inspector or a restaurant issues a new report for a restaurant, they want to know that the history of this register is still, as it was, and the history hasn't been rewritten and that the registrar is not going back on their word. On the list of properties of the register earlier, one of them was that a register is append only. So you never ever, even if you update a record, it's always appendage. You never lose that. This proof allows you to verify that property of the register. And finally, we have what is called the register proof. So this would allow you to download the entire register, store it somewhere else, take it away, do something, use it. But you could then later prove to someone else that its entire content is a genuine register and it's not being tampered with since you've had it. So how are we going to provide these proofs? We plan to use something called a Merkel tree. It looks a bit like this. It's a cryptographic concept, which allows us to verify these three properties that I just mentioned quite efficiently and securely. I'll come back to this in a minute. But I'm going to mention a project by Google called Certificate Transparency, who we are working with using a lot of their ideas. They are using Merkel trees at the moment in something that they've called a verifiable log. So a certificate transparency, what is this? The aim of the certificate transparency project for Google is to fix some of the flaws in the SSL certificate system, so the cryptographic system that underlies all of the HTTPS connections. So how are they going to do this? Well, they've got what they call a verifiable log, so it's a list of all issued SSL certificates. It makes it possible to identify any certificates that have been mistakenly issued by a certificate authority or maliciously acquired. So, for example, if you were Google, you could look up the list of authorities that have issued certificates for Gmail, for example, or us as GDS, we could check the certificates issued for the country register. So, although we're not actually storing certificates in our registers, there's quite a lot of ideas around transparency, that certificate transparency project that we can borrow and steal. So, now I'm going to explain what certificate transparency concept of a verifiable log, which we are borrowing, how that works. So, suppose that we're building a register of restaurant inspections. First, we collect some data. We've got five entries here in that register of inspections for five different cafes. We then compute the secure hashes for each of these individual entries. You've got here A, B, C, D, and E. We then combine these into a Merkle tree. So, G is a hash of the concatenation of A and B. H is a hash of C and D, and you go up all the way up to M, which is the Merkle root hash. So, this root hash signature here, this M, that is actually the register proof that I mentioned just a minute ago. If you were to download this whole register and later want to prove that it's genuine, all you'd have to do is take your individual raw entries and rebuild this root hash and prove that they're the same thing, and that is your register proof. As entries are added to the log, to the register of the log grows, so it's equal in size to the size of the register. Now, I'm not going to go into too much detail about how an audit and consistency proofs actually work, because it's a bit of a rabbit hole. If you want to go into lots of detail into that, there's a link down the bottom to a blog that will give you more information. There's one thing I'm going to mention there about the verifiable log. The structure of the verifiable log allows you to do an audit proof or a consistency proof without downloading the whole register, which could be huge. For example, if we're looking at just one entry, the last entry, Roy's Rolls, all you need is to verify that this is an entry in the register. You just need to know the value of K, which covers the whole, which you need to know all siblings as you go from the path from the entry in the register to the top of the Merkle Root hash. For this case, that's just K. For an audit proof, if we've got the original tree here and we add one more entry, so F, Prima Donna there. F is the new hash of the new entry. All you need to prove that this tree and the previous tree are consistent is that you need to prove that the first five entries in this tree are exactly the same as the first five entries in the previous tree. To do that, you just need to know the value of K and E, because K and D cover the whole of the first tree and they cover the first five entries in the second tree. There is more detail than that, if you want to see it, go to that. I think that's there. I'm going to mention one further proof that we might consider implementing. This is something, a verifiable log that comes from certificate transparency can't give us. This proof is a record proof, so is the data I have fresh? The audit proof I mentioned earlier, so if we go back to the example of the food hygiene rating in my local restaurant, the audit proof is useful, but being able to prove that this is a genuine entry in the register is not useful if there is a more recent version where this entry has been downgraded from five star to one star. The record proof verifies that this is the most recent up-to-date entry and there hasn't been a later inspection that's changed that, but as I say, the verifiable log can't give us that information. If we're going to implement the record proof, how do we think we might do it? There is another implementation of a Merkle tree called from certificate transparency again. This is called a verifiable map rather than a verifiable log. It's... similar in that it's both, again, another Merkle tree. However, this Merkle tree, rather than growing in size with the size of the register, it's a constant size and it's absolutely huge. If you have a hash in the algorithm of, say, SHA256 so that your unique hash is always 256 bits long, then the size of your tree is 2 to the power of 256 so it's, like, unconceivably big. But the thing that makes it really efficient is that because your entries are in leaf positions addressed by that hash, most of them are empty values and it's really sparse. So that's what makes creating, calculating a record proof, proving that you've got the latest entry and verifiable map. And this is where I ask for some feedback. We want to know if you can think of any times where you've checked this kind of integrity of your data or can you think of times when you wanted to be able to check the integrity of your data but you've not been able to. Would you like to be able to check that you have the latest entry or that this is genuinely an entry or that history hasn't been rewritten by any type of data? Please let us know what you think. You can provide any feedback generally at this beater banner at the top there, the feedback banner. Or please come find us later for drinks or on Twitter. You can hunt us down. Thank you very much. Any time for questions? This is all great. Fantastic idea. One registry I'm dealing with is the WHO list of clinical trial registers. And it's compiled from one from every group of countries. I wonder if you have any plans for collaborating where you've got a few people a few organisations putting stuff into a single register or allowing your register to interoperate for something comfortable in another organisation or international body. Or is this only something that can work as a self-contained author by a single person? Shall we take this one? I think the main answer is I don't know. The models we've been working with have been around. We work in central government where there often is a single identifiable authority. While I recognise the problem you're dealing with it's not one we've had to look at yet. That goes enough to come in at the same time. Otherwise Germany doesn't exist. Do you have any system to deal with the fact that you might need to for example remove East Germany and put Germany in and those who kind of have to be an atomic need to work transaction? I mean other than the fact that we wouldn't publish the register with the intervening entry there and I think that's the main way we're dealing with it at the moment. I think a lot of this is still in flux. We're still evolving and iterating our model and that is a very interesting question that we need to think about. The country registry is a new came here of existing countries. Different countries have different views of what countries exist. So there may be disparate parts are making any attempts to sort of map that to other countries so I think the most interesting answer I've got to you there is if I go away back in the slides and stop responding again so the name of this record you can see is GM and that is the ISO country code so we're using standards to deal with that. There are issues with that which I can talk to you about afterwards but where there are existing data standards we want to use them. The country field in this is GM is the ISO country code. I certainly hope that that sort of analysis would be possible and ideally without us having to even be involved as long as we've published the data then anyone can take it and I'm not aware of the open refine API so maybe we can talk about that afterwards. At the moment the model we're working with is that all registers will be flat mappings of fields to values and if there is any hierarchy it will be through a link as Michaela was talking about from one register to another register. I think what we're going for here is definitely a lower bar than the full link data thing we're not looking to link to arbitrary data out on the internet where these registers link only to other registers within the UK government so I think that's the main difference I would add to that I'm sure other people are more expert in link data than I am and can do an interesting analysis and that said we do make the data available as territorial if people want to use it that way and I think that's all the time we've got for questions but thank you very much.