 All right. So hi, everyone. So my name's Hong, and I'm the head of the Open Government Products team at GapTech. And today I'm going to be talking to you about zero-knowledge secret sharing, how we share data on the Singapore government. Well, not how we do it yet, but how we're trying to fix it. So basically, sharing data within the government is really, really hard. So just so you guys, I know there's a lot of talk about privacy and data sharing, but as someone who has worked in the government and has tried to do data projects, turns out that it's really, really non-trivial to even get access to really basic data that you're trying to do for a project. So to give you an example, we were trying to work on a project for, let's say, I think we were trying to identify single mothers and what their profiles were. So we could figure out what a subsidy scheme for them could be like. And so these are just a set of the steps that you have to go through. First, you have to ask about the data. You have to ask what data do you have. You have to explain why you need it. You have to go through a whole churn of not this, not that. And perhaps one of my favorite steps is that data transfer in the Singapore government goes by taxi, where you spend a lot of time at an inordinate amount of time where you literally cab back and forth. And as a manager, I spend a lot of time approving taxi claims for people to cab back and forth with hard drives, transferring data around, because that's the fastest way you transfer data around the Singapore government. This is not just me talking out my ass. This is literally, we did a user study. So our UX team actually went to talk to a whole bunch of users who tried to do a data project. And this is what the timeline looks like. So you start out and you're really happy, right? You have a project hypothesis. It's like, I'm a smart, modern, data-driven person. I'm going to help transform how we do something in Singapore. So I'm going to get some data. I'm going to do a project. I'm going to figure this out and optimize it. All right. And so you start figuring out, all right, where can I get this data? And that's where you start getting sad. And you figure out, you see, you figure out there's multiple, multiple parties trying to find data. You just ding dong back and forth. And this isn't even starting. This is like, undefined period of time. And so after you've dinged on for a little while and figured out which is the correct agency to find the data it is from, now you actually need to ask them for the data. And this, you would think, is like a one thing, right? Here, I'm a government agency. I would like to get some data to do this project. Please and thank you. This turns out to take between two to five months. So if you're looking at this, you start, you write an email to ask for data. The agency, you send you a data request form. You fill it up. You negotiate about which fields you can or cannot share. They send you a data dictionary, because obviously, they don't actually explain what the various columns mean. So you need to go back and forth. And then you go back. And this actually goes back and forth multiple, multiple times until you finally get approval to get the data. And God help you if you change teams. Because if you change teams, you have to go back all over again. Once you've done that, now you need to actually get approval. So this is just filling in the form, not even getting approval yet. And so you fill in the form. You've got to get it signed by your policy director and then they've got to get it by their agency. You've got to wait for data to be released. And then there's the actual transfer to data, which, as you can see, takes three to six months, even those two steps. And if you can count the number of cab rides, that is. It's quite a lot. And after you've gone through this whole process, then after spending a year of getting all the data, you then have about two weeks to two months of analysis, and then your project is done. So if you're wondering why we don't digitize more in the government, this is why. So we are trying to fix that problem. And so total time is about 12 months to get started on a project. And so sure, we should just build a platform. This is like your startup pitch deck. Why not build a platform? We have a central platform where all agencies can upload their data. People can search, preview, and download it. Seems pretty straightforward. You just have a file server. People put stuff up. People take stuff down. That's the problem. Now you have a service, which now you have a single software system, which has all of the Singapore government's data. Now if something were to happen to that, and things never happened, obviously, because you were very, very good with security, that would be horrendous. Like you would basically leak all kinds of private things out. Not even talking about military secrets, even just things like medical records, or people's indebtedness, and things like that. Like all kinds of stuff which people treat really seriously. All right, so how do we get around this? We clearly can't go back to this. This is security through bureaucracy. It's stronger than security through obscurity. So what do we do? So that's where zero knowledge comes in. So this is how we're proposing to work. This is how we are building one of our data sharing platforms. And take note, you don't have to do this for it. I just want to caveat. This is not for every single last data set because it adds a bit of a burden. But it is how you more easily share something securely without creating a big, juicy hacking target if the data is a fairly sensitive bit of data. So this starts out, basically, if you look at it, we have a web app. And a user just uploads a file using a web app. It's pretty straightforward. But instead of just uploading a file straight to the server, what we do is on the client, in the browser itself, before it ever touches the server, we encrypt the data. We encrypt it. And the encrypted data goes to the server. And the encryption key stays only with the user. We showed on screen. We tell you, please write this down. Take a picture of your phone. Don't put it on a post-it or leave it on the office. But keep this thing safe. The person who wants access to data can go to our search index and can search around and find the data sets they want. They download the big blob of data. Then out of band, instead of taking a taxi back and forth, they just need this guy to send over the encryption key. And then they can decrypt the file and they get access to the data. So you can see you're using the server to transfer the big bulk of the file. And then you use an out of band communication. Out of band is a fancy way of saying, I have not described the mechanism specifically. Basically, you can show a picture to someone. You can write them a post-it note. Some other way, other than the main server, so that even in this case, if the main server is compromised, there's nothing to be found. If someone hacked our main server, all they'll get is a bunch of gibberish files because they can't do anything with it because all the encryption keys were stored offline. And that's the key principle here of zero-knowledge, that the server has zero knowledge about what it's storing. My server, even though I'm the one, so even though we're the team who built the service, we have no idea what's in the data there. We don't know and we don't want to know because if we don't know, even if someone compromises us, there's nothing to leak. And that's the key principle. It's not about trying to harden and harden and get something like super hyper secure and pinky promise that you'll never get hacked. It's about making sure that there's nothing to steal. But this does cause a problem. And those of you who might have spotted this, where the key itself is a single point of failure. So let's give you an example. So hypothetically imagine some government officer's boyfriend was a social path and decided to leak age data to the public. That would be a problem here still because the person who have access to it, he accurately has the key. You can get the key and you can decrypt the data. And that's a problem, right? You should not, and you know, even like just even more benign than that, right? Let's say we had some really, really important data set about like people's indebtedness or something like that. And someone just lost their password. Like you just lose this key. Whoops. Now the Singapore government has just lost all that data and it's just gone because it's an encrypted blob by design you can't get access to anymore. So what do you do? So our solution is to take the key, split it into multiple parts. And the way I think about it is that you take the key, you split it into let's say five parts, and any three parts of this key can regenerate the original key. So does this make sense? So I'll take the key, I'll cut it into five portions, but I don't need all five to regenerate the key. I just need any three of the five to regenerate that key. And this seems like a really complicated math, like math magic trick, but it turns out to be really, really simple. It's based on nothing more than secondary school math. So this is known as Shamir's secret sharing. I would highly recommend reading the paper. It's literally a two page paper, like that's it, it's just two pages long, and the sort of intuition behind it is remarkably simple. Any two points can define a line. You know you learn this, right? Two points define a line. If I have a quadratic equation, so x to the power of two, I need three points to define a quadratic equation, right? If I have a cubic equation x to the power of three, I need four points and so on and so forth. So let's say I want to have a key, which I need at least three people to regenerate the original key, right? So let's say that secret is here. So what I do is I define a quadratic equation which passes through this point, which is my secret. And then I just sample any other points on the quadratic curve. And you can see that any three of these points will uniquely define the quadratic equation. And so if I were to get x equals to one, y equals to two, negative two, x equals to 1.5, y equals to negative two point, two five, so on and so forth. And then I just solve a simultaneous equation. I literally just solve a simultaneous equation to figure out what the x square equation is and therefore when x equals to zero, what's y? And that's my secret. So the intuition's really simple, right? It's literally simultaneous equations to solve a quadratic equation. It takes a bit of time to think about and it takes a bit of time to wrap your head around, but intuitively you can imagine how you scale this to any number of people and any number of parts needed. So let's say I wanted to have a key with 10,000 parts and any 9,000 of them could regenerate the original key. Well, I just need a 9,000 minus one power polynomial and then I just sample 10,000 points anywhere and it's easy, it's trivial to generate because you can get as many points as you want. And the fact that it's so easy is beautiful because if anyone loses the key, well, I just throw it away. Like I just get another point from my curve and I hand it to him and there you go. So a few things that are important about this thing. Firstly, no single person has the power to access the data. So you could imagine in this case, let's say you could imagine giving one key to let's say the minister of health. You give the other key to the director of medical services in the hospital. You give another key to like, I don't know, the chairman of like some bank or whatever you want and any one of them, if they decide to go rogue or do something funny, can't access the data, which is great because it means that you need at least this like those nuclear codes thing where you need two keys to knock the door. This is an arbitrary number of keys digitally to unlock it cryptographically. The second major benefit out of this is that I don't care if someone loses a key. So let's say I had five keys and someone loses one. Well, as long as three people don't simultaneously lose their keys, my data is still safe. And so you could imagine if let's say you wanted to space it safe, you could do like, you know, I give seven people the key and three people are needed to access the data. And so even if even three people just lose their keys once in a while, that's fine. I just issue new ones to them. I just keep track of who has them and once in a while I just check to make sure that not everyone has forgotten everything. So it gives you both security as well as durability, which is important problem we are trying to solve here. Security and durability in one. So this works fine if you have like, I don't know, like three to five people who you need to get keys from, right? So when I want to access this secret data, I'll just go to like the three or five key holders and I'll get it from them. But let's say I have many, many people. I say I want to get data from a lot of people, like thousands of people. So then this is not workable anymore. I can't actually get parts from a thousand people. That's a pain in the ass. And a big example of when you want to use this is when you're collecting public data. So, you know, a lot of, you know, forms are a part of life, right? Especially with the government. And so let's say, you know, you're applying for a flat or you're applying for medical, you're applying for leave or you're applying for like medical assistance or whatever you fill in a form. And all these forms have personal data on them. And the government can't ask you to encrypt every single one of your forms. They didn't go around to every single person and ask you to get your keys. That doesn't make any sense. So how do we deal with this? Well, we extended to work with asymmetric key encryption. So how many of you here know what asymmetric encryption is? Okay, does anyone not know what asymmetric encryption is? And I need a quick explanation. All right. So for those of you who don't know what asymmetric encryption is, basically, instead of having one key which unlocks and unlocks a file, one key can only lock it, the other key can only unlock it. Does that make sense? So that's the key thing. So if I imagine a physical lock where I had one key which could only lock it and close it, but that key could not open it again no matter how many times I tried. And another key can unlock a closed lock, but it cannot encrypt it. So that's the concept in asymmetric encryption. So in this case, when someone creates a form, let's say I'm a government officer and I want to collect data on... You know, I want to collect data on, let's say like children's eating habits, right? I want to check whether or not they're having a good diet. And I don't want to just be shared with everyone, obviously. So I create a form and I'll ask you, you know, how much do your kid eat the day? Does he have a lot of sugar? Does he exercise? Blah, blah, blah, blah, blah. The browser, instead of just encrypting the data, generates a key pair. So one encryption key, this is to lock the data, and one decryption key to unlock the data. The form itself is stored on the server. So I put up my web form that says, you know, to ask people all the questions. And with these questions, the key is stored with it. The encryption key, the locking key, but not the unlocking key. So next, let's say a user comes in. So a parent comes and they access the form and they fill in the kid's sort of dietary information, right? Well, when they press submit, instead of just sending that form decrypted to the server, they use the encryption key that was paired with that form to lock it. They lock that, they encrypt it and send the encrypted form response to the server. Now, every single person who accesses the form has access to the encryption key. And so everyone can encrypt their data to put it on the server in a secure form. Yes? A encryption key per form. A per form, not per submission, per form, yes. And so everyone, so everyone accessing that same form has the same encryption key, but that's okay because just because of the property of asymmetric encryption, even though we have the same locking key, doesn't mean I can access your data. So two of us, let's say two parents, they both submit their form, they both encrypt it, but they cannot decrypt each other's data even if they were to hack the server because that's how asymmetric encryption works. And so once all the parents have submitted all their data, even incrementally as it comes in, what we do is we just, as the government officer who's trying to get this, I just download the data blob, decrypt it, and I get the responses. So we have now collected en masse data from a whole bunch of public, from a whole bunch of members of the public and citizens, and the government officer himself has access to the data, but me as the central guy, as the guy who's built the service system, the Amazon who like, let's say runs the servers, the ISP who like might monitor all your web traffic, they can't see anything. And that's the point of zero knowledge. And so this is with asymmetric key encryption. This is something we've actually implemented. So recently you know, so let's put this out here. So this right here for the first bit, so where we are. So we started from here, which is big bureaucratic process, right? Big bureaucratic process. You enter this idea, let's build a platform, platforms insecure because you have a single point of field because you have a big hacking target. All right, so let's go zero knowledge, helps us be a bit more secure. So you go for basic zero knowledge setup. Basic zero knowledge setup works reasonably well, but then now your key becomes a single point of failure instead of just a silver. All right, so what do you do there? You take the key, you split it into multiple parts. The key being split into multiple parts gives you durability as well as security because so there's no more single point failure than the key. After you split the key into multiple parts but you now want to get data from a whole bunch of people. You want to get data from a whole bunch of people. The way you do that is you go to asymmetric public key photography and that lets you get access to that. So you have durability, security, access to a whole bunch of people and all by this we're building a platform to share data in the government. So that's how we trace for. So where are we now? So we are in the midst of building this. This is what we call vault. Basically it implements that data sharing thing. And these are just obviously dummy data and stuff just to show you the idea. But to the user, despite all that complexity we put in, it's remarkably low, it's remarkably low complexity. You just upload your data when you need it and you just browse around and you find the data set you want then you download it. And then you go contact the person or however and then you get a key from them separately. But if the vault server is hacked, yeah you can look at all the listings of what the data is there but the files themselves will be useless at least for the super secret stuff. For the non-classified, non-super secret stuff. Sure, we can put it on the server so that you don't have to go fast about finding keys but the secret stuff is safe. Yes. Does vault use the asymmetric encryption only or does it also use... Does it also use what? Some years. Yes, so vault does not use the asymmetric encryption yet it's right now implementing but we have implemented a Chamele secret sharing split key. This is still under construction so we have tested it out with some small use cases but haven't scaled to have run up. It's not a general thing in the government yet it's what we hope that it can achieve. Forms or SG is the other one where we... Ironically, we have implemented the asymmetric encryption but we have not implemented the Chamele secret sharing yet so they're both sort of in half states of achieving this but they are both working towards each other. So data sets have the M over N key state? Yes, because you don't lose that and the forms has this because you'll do that and most times it's only one person creating form you don't want to create that high friction to have to manage those keys so that's why we staggered it in that manner but both are working towards having both features obviously. Yeah, and so Forms SG is our government form builder. Basically, it's like Google forms but for the government so instead of spending months and months and months trying to hire a vendor and negotiate the specs and you go back and forth and get funding and all this sort of thing you just go and you drag and drop it in. Ta-da, you have a form builder and the reason why people hadn't done this before is because the reason why the government can't use Google Forms is because of security. Like if all of the Singapore government's transactions were in Google Forms, that would be a little bit worrying but by using this, we're fine. So you have all these forms but each of them is individually encrypted and individually secure. And so yeah, the biggest use case of this so far was I think national day balloting for tickets. If any of you balloted for tickets for national day that was built with Forms SG. So ta-da. Yeah, so that's who we are. We are the open government products team here at GovTech where a team of software engineers, product managers, UX designers and we build products for the public good. And we are recruiting FEC so if you ever want to find a job, come look for it. Thank you. Okay, I think we have time for one question. Anyone has any? Yeah, please. Hi. So, okay, when you intermediate secrets share, right? Is it a teacher set that you want to have everybody present to contribute that part of the key? Because I mean, I've been thinking about this and the thing is, is it possible for every guy to sort about it somebody would enter a key and if we store it securely and somebody else could enter a key subsequently? Right, right. So you're talking about like incrementally doing it asynchronously, right? Yeah. And so someone could, so instead of having everyone there at the same time like nuclear codes, you guys have like, all right, we are stage one on lock, stage two on lock, stage three on lock, well, possible, it is something we have considered level of complexity to implement that process fairly high. So we're starting with a base case first. The problem with this incremental thing is that then you need to encrypt the stored keys and like you just, basically we don't trust ourselves not to screw it up. So we're gonna start with a simple thing first. But yes, it makes sense and it could work. You could imagine this is sort of similar to how like, for WhatsApp, you know they do the double ratchet protocol, that sort of incremental, yeah, you could imagine doing something similar in order to like accumulate the key so that it makes sense, but I have not thought through the full implication and yeah, there we go. Other questions? Yeah, okay, well, I'll be around if anyone wants to talk to me, but thank you so much.