 Yeah. Okay. Hi. So I'm Chris DeBell. I'm a software engineer at Open Government Products. And today, oh wait, so OGP, we are a unit inside GavTech Singapore and basically we built tech to solve public good problems. So today I'm going to share a project that I've been working on and it's called Vault. And in particular, I'm going to talk about how I implemented authorization with an open source library called Casbin. So before I go into the meat, sorry, having a bit of a... So before I go into the meat of the technical details, I'll just give you a brief introduction on why we decided to build Vault and what exactly Vault is. So when you talk to public officers, I think one of the things that you'll hear from them that is a big frustration for them is that sharing data within the government is hard. So why is it hard? It's because the current process as it stands is that they have to go and find some common contact or co-email someone to find out who has what data and whether it's appropriate for them. And then they have to email for approval. So firstly, that first step, what's the problem with that? It's discoverability because there is no standardized data dictionary that is digitized. They don't know what exactly is available, what is in the universe of government data and who exactly to approach to find it. So if they do manage to find out who has the data and go and approach them, then they have to write emails to get approval from their superiors, which will then have to go on to the superiors of the other agency. And there's this like back and forth email chain. So that's the request process. It requires usually bilateral or multilateral agreements and different kinds of forms and processes as well. There's absolutely no standardization. Some of it is paper, some of it is digital. Basically, the requester does not know what information exactly they have to provide. So if they do manage to get the approval, they can now try and get the data. In the data transfer part, there's a long lead time. Why? Because they have to get it over an encrypted thumb drive. So someone has to put it in the thumb drive from some kind of database. A lot of them are legacy databases. And then physically pass the encrypted thumb drive to the requester. So it involves someone, I don't know, driving, cycling down, whatever. And I think that with modern day kind of systems, you're used to getting data programmatically, right, through APIs. But obviously with this encrypted thumb drive, that's just not possible. And finally, the last one is accountability. So because of all these different processes, you need to trust that the person getting the data, that their agency has processes in place to ensure that there's no data leak and that if there is a data leak, then they have the process to find out who exactly is accountable for this data leak. So in order to solve these four main issues, we decided that we should design a shared central repository and accompanying service that will help government officers discover and access the confidential data securely. So from my description so far, you can see that basically the system has two core aspects, your data sets and the ability to request for these data sets. And that's what I'll focus on in this talk as well. So just some visuals for you. This is vault as it currently stands. So this is what a government officer will see when they log in. The first picture is the homepage. And you can see that there is a card view of the latest data sets that are uploaded by an agency. There's also a list view. And then there's a search functionality. So you can type in whatever key what you want. It searches the data sets on their title and description. And we might also want to accent it to include the different columns that are in the data sets. So when you do find the data set you want, you click into it. At the bottom, you can see this picture over here. It's a data dictionary. So it's within the individual data set page. And so this tackles the discoverability issue. You can see that we've listed out all the columns, the title, the description. So what exactly do these values represent? And also the raw or derived categories. So that's useful especially for like example data analysts. They need to know whether it's an aggregate data or is it the raw granular data. And so this really helps people to be able to be like, okay, is this data set relevant for me? If not, they just request it and then they don't know like whether it's actually going to suit their needs. So if someone, if the user decides that they do want to request a data set, they can go into the data request form and add the data set. So because what we envision is that officers are going to request it on a per project or like per assignment basis. So they're going to need to add multiple data sets. So we support that functionality as well. And then they just provide some like justification as text views. Right. So after the request has been submitted, the approval will go off to the relevant parties. So one of them would be the requesters own superior within their agency. And also approval will be sent to each of the data set owning agencies. So because we are allowing requesters to request for multiple data sets, we need to send the approval to each agency that owns any of these data sets because all of them have to approve the request for it to go through basically. Yeah. So this is what the approvals will see. And finally, if it has been approved the request, then the user can get it on demand. Right. So they can just go into the system downloaded it whenever they want. And if let's say they delete or accidentally modify the data set once they've bought it out of the system, they can always log back in and get it. They don't have to go through this whole process again. Right. We also obviously do expire the request so the person cannot access it indefinitely. So from what I've described so far, you can see that there is a need to have some kind of permission set to govern who can do what. Because even if I'm a legitimate government officer, you don't want me running amok in this system and accessing the activity of like all your government officers and all the data sets. So we do that through a combination of authentication and authorization. So authentication is concerned with like whether I am a legitimate user. So for example, who am I? I'm Chris Bell at open.gov.sg when I log in. And so I can access it because I'm a legitimate government officer. But even though I am allowed to go into the system, I can only do a few things. Right. So for example, I can read the data set description, which is that data set page where I can see the dictionary and the preview. Sample set preview. I can also read my own request that I've made and I can create my own request. But because I'm not an admin user, I cannot create my own data set. So I cannot write to data set. I cannot approve other people's request. And I obviously also cannot read other people's request. So authorization is really to do with restricting users like actions to their assign scope. Right. So how can we implement authorization? One way is through the most basic implementation is an access control list. So what an access control list is just a giant set of rules. It's a list of permissions that you attach to all the objects or resources inside your system. So you will specify which user is granted access to which object and which action they can perform. So we didn't vote. We only have read and write. But obviously you can further differentiate it to include like delete as well. And so each entry will have a subject. You will have a resource and you have the action associated. So for example, this little green man, if he can write to this for approvals, he will have the permissions for those four approvals. And this little man below, if he can read these four data sets, he will have the permissions for that. Right. So what happens with an access control list is that it just gets exponentially large the more users and resources you have. So I just have like eight users and eight resources here, but it looks like a complete mess already. And it just grows and grows because you need to specify every single combination of user, resource, and action that's possible in the system. So what is an alternative to this? So there's also another model called role-based access control. And it's basically where you just group your users into different type of roles that you have in the system. And the good thing is that you can give your users different roles and they inherit the permissions from all those different roles. So for example, here I have a group A and a group B type A and type B and four users in each group. And I just place them inside that group and I associate that role, role A and role B with these permissions and then all my users will inherit those specific permissions. So inheritance is what supports the role-based model of access control. And another nice thing is that you can compose more widely-encoupled roles from your more granular roles. So for example, if I want some kind of super admin that can do what A and B do, then I just put C equals A and B and then it inherits all those permissions. So what else can we do? I found that RBAAC alone wasn't enough. For example, if I have an admin user and I want them to be able to see the approvers from their organization, if I just have an admin type user and then I try and give it access to the approvals, they will be able to see the approvals from every single organization and that's not really how it works. So this is quite similar to I guess how AWS has your environments and you can only see yours, right? You cannot see those of other users. So you have domains. So here you have your organizational domains. And if I want to add a particular resource, if I only want to let Alice and Bob see these resources, then I just add the permissions within organization A and for the type of user A. So the nice thing is that I can put my resources into different domains. I can share them across organization A and B or just isolate them to within organization A. Okay, another little extension that I did was to group my resources. The same way I group my users, I group my resources into the different types. And again, that simplifies. So you see from the whole giant mess of errors I had earlier now, I just have very clean tree arrows per domain. So just to do a quick comparison with an ACL with 10 users, 10 resources, I have more than 100 rules, right? 100 rules is assuming that each user can only have one permission for one resource. Anything more than that, it just is even more than 100. And with RBAC with your domains and resource groups, the same configuration, I only need around 24 rules. And so I'll explain why it's only 24 rules, right? So I have four rules that assign the domain groups and the roles. So if you can see lines 1, 2, 4 and 5, those are my policy definitions, right? So I have my role, which is of type A. Inside the domain organization A, the first two are for domain organization A and it's to read and write the approvals, the resource group of approvals within organization A. And lines 4 and 5 is the same kind of role A, but it's for read and write within organization B. So that's four rules for that. 10 rules for assigning the users to one role, obviously that's assuming that your user can have only one role. But lines 7 to 10 show four examples of how I do it. So I say that Alice belongs to domain organization A and she has role A and then Ellen has the role A inside the domain of organization B. And finally I have 10 rules to assign your resources to groups. So similarly, your approval one belongs to approvals of organization A. Right, so RBA C that's implemented with the open source library has been. What's good about it is that it's extensible and flexible. So I can assign multiple roles to my user and I can assign resources to multiple domains. So for example, Alice used to belong to only organization A. If now I want to give her the type of role A inside organization B, all I have to do is to add one line. So on line 8, the highlighted one, you see that Alice now has the role A inside domain organization B and she inherits all the relevant permissions. So I don't have to say like, oh she needs to be able to access these 1000 data sets, these 2000 requests or whatever. It's very simple. Right, so this is, I'm just going to give like a little bit of detail on how exactly you can use Casbin. So the thing that I showed before, the code that I showed before was your set of rules. So that's basically inside some kind of database. And this is the configuration that will help Casbin decide based on those rules, how can it check whether someone is allowed to perform an action. Right, so the first three things that I have over here are my bindings. So these are all definitions. The label R is a request definition. The label P is a policy definition. And then the Gs, they are role definitions. Okay, so basically the definitions give me bindings. For example, if you send a request to my API endpoint that's like post to slash data sets, slash the data set ID, what the middleware will do is to transform that request into an R. So the R label, the request definition. So in that case, if let's say I'm the one making the request, right? My subject, the first few, would be my email. So Christopher at open.gov. The domain would be my organization. So OGP, the object would be the data set ID that I get from the endpoint. And then the action would be based on the HTTP request method. So a post is equal to a write. So that's the transformation that the middleware I wrote will handle. Yep. So similarly, your policy definition and role definition are just bindings as well. Okay, so going on to the policy effect. The policy effect basically defines whether that request should be approved or not. So you notice that there's this P.EFT that's underlined over there. Basically, EFT stands for effect. And in this case, it's allow because if in your policy definition, you don't specify an effect, the default is to allow. So the other option is to deny, but for that one you have to explicitly specify it. I won't go into that in detail for now, but allowing the deny basically means that you could do something that is conditional, right? So you can say that the policy effect will be if there's a policy that allows this request and no other policy that denies this request, then allow the action to go through. So it can get very complex, your policy effect. But in this case, I've kept it simple because that's all the system requires. Okay, so the next biggest part will be the measure. Basically, what has been in, what has been is it's basically just like a pattern match. It's like a pattern matching game. So for example, if a request comes in saying that admin at data.gov.sg who belongs to the domain, open government products, wants to read the approval with the ID 123, that I call the enforcer method of the Casbin API and it will based on the measure try and match against the policies in the database. Okay, so I'll just show you briefly how this process goes and right. So based on the first clause that you see up there, it says gr.subject p.subject r.domain. So it will go inside the database and try and see whether there is such a policy. So the request subject you can see is admin at data.gov and the request domain is domain underscore ogp. So if you look inside the database, you can see that on line three, there is such a policy. And from that policy, you can see that the second field, policy.subject, there's going to be row reader ogp. Sorry, this is on line four. So going on to the second clause of the measure, it says that request.domain must be equal to policy.domain. So what's the request.domain? That's going to be the second field. So domain underscore ogp, that's my request domain. Inside the database, there is a policy that's on line one. The policy says that my second field is also domain underscore ogp. So the request domain does match the policy domain. So going on to the third clause, it says that there should be a G2 row definition with the request.object and the policy.object. So you see that the request.object is approval with the id 123. And again, in the database policy, you see that the policy.object is approvals underscore ogp. So if you see on line six, there is such a row definition. There is a definition that says that G2 should be for approvals underscore 123 and approvals underscore ogp. So you can see that it goes across all these rules and tries and match it based on the inheritance. So it's not immediately obvious. It's kind of like a chain pattern matching game. But that's basically how Caspin tries to match the different rules to see whether an action can be performed. Okay, so one last thing that was interesting for me is that for our system, we couldn't just use organizations as domains. And why is because when you make a request, it actually belongs to the individual and not the organization. So let's say I'm in ogp and I create a request. I don't want to just attach the permission to ogp, meaning that all my fellow ogp team members can see my request as well. But I cannot just keep the request as a single resource item because I also need to associate the approvals for that particular request. And I also need to add the permissions for the relevant admins to view this request. So what I eventually came up with is that actually request function as their own little domains as well. So by making it a domain, it pretty much functions the way as organization domains do. I create a reader role for my request domain. I create a read permission. I create a writer role. I associate it with a write permission for the particular request. And finally, I also add a resource group which are the approvals for this request. And I give it a read permission. So being the creator of the request, I then have the following role definitions. So crystalbar ogp.gov can be a reader for this domain and can also be a writer for this domain. That means that I can read and write to my own request for this permission. I can read all the approvals that are associated with my request because I need to find out the status of the approvals that are associated with my request. So what happens when an approval needs to be created? So previously in this example, you can see that there is a role definition that says that approval123 belongs to the organization ogp, right? And this policy ensures that your ogp admins can go and read and modify the approval. But for the requester who is maybe from a different organization, they also need to be able to read these approvals. And that's why there's this extra role definition here that puts the approval123 inside the domain of request123. And so that means that as the requester, I can now, based on this policy in line three over here, I will be able to read all the approvals that are associated with my request. Right, so in order for the admin users to be able to read the request, right? So if I am an admin user and a request has come to me and I need to approve it, I need to have the permission to see that request, right? But because it's in its own domain, so I need to inherit the correct permissions, so I say that all the admins of ogp can be readers of that particular request inside that request domain. So this was something that I didn't see was a pattern that was used before in other implementations of Casbin. So it's just something that I realized needed to be done because of the particularities of our system. So I think that this embodies very well how ogp works. We use a lot of open source code, obviously verify that they are well maintained and used by other people, so that bugs will be found and stuff like that. But I don't think that it would have been really very possible for me to implement authorization the way I did if there wasn't already this kind of pattern matching library to help me do all of this. So it would have been a lot more difficult. And I mean, a lot of open source stuff out there is really good. For example, for Casbin, they have created support for not just Node, but also go phppython.net. And a lot of other companies like Cisco and stuff use it as well. So I thought that this was a good example of how we use open source code in our development. Thank you. If you have any questions about this or work that ogp does, feel free to ask.