 Good afternoon everyone. My name is Sudarshan. So this session is going to be about the UID project and what we are doing in terms of reporting analytics and some of the AKS that we are planning to launch. The original presenter is Pramod who is the chief architect. He is unwell so he has not been able to make it. And Sanjay Jain who is the product manager is on the way, he is slightly delayed. So there are two broad sections to this. One is in terms of the whole BI and reporting analytics framework that we have and the kind of data and how we use data within UID. And the second session is about the authentication and whatever kind of applications that we see, ARTHAR, data and ARTHAR services. That's something that Sanjay will cover. In case he gets late then we can always take that in the launch session. So let's start off with reporting analytics specifically in terms of UIDI. I've seen a lot of people talking about what will be going on. They understand reports. Reports is something they understand but that's pretty much a very ad hoc system in terms of if they have anything with any of the senior officials that's when they look at the data. But from a UIDI perspective what we wanted to look at was make this a very systemic process. And when I talk about data here there are two types of data. One is typically the data that you would get published as part of census or the other data that you know then you can do a lot of statistical analysis of. But what as an organization we also wanted to do was we wanted to use a lot of data literally on a real time basis for operational processes. Typically as you see in corporates you have data that comes with your time and then people and all the managers and all the whole field is working on that. We wanted to bring about that system within UIDI to be able to drive a lot of the operational work. And the reason for that was because as an ecosystem that UIDI has built, UIDI is a very lean organization. It's hardly about 150 people from a UIDI perspective. But from a project size we are obviously looking at a very, very large project here. And what we do is we work with a lot of partners. So one is what we call as registrars. These are the people like the state governments in the case of Karnataka. It's a Karnataka state government. Or you have SPIs, LICs and the other non-state registrars as well. So they are the people who actually take, enroll the people, enroll the agencies that enroll you on the field. So if somebody has got an Aadhaar number, they would typically go to an enrollment station and they would enroll themselves. And the state government are the ones the registrars are the ones who typically would hire these agencies. So the UIDI is not directly going on the field and enrolling. Now this provides, this scenario brings in a lot of complexities because what we have is, you know, we have the overall UIDI objectives. But then we have the, we have a big layer in the middle of the registrars. And then we have the agencies on the field. And we want to make sure that all, the whole ecosystem for such a large, I mean, for such a large and logistically complex operations is all aligned in the same direction. It's all working and it's all being monitored in a consistent manner. And that's where we saw the need that, you know, we can't have a system which was a stringy post-mortem system that, you know, you're getting data and then you're figuring out that things are not working or working. What we wanted was to be able to drive a lot of the operations on the field very, very quickly. So here are some of the ways in which we are using analytics and reporting. And this is, again, specifically from our very operations, running the data field operations perspective. So, you know, we have an analytics portal, a reporting analytics portal that's the house portal that actually is open to all the ecosystem partners. And it's all log-in based, so when you log in, you'll see data that is specific to you. And it covers all the key performance parameters, key operational parameters, and you'll try it and you can know the status on a daily basis of where things are, what, I mean, right from enrollment status to the kind of quality you have, productivity. You name the metric and you will be able to provide you the metric. And separately, we also have what we call the NOC, which is a Network Operations Center. This is typically like what you would see, you know, it's a big room with a lot of screen panels and you have several people who are continuously looking at it. It runs the data center and a lot of the operations because what we have is about a million enrollments that have data for more than a million enrollments coming in on a daily basis. And what we need to do is make sure this ship is running smoothly and there are no bottlenecks. So it provides a real-time tracking and it's literally all screens, panels with which show where the bottlenecks, what stage, I mean, how many packets are at what stage, at what processing level. And it typically offers you any breakdowns and deep dives so that you know what's happening. So one is from monitoring operations on the field, so that's what the analytics portal does. It provides the whole ecosystem of registrars. So for example, a state, a Karnataka state government can log in and they can see each of their enrollment agencies, how many enrollments have they done and what's, and that too down at the operator level and at the station level. So they're not exactly there what is happening. And like I said, you know, it's such a diverse ecosystem. You have state, non-state players and a lot of the private sector players. So we can't run this whole ecosystem purely based on traditional methods and information and data is the only source that we saw as a way of running this organization. We provide end-to-end visibility all through. So that's a part of our, you know, driving the whole transparency piece. So anybody who logs in, they can see right from the point where the packet started its journey and who came in and who enrolled at what point of time right to the time it's processed and given to the time it is delivered. In fact, if you go on the website and there's a link called Checker Autostatus that actually at a packet level you're in a resident or anybody and go and enter the basic details and they can look at the full details of where the packet is in real time. So it will tell you what is at this stage of processing or it has already been printed on the way or it's been delivered. So you have a complete end-to-end visibility within the ecosystem as well as outside the ecosystem. And it's important because what we wanted to make sure was it's very easy to confuse things in such a large ecosystem because each of them has a different incentive or a different reason why they are part of this program. And, you know, there is a big chance that there is going to be, you know, there could be a lot of conflicts right from the actual contractual to operational conflicts. And hence you need a single source of truth, a single data typically is the most objective form of truth, but then there's something which nobody can question. This is also interesting. We actually use real-time feedback. So data is updated pretty much on a real-time basis. Midnight, the data is updated. So you can pretty much look at today what happened in last night. So whatever data we have, everything is published available on a real-time map. Typically, a lot of the government programs, you know, you would have seen data that comes in and there's a lot of gap between when the data actually comes and then is used by the organization themselves. And that is what we wanted to cut down the time and try operations using literally real-time data fed back to the whole ecosystem. We have fraud detection modules also because there is obviously going to be a natural behavior towards trying to beat the system. So we have fraud detection modules for multiple... So one is from an enrollment perspective. So people who enroll in the field, we have, based on the number of tools that we have created, we have the fraud detection module and that scans each packet and then takes a call whether this could be a potential fraud or not. And if so, then it goes through a separate set of processes. And then from a transparency perspective, one is we spoke about all the data, at least from an operational perspective, we try to share it with the ecosystem. And similarly from sharing it outside the ecosystem as well, we provide a lot of the data as data sets and outside. In fact, it's all anonymized and ruled of data sets aggregated at certain levels depending on what the data set is. And this is soon to be available as live data feeds. You can subscribe to the analysis data feed and you'll come to know when the data has been updated. There's also an analytics white paper we have online and that speaks in much more detail in terms of specifically what we've been doing, what's the kind of architecture we've been taking. I do not want to cover that in too much detail, but we'll just cover a couple of slides after this. But that white paper talks in much more detail in terms of what we do from a data analytics perspective, what kind of structure we have. And it also recommends from a government, as a template, how can this be used by other government programs as well for setting up their own BI modules and reporting analytics modules. But this was shared by a lot of the ministries and departments and we brought a lot of interest back from them so soon we'll be presenting to some ministries in terms of how they can use these systems for their own operations. This is just a small snapshot. Again, I wouldn't say we'll do any high-end modeling and analytics as yet. What we started off is to make sure the base system is set from having a proper reporting system in place. We have basic visualization in place and a lot of the data. One also needs to understand that the audience here is people who have not been exposed to this kind of data or are not exposed to just the use of data in the first place, which is why it was very important that we... We can do as complicated as we want, but what we wanted to do was to inculcate the habit of looking at data on a regular basis, which is why it was very important to stick to very, very basic data, very basic visualization, and we also provide a lot of canned reports. We don't expect people to go over real-time and download data sets, create custom data sets. We just provide a very simple thing which they can take a printout of and put up on a file and then review that. All of this is login-based, so when you log in, then the data and downloads are specific to what dimensions you have. At a very high level... One question is, is the data available directly and later for... Like I said, we are publishing data sets, anonymized and rolled up data sets for public as... So for the ecosystem partners, they don't get access to the raw data. Even from a reporting infrastructure perspective, we have certain parts of data which we use for reporting only. So we don't have access to the raw individual data that we can then use. It's typically data sets that are aggregated at certain levels which are to be used for specific reports only. Are these data sets already online or are you planning to do it online? Okay. So when do you intend to... Unfortunately, I will only say soon right now. Okay. But a lot of work has already begun on the space so soon as... The data sets are ready, working on the processor. So the processes are not easy to quantify. At a very high level, the whole... So one good thing about the design at the initial stages was the whole BI modules and the reporting modules was there from day one in the whole overall architecture design. And which is why today we have all the modules in place and it obviously spans the standard data capture. In fact, we capture a lot of operation metadata. So I don't know how many of you have done enrollments on the field, but data like somebody's enrolling, the operator for example, how much time did he spend on each screen? So when he moves to the next screen, so he's taking your demographic information and he moves to the biometrics where he's taking the right slab, left slab. So we have able to capture time stands on each of that. And the reason we do that is because we want to help improve the operation of the productivity. So when we go back to people with insights in terms of where they are typically slower, how they can improve their productivity, that is an incentive for somebody to look at data because the faster they do and the smarter they do, the more enrollments they do on the field. Similarly, then we have data acquisition. So that's the module in which all the source systems that feed data into the organization. So like enrollment, authentication, other sources. Then we have the whole data storage or data warehouse modules. So we have of course, what is the production system so that all the servers and all the live processing happens. But there's a copy that is created. So from a reporting databases perspective, we obviously do not have access to the production data. So there is a separate set of data sets, data data also available for reporting specifically and that's what we work off. Data distribution is then done by multiple ways. So we have data marks on data sets and some of the raw data that is used to publish this information. And then data access is how we then finally share this information ahead and which is through the delivery platform the important that we spoke about. Signered dashboards, we also offer a lot of cell service capability. And the reason this is important is because it's like a scope that we have a very lean team. So we can't do the analysis for each and every partner of ours. So what we try to build is a sense of system so people can actually go and look at all the data and figure out how they want to slice and dice the data and download data sets for themselves based on that. And then a lot of ad hoc reports also can get generated and then there are the canned reports because most of our people prefer to just take a spoonful of it to them. At a very high level key principles in terms of the whole BI reporting modules, we are completely open source and we like our file system to give the BI modules that we use we use pent-up open source software and we stick to that all through. So it's scalable in here and given that we need to move to a very the size of the data is going to run and it's going to be a huge data set. So the system has a bit so that it can scale it's very modular in nature and it can scale very quickly. What we also do is since we spoke a lot about operational data so we have access to not just processed data in terms of how many people have got input but also a lot of that operational data real-time data that is provided and both of these modules are handled separately. Big data of the file handling and storage systems we use pretty much most of the cutting-edge technologies. Data security and privacy is extremely important for us so we have security to standard physical and electronic security and from a privacy perspective I mean within UID of course there's a lot of systems but from a reporting and analytics perspective also we have a data council that monitors all of the data that is shared and tried no personally identifiable information is available to any of the reporting data sets and then there is a huge amount of anonymization and aggregation that is done so that only a specific piece of data comes out to RT. This is about how I started this early part. Want to take a look? About other, I guess most of you are aware of this but what we're trying to do is bring the largest biometric system in the world. The key point here is we're trying to provide online identity to all the residents and this is based on one identity which is already complicated so you get one number and that's the number that will stay with you for a lifetime and from an open data perspective all of our data is accessible via the online authentication service that we cover in the next slides and what I just spoke about which is the bi data that's accessed internally and extremely through our data portal. Other authentication is about the question of when a person enrolls with us they tell us who he is we ensure they're unique and then authentication is about confirming that identity so when they claim that identity somebody you can validate it. Now one of the key principles is that we only respond to the yes or no we actually don't give out any personal information during this process and the other part is that it's online. We support multi-factor authentication using biometrics, PEN, OTP and of course any combinations of these and all sorts of protocols and devices. The API spec is again public we encourage you to go look at it and if you have any feedback or use it. Actually it's all on our website and I think this PPP is a little bit too much. Yeah, in any case it's a simple search for your idea of authentication and Cydec Lance this is what it is you've given your other number so you're not actually searching with biometrics against the entire system so once to unmatch so when you provide us the other number you go pull up your appropriate data and then you send us additional data biometrics or demographic we match it and we respond to the yes or no. It's a very simple system it's eminently parallelizable and scalable we expect to support about 100 million odds a day so in over a 10 hour period that's the kind of volume we're scaling up for and a lot of the applications are actually quite obvious a lot of the government welfare programs would use it the financial inclusion, the banking systems we're using it there's a KYC requirement for a lot of telecom providers for banks, for other service providers and we expect to be participating in that and no, I'm a customer, sorry I'm so used to these acronyms but yes, this is a no-yah customer requirement where service providers are required to know who they're servicing and in particular this is used potentially from a law perspective if they want to go back and trace somebody but essentially this requirement has been put into various service providers and we expect that eventually the other number will end up unifying various systems and providing the interface there's a white paper that we have published on authentication and we expect to actually come out in the document shortly and we'll talk about the effectiveness of biometrics in authentication that's it so at this point I'll take questions and the solution this is a lot of sensitive data yes so you must obviously be having security distance and all so are there any laws that prevent misuse of this data so overall we don't have in India significant data protection in privacy laws some which exist but from the UIDI perspective we actually put in a lot of protection on to this data so basically resident data is not visible to anybody other than people inside UIDI for the purpose of doing other assurance or any other investigation and from the external perspective the only data we give out is anonymized and there's no personal information available so the other API is only a yes or no nothing else that's accessible no other APIs you're building on nothing that could cause there are APIs for different purposes but the authentication API only provides a yes or no and in any case none of the APIs provides any data outside the system and anybody can query this they'll use APIs for the authentication API they'll come up with an authentication framework where you have to be a registered user agency to actually access the API so can you give us an update on what is the legal framework protected in this data what is the status of it obviously UIDI is not responsible for the law in terms of the privacy laws that exist in India but you must be aware of it because it does affect your work can you give us an update on that what do you know of the status basically there is no overarching data privacy law everything comes under the IT Act and the provisions that come from there where as a service provider this this is not my form so it comes under the IT Act and that's pretty much the only act that covers data issues right now there is a privacy bill which is in parliament which I don't think I've moved for a while and this UIDI also had proposed a bill which would then give protection to UIDI data alone and again that's still in parliament it's not being either approved or whatever because there are two consents with this one is leakage from within the UIDI database a rogue employee with access to data could leak it and I work with government data I know how transparent this whole system is when it comes to pulling data out of the system because there is a framework in place that says that data can only be accessed in a certain manner but that framework also prevents day to day operations and therefore lower level employees find various ways to get around it for their own convenience and I work with the Karnataka government I've seen how this happens there's one public directly at one particular IP address which is slash.com logs it's got data for everything and you're supposed to just not acknowledge this in public that's how it works you want data from the data center you call them then put it in this public folder you download it from there this happens in reality because the rules that govern access are so tight that they prevent data operations and I'm concerned that this may or may not be the case with UIDI so that's one problem the second thing is the public API is only a yes or no answer and that's an extremely powerful tool because it lets me now correlate private databases I have a private database which has a UIDI number somebody else has one so merge our databases because the UIDI number is guaranteed to be accurate and that is a problem that's not your problem it's a privacy problem which there should be privacy law for it to prevent or to regulate this kind of database mergers I actually completely agree with you on the second point where you talk about the need for overarching privacy law and it's a requirement UIDI or not and in some sense in the UIDI becomes a sort of lightning rod for all criticism related to the overarching privacy law so we are a little sensitive about this but the fact is that yes we do need a privacy law which protects data for residents in general but on the other side a lot of our systems are actually not hooked up and that leads to a lot of inconvenience to residents and so we end up having to play this balancing act where you don't want to prohibit people from merging to databases because that actually can be used to find for example fraud in the PDS when you talk about the fact that the number of ration cards in the state is more than the number of residents now someone is going to have to go look through databases to figure out who's real and who's not and so somewhere in there is a balancing act which has to be worked out and it's not just about private databases so and also there's a lot of issues about maturity in the society in regards to privacy and today you could go to a street corner and stand and fill out hand out forms without telling who you are and people will fill in these information all kinds of information so we have to balance the two extremes where people are completely free with the information and where we are concerned about people being cracked and so on so somewhere in there is a balancing act UIDA is a player in it but we don't control it a non privacy question to move away from that so one important part of UID data is address and since there is no standard way of actually filling out addresses anywhere in India how do you manage that so one thing which is standardized is PINCO but is there a list like so it has UIDA I actually come out with a way of actually doing identification based on PINCO how does that work and would that be available for public since you've already taken the trouble okay so there are two databases which exist in the country today one of them is with the Registrar General of India they actually have an entire administrative hierarchy of India in a codified form so you get you have ports for the states districts, sub districts and down up to the villages and cities nothing below the level of city would show up in that system and the second database is the postal data so they have this PINCO hierarchy which doesn't exactly merge with the they have postal circles and not states and so state could actually be divided across both or PINCO can go across the state border it just happens by the way so and they don't clearly map to each other but we have worked with the postal department to try and get this mapping and in fact the client enforces that so you type in your postal your PINCO and it only shows you the villages, towns and cities which that maps to that at least so at all codified levels we have a separate field which ensures that this data stays consistent anything under a city in the for example the depending on where you live it could be a ward it could be a neighborhood or in Bangalore you have layouts those are not codified and those are free form entry right now that's basically the best you could do at least to the public we I think we will try and get that out because this is really public data it's just that it comes from two sources neither of which is UIDI one of them is the RGI and one of them is the postal department and you have worked with them to try and get together but I don't see why we couldn't release it the developer portal of the UIDI is it is it open to only registered users I think it's a you have to register at the portal but I don't know so what does it take to get registered access to the API right now actually the developer portal you can get the API itself there is a developer access to the API which is not the same as the public so it's not the UIDI number is less and then you have to go through the process and become an authorized user agency for that if I were to say develop APIs for accessing the UIDI database for various languages and frameworks now at what level of access do I need to go build this obviously I need to test it with live data to know that my library works but I'm not actually interested in the data I'm only interested in making sure that this intermediate library works so what do I need to do just go ahead register let us know what you need we will work with you to help you out there is some data but he wants languages in some ways which if it's not there in the API there seems to be no concept of the person who's being authenticated approving who can authenticate that is true is that a thought or debated is that a valid idea because that solves a lot of the privacy I'm not going to say that but it's an interesting thought because finally end of the day you're going to there are two issues with that let's just think of it from a scale perspective it means that when you come in and specify your authentication preferences I have to have a sort of a photo which is you know for every person in the country which is becomes a very large scale exercise in itself and second a lot of times you don't necessarily know who's authenticated you go swipe a credit card somewhere at the shop and it shows up on a different name because there's a lot of federation etc going on or do you want to go by a class of providers like you only want banks to authenticate you but not telecom companies there may be ways to look at it but it's an interesting idea I'm not sure of its practicality but the technology is going to improve so it's an idea worth looking at Is there some applications that are already enabled and using so far we have only one field application which is actually for financial inclusion so you might have read about these narrator payments being made on the basis of duality so we have enabled business correspondents who can work with certain banks who withdraw funds from those bank accounts using duality applications which states are doing the other one that's happening is in Mysore, Indian oil is doing a two questions so the factor of the data comes in is very interesting I would like to know what kind of cases have come out and the second was the data what kind of servers are they stored on in the data company servers were there are tools governing how servers can be used so if the data is non-modifiable for 10 years so do you have those kind of servers so the data council specifically is about approving which data sets can be made public so really there's no cases per se that would come up before that the second is that basically once we decide that this data is public it will eventually be stored in a place where you can access it to the board so really the actual storage and servers are different but it's the rules that govern the data set that are more important as to how long the data lives and I don't know right now it's been at a high level in terms of making sure we spoke about a lot of the reporting and what comes out and is accessed by the ecosystem this comes in and makes sure that the data at one level is sufficiently minimized and the basic rules are made and also the final what of these metrics is published to the ecosystem from a measurement and operation that's the front end, the back end part is what I was asking about, it still was very storing and you haven't hit that lock right now you haven't hit that lock right now one last question I don't have a question I have to add to what he said so in case anybody has a question can ask okay one last question if a citizen has some problem in the way you are codified because like in a credit card company sometimes a mistake is made all the head it belongs to you now suppose something goes wrong how can an Indian citizen rectify what is the process that is going to be used so we have an update process which is not yet rolled out to the field which will be rolled out within the next refresh cycle in a couple of months during that cycle we will roll out a process by which any resident or citizen president can come and update the data so that would take care of that