 My name is Pramod Verma and I was the chief architect for Aadhaar. I have with me Raku who has been a principal architect at when we do the project at Aadhaar. We are going to talk about more of the technology aspects today it is a very long topic and it is one of the largest projects in the world and largest identity system going to be largest identity system in the world. And we cannot talk about all the benefits and everything else today so we will try to stick with the technology part because that is what you guys are here for and we can share with you how we built up the system from scratch to go to 200 million people we have 200 million people in the database now and it is growing fast do you want to just go down. So I am going to very quickly talk about I think there is a it is important that I give you a context of where we are so we are country with 1.2 billion people and about 6 lakhs odd villages and very important some of the numbers 60 percent live under 2 dollars and only 3 percentage pays tax so all of you are part of the privileged 3 percentage possibly and we have less than 80 20 percentage would do anything with banking so it is kind of a complete disparity you can see that and interestingly we have 800 million mobile connections so that is the only thing which I think we really took off like mad and everybody figured out so I think we underestimated people's ability to use technology that is a good thing so I mean people just use it once they see value in it. And we have another interesting thing you see is about 200 300 million people moving around in the country where lot of immigrants including lot of us who are not necessarily from Bangalore living here and working and actually means that we are to constantly establish our identity. So actually our government spends about 300,000 crores that is for our 20 40 billion dollars every year giving direct subsidies but all the subsidies require identity subsidies like salary you know NRE, GA wages, Jananiswara Chayoshina for women all kinds of pension for old age artists lot of money going down and it is like paying a set of people with not necessarily having unique identity so massive money and actually we have a lot of so many programs so the poor only so much percentage about 30 40 percentage leaks by ghost identities and you know unknown identity and non-existing identities and actually duplicate and it is very common thing. And so the vision was to create a common national identity so we do not have a common national identity across for everyone so we have fan card but I talked about we only have 10 percentage people in the country who have fan card only 3 percentage based tax so it is a you know we do not have an online thing election identity is the most common thing but it is not really used as an identity card of sort. And the idea was to create a biometric back because the purpose was to solve the uniqueness so that is the challenge so how do you uniquely distinguish a person from the other person and names address don't really help and also sorry I think the strategy was to create an identity platform it is very critical that the program does not take on all the features of identity and the applications so the applications are left to everyone of you right several of you writing applications in the cloud people may be writing pensions or school scholarship programs and attendance systems everything or a banking system financial systems payment systems where it requires an identity establishing an identity and verifying identity all the time we do that all the time by giving copy of fan cards and everything else right. So that is what we wanted to create it with an open API so the platform is an open platform with an API that is accessible to many applications so if you guys start working we still don't have the critical mass we only have to 100 million people but once you get critical mass you will start seeing some of your application online may require what's your other number I will verify you online you know immediately and that becomes very valuable. So the other system is actually broken into two primary modules the application itself one is the enrollment which is a one time in person's life coming to other system of any of the touch points in the ground giving demographics data we have actually very minimal data that was one of the reasons for obvious privacy and everything it was just purpose was only to create an identity system and so we limited ourselves to name, date of birth, gender and address that's it and optionally we have mobile number and email mobile number is very common we do see people giving mobile number email we don't see at all and very very very few people and biometrics so all the ten fingers and iris so we take both multi model biometrics and what we do is actually to take that application request from you to give an Aadhaar number it's a digital data encrypted data and we have to run through what's called a deduplication that's really the largest challenge taking your identity request and running through the entire system to ensure that you are indeed unique and actually give you a new Aadhaar number if you are unique and if you are not we politely tell you that you are already in the system and so you probably have a different number and don't stand in line several times you know it's not it doesn't make sense and authentication on the other hand is something which you use it all the time so once you get Aadhaar number when you walk into a bank somebody walking into today Aadhaar number is actually accepted as the bank opening KYC requirement so know your customer requirement and you can get a SIM card and so on so people find it very valuable to people who have no identity go with Aadhaar and say I am who I am I am Pramod and how does the system verify my identity claim that's what's authentication all about so authentication is a very very lightweight open API which can be used by applications you guys write to verify someone's identity claim it can be on the internet it can be on a device can have five fingerprint it can be OTP based so we allow multiple things and yeah go ahead and from when we actually let me go down to technology very quickly 40 minutes is very little to talk about system but when we started doing this together it's the largest project and we wanted to ensure that we as architecture team we put together a set of principles design principles by which we will design the whole system and some of them are designed for scale obvious reasons we have 1.2 billion people and you know in the in February when we were doing deduplication we had 18 crore that's 180 million people in the database so we were doing about 180 trillion biometric matches every day so we had to do a lot of compute and that's a pretty large compute and it goes to larger database the more compute you are going to do because to find out your unique or not of course there are filters smart filters you apply to reduce the matching set to a much smaller number to so the compute is still very large and authentication on the other hand if even 10% of the countries authenticate every day we are talking 100 million people 100 million authentications every day so 100 million API calls and maybe 200 million 500 million API calls so scale was very very important and the next one the next architecture principles it was open architecture I think government project I think it was very critical for us to bring in openness for multiple reasons one was openness because of the it's a very large program we wanted to be vendor neutral and we didn't want to use one of the proprietary technologies and so on so we stuck with open source so these principles were like written down like you know Bible and we everything we chose we talked we look back and say does it adhere to this principle one was open standards to ensure interoperability so we actually used almost all standards or if you don't find we created so we have bunch of standards for biometric data capture biometric device interoperability which in the country world nobody had done it so far because most of the time people used to just choose a vendor and say okay I'm going to use your technology and go with it so actually we had to define this API publish it on our website it's all on our website where multiple developers and multiple vendors can actually you know comply to that API so that we have a choice to make from we are not stuck with one technology vendor sort of thing and use of open source open source is a very very powerful thing and that's I think is reiterated everywhere we are talking about all the talks so it is very critical that open source gives us a power of the flexibility openness and actually a lot of from the security which is the next topic the open source also helps security a lot especially when it's a government program it's very very important to know that what's going in to the system is absolutely open and everybody you know millions of developers like you are actually looking into it and ensuring that's an open system and there's no malware is and you know hidden codes and anything else it was very very critical and also the data of security what's most important for us and it's biometric and personal data so you know entire system from capture from the time of capture on the field in the villages in memory we encrypting in PKI and the entire system throughout the disk it's all encrypted data so we have some limitation in compute because we couldn't just crawl through and do text searching and things like that because it was all heavily encrypted data but hey we had to balance you know what's what's security versus what's better for architecture and data privacy so if you move on so and this on the design for scale I want to talk about today because a big day you know this particular conference about scale and one of the architecture principles we said is we have to use scale out so we are not going to we are not going to get ourselves into the scale up architecture every any place that actually means that I'm not going to buy a component or a single database or anything then I expect the database to solve my computing problem that's a bad idea and 20 years plus I've been doing architecture it's just not the right approach to take at all you should say computing has to be outside the data store so we had to make sure every pieces of the puzzle was a scale out so we could add more nodes we could add more servers and we could scale not by buying a big box you know the problem is all the annual maintenance contracts everything else is once you buy into that you you can't get off so it's very very critical that you use what's called commodity hardware so we were we completely run on commodity hardware simple Linux boxes 64 bit blade servers we buy the cheap blade servers plug it in and the architecture actually scales so we can actually scale up very very very linearly and including the data store I think the fourth point is very critical lot of application I think Ashok just talked about that you know you always see this bottleneck finally at the database side it was very critical for us to ensure that databases by day one onwards design in the application to take care of sharding and linear scalability horizontal scale and the next and no single point of bottleneck it is very critical as an architect to look for and saying if I break any of the system components will it stop working will it stop working how do I make sure that every component is redundancy data redundancy compute redundancy two different things lot of people actually completely mix up with compute and data data stores are dummy data source you should use whether rdbms a Hadoop any data storage space whatever you're using you should be using it as a dump store you should actually take compute out that actually makes it easy because that way you are you are not expecting database to solve it through SQL all your business problems you're expecting you're building it from your you know principle point of your building it from scratch outside and things fail as is very very important that we will reiterate hundred more times everything fails everything fails from software in when it goes production from blade servers we have a few thousand blades of them for twenty thousand course plus just burns so servers burn storage die so you have to expect failure so how do you architect components that is built to fail built not built to fail built to handle failure okay so built to handle failure from a outside so that is something which an application has to handle you cannot expect somehow your vendor or a product to give you failure handling it's a bad idea to think of that and asynchronous processing just separate as much as possible through multiple steps of repeatable retryable pieces of the puzzle and item potency I think Ashok taught talked about item potency is very key all our components are built as item for that means if it fails no problem we can start from where we live we can retry and you do compensating transactions and so on so you compensate if something is gone wrong that's I handled in the application level and just to give you a volume before I hand over to Raku the volumes we are talking about we are target goal was 600 800 million UIDs or other numbers in four years we are pretty smoothly cruising towards that we wanted to hit 1 million a day and within a year so we hit 1 million so from October last year we have been actually doing 1 million enrollments a day into the database so every day we get 1 million people to process and give our number and as the database grow goes up the compute keeps going up so we had to continue to scale without architectural rewrite that was very critical and in February March 10 we were doing about 200 trillion biometric matches every day and interestingly each of the enrollment packet we get it's five megabytes because of biometric and encrypted so it's actually five megabytes with lots of metadata so the size was very very tricky for us even on the villages and you're collecting this data in villages and you had to transport that much of data and it was not an online transfer you can't put a data card and expect you know hundreds of people's on 5 MB every day to come into our data center just not the connectivity of that large scale so we had to do all kinds of offline data collection mechanism but still ensuring security so the data is heavily encrypted signed and so on but it takes time to get to us and we get about 1 million a day means you know 5 terabytes that's what incremental data we get and every day we process about 30 terabyte of IO in the data center today that's what we do and of course lifecycle updates you know address changes as you grow and that continues forever and new births probably will continue forever I think knowing I think we might peak our population but I think new births we have about 70,000 births a day or something like that okay and additional process data we have lots of process data we have we are architecture we were extremely high on metadata so we collected so much of metadata about the system from the enrollment so we knew which operator is doing what which operator spent how much time on what part of the screen on every time on villages by people by machines we knew how much so all that process data and it's lovely for people who love data it's like awesome because it's just there's no personal data it is just process data knowing how many people tend to correct their name several times and how many people spend lot of time in the demographic field versus and the biometric capture everything is captured and we could measure them and we give continuous feedback through our portal to our operators so every operator knows how they are doing in comparison with other operators and it was very and actually helps the moment you tell the people tell people that how you're doing with other people natural I think people tend to improve themselves by knowing and go on and authentication on the other hand is a very tricky thing very different problems okay enrollment is a large file 5 MB encrypted data comparing with everyone to find the duplicate whereas and that's a throughput problem because people are not waiting by the way we don't give real-time other number so we collect the data several days later you get kind of other right sometimes your delays and all that thing so it's a batch kind of a throughput problem authentication is a yes or no live call so that is the online service we have offered to authentication user agencies such as banks and they can subscribe to that service and when the resident comes and say I am promote and I put my fingerprint it's the same kind of a yeah that residents claim of identity claim is right you can never ask what's from what's address and there's only a yes or no so we've promote claims am I I live in Bangalore you can verify the bank can verify so one way verification and that's a subsequent problem you know we have to respond in subsequent and we are currently sized for 100 million authentications a day and all of them require guaranteed audit so that was a little tricky for us that the system had to do a right commit guaranteed rights of that volume and because the non-repudiality you can't go back inside and authenticate so you be needed digital signature our API is out there you can actually look at it we have at least few thousand developers developing applications based on these APIs we have a Google group to discuss and all that thing and it's very tricky to get guaranteed audit so we had to and it is multi-DC architecture that actually means that if a meteor strikes on one of the data center you still had to guarantee the data is in the other data center so you had to do a near data center data preservation of audit and so on so we'll talk about part of that how we implemented and size is very small authentication requires unlike enrollment it's only 4k including the digital signature so very small very lightweight stateless very critical that we did say it was because we could load balance across data centers any data centers and but you know the because of the volume the right number of audit records just grows crazy and so we needed to mechanism to actually ensure that our data structures are allowed allowing that and we also have analytics the entire system has and going all the metadata I talked about goes into our analytics module on hive we'll talk about some of that and that Hadoop system which is actually doing analytics has no personal data it's completely anonymized data going into analytics but we get you know summary data we you know how many people in that age group actually are doing authentication and so on so we understand the summary level measures and these are the open APIs we had done core authentication API best finger detection API one time pin OTP so if you're on the internet you don't have to use biometric you can use one time pin if you have a mobile in the system and biometric device API's and biometric SDK API's and so on so these are all APIs which are published on our website and lot of developers working on it I'll let Raku go talk to you about some other implementation and we'll come back and conclude all right thanks yeah you want to okay so I'll try to translate some of what we discussed as principles to running code and and in that I would try to make it relevant so that each of you can take back what learnings we had and hopefully assign or try to apply some of these to programming patterns and and challenges that you face so on there if you see from principles right so on the enrollment and the authentication side we if you translate that to kind of workloads the enrollment is a kind of asynchronous very batch oriented workload where no user is waiting on it on the other hand the authentication is a kind of a workload where someone is is waiting for it so it's a very synchronous workload now you take these workloads and then we see how to translate some of the principles that we established into patterns and map it to technologies that we can use right so if you take two on the principle side so we said we wanted to have very lightweight application development so which is plain old Java objects so we wanted to write each of our components as plain classes and be able to manage and run them lightweight custom application container now the workloads that we had and the kind of system that we are trying to build did not fit any of the conventional application enterprise applications and contrary to what many of the vendors were trying to sell these big app servers we decided to build our application containers to be very lightweight and and custom build them so with respect to choice of technology so we said hey we don't need even j2e so we pick Java but we said we don't need j2e so we would build our servers using the Java to standard edition and to manage the runtime we said we would use spring so when we also needed HTTP gateway so if you take the authentication API the last mail is on HTTP so we wanted an HTTP gateway and we had multiple choices so we picked one which was Tomcat compute patterns again here we decided we had to move away from the conventional you know the design patterns that are there of saying that typically you take data and move it into the application layer instead we saw it would be relevant to move compute to the data layer so data locality patterns and also to be able to distribute compute across a number of machines and here make a choice or give the ability to say is it cheaper to move processing or it's cheaper to move data okay so in order to distribute compute across processes and within a process we used okay so we used compute architectures like CEDA CEDA which stands for stage driven driven architecture and here we used a combination of mule and and rabbit mq okay where rabbit mq gives us the the messaging layer and mule gives us a programming model which allows us to sequence a number of steps across machines or to run multiple of these steps in in parallel with the same JVM process okay master worker now while Hadoop gives us if you take raw storage gives us ability to do master worker and you know the job tracker and task tracker setup it also suffers from high latencies right because it's it's constantly writing data intermediate data to the disk shuffling data and then used between the map and reduced phases but for workloads like the authentication workload where our total response time had to be in the order of milliseconds we could not afford this latency so we wanted another master worker pattern which was not very disk and IO bound and for which we used grid gain so data access types so we had different kinds of data access patterns or data access needs if you have to call it the first was high throughput streaming rate right which so when you're doing the deduplication process so four terabytes of data coming into the system being read about four to five times in a day so it's about 25 or 30 terabytes of IO per day so that's very high throughput streaming which is used in biodegroup and in the analytics side of a system so where we used HDFS and for the analytics we use high the other is high volume moderate latency which is like the workflow so if you take the enrollment it goes through a number of stages and we kind of checkpoint intermediate stage and then the data post other generation goes into a store a data store which is asset compliant reliable durable you know gives you the transactionality so we use my sequel for that and cases where we use for authentication we kind of use a cache database that's built on H base the next category is very high volume very low latency these are needs like authentication demographic dedupe where you you have the ability to as an additional step in the process dedupe a resident based on key demographic information provide search capability and also KYC kind of an API and that's built on top of MongoDB as a data store and and solar for storing secondary indexes I'll just pause the minute here just to give you a dump of so that you can digest the fact the various technologies and what they were mapped to and what purpose they were actually chosen for so if you have any questions here we can take before I move further okay the we did a benchmark of 10 million authentication request in 10 hours so a million an hour so this like I said yeah so the so when I said that was a benchmark so the live coming in is a few thousands because that focus right if you see the the lot of focus on the enrollment side which is 200 million so authentication systems are picking up okay okay can you repeat the question can just repeat the question sorry just repeat his question yeah so the question he asked was where is personal information stored correct right so the master data like promotes it is encrypted and it is stored on the file system okay extracts the demographic extracts of it and the UID record is stored in the MySQL database so if you say the UID data post other generated record that's stored in MySQL okay so one of the challenges a side of compute that we face was about various data stores and the kind of data consistency challenges that that we have right so if you take the enrollment packet so which are it's about 5 MB in size for the during the time that's being processed there is a requirement for very high throughput read throughput and streaming right and subsequent to it we need to archive this data for seven years and in systems like Hadoop the the challenges you also dedicate a lot of compute for associated with storage so when we sized it for in order to store a petabyte of data and archive we would probably need about 200 servers and we couldn't afford to deploy so much of compute for data which is really going to be read so we had to create an archive store and the archive store is on NFS giving moderate read throughput pretty high latency read right the next for in process or hot data the data is stored in in HDFS so that gives us high read throughput also but at a cost of latency but on top of HDFS again for selective data like biometric templates we store it in HDFS so which can give us high read throughput but at acceptable latency and then the MySQL data store our own experience says that it's it's not a it's not a debate between NoSQL versus SQL or you know relational versus unstructured we have had multiple instances of data loss possible from the NoSQL stores even though they they claim to have multiple copies so if there was one store that we could trust the data and there is two pieces of data that we cannot afford to lose one is the raw packet which we stored in the NFS and the other is the other record after it has been generated so this other record along with a number of other information like address and things like that is stored in MySQL I mean any RDBMS would have been fine but use MySQL so relatively low latency for indexed reads very importantly giving us asset properties for storing the data yeah Mongo we use it for search and other characteristics over again lower latency and solar again for giving us fast access so in all of this if you see while we did create multiple data stores we deal with the data consistency issues right what could be live in your not in an NFS but maybe in MySQL with respect to status of an enrollment may not necessarily be immediately reflected in the in the Mongo layer so that's something to watch out for on the architecture side so like I said two distinct workloads enrollment side using set up messaging giving us ability to scale across a number of nodes we maintain soft state and that's soft state is very relevant for us to enable recovery so it means any node could go down and when it comes back it could go and retry those transactions supported through idempotency authentication side essentially an HTTP gateway lots more simpler from a workload standpoint but giving us I mean where we have to provide for low latency processing so in this what we also did is all the systems are continuously emitting a lot of interesting events and these events could be monitoring events could be BI events and we take these events and deliver them to relevant data stores you will see that our analytics data is fed through these events which are emitted by the system and the same events are also used for real-time monitoring of the system which is which takes me to my next slide so in order to operationalize all of this now this is a live screen grab off the live screen grab off the knock or the operations monitoring the knock tool or the network operation center we have this SLA monitoring dashboard which tells us what does it take for the system to continue to do a million others a day and now the system behind this so you're talking about 20,000 cores right but at a business SLA level we want to be doing if you see the right corner that you see like on that day it was doing 5,74,000 other generated till that point in time the kind of throughput that's happening the various stages that are there in the queue and where the various bottlenecks are so the point I want to leave behind us all this big data is nice but in order to operationalize them you really need monitoring systems right and you can't throw people at at solving this issue but at least to see the health of your system you really need these large monitoring systems. So for learning I'll give it to Pramod. So just a quick summary slide pretty much learning is that again as he said it's not a no-sequel sequel I think it's a you know a wrong way to argue about I think there is a purpose for each of them you have to choose the technology for the purpose of it and you have to understand what the value is coming from and the maturity cycles are very maturity curves where they are are very different so we have had some bad experiences losing data and all that thing but it's very critical that you choose data stores data stores especially that are knowing what the flaws are and thinking work around it no issue at all actually you can work around the flaws of HBAs or Hadoop and actually do it in the application layer and the recovery and any retries and so on you get the great scale no problem at all but some data you can't so some of that learning is very critical and we want to just walk through a few things we could pick a lot more maybe one of them is make API based I think it's obvious to a lot of developers but when we write it it's very tricky or we don't know what we are writing so I write everything as confidence because some things we rewrite we probably rewrote four or five times I think we throw away and write so problem was the regression right you had to make sure that everything is small, compromised, write an API, write test suite for the API knowingly that you rewrite multiple times okay and everything fails I mean this is an experience and even Hadoop doesn't matter what you choose everything fails MySQL come down, blade servers burn, storage all just scraps out all over the place and it's when you large compute 2000s of machines and we have 2.5 petabytes of data already it's just bound to happen and the question you're asked as an application developer is that do you depend on some technology to somehow recover your data and probably not it's a bad idea so you have to plan for that advance so architecturally you have to think too this must recover retries and so on to self-heal and start thinking building that way and security and privacy it's not an afterthought especially for us I think this project called for very heavily and a lot of you guys probably building cloud services with resident data and I think you will reach a situation where you know the government rules and the compliance and all that thing will come in and you have to think about it from day one okay and scalability does not come from one product I scalability does not come for either I think you have to think about it every component you write you have to think about how you're going to write how you're going to scale and the good thing is that a lot of technologies in the open source available today to make your life a lot easier so I think that that is much better 10 15 years ago but you have to still think about how you're going to architect open scalar I think if you think if you assume homogeneity and if you assume that everything is come from one product or one vendor and then you have a problem so you have to think that everything is going to heterogeneous you're going to have a bunch of all kinds of technologies in it then question is how do you interoperate them interoperable make it interoperable and knowingly and using commodity computing so if you're on the cloud you already on computing the commodity computing and for us we were not on the public cloud we are on a private cloud on our own data center so we had to choose something which is assuming that we get best product from the market at any time so that I see means if you don't assume any vendors any product at all we buy cheapest available at that time okay I think that's pretty much it so thank you so much I don't know whether we have time for yeah questions especially if you're not hungry it gives it's lunchtime yeah hi so you mentioned that the amount of data that's being stored is about 5 MB and that the API lookups are only 4k so if I'm doing like a retina scan how do you get that data and oh yeah that's that's how biometrics work what happens is the biometric images which which are captured for your finger so the whole ten five finger five finger four finger four finger thumbs those are images they are high quality very heavy images the biometric system converts them into a unique kind of a signature it's called a minutia so you convert into a minutia and then you store the minutia the minutia is a much smaller so that goes in MySQL no actually no so minutia we extract see the minutia the actual data is encrypted and stored minutia is extractable anytime so what we do is it's used only for authentication when you do biometric matches so it goes in during deduplication it goes into deduplication engines database but authentication is on H base so we have and authentication always based on your Aadha number so our authentication API Aadha number is mandatory field that's like a one is to one match so you go and say I my Aadha number is this and I am promote so I can put my fingerprint and verify my claim I had one other question so what did you guys do in terms of like how did you think through stuff like identity theft and you know because I knowing how the security stuff works you know it's obviously important so I was wondering what you guys did there are obviously I'm not sure whether we can chat during the lunch too if you want but there are a lot of aspects about the biometric itself the fuzzy matching and how biometric multimodal biometric helps towards that as well as a constant fraud detection engine so you have to con we run a fraud detection behind the scene using the if you have an inline fraud detection as well as offline so we actually run that constantly to figure out but interestingly we don't actually in our country maybe you know so far different pattern of people I think people stand we see trivial duplicates most of the time because you start the counter everybody stands in line and then you start another enrollment center every same guy stand in line again because they think there's some government stuff going on I don't want to miss out every one of my friends standing around I was standing so we see what's called a trivial duplicates not necessarily trying to beat the system okay but that might come in the future though as it evolves it might come so but it's an evolving things you're constantly catch by that yeah my question is related to the enrollment first thing you talked about the biometric data so here I wanted to know the percentage of failure occurrence of failure and second thing exactly how are you defining this failure as you just said you're taking the respective thumb prints and ice cans and say out of this even if one of the images because I do have read about this in the newspaper where they had said at certain level some of the images were not printed correctly or that was not taken so the entire account would have been considered as a failure or how was it okay so I think I definitely don't have time to talk about a lot of biometrics but the good thing to answer your question we have a white paper published on our website uid.gov.in is called use of biometrics in our system very detailed it tells you exactly how we're using it what are the accuracy numbers we are getting it and how are you doing what's called a failure to capture which is what you talked about so we have a failure to capture how are we handling failure to capture the multimodality helps towards the iris and fingerprints so one of them we can de-duplicate if other one is not there we are people with no fingers for example right we have to deal with all that thing and the idea is to not have enrollment no we have to give the goal of the system is inclusion so we have to give identity to everyone even if it means exception processing so do read through that white paper there's a lot more biometric details I think before yeah yeah so I wanted to know a little more around the deployment monitoring system yeah yeah so because we are doing recommendation engine for early job seekers in India so if you just imagine the number of people who graduate every year and seeking job opportunities right out of college so that's the scale which we are looking at and recently we have started going exponentially and I realized that deployment monitoring is one crucial factor to determine the health of your system so I'd like to know more around the deployment monitoring infrastructure you guys are using requirement monitoring deployment deployment yeah that's a very good question yeah I think he talked about that assuming that if you start if you ask an application developer if you assume your operators who are 24 by 7 monitoring your system are intelligent enough to deal with your application you're wrong actually we just can't work so the system has to push it out or out very very very quickly and that what's going on and try to even not even expect the people to fix that issue I think it's a lot of times we have done every one of our component for example if it fails completely retries by itself so there is no need for anyone to really lose sleep over unless it repeatedly failing okay so so what we did was on the network monitoring side we use the screen you saw is actually built on HTML5 very simple browser based what we do is from the application nodes we have what's called an in-memory event collector so we wrote ourselves a code we didn't use any actually technology actually we had a pattern in fact we'll be publishing some of that very soon so what we did is collect all the events in memory and we knowingly that it's okay to lose few events and the in memory collector kind of a single term pattern for every node it published onto a dashboard server so we had a REST API which we built a dashboard server so on the side this was actually collecting the data and the monitoring this they refreshed off that REST API server we do we use Nagios we use Nagios for infrastructure monitoring so Nagios we use for infrastructure monitoring but if you look at this this a business guy our CEO or our chairman Nandini Lekani can walk in and say exactly what's going on no technology involved in this in the sense that the way you articulated this is actually completely a business person's view so we have reviews right on in front of the thing that how come we have a so low rate every terminology used here in the screen for example a completely non technology terminology the business tech terminology how many enrollments how many held how many didn't held and all that thing and each of those boxes can be you know opened up so this had to be custom built we use Nagios for all the standard infra like a network components my sequel of course we use that we use what's your availability target for the system what's your availability target for the system availability target for authentication is actually four nines and probably going towards five nines but we are around maybe 98 percentage of time on authentication enrollment is actually because it's a batch oriented system we have little more of a luxury there so you know the system can come up go down but as long as we get a throughput of one million a day we don't have any backlog that's all we care so I think that that's probably around 96 97 percentage so okay the mic over the next couple of years you can expect that we'll have better technologies many things how how will those get plugged in or how will the how will the system evolve so I think that actually goes back to my earlier saying that first of all we have tried to build everything as a component component very very very simple a mostly API based even if you didn't have an API we wrote a persistence layer for example you give you know you can catch up with him during last we wrote a persistence layer which actually completely transparently allowed us to shift from my sequel to mongo without any code change so which allow so we think at least all the experience we have we believe that we have done it in a very small loosely coupled in a component fashion so that we can rewrite components of the thing and evolve it's very tricky very tricky to leave it open because we had to solve a business problem today so we had a kind of completeness is required but loose coupling and API probably the you know two things you can keep in mind I think if a person dies how long the data will be retained or how the process of the current policies to retain forever so the data after 100 years it will be huge right yeah we don't know 100 years later what the policy is going to be but for now it's only small data 1 billion is not big number I think you guys are all talking big data so 1 billion doesn't nobody seem to be blinking these days so we'll keep it supposedly a world's population peaks at 11 billion or something so it's not so bad all right yeah but we'll probably figure out 50 years later it'd be good to watch I think last one no actually see it's very important our purpose was very very very very minimalistic it was extremely important to keep it damn simple to get a system right and that actually means that we don't want to know your relationship whether you married how many wives you have or we don't care about any of that right we care about you as an individual identity you are authenticated with applications on the other hand in the system like a banking application they care your health care applications you know those applications will care identity system we don't have to know for now but we do have a thing where to somebody can report dead death now but the death reporting in this country is not clean at all so I think it's allowed to you all for now we just mark if it's dead reported and it gets authenticated we just know now actually we actually we don't think so we don't think you know cutting down the number of database records gonna make any difference because architecture is such a way that you know very little difference between a key lookup so we are doing a completely key lookup right so very little difference so I we are not losing sleep over it let me put it that way yeah on one question all right what are the types of private private usage of this information obviously government agencies will be using it yeah what are that private no actually we have a so I think I really encourage you to visit you idea or go out and we have some nice white paper white paper about articulating the usage for example career site he talked about or in an HR systems at the end of the day it's identity and our number should be considered like a root identity so when you're born you know but you get an identity and every other identity are what's called a derived identity so the your pan card system provides you an identity for a domain or a driver's license provide you a derived identity today we have nothing which is unifying them and hence the problem with complete duplication fake everything else right so other is not necessarily just just an identity use it use for as a root identity that other identities can work well so you can actually think of any system today you walk in school admission in a school programs pension programs anything requiring constantly people have to claim you have to prove who you are and that that's a constant problem and with immigrant labor the 300 million people who are moving around people who they don't know they never walk alone in the road and all over because it's very tricky for them they can't establish their identity so they always move in they get caught by police and all that you know it's not so identity is a very simple who you are that's it and that's what we do actually by the way this system knows zero knowledge of transactions you conduct in life so the zero knowledge transaction all we do is identity and you can verify online very simple thank you Pramod thank you Raghunath