 Hello all. In this part, we will look at what in ZOTAP we did to comply for the GDPR regulations, which came in force and bulk of these developments started in 2017 and I will touch upon particular aspects which have been developed post GDPR as well. But bulk of or say even 70 to 80% of what I am going to cover here is going to be what we developed before that May 25 2018 deadline to call ourselves as compliant with the GDPR regulations. So many companies at that time used to use a bunch of excel sheets with a catalog of items and figure out what are the security controls, what are the privacy controls in place. But we decided to invest in a more product centric approach wherein we took all the GDPR regulatory requirements as a sort of how it will look like as a product requirement and started investing on that and built I would say a reusable tech stack of product which we could reuse for multiple other data pipelines or a data driven products down the line as well. And I will just go over that journey in this particular part of the stock. So coming to the requirements of GDPR if I have to distill the requirements from a more product centricity if I have to look at all the GDPR process and distill it. So largely this is what it falls into. So you need to have a module which can help in managing sensitive data. How do you manage a secondary but fundamentally you need to identify what's the sensitive data and how exactly you want to manage it. Management could be as simple as okay I am going to blacklist all sensitive data if it's a healthcare data of a person I am going to not even take into my system. So those are rules we'll talk about the rules and actions but that sensitive data management is a first and foremost thing. Then we covered a lot about consent in the previous part. So the consent management becomes I would say very very fundamental in terms of compliance for GDPR. Then how do you manage the personally identifiable information. So it could be something as direct as your social security number or your email and phone number or something which could be I would say implicitly referred based on a bunch of other criteria. So it could be your IP address and your device identifier and your mobile identifier together which acts as a PIA for a particular personnel. The next is the user information. So you need a specific entity to manage the whole user information. So suppose the user is of course stitched to identifier and GDPR talks about a bunch of user rights in terms of user asking himself to be deleted from the system or asking us what is the data you have about me give it to me in the next 24 hours. These kind of rights are available and they can download all this data and they can ask it to be ported. So to serve all these rights you need to manage your users the end users information. Then access management is another product identification which we did which is not as I mentioned it is not explicitly talked about in GDPR but we wanted this because if you have to go for the downstream audit and all these things you need to have a clear access management policies of who's accessing the data. Then the auditing requirements then we had identified based on our data as a service business we had identified a couple of additional requirements which could help us in privacy. So for example in privacy now it is much more formalized there is something called reidentifiability and if you use multiple cohorts of various kinds of data you can reidentify that to a particular person. So we wanted to minimize that risk so we set up something like a cohort minimum size on which the data can be pushed out of the system and obviously the data retention is worked out by the TTL. Say for example if you are collecting cookies as identified it has some kind of TTL which you need to adhere to and user himself can give a guidance saying that don't don't retain my data beyond 90 days so which is something we need to comply to and from a security perspective the obvious items which is like that when do you use your one-way hashes when do you use your Shack 22 or MD5 and what are the encryption semantics you need to have in place. So these are all requirements you need to have modules which could potentially do it in the client side or in your server side or while you're pushing it downstream as per the downstream system requirements. Then as I mentioned if you're going to have an existing set of historical data assets if you need to make it compliance ready you always need to invest on something called a one-time cleanup of the data so you would have identified multiple things based on the 7, 8 line line items which I have listed in the past but if you have to clean up your existing data set that itself is a process on its own you need to run your existing data sets to the cleanup process once based on all these requirements. So from the product requirements the conceptual model nicely divide into three particular layers one is you needed a set of policy and whatever I am showcasing in the blue box becomes a first-class citizen in your technical architecture or you will call them as entities or what what do they operate on so you have the rules layer which is actually going to govern the things of what is going to happen then you have the processing layer which can be reused across multiple data pipelines then you have the logical layer which is actually having the data asset in various formats so you have a plain data asset you have the user data asset then you have the audit assets then you have the consent asset so these all become these logical entities become like a first-class citizen then the processing entities the deletion compliance processor TTL processor all become first-class entities then the policies you need to if you if tomorrow say something changes some some law changes you need to accordingly have a system where you could extend or morph your policy to adhere to that so this is how it was conceptually segregated and we will see from a tech perspective how these all play together. So coming to the technology end of things this is going to be a more of a bottom ups approach of presentation where I will talk about a couple of individual where we invested in and what was developed and go to a final you could call it as a reference architecture it's a pretty little old but it still holds good so this was a reference architecture or the architecture which was developed in 2018 but it still holds good so we'll start from the bottom different components and build it to the final combined architecture. Another thing is I'm going to cover the data as a service first because that is where we invested in then in the next part or say in in the following this thing I'll cover how what changes we had to make it to comply for the SaaS business as well the software as a service so fundamental differentiation data as a service works on the third-party data and the SaaS business is customers first-party data. Now first and foremost just to reposition this at 2017 now I know there are very specialized companies which are talking about data catalog and you have forestry waves and Gartner G2 references all these things but in 2017 fortunately there are one or two open source data catalogs not very mature so we ended up building something which which was helping us achieve our use cases internally so on the right hand side I have put the text back on which we have built we built it over playing our dbms and elastic search it was a microservice in a library and it had capabilities around registration non-boarding and updates during processing and we had an extension it evolved to accommodate a couple of metrics like the quality metrics and what are the verification semantics which can be added and the last thing is we have evolved the catalog where it could be reused to a SaaS component where people can bring in their own catalog as well so this is this is kind of the journey and the initial text stack looks like what I have put in the right hand side coming to what why you need a catalog thing is if you are doing anything in compliance you have to invest in some level of data asset inventory control so that is something very very fundamental if you're taking undertaking any compliance journey be it if you want to go for a ISO certification or any other certification or you want to comply your data to the data production value ccp gdpr or anything HIPAA or anything you you need to invest on this so a couple of basic things where it starts is where does your data come from right who gives the data what is the partner which is the region it is coming from what are the categories of data okay data categories could be myriad even if it's a people-centric data it could be something like your demographics or their interest and intent or the apps they are using in their mobile or their browsing history or their healthcare so the sensitivities of each of these data is going to be different so you need to understand what is the data category which is coming into and second thing is pretty obvious if you forget about compliance even if you want to run a data pipeline system in a efficient manner you need to know what the data contains you cannot blindly run any data so you need to know the schema you need to know what is the field types and what is the cardinality cardinality is the number of values a particular field can take say for example if it's a gender it could be a three cardinality item but whereas if it is a zip code or a geolocation it is a very very high cardinality item so that is the thing and you need to have some level of expected values say for example if you if you are tomorrow you're bringing in a policy where saying that I do not want to process any minor data so you need to restrict your expected values which which can act as a validation when the data flows into your system so these are couple of things which we had created and second thing is how do you describe the data couple of data can be raw couple of data can be inferred through heuristics or some machine learning pipeline or some analysis which you have given on top of it you can name it as inferred calculated attributes derived attributes whatever but you need to know which is the I would say the source of that particular data in terms of whether it is in this raw format or it's an inferred format and the next item is when did you get this this this becomes very important in terms of which data was suppose you're running a data exchange you need to know when you have sent the data and this version of the data you're sent and for that the time stamping is very very important so old school data warehousing people will know that it was always a concept called byte temporal time stamping where you had a period of time when the data start valid date and the end validate of any data point so that is very much relevant even in current scenario if you're running some sort of data exchange system within within the company right the version of this thing of course there are again in versioning you have stellar tools like dbt and all these things which is which is just bringing a github style of semantics into the data as well right now where this data point came from this is something very unique to zeotap the reason is why we wanted to invest in lineage in even before the compliance we had a unique problem where we have like hundred plus data partners who are giving us data or who are contributing to a knowledge of a specific user so for example one data partner knows about his intent and another data partner knows only about his demographics and we need to know which data partner contributed to the knowledge because downstream when we use the data we need to give back attribution back to that particular source of knowledge so that is why lineage became important to us second part of the lineage is a conflict resolution so one data partner can say that this particular user is a main based on some derivation strategies they have a direct mapping strategy it could be probabilistic or deterministic another data partner whereas says this is a female now what exactly is the correct answer so that is one conflict resolution semantics and this this this has a whole machine-learned pipeline behind the scenes in terms of how it is operating and how we are attributing this and and it has a sort of priority queue within the system where we give compliance the priority say for example a data partner potentially in my data science pipeline could be a very low quality data partner due to various aspects so it is in some moving spectrum due to various aspects but he is still saying this particular user is age 16 years and whereas all the other high quality data partners are saying he is 21 as per compliance rules I am not I have decided I won't process any minor data for my third party in which case I don't care even if it is a low quality data partner I give the priority to the compliance and I just nuke the data when it comes in right I just block it at the first level it doesn't move beyond into that system so is the thing an attribute level is pretty obvious we need to know at each attribute which who has contributed to this particular piece of attribute so as I mentioned one can give in to another can give interest so this is about the data catalog and lineage system which was built then the evolution this is something say it's it's evolving story when I talk about the SaaS this all will fall into play in terms of why this this evolved into this so this could be taken as a evolution for any data catalog as well so the basically what you start with is in terms of what are my data assets and what is the schema what it contains right second comes lineage lineage may not be a priority for all the companies but slowly I would say the governance rules and everything picking up the lineage is becoming a very important thing and raw or inferred I just talked about whether it's raw or inferred then the third is where it is stored right so the the data can be stored in multiple locations based on its version you could have one copy of the data in your data lake and another copy in a HBS data system and another truncated copy version version 3 of that in a very fast lookup store so you need to know if a data asset is available whether it's split and stored across various places so where all it is stored becomes a very important thing then comes the higher order problems what purpose you are using the data for you might have mapped these purposes up front so there are two aspects to it internally you might be having the purpose in various manners you could say this is for analytics exclusively for management reporting or you could use it for analytics main for management reporting and market my marketing thing and third could be the purpose like this this purpose is mainly for billing my customers so you need to understand each of your data assets what is the purpose of its usage reason purpose becomes important you need to backtrack it when it comes to the consent and the preference management system you need to have some kind of connectivity in terms of okay this is the the the data is currently used in all these purposes probably down the line when the consent flows in I need to cross check these two purposes and then figure out okay can I really use the data or not then then the next is who owns it right who owns it is there could be multiple ownership right so there will be always a creator of the data and who owns the data ultimately the customer is the owner as per any of the government regulation the customer is the real owner of the data so this ownership is within your organization who has access to what data point and this owner has an additional responsibility of assigning correct roles say for example if I am the sales department there's also in line with the data mesh architecture which many organizations are moving to if I am running the sales department I take ownership saying all this sales data is mine and this is the customer data attached to my sales data and I give this access I approve this access for the marketing team or I approve this access for the finance team so this is called the who owns it concept the ownership concept itself it'll take a little more time but largely I think this this should help in understanding of what is the ownership of a data within the organization then the uses is who uses it is a separate this thing now that can be a controlled by the access control systems like the role-based access control systems and the last thing which is very important which is I would say one of the highest level of evolution if you can achieve it even even within ZOTAP this is achieved in bits and parts in terms of the we have various rules helping it out but it is not like a combined holistic system where I could go into a metadata system and I have all the data assets and I am able to take a single window saying that okay this is the roles and these are the rules which is governing the data right so this is what are the rules which govern all the I would say thousands of data assets which could be strewn across your organization so this is a I would say evolutionary approach in terms of how the data catalog is evolving and there are some fantastic companies which are investing in these areas and giving you out of the box tools now to achieve this as well now within ZOTAP what we did we spent time on how we manage the policies so policy is nothing but a rule in terms of say there is a rule which says if a schema is supposed to send me in sha256 and it is not coming in that particular format I say drop that drop that data set right because I just took a sample of 10 items and I'm just dropping that so that is an action what we are doing right now so the policy is tells me what exactly I need to look out and action is what I need to do based on the policy evaluation so this is how we have structured the system and the policy types can operate so when I say policy on which granularity it operates so it could be a schema level granularity which applies for the complete data asset and it could be a value each value level granularity as well and the retention policy is something which which can operate on the again on the data asset as well so this is the thing and actions we had a couple of out of the box actions like drop the data you need to alert it and you need to nullify the data means it's a we didn't have masking for the data as a service at that time in terms of our use cases so we used to just nullify it because we didn't want that data in any any any way and we also added some sort of hierarchy support in the sense if this policy is there also check for this policy and if this action is taken also do the use of this action so this is more like a many I would say architects can cleanly understand why this separation is done and why the hierarchy and how these things are so it it derives from basic design principles of a system design and these things are the right hand things are going to come back again and again rdbms elastic search which was again powering this microservice and libraries of most of our libraries we currently have library on golang and java and scala as well most of the data pipelines on scala back and service on java and golang so that is where we have spent these libraries on and for changing the policies and actions I'll come in the next slide we had created aps the crowd aps for the domain experts this could be your legal people who see a change in the regulations and they want to change something and even the product experts who want to change some things in the tabulation I have just given how this is stored that how the policy management is more or less stored right so we have the it just gives you a relation between the various items here now coming to the compliance catalog what why why you have a policy now what is this compliance catalog nothing so compliance is the final granularity in the policy so any functional programming expert can immediately understand what I'm talking about here so the compliance is a runtime parameter and remember compliance has three dimensions to it it has a vertical dimension it has an organizational dimension it also has your regulatory dimension so whether it is gdpr or ccp gdpr might say that you need to do something within 72 hours whereas data protection bill india might say it is 48 hours so that becomes kind of a runtime parameter or a threshold if you look at gdpr the sensitive list is different and pdp the sensitive or rather pdp let me call data protection bill which is the current name it is a little sensitive different list so the blacklist whitelist changes how do you accommodate your system for this extensibility right that is why we have a separate catalog for this whole compliance system so what I have put in that point three there is this f of policy that's a function of policy and the function of the parameter which is added will give you what is the action to be actuated on this and I just gave an example of the blacklist how it looks like right so for data as a service the ip address becomes a blacklist anything with ethnicity becomes a battery and blacklist anything with religion becomes a battery we don't to process any of this for the third party data access now what do we mean by use so back end and ui as well so this device underscore ip address just giving example we don't process it this may not mean anything for a person who is looking it on top of ui right so he needs to understand it in a better parlance so for him a ip address of the user or something some better terminology might be might be good to explain him what exactly right so that is why I mentioned that this also has an extension towards creating your own catalog which which helped us a lot in terms of our SaaS business as well downstream so this is largely about the compliance catalog we talked about policy we talked about data catalog and the lineage then the policy management the compliance catalog now this is all placed together to give the compliance processing so whatever I talked about here is more or less your governance layer of course governance could have multiple other things in terms of access control as we saw in the evolution of the catalog but largely the governance system under the umbrella of the governance we are storing data catalog the policy store compliance catalog this path is I would say it was a bit unique to us but now given many of the data processing pipelines are going more and more even driven this is again reusable for anything so the path catalog is nothing but a repository of registered paths where the data can come and land the data asset can come and land and which is going to be a trigger for the downstream processing so since it's even driven it also fell into that same governance not since it was even driven since it was linked to the data asset which we are operating on it fell under the governance bucket itself the conflict service you could debate whether it needs to be under governance but for us we just put it under governance because we wanted to I would say also govern what is the configurations we are giving so when I say configurations you might be running a spark processor in our case it's spark processor or tomorrow a fling processor or a basic Kubernetes cluster on which you are operating your data pipeline so what is the conflicts you want to inject into the system so that is what it's mainly the processing configurations which we had here now how exactly the processing pipeline works any data set from the path catalog the trigger happens it comes in this system injects what is the schema level policy now the schema level policy is operating on the whole data asset it processes it takes a particular action then it spews a couple of audit logs based on the action which has taken then it iterates on each of the data points within the data asset where the value level policy is going to be actuated so the processing of the each record with the value level policy again the actions are taken or it is spewed in the end you get a compliant data set so this is a spark pipeline which we have downstream used for some other pipelines as well as similar semantics just the processing layer has been different but here what I'm showcasing is primarily the spark pipeline coming to the next item the deletion or the opt out is something needs a separate treatment because the data deletion as I mentioned it could be the data asset could be in multiple places say it could be in your hot store it could be in your archival store it could be in your the basic data lake so if it's in the data lake you need to ensure that this deletion is much more transactional in nature so that you could call yourself compliant saying if somebody has done an opt out we have done the opt out as well okay so just forget about this right hand side I just added this slide for later but it feel I miss deleting it but we'll just concentrate on the left hand side this is more from the data as a service business angle what was happening so ZOTAP had three different sources of getting the opt out or the consent data one is the global or the ZOTAP so we have a website we have a app store iOS and android app store items via which a customer can go and give an opt out signal second thing is our data partners who are giving data to our systems they can send the opt out list third is the consumers where we send data to the particular system so we send to approximately 70 odd systems at this point in time including the facebook's and google's and all these things so they can come and tell or I don't need to process this particular data so these are the three sources via which the opt out data comes into ZOTAP's systems and as I mentioned the global opt out let me come to the global icon the privacy app we have a privacy app and a website this is a classic backend AP architecture where all these opt out data goes in and mind you with GDPR we have up to 72 hours to comply to go and delete that of all these systems from data partners and consumers it is a similar to ingestion mechanism it could be a cloud exchange we also get give a API and for some data partners we also host a SFTP via which they are able to send this on a say daily frequency which we are getting it so it is something similar to data ingestion the opt out data which comes into play and as I mentioned the handling is very important how we have done here let me explain global means we have got an opt out signal from the ZOTAP website or the privacy app that means I cannot have that user anymore across any of my data assets so I just go and do a blanket nuke of that particular user partner level means I have 100 data partners two data partners are saying this user has opted himself out of my systems if you remember we have already created a lineage system so we know this particular data partners data where all it is percolated so we just go and nuke only those assets which has come from that data partner we do not nuke the entire user the entire user nuke will happen suppose if he is the only person who has given that date so obviously in that case the user will go away but if the signal has come from four data partners only two has opted out the two would be remaining two would be taken off and consumer level operates in the similar value it is just it is coming from the left hand right hand side of the equation the consumer says I cannot consume this user's data anymore that means we filter it out before pushing the data to that consumer so I as I mentioned there could be consumer A B and C C says this user has opted out of my system that means A and B still receive the data but C it is not receiving so this is these all rules driven in terms of how the flow works downstream but these are the three kind of handling and how do we handle it so this is where the whole consent mastered object the preference and consent object mastered object comes into play right so it has the purpose and the IDs and it has all these meta information of what has happened to that particular thing unless it's a global level for a global level there is nothing like a orchestration semantics it's just plain deletion semantics for that where where there is consent object goes and it says ID global that means it whenever the next all the data processing pipelines happen they'll check this all consent mastered IDs and they just nuke it from all my data assets and the TCF is a specific framework which was created by the IAB which is the ad tech consortium and we adhere to that because our primary business was on ad tech I'll skip that right hand side that is for the SaaS thing what was the changes made now coming to the consent data flow uh this is not a sample this is a production of the data as a service so as I mentioned you have the various data partners and you're the global consent first thing is I need to enrich the IDs from a from the DAS business perspective my IDs were primarily the mobile identifiers which is cookies and mobile IDs then you have the email and the phone number hashes so we had to crosslink all these identifiers so that we know say data partner A might be just giving MA IDs and some kind of optical information whereas in ZOTAP this fellow would have come and given his email ID and told opt me out so I need to have these linkages between these various IDs I'm not going to talk about how the whole entire linkage semantics going to work here but this ID enrichment is more like a ID linkage creation which is which is happening within ZOTAP for our it's one of our primary businesses so that happens there and we standardize the format of this whole consent data and we create two things one is a consent object which is mainly used for the deletion as I mentioned so the deletion it is like blanket you just need to go and say that okay globally delete or delete for these particular data partners and this thing and the second thing is if I have to if you remember there is a third scenario where in processing I need to take care of not pushing out certain data to certain consumers so that is where the processing semantics come into play so you have this that is that is more like a tag in terms of okay this consumer these set of IDs can be tagged not tagged or say do send do not send kind of a very simple yes no kind of tag which is which is used by the consent processing live and it goes and all this consent data object is stored and there is a historical archive as well except in terms of a global organ so this is how the data sample works and coming to the other major requirement of GDPR if you remember they the user has the management which is all a right to clauses you will see what is the user's right to erasure user right to ask for the data user right to forget and there are a bunch of things even the data production will these things are coming out so for us from the data service perspective the primary identifiers were the MIDs cookies hash email and hash phone number same architecture as how we are using for the opt out APA collection we use we just expanded that application to give the API layer and the website and mobile apps are available where which any user can go and check for this data and excises user rights because there could be a heavy inflow of check whether my user data is available and stuff like that we use blooms heavily it's just I'm alluding to the tech stack what they're used we are used to blooms a lot and all the identifiers across the roadmap there is a fast lookup DB available for quick access of this particular user action management if you think about it there's more like a transaction thingy where I need to respond to the user as quickly as possible whenever he asks so it is being served by a fast lookup DB at this point in time and the assets and other things of course with the ID asset which is which will be processed with the profile asset and other things will be taken care of the downstream crossing pipelines that is that is the internal flows coming to a very important thing this is something which which immensely helped us in terms of say creating even some audit artifacts right when we were running for the ISO certification and other things what we decided is we will use a common format of audit log and the format should be as simple as say you are able to load it as an external table in any of your bq, ethna or whatever and you are able to run your queries so we developed libraries for distributed systems like Spark as well as for all the microservices which is running that which was having any any sort of this data asset management of any sort they will be using this particular log semantics and all the logs wherever from a distributed pipeline if you go there it goes to the yarn and it goes to say a pub sub or a Kafka and at the end it flows into a s3 so and even from since it's gcs now we are we are not no more in s3 anymore but at that time it was an s3 now it's all in gcs a couple of log grammars which I have put as examples here is like what is the violation type which could there and which is the product we have a bunch of products in Zootab so what is the product code and which flow in the which stage in the data flow did this violation happen what is the action we took and what is the timestamp of the action and when the violation was fine and when the action was taken right so that could be a difference between because everything has a some sort of delay could be available before the the data set could have come to you say today morning but your action had taken only in the afternoon so that means the violation had technically happened as soon as the data came into your system so but you took the action here and then you say the violation was identified at this point and the action was taken at this point and it had a couple of other metadata as well which is all which I don't remember off my head but a bulk of other metadata available along with the rocks and we had defined a proper grammar on which we could as I told runner SQL query on top of it once it was unloaded as external data and when I when we put this all together this is how my data as a service pipeline looks like from a compliance perspective so if you look at the central is the compliance services and what I have put on the right hand side are the various different products which is running their own data pipelines so we have a targeting product for that take connect product again for our tech inside for audience planning so these all these consume this compliance services and same on the left hand side is all my ingestion pipelines where data is flowing into my system towards my storages and the right hand side those all data driven applications which is which is written on top of this you could you could just correlate to the initial slide where I showed the logical model compliance the processing layer and storage layer in terms of if you look at it the storage the policy and the rules are being in the storage layer then the compliance services all the process the TTL process compliance process routing process which are orchestrated using a compliance workflow manager is there then for the global opt out you have the mobile app and the web app then you have a bunch of AP layers helping it say internal APS for the admins as well as external APS for the end users that the catalog you have the audit management and the user data services the various microservices which is helping it right so this is bit I would say this architecture has been as is since 2017 where 2018 was the first release there have been a couple of iterations but largely it has remained the same so that was the whole the product which we put together from a GDPR and compliance perspective within ZOTAC to achieve all these things together going forward there is other important aspects as I told just these processors is not one end of the story you need to have a good amount of infra items and security items as well within your systems to be to be adhering to the compliance if you remember in the part one I had shown a circle in terms of the security aspect and the governance aspect and the privacy aspect like a fully circle going around so this comes from the cyber security and other aspects one important thing data sovereignty it has to be done by infra nobody else can give it to you right you cannot do any kind of compliance processing or any opt-out processing to achieve data sovereignty so suppose at least with ZOTAC EU data remains within EU there is no cross border data transfer we have just taken a decision in terms of we will never do any cross border data transfer so the data sovereignty is completely within EU itself whatever EU data it is in EU US data it is going to be in US then access and ride control this is a huge topic go over what is zero trust security and principle of least privileges these are heavily used many of the access to the production systems it is based on a system where we give a three hour access for anybody to debug and stuff like that so it is like a this the InfoSec team can really help in terms of how to actuate it but you need help of your infra and security teams to put it executed on the ground and a couple of other things the ID and profile data set is always separated this is mainly to ensure there is minimum blast radius in the sense somebody has inadvertent access to your ID they still don't have access to a profile data set which is and if you have profile data set they don't have the ID data set so that they can link it and really figure out whose profile this is right there is a in between anonymous ID which only programmatically is ZOTAC results in terms of combining these two identifiers so this was again a architectural division which you had taken one from a compliance perspective and also from a overall manageability perspective of the data assets as well and all profile data is it's not exactly pseudonym is it is a random ID which is actually it is if you if you take the profile data in its own the ID which is there is it's a it's a random ID so pseudomination I need to correct the term it is an anonymized ID which is sitting with the profile data the all the other ID which has linkability directly to the user is sitting in a separate store and for the data as a service the third party data as a service ZOTAC accepts only hashed email and phone numbers we don't take raw email ID or phone numbers and if some partner is planning to give what we do is the first step from the raw and landing zone we just convert it to this particular format and downstream processing everything happens that and that that asset is cleared of once that is done so it it has a very small lifetime of when this conversion happens right say and conversion of email million email to hash format is hardly a half an hour to one rx that is in any cloud system so that is that is where it is and with that I'll just move on to the reusability of SAS in the part three of the talk and also reusability of the assets which we have created for GDPR for the upcoming PDP or the data production I keep repeating PDP so it is a data protection bill now so for the data protection bill as well.