 So let's start, this time a formal introduction because last week I didn't do that much. My name is Satish KS. I head the engineering here in Xiotap and my responsibilities include all the engineering delivery strategy as well as aspects of technical architecture. I come with around 18 years of experience in IT and have been fortunate to spend last eight years in the big data stacks. And I'm passionate about the tech evolution from both business as well as the core tech perspective. Another thing I thought I would mention in this forum is I'm also passionate about my art Tai Chi. Okay, that is my LinkedIn profile. And introduction about Xiotap. Xiotap, we are a Berlin based startup. We started in 2015, 2014 was this thing but operation started in 2015. And we are a customer intelligence platform which enables brands to better understand their customers. So how we do it is we provide three items. We provide a 360 view of the customer data. We provide identity resolution. And in ATTEC there is a jargon called activation. We also enable activation. So this is a SaaS plus Das offering. We were originally a data as a service company because we were only dealing with third party data. But given the tech stack we built, we also repurposed it for a SaaS style of offering where brands can directly manage their first party data and optionally use the third party data as well for further enrichments. And every data company has a theme. We are people centric data collectors and the data can be categorized broadly as the identity data sets and profile data sets. And another style of categorization is deterministic what actually we collect based on deterministic identifiers. And second thing is based on our patented algorithms inside the company, we do some inferences and derivations. These we stamp it as probabilistic data which we supply for our customers as well. And as I told, we are fully privacy and GDPR compliant. We cater to close to 150 data partners both inbound and ingress as well as agress. Currently the main stacks we are integrated with in any company on any layer is the ATTEC layer and the MARTEC layer. So that supports your tap. And GDPR by now, it is no more tribal knowledge. Given the PDP bill and everything, everybody is having a good level of awareness or a decent level of awareness, but still I just jotted down a couple of points based on our early memories because this is a two year old law because the EU data production and privacy bill, it became enforced in May 25th, 2018. And when it came, it had an effect across the whole organization, be it the business teams, the data sourcing team, product and engineering, legal, as well as the security and IT team, every team had some impact because of this law coming into play in Europe. Couple of broad things it talks about is one is the personal data. When we talk about personal data, it means the data collected about any person which is identifiable to him and he's also a EU citizen. So that is called as the personal data. Second thing is the user rights. What are the rights the EU citizen has on the data which is collected by any organization? And what are the access rights? Access rights is both from the user perspective as well as the internal organization perspective. What are the access rights and security controls to be there? And it has some recommendations, broad level recommendations. It doesn't go to saying that this is the encryption algorithm you have to use or this is the technology you have to use, but it broadly recommends how you want to handle your data, how you want to transport your data, how you want to store your data, these kinds of recommendations are given. And of course there are audit requirements with any compliance, any law and any regulation there are always audit requirements. And the scariest of all, it carried heavy penalties which could be a make or break for a startup. If you're not compliant you can as well shut your shop and go away. So that is the kind of penalties it had. And it had two categories of companies which were handling data. And Ziotap fell into both categories. We were both a data processor as well as a data controller. We became a data controller because internally we were stitching the device IDs of the person's user with the profile. It may not be directly stitched together but we had the inference mechanism and that is one of our patents which we had of how the telecom identities are stitched together to the profile of a person. So by that means we also became a data controller and we had to be compliant on both angles. Moving on, as I told you we took a productish approach. So whenever these kind of regulations come in first thing you will get from your security and compliance team is a huge Excel sheet which will ask you to categorize all the data sets put everything in picture and all these things don't fall into the trap. Building a product is sometimes much faster than doing this operation like this. And that's exactly what we did in Ziotap. So when we distilled, when we read through the GDPR from the legal lens as well as the business lens and the product lens, what we did is we distilled it into a bunch of product requirements. And that is what I have listed here. So one of the prime requirement of GDPR was how do I handle sensitive data? Sensitive data means data of a person's ethnicity, data of his health, his actual PI information which I have categorized as another of this thing. How do we manage that? So these all come under sensitive data management. And second thing is it talks about by default the consent is like opt out. You have to be explicit about the consent you are collecting, it shouldn't be like an implicit consent. You take a single consent for everything and you take it as a user constraint. Instead of that, it talks about explicit consent management. Then the third thing is how do you manage PIAs? They're personally identify information like IP address, name, email and phone numbers. For this, how do you manage the user information? What is the product you are going to give out for? Managing is user information. These are a bunch of rights he has, like rights to be forgotten, rights to be erasure, rights to portability, as well as right to understand what data he has and what are the processing we are doing. And access management, of course, access management, I'm not going to cover that much in detail in this presentation because this was moved totally to the IT and infra folks in terms of putting the access management as per the minimum privileged access policy, whatever is applicable. Then the auditing requirements came in as a product requirements and couple of other remaining requirements are like sometimes very specific to the companies. For example, ZOTAP has a use case where we create cohorts from the profile data that is kind of a segmentation or audience creation from the data. So in this, we have some additional requirements to protect ourselves, saying that say a cohort should be of minimum 10K size. Beyond that, don't export the cohort somewhere outside. And what are the TTLs? Say for, if it's a cookie, what is a TTL I'll apply so that I don't have a very stale cookie still in my system. And if it's a MAID, what is a TTL I'll apply? So these kinds of customer requirements always some companies would be having. So that we also had those requirements. It also came into the product. Then of course the PIA management and from security perspective, they'll say, okay, if your data is addressed or if you're transporting data to this thing use these levels of security. Say you need a layer four level security, you need a transport level security and you also need data encryption. If it is encryption, this is the standards you used to use. So these all kind of rules come from the security teams as well. And of course, when you're becoming compliant you need to scan through your current existing all data sets. So this one-time requirement pretty much should be applicable to any company for a one-time bootstrap or data cleanup as that is asked. So that also became as a product requirement which was handled in a pretty much operation manner. After this product was built, we landed initially on the all data sets and we kind of created this thing. It became like a testbed, internal testbed if you think about it for us to figure out whether what we have built is working fine or not. So moving on, if we kind of split the product into a conceptual model, always compliance acts on couple of entities, right? You have, those entities has to be given the first class citizenship across your architecture and that's exactly what we are at. So if you look at it, the primary entities are here, the user and the data sets themselves. And in terms of processing, you need some kind of compliance processing which can permit across all the layers of processing in your system, across all your products as well. Then all these, whatever we are talking about in the previous slide, right? If I want to block a sensitive data, I want to validate this sensitive data drop rate or if I have to keep a hashing encryption, if I have to validate the hash length, everything becomes a policy or a rule set. So that becomes another first class entity in your product. And another thing based on the previous requirement which is one is the consent, if it's opt out, you need deletion. Deletion is not simple in really large data sets, especially when you're storing across multiple data layers, you might have in GCS buckets, you might have in BigQuery, you might have in some other database. So the whole deletion processor workflow itself needs a focused handling, I would say. Same thing applies to TTL as well. And the top four which I have mentioned is a consent audit user as well as the data set. So I'll just layered it as what are the logical entities here, what are the processing entities here and what is the rule-based entities here. So based on this, we move ahead and started creating the tech architecture for that. The way I'm going to present the tech architecture is more of a bottom-up approach. So we'll see the various items which we actually put together and finally how they combine to achieve the necessary use cases. So the first thing, what you need and without which you should be really fearful to call yourself compliant is a clean data inventory in the data catalog and a lineage system as well. Lineage is something a bit specific for Zyota. We had a use case even before compliance to have a lineage, I'll just explain it. So you need to know who's giving your data, which partner is giving, what is the region it is coming from, what are the categories of data, whether he's giving me only identity information, whether he's giving me apps data or he's giving me interest-based data or he's giving me URL browsing data. So all these has to be there, the categorization has to be there. Then you also need to be aware of what this data contains. What is a schema or registered schema of a data partner? What are the field types, whether it's numerate or text or string or it is like some rejects. Then what is a cardinality? Zip code could be a very high cardinality item, whereas the age can be bucketed and created into a five cardinality or four cardinality item, as well as gender could be just three cardinalities. And what are the expected values or expected rejects and all these things? So this is another primary thing you need. And how do you describe the data? Whether it is a raw dataset, you have inferred it, it is derived by yourself, if you have derived how it is derived, okay? Then next thing is what is a version? Always when the data is flowing from system to system, right? Downstream system might be acting on a completely different version from what the upstream system is currently handling. So this is something, the whole data flow should be aware of. What is the version in that time point what it is acting upon? So the version and timestamp of the dataset is very important. Then the last point is talking about the lineage. Why we needed lineages? As I mentioned, we collect data from multiple third party sources, right? So suppose there is a user data, his interest data is coming from one data partner and say his income data is coming from another data partner. How do we attribute that? That means I need to know at each attribute level which data partner has contributed to that knowledge of that attribute, right? So this is something which we have and which came in handy when we had to do the opt-out management downstream as well, okay? Then the second thing is, another important thing is resolution of conflict. I can have partner A saying this particular entity is of age 20, whereas another data partner can say this particular entity is of age 70, right? 20 is compliant, 17 is a minor. I cannot technically hold the data. So what do I do? So I use this to give benefit of doubt to compliance and say, okay, I won't even process this entity. Let me drop the whole record because I'm in doubt. And this has other implications as well. There is a whole layer of data quality and I would say a patented goal set which kind of helps me in resolving the conflict but it is kind of a priority queue. If you look at it, compliance takes the highest priority then the quality takes the next priority and so on and so forth, okay? So these are a couple of items which we had to build which some were built and some we were building at that time to achieve this. This is built over RDVMS and Elasticsearch stack and it has a microservice and library support. Library support is mainly for the Spark processing aspects. You will see these two points repeating again and again the microservice and library support. And when the catalog is updated as during the say when a data partner comes in, when this data set is getting onboarded as well as during the processing, right? When the data process is happening across the layers you have some updates happening to the catalog as well. Like the classic thing is the data point, the knowledge attribution which is the outcome of the processing. Now, when we were building this this I'm talking about the end 2017 and early 2018 there were no cloud based native tools or tools like Apache Atlas which can provide you this capability pretty much out of the box. Now we also looking at using any of the specialist tool set and migrate to it so that the management and downstream involvement is much easier on all these aspects, right? And this has evolved to accommodate more items as well like the quality stats and verifications as well. Yeah, so this is the first bit of technology in the whole architecture which was built. Okay, the second entity is the policy management itself. Now the policy is nothing but room. If say for this data partner the schema this is the registered schema if more schema is coming raise other. If the email is not coming in hash format hash length is less than 42 then it is not SHA hash drop that record or raise alert to the data sourcing thing. So these kind of things. So it's a combo of two things. One is the actual rules and another is the actions. Actions, we have a out of the box set of actions as well as it is extensible to create custom actions as well, drop action alert action null actions and all these things. I'll just take you through some example the next slides to help understand this, okay? And this also has a hierarchical support in terms of the policy. If this policy is there then this also has to be applied kind of hierarchical support is also there. And again, it comes with the same kind of tech stack which I told in the previous slide text. One additional thing is if you think about it the policies and actions are not defined by the engineers it is defined by the domain experts. So we need to give crud or create read update and delete via APIs to power users. They could be your legal team. There could be your some account management team or that could be even your product team. So this is one additional thing over and above the catalog which is being added. Catalog is more or less it's like a self import thing. Okay, fine. Just I've just put a sample of flattened sample of how the policy table is going to look like in the RD-DMS elastic search I couldn't take out because it was coming out a bit bigger than what I expected, okay? Now, next thing you need to understand this concept a bit more. This has two use cases, okay? One is I am separating the actual runtime parameters for the rules and the thresholds for the rules from the actual rule itself. This gives me say tomorrow for CCPA if I have to change a different set of parameters for the runtime, I can put it against the CCPA loss. Say for PDP, tomorrow something else is the threshold and runtime parameters, I can change it. So it is in database parlance, this is called the normalization which helps in more flexibility and evolutionary control as well. So what happens is this is a classic equation. The function of policy plus the function of the parameters which the compliance catalog provides gives you the action. So let's take two actions for example. Suppose say the policy is a schema policy and it is talking about age and the parameter age threshold is 18 and the action executed is drop the record, drop action is executed. There is another policy saying that device IP address policy if it is present in this data partner that is a parameter, if it is present it's just a boolean parameter, it is present to absent. Present means replace it with null, you take the null action on top of it. And another major flexibility which we had because of separating this catalog is the data layer pretty much has many encoded parameters. When I say encoded parameters, the interest for example since we are in capping to ad tech and many ad tech data partners are there for us, they give interest in the IAB style coding which is IAB underscore one, IAB underscore two, IAB underscore three. Like this you could have custom jargons coming into your data platform but that may not make sense for your backend as well as the unity, user team. So you need to have some translation. So this catalog helps, though I have named it as a compliance catalog, it is little broader than the compliance catalog. So it has this blacklist, white list and all these translations so that the backend and UI can consume it as well. Yeah, makes sense. So these two slides from these three slides, let's see how Spark processing pipeline is gonna work like. Now if you look at it, whatever I talked about till these three slides, they all fall under in some way your data governance layer. I have a couple of other services which I have put here. I'll just take a couple of seconds to explain that. Config service is nothing but the actual Spark job properties and Spark config itself because for each of the workflow, each of the item which is being processed, it needs a separate set of properties. Those all are maintained by the config service and the data catalog will be just talked about. Then the policy store, we just talked about and path catalog is something interesting. This is for our trigger-based mechanisms. So just to give you the problem statement there, all my data partners, they just put data into various GCS parts, the cloud buckets. And this path catalog is actually tracking them and it helps us in triggering the workflow at the right times. And once that workflow is triggered, it also knows next workflow as being triggered and registered as a path which is available. So it is kind of a metadata on the pipeline itself. So that is a path catalog. Then last one is the compliance catalog, which we again talked about. What is happening here? Say you have a data partner, you have a data set of the data partner coming in. First, using the governance mechanism, it will figure out what are the schema level policy. It will do the processing. It will take the actions and write the relevant audit logs. Then it will go into a loop where it will look into the value level policies. Value level policies, if you think about it, it has to be applied on record level or the row level. So for each record, it will look at the value level policies and take the actions. After that is done, you get a compliant data set. So this is how all these three together is getting actuated into a actual workflow. So like this, we have multiple pipelines, but what I'm trying to say is I'm just saying the sample of how one Spark process is gonna work here. Okay, next comes the privacy opt-out and consent. So opt-out means the user is saying, just take me out of your system and there are nuanced opt-out as well as nuanced consent. You can opt-out and there is something called what are the purpose you're opt-outing out from? It could be a blanket opt-out or a purpose level opt-out. Same applies for giving consent as well. Blanket consent is given or purpose driven consent is given. ZOTAP has three modes of collecting consent. Given we are a data controller as well, we are obliged to create a privacy website as well as an app. So that is what I have mentioned as the ZOTAP consent that flows into my API, into my backend system and comes into the pipelines. Second is the data partner. When I say partner, who's giving me data into the system? Now this is again a multi-fold mechanism because some data partner gives us a cloud transfer, some gives us a SFTP. So it is similar to ingestion, how I get my opt-out and the consent data. Also, there is a HTTP real-time API where they hit when they're doing some syncing stuff as well. So all these mechanisms are available for the partner as well as the consumers. And based on how my consent is coming into the system, I have three modes of handling within this. So if it is ZOTAP, it is always global because that is directly coming to you as the user is saying, ought me out of your system, ought me out of ZOTAP. That means I have to go and nuke this guy or nuke this entity across my data sets. And also wherever I have given downstream, I have to notify them to take him out of their systems as well because I have given the data point. Whereas when it comes to the data partner level, if you remember I talked about the lineage and all these things, what I'll do is I'll only nuke the knowledge which the data partner has specifically given me, be it the identities. If he has given me email and some profile data, I'll take off those email and those data sets. If he has given me email and a couple of other identifiers and couple of identifiers alone, I just take all those from my system, okay? The third is the consumer level. Consumer level is even more interesting. When I say consumer, consumer is our channel partners. It could be Google, Facebook, Instagram and all these fellows. So once the user opt-outs of Google, Google comes and tells me this guy has opt-outed out of my system. When I am sending this process data to him, right? At that time I have to filter him out. See this user, still I can send to the other channels because he hasn't opted out of the other channels. But he has opted out of Google. So I have to filter it from the Google system alone. So this is why we have three kinds of handling. The global, partner level and consumer level. And that is all again run by a various processes. It's very difficult to put all the process slides here. So I'm just giving you a verbal explanation. Hope that is explaining things. If you have any doubts, ask me after the session, okay? And what happens is from this consent, I'll show you in the next slide. There is a consent object which is a constructor which has the identities on which the consent has to be applied and what is the purpose of the consent. And the latest development in the past one year has been. We also became the TCF compliant. TCF is a transparency and consent framework which came from something called the Intrational Advertising Duro Consortium which helps in managing a consent at a blanket level across the cookies and the MAIDs, mobile identifiers. Because that is the identities primarily Ziotap works on. It is the browser cookies and the mobile identifiers. Next, so this is how the consent pipeline looks like. As I told, I'm getting from data partner A, probably the second thing I should have given is a data consumer B and Ziotap. And what is meant by this ID enriches. See if Ziotap is giving, right? Ziotap, we can get counsel from an email ID or I can get from a cookie or I can get from an MAID. So since I have to do a blanket nuke, what I have to do is I have to figure out what all the other IDs that is linked to this particular email ID. First thing is I have to hash the email ID because he's sending me an email to privacy.Ziotap.com. I have to convert to hash. Then based on the hash, I have to enrich all the identifiers which are linked to the hash. Then I convert everything to the standard format. And same applies for data partner. What happens is if he gives the ID, I have to figure out what are all the IDs which is coming to that data partner, which the data partner has given me in the past. And I have to link them all together and create the consent object and the consent tags. Tags is nothing but the purposes which you're talking about. So you have a SNO. And if it's SNO, what are the purposes? And currently for no, we are not the supporting granular purposes. For a no, we just go ahead and nuke it because yeah, we just go and nuke it. We are not doing any further processing there. So that is why I mentioned as a deletion is one of the activity. And another is for the processing, whether it can be allowed to processing or not is given by the consent processor liberty as a boolean for any of the downstream processing, okay? So this is about the consent data flow, sample data flow of how the consent looks like. It's not a sample actually. This is how the production also looks like more or less. Next comes the user management. As I told, since we are a controller, this privacy website and privacy app, how we are managing, right? So the primary key here as I have been alluding to in the past 15, 20 minutes is like mainly the MAIDs. By MAIDs, what I mean is the mobile advertising IDs, which is the IDFA for Apple and the advertisement ADID for Google. Then you have the cookies, browser-based cookies. You have the email hash, you have the phone hash. These are the four key or four identifiers which Zyotab collect. We don't collect IP address as of today. We don't collect actual names of the person. We don't collect SSN and other things. So these are all couple of things in our blacklist which we don't collect. For Zyotab, these are the identifiers we work with. And we had to create an app. We have to create a mobile app and the website and they interact to the API, the backend API, which is based out of Java play framework. Something like a drop wizard and all these frameworks. And given the size of our data set, we heavily use blooms so that the identifiers are quickly checked whether it is present or not present. And all the identifiers currently stored in a DV called AeroSpike, which is, I would say, a fast transactional OLTP transactional read-write database. So that is being used for all my identifiers for this particular purpose as well, okay? Okay, the next item, we covered a bunch of the items in the product slide which I showed as well as the requirements slides. Next is, which is again very important as audit. So the audit we, the decision we took is, we'll take the logging itself as the audit. And we take all the logs and store it in buckets so that it can be loaded to OLAP database of your choice. It could be Redshift or BigQuery or Snowflake or plain five and Presto. And you do your analytics and analysis. But only key there is, we didn't want to make unstructured log. We wanted a common log across the organization, especially for the GDPR compliance. So we extracted it out with the compliance logger, the library as well as a microservice. What happens is the log grammar includes the items which I've listed there, the violation type, which product is giving me that log. What is the data flow stage? What is the action taken? What is the timestamp of the action when it was taken? What is the timestamp when the violation actually occurred? And a couple of other metadata around the log as well, right? So this service is pretty much used across our layers so that across Spark as well as the backend layers and we aggregate the log into a single place. And it is stored on, I don't remember the bucket. I think it's month-wise bucket because compliance logger is not that heavy compared to say application logs. So currently if I remember it is month-wise buckets and this can be loaded and you can analyze what all actions happened and what has, how many consents has come in and all these kinds of things. You can do some basic analysis and forecasting. Yeah. Okay, putting it all together. So as I told, compliance is a first-land citizen. This is a complete compliance service. It provides capabilities over the blacklist management, since you're data management, the user data services, audit management. The compliance workflow is what is triggering the Spark workflows and determines which layer has to apply what policies and rulesets. And these are my ingestion pipelines on the left-hand side and these are the address pipelines. This is the only slide I had before, so which I used to showcase to say our outside deck. So I just reuse that. And it's a pretty old slide. I didn't rework on it. And if you look at it on the top layer is the actual apps which ZOTAP provides for the opt-out. And there is a user and the constant API layer, user API layer. And this admin layer is what I talked about for the policies and the catalog management, which is not exposed to this web app. So this admin layer is actually used only by the power users for the API-based users. And it's just a representative from the various stores which we talked about till now to achieve the whole use cases, right? So this is how if you look at it from a 50,000 PQ, how the architecture is gonna look like for this, the whole product. Next is any architecture. You will have a couple of requirements, the non-functional requirements, like how it scales, how resilient it is, is it extensible and all these things. So this is from our own experience. When we deployed this particular system in place, we had close to 80 workflows. Now we have more than 400 workflows and we are still scaling on a day-to-day basis. And all workflows has a certain set of process of how this has to be, how this is initiated in the Spark library and how it takes care of the things, right? Second thing is we also did a AWS to GCP migration and it was a decent, fairly simple lift and shift for us. This is for the compliance angle as well, this process as well. Then second is when CCP came into play, we had to, we could tweak the policies accordingly for the US and specifically to the CA, California region to make it extensible and accommodate those aspects as well. Of course, there were some changes. I don't remember everything off my head, but as far as I remember, we could accommodate everything without engineering effort. Just with the testing and sanity effort, we were able to cover that. So this is largely the pipeline summary of how the things have been deployed. Next item is infra and security validations. I thought I'll briefly touch up on this. ZOTAP, all the infra is split region wide. We are very, very careful about data sovereignty. Whatever data is in you, it is always in you. Data storage as well as processing. Same applies for US and same applies for India as well. And the access rise controls, we have a couple of certifications which I'll touch up on. So it's, as I mentioned in the talk before, it is based on minimum privileged access across all the datasets. And we have a chief data officer as well as a security person who does a quarterly audit on this and figures out if there are any exceptions and pass on recommendations, which is taken up by the infra team. Then we don't mix the ID and profile except during runtime processing. Always the ID and the profile is pseudonymized. In the sense, during ingestion itself, if I get an email ID or a cookie ID, what happens is a pseudonymized ID, which we internally call it, call as a ZDU ID. Or a ZOTAP unique ID is identified for each of the user profile entity. And at any point in time, if you want to do analysis of say, general age bucket analysis or general interest bucket analysis or how much of the filerate I have, you operate on this identifier, which doesn't give the actual identifier of these sets. That is happening only during for the, that access is allowed only for the runtime applications. Then another security recommendation is the data at rest encryption. Pretty much all the data at rest, we eat in GCS buckets or anything, it's encrypted. We also use BigQuery, which is again encrypted data at rest. And the email and phone numbers be hashed. This has some nuances in the sense, we support all three hashes. If data partner says, I can use all three hashes, we go ahead and use MD5, SHA, SHA 256 as well, apply for phone number. And we use upper case, lower case and ignore case. So in all this hash combination itself is like a nine plus six 15 total combinations available in our systems. It may not be always the same filerate because of the contracts we enter into the data partner. So these are all a couple of other items I thought would be useful for this forum. So I presented this as well. And the hash, all are validated. We have every hash as a length parameter. So you can easily validate as well as the email and phone number that rejects us against which you can validate. So these are all ideas which you can validate unlike a cookie, which can be any random UIID, right? So whereas these identifiers which are PIS, you have some levels of validations as well on top of it. GDPR doesn't recommend any certification out of the box, but if you have all these certifications, it kind of enhances your stance in saying that, okay, we can believe this company is compliant. So we have gone through ISA 27,000 one certification for past three years. We did a re-certification this year as well. And same applies for the CSA Star Certificate from BSI. I don't remember this thing. It's some British Standard Institute or whatever. Then the ePrivacy Seal from the, I think it is from the EU forum, one of the forum. We also have that. I don't know what will be applicable for the PDP down the line if there is going to be some other ISO stack or whatever. But yeah, given all the stacks we have built, we are very confident that we could just swim the PDP bill through as well. And current additional developments which are happening is if you look at it, when I was going through the slides, there were a couple of manual processes which can induce human errors, like when the data partner is onboarded, that could be some manual errors in terms of saying the schema has so and so items, whereas the additional items comes in. Of course, there are some validations present there, but still there are human errors which can creep in. What we are looking at is current development is going on around NLPA ML based scanning of the datasets and application. Of course, there are major cost challenges of productionizing this as well. But this is some active development which is going on in terms of both for the data quality as well as the compliance aspects. This is the current development which is happening in the organization across the data layers. And yeah, that ends my slideshow. We can move on to the question and answer sessions. Hopefully there were some useful pointers for everybody. Thank you. Satish, same question to you. Has there been any update over the past year and anything you'd like to say to companies that are trying to comply with the upcoming data protection bill in India? Hey, Anubhash, thank you. So one of the changes which has happened in the architecture or which has gone into production, if you look at it in the whole talk, it was mainly heuristic based and I had put a comment about ML based approach. So what we have built on top of this is something called the Prozzie Enhancing Techniques, which helps between any data transfer between point A and point B. So that is one additional thing. And of course, the other additional thing is the certifications have gone up a lot. Beyond ISO, you have the SOC2 and all these additional certifications which you have done. Specifically with respect to PDP, nothing has been done because I am also waiting for the law to be passed. We have the draft of the bill and once the law is passed, we don't see any major challenges at this point in time to be changed. But the Prozzie Enhancing Technique is mainly for a bunch of business use cases where the value exchange can be done between two different teams in the customer's end. So that was one of the case. Probably what is the technique and other things beyond the scope at this point in time but we could talk about it later.