 several months back when we started looking at the PDP and our clients also started asking one of the first questions that we had is what does it even mean to comply? Is there a list of questions that we should answer? And we went around looking to see if there is a comprehensive list. The closest thing that we could get was one of these documents put out by Ikigai and what we immediately did was to turn that into a machine-readable checklist. It's actually fairly comprehensive and it actually links each question back to the section in the PDP bill itself. And first thing that we did is to organize this. Take this from a PDF form into a JSON or CSV form. Second thing that we did was to tag each question with the information as to whether it is an architectural question, whether it's a security question or it's a process question and so on. And then we have turned it into a usable JSON with all of these elements listed out. This is MIT licensed. This is modeled after the GDPR checklist which we, in our own product, we make it a matter of routine to include this GDPR checklist with every one of the deployments. The intention is the same that every company, every product is going to have a checklist like this. And some of the aspects that Satish talked about are already, they are identified as a set of questions. This won't be enough. The design of this checklist is also such that it is extensible. If other people have other checklists or they want to modify this checklist and so on, it is all extensible. It's a bunch of Python code along with the original links. It is crucial to use this. We have formal permission from Ikea itself to use the original content. So that is not a problem. You can see a glimpse of it in a markdown fashion. You can see some of the mappings and the categorization and so on. Also, beyond that, what we did was to layer some workflows around it so that we can share it. This is now, you know, we are prototyping some workflows as part of our product down the line. So you're seeing some of this stuff. Now, the interesting thing here is that from a context standpoint, whatever Satish is doing, Scribble as a company has to replicate that process at a slightly smaller scale across multiple companies. So a lot of the language and the tooling, the metadata handling and a bunch of stuff that Satish was talking about is something that we also have. And as part of that whole infrastructure, we have started building this compliance stack as well. And one of the first things that we implemented was a workflow version of the same checklist. This is not open source, but we have the ability to annotate, to attach documents, to link it to data sets, leave comments, assign responsibilities and a bunch of stuff like that. I won't go into these details, but the idea is that the experience that Satish had is we're also getting pushed roughly in the same direction, the hierarchical, the stack that he had, and also the rules. One other potential open source that we have not put out. If any of the folks in the community are interested, I'm happy to help open source it. It is designed to be open sourced one day is a classifier. Let's say you have a crawler that goes off, picks up these available metadata where it is available, as well as snippets of the data sets, wherever in your S3 and your databases, let's say that it is available. The step beyond that is to classify based on variety of concentrations. It could be quality concentrations, it could be security or compliance considerations. There is a separate Python module that we made that actually goes through, that allows specifications of these rules as a starting point, which and it goes and tags all of the available data sets in an automated fashion. This is still work in progress. We haven't worked on it for the past four or five months. We've got too busy. But if somebody in the community is interested in picking up and looking at it, I would be happy to share this. We have a third project also in mind open source project, which is about metadata standardization. Some of these tools are actually reusable across companies. They're not reusable unless there is a standardization of the underlying how you represent the schemas and things like that. We would be interested in seeing if anybody else wants to work on the metadata standardization for that as well. So let me leave it here. You can get in touch with me or this is, you can find it on my GitHub. I will add a link to the, I will post an update with the link to this repository. They've anchored a lot of things that are here like the assessment or even the data classification is something that we have built as part of the platform for running a privacy program as well. So I think that something that we can kind of look at together and also see how it can be integrated to make it more robust. Plus extent. Also, I think this goes in with the privacy by design thing that we had been discussing. So let me push that framework out. A lot of things will get linked together there. Absolutely. That is the idea. If you have community level broad processes and a bunch of tooling that will address various problems, it will make it much easier for people to comply. Agreed. Venkat, this video is fantastic. Iki guy. Yeah. Yeah. And just, I just have a broad doubt on the metadata standardization. Are you looking at a vertical specific standardization? Can you give a practical example of how that would work? So in the data enrichment, right? You're talking, I mean, many of your workflows is actually we do a bunch of that for customers, except that ours is a very little more generalizable and customizable platform so that you can keep making copies of it across customers. But in all cases, what happens is that we, first of all, access a lot of raw data and also generate a lot of derived data sets. And for auditability purposes, we generate a lot of metadata. When was this generated? How was this generated? What do we know about the statistical characteristics of a given column, maybe even visualization and so on. Every day we are adding more and more stuff to the metadata, metadata at a data set level, at a column level, at a process level and so on. The challenge that I see is that when I go into any customer environment, they have data sets produced by a variety of processes, sometimes in Spark, sometimes in SQL, sometimes in R and so on. And if there is, if a company wants to implement any kind of metadata, post processing, handling and so on, there has to be something common across all of these different systems. We are able to, because we control our deployment, we are able to impose metadata standards on all of the data sets that we generate, right? But we can't link it to any other data set in the customer's environment. Understood, understood. Okay, okay. And the PDP classifier currently it's a heuristic based or you are having some intelligence behind it? So it is, so first thing that it does is it looks at, let's say you have a 100,000 paths, right? First thing that it does is actually it clusters them saying that should we treat all of these subset of files together so that you can reduce what all you need to process. Then it extracts the samples and it applies a set of heuristics to say this is sensitive or this is not sensitive. Okay. But the way it is written is that it is extensible for you to apply ever more sophisticated algorithms there. Understood, understood. Great. We didn't have the time to develop it further. That was the main, main challenge that we have. Always that is a challenge. But it's, it's a, I mean, when I look at the alone hours for a customer, our system alone has generated 100,000 data sets. Right. But we have a strong handle on that. They have a million other data sets in their system that nobody has any idea about.