 Hello all, this is Satish KS, part of Privacy Mode Fellowship Program driven by the privacy community of Haskell. This community works on data privacy with focus on engagement with policy to improve overall privacy ecosystem from consumers and providers perspective. The upcoming data production bill brings a spotlight on data privacy across all domains with adequate compliance needs. This fellowship program is set around the same context and as part of this, I'm going to present my company, Ziotap's Journey with the GDPR regulations, more from a product and tech perspective and what we have done towards the compliance of GDPR. Ziotap runs primarily a data business both from a data as a service perspective and data platform from a SaaS perspective. This is going to be a three-part presentation, the part one being an introduction to data privacy, GDPR and the regulations and part two being more like a reference study of what we had done within Ziotap and part three is like a cross fitment towards how the learnings could be cross applied for the upcoming data production bill in India. It's a brief intro, so we are mainly going to talk about GDPR and compliance for data business and I am currently the CTO of the company and with that, let's start with firstly, what is a data business and at a high level, what exactly every data business more or less does, right? So they source data and they do multiple processing on top of it, which I have largely defined as refining data and they deliver data for various use cases. And this could be both within the company as well as in inter company as well. So sourcing the challenges that there are going to be varied sources of data on which you have to run the integrations and refining is now a huge area where you do lots of standardization, cleaning, enrichment and intelligence additions both heuristic and AML driven and whatnot, then you might do couple of your own enrichments based on your quality makeovers, profiling and stuff like that and temporality is the time dimension of the data on which we act upon. So these are the various processing needs and finally, you deliver the data, this sort of enriched data or the enriched intelligence is delivered via integrations, it could be a cloud exchange or AP mode and you take care of the reliability of the data, you guarantee the SLAs and you take care of various modes of data exchange. When I say modes, it is like you could send this in real time mode, batch mode, which are the primary modes which generally people talk about and how you organize the data as a whole. So it could be something as simple as a CSV format to something very complex as a binary format which is for specific use cases. So largely this is the landscape in terms of how a data flow and thereby your data business operates and if you put the compliance lens towards it, there are two primary categorizations of any data at this point in time, as per the regulations across any countries which is upcoming as well as which is already available which is one is the people-centric data that does any data which could be a device data or product data or anything but which has a linking to a real life person behind the scenes. Say for example, if you have a mobile and if I have your mobile identifier that means I have some sort of device identifier which is linking to an activity of a real world person behind the scene and some direct identifiers could be your first name, last name and the email and whatnot. The second team broadly in the categorization perspective is the non-personal data which could be say for example, weather data or say your city traffic data which is not exactly tied to any particular person but it is tied to some other team so that this is a broad classification and the data flow could be either purely people-centric or non-people-centric or a combination thereof. So this is primarily how you can look at the data business landscape. Then a brief introduction on GDPR. This law came into EU and it became active on May 25th, 2018. So this is the EU data protection and data privacy bill. And when the bill was passed, it not only affected certain parts of the organization but it had a effect across all the legs of organization. Say be it your marketing teams, be it your HR teams, the legal team, the infra and the secure info-sec teams as well as the product and tech teams. So it had a profound effect across many businesses in Europe and what it talks about is first case, this bill is talking about the personal data and its handling. What are the rights if a company is collecting personal data and the personal here in question or any citizen who is part of the EU member countries. So that is the scope of this particular personal data. And what it talks about is it gives a couple of rights to the user which we'll talk about down the line in terms of what are the rights like right to erasure, right to portability, right to transfer and right to know what data a company is storing about them. Then the second thing is it prescribes a couple of things in terms of access rights which is to be given. Then the third is definitely on terms of how you're going to handle the data. So here too, it does not say you have to encrypt it. You have to do asymmetric encryption or asymmetric encryption. It is just prescriptive in terms of how you have to handle the data. Then if you're doing data transfer either cross-border or within companies or as part of any second party data exchanges, what other things you should take care of. So this is things which is handled in the GDPR law. Then finally, the audit requirements. Audit requirements is again prescriptive. There is no primary certification which says that oh, this is GDPR certified. It is a combination of multiple things which we'll again see when we go more into the presentation in terms of whatever things which could help a company say yeah, now we are GDPR ready and this is because of these and these reasons which we have put across say the IT organization, the legal organization, as well as the product and engineering organization. And the last thing which is very important is it talks about penalties for data breaches. So the penalties are pretty huge and especially if you're a startup which you're operating in any of these regions, a single penalty which could arise out of a suit to your business could more or less shut down shop for us. So that is the kind of impact it could create on the organization in terms of penalties. And in terms of the roles with GDPR talks about, it primarily classifies as a data processor and a data controller. So data processor is broadly you could define as he just doesn't store the data if you have to fundamentally talk at a very, very fundamental level. So it is like a transition of data through some processing stages and it moves on. He's not exactly responsible for the data. So he has certain responsibility when I say not responsible. It is not like he's the primary owner of the data. The second thing is the data controller. Data controller becomes more or less the owner of the data because he is procuring the data himself. And some companies can become both a data processor and a data controller in terms of how they're operating as a business. So you have to look into the laws to figure out whether you are a data controller or a data processor or you are a combination of both. So that is a brief introduction to GDPR. Now, just to understand what is happening and how the landscape is changing. There's a couple of slides I have put through to figure out how in this, there are many parlance running like data is a new oil and many companies are investing in their data strategy. So what exactly is a value exchange in terms of data? So the one is you have the business context, couple of business teams, marketing teams, sales teams and other teams which need data in specific formats and specific insights and analytics to be driven out of it. So they give the business context to the owners of the data. Now the owners of data could be product and data teams within the organization or across other organization where you need the data to be taken out or it could be subdomain vessel. Say for example, you're a conglomerate and you want to cross position your say lending data towards a investment banking data for something like credit scoring. That is again a business context on which a value exchange happens. So now prior to any regulations, what was happening? This team say for example, a marketing team, they will say, hey, I want to work on a customer acquisition scenario. This is my business cost and I probably need the last two years of data for me to calculate certain attributes, say something like RFM model or something. And the data teams either provide it as raw data or if it has data science and other capabilities, it is able to provide the additional capability and rich data to them. So this was the original scenario. Now with regulations coming in, what happens in this scenario is the flow which we saw in the previous slide does not happen easily. So when the actual data exchange comes, there is a new entity or in many companies in you at least this is called as a data protection officer. So he needs to assess what is the impact of sending this data from this particular layer to this particular layer. And when it goes to them, a bunch of questions and he, mind it, he is responsible for protecting the company from any data suit. So he has a immense responsibility. So I have put a very small box here which talks about the policies which he needs to be aware of, he needs to be aware of the business context requirements, the regulations, the standards, the laws which is applicable in there and what are the rules he needs to operate on and it's a pretty humongous task which is sitting on it. So he'll come back to the product team. When I say product team will ask us some simple question, can I share this data? And there is going to be a bunch of exchanges in terms of do you have consent? And do you call it as customer preference? Do you have some sort of privacy protection in case you are collecting raw PIS? And what are the controls you have put in terms of data security? And do you have a data retention policy? Suppose say I have a particular identifier which the client says I shouldn't be storing more than 90 days, do we have these policies in place? And many a times in current scenario when companies were in two other hands with all the data regulation policies and stuff like that, the answers would be, oh, yes, probably I need to check with this guy. Could be, oh, can I get back in a month's time? So now when this exchange is happening, the value exchange is completely curtailed because there is this team on the right hand side who is waiting for all this value exchange to happen which is not happening because of all these. Both the cases are important. You need to adhere to the DPO because he is playing the trust and the protective play for the organization. And the other team on the RHS is trying to bring value to the organization. So this is where we are trying to solve the problem in terms of how exactly we could reduce this friction. Now, couple of other jargons which has come up which was that, but which is getting a good amount of highlight. When you go to any website, you will see accept cookies, rigid cookies, that's a baseline, but if you want to say manage my cookies, it'll take you a different screen which throws up a bunch of items there in terms of can I use your data for analytics? Can I use for marketing purposes? Can I use for targeting purposes? Can I use for statistical surveys? You have a bunch of questionnaires there which, well, each and everything you can click and are just saying that, oh yeah, you can do this, you cannot do this. There are some certain standard framework. These are called the cookie consent and there are other ways of collecting consent. It depends on the company's strategy. This is not something which is going to be scope of my talk, but just for the understanding in terms of how exactly the trust between the customer and the company is being established. This becomes, I would say more like the facade or the front part of it of having a very clean cookie consent or a consent strategy. I need not explicitly say as cookie consent, but a consent strategy and a marketing preference strategy. So, marketing preference is a little advanced level in terms of consent. Now, what is consent? As per GDPR or even the upcoming data protection bill, in the past, the consent was more or less implicit. If customer had to opt out of something, he can go and opt out. The default was in many companies, it was like opting. He is opting in himself for any activities which the customer can derive out of his data points, but that has been completely flipped. What they're saying is the consent should be explicit in the sense when I go to the website, I need to be explicitly aware of what all consent I'm giving. So that is why it even expands on the definition saying that it should be informed, it should be active, and it should be unbundled. I cannot bundle multiple things, say stats, marketing, targeting, and machine learning, all these things bundle together in a singular consent, though consent should be granular. So that is the granularity is another requirement. And it has become mandatory. And the consent collection and the logging and able to non-repudiate things is a legal requirement in terms of the consent, how it has changed. Coming to other thing, the marketing preference is I would say one level deeper in terms of each company trying to understand what are the personalization strategies which it can take for a particular consumer. Say I'm a consumer and another person is a consumer, I don't like receiving too many emails, I can go to my marketing preferences as part of the same consent box which is thrown up to me and say that don't send me emails, I'm okay if you reach out via say YouTube ads. So that is a marketing preference and that kind of personalizes the way the company is gonna reach out to me for any offers, promotions or anything in terms of how they are setting up their website even changes. This is not mandatory. And this more or less acts as a support to the consent. It is not directly legally implicated but it acts as a support for the consent. But the consent is a direct legal implication which is a must have for any organization which is embarking on any sort of compliance challenge. Now, coming to one double click in terms of how exactly consent or for that matter even the marketing preference is managed is from purely from a data pipeline flow. You could bucket it into three major flows. So this is again coming in terms of if you remember consent is again a data, right? You're collecting from various things probably somebody gave a phone call over IVR and pressing a one button that is a data point saying he has given this concern. Now the same source, refine and deliver applies for consent as well. And that's exactly what I have put here in terms of how the consent has to be internally managed. So you have all the collection touch points it could be a multiple customer touch points say the cookie, the connected TVs call center as I just mentioned email. And now in the last three years thanks to all the regulations there are softwares coming up very, very nice softwares coming up in terms of setting up and administering the whole constant. And it has to have transaction audit because the consent is a transactional thing. It has to be more or less in database parlance as it compliant in terms of if I give a consent it has to be really recorded. If not, I need to get it recorded from the customer again and I need to have a history of that. So that is why I mentioned it as a transactional audit capability. Then optionally in terms of collection concern so for example in Europe there is something called a TCF which is driven by the IAB IAB is a consortium of ad tech players so they have a framework like that. Then you could have additionally a taxonomy support. If you remember in the previous slide I talked about informed consent. So how do you inform? So that means you are creating a taxonomy for the information exchange. So that is another one. Then you can have standardization say for example, you have invested in a CMP software but your customer care is operating on a little old school software. So how do you standardize these two consent which is coming into this thing? Then you can optionally have the preference management. This is a marketing preference. Now once you get all this consent in the refinement of the consent what you have to probably invest on is how you're going to create a singular view of all this consent. Why the singular view is important we'll see in the third bucket. This is fundamentally data playing and there aren't any out of the box softwares available at this point in time. There are certain customer data platforms, CMPs, this is like a downstream roadmap for them. And a couple of data warehouses which the teams themselves or the in-house teams themselves they invest on to achieve this mastering. And this is another thing which is happening in terms of what we have to define consent mastering you are creating a singular view of all the consent points captured from the various customer touch points to create a singular view of what the customer has given us consent for us. Then optionally on top of mastering once you master your data obviously the analytics and the risk assessment and all these can be put on top of it in terms of if a particular data is flowing out what are the risks and what are the concerns I need to be really aware of. Coming to the last one, this is the delivery. How do you make consent? Because if you look at any data-driven system even in a I would say very, very medium-sized company you could have anywhere between 100 data pipelines flowing from your primary data warehouse or if some company has directly invested in a data mesh architecture again it's the same. So there could be hundreds of data pipelines flowing out. Now when the downstream is not exactly a transactional consumer and it is going to be any business consumer which is going to use it for anything beyond what the consumer is transacting on the system. Say for example, if the data is flowing within my session to serve me something say it could be a financial transaction or a purchase or subscription whatever that is a transactional thing. Beyond that anything what we do with the data is like a downstream system you could run anything on top of it. Now the mastered consent has to be available across all these systems otherwise they won't. If you remember the friction causing factor was a DPO was asking certain questions whether this is available that is available this is available. So this orchestration acts more like a filter layer saying that hey this data flow I can send only these amount of users because from my customer care center he has given me marketing preference as use me for targeting and from a cookie consent this thing he has given allow all the targeting cookies. So I combine these two rules and say that from orchestration perspective this consent acts as a filter so the data can flow from this particular the system A to the system B. Again here there aren't many softwares which are just out of the box available. There are a couple of new data platforms not new they have been around for almost five years now which we are investing again in this orchestration layers as well. Coming to the other aspect the privacy here we are talking about data privacy it is a different from data security data security helps in achieving data privacy from one particular angle but privacy is a little deeper than say plain data security so I just wanted to give a primer on what exactly is a privacy management from a data context. So for privacy management data context largely this framework has been taken from data governance frameworks and if I remember correctly I picked it from Lyndon and NIST and these frameworks in terms of what is the recommendation they give to start a privacy management journey. So first and foremost obviously for data governance or privacy or anything you wanna do with data going forward you need to really identify what is the data assets and DPIA is data protection data privacy impact assessment and third is a discovery semantics on top of the data. So primarily these all can be achieved by investing in a very good data catalog and some kind of audit tool on top of it. Then you have to plan in terms of what are the policies and procedures. Say for example if you have to create a consent strategy to collect the consent data what is the policies and what is the procedure whether you wanna go with website how many website you have and whether you want to have similar level of consent across all the websites and if you want to give a offline consent management how we are gonna do that. Then you need to the second part that is just the collection part. Now in terms of the value exchange part the part two and part three is very important. So how are you gonna manage that? And third is the roles and permissions in terms of how it is and the trainings across your organization in terms of how they should handle data. So why they are talking about training is say for example we always talking about customer as I end this thing but if you look at a scenario for a HR even the employee data they have to collect the consent going forward and know how to manage it. So the training across organization becomes important and tooling is an investment you need to look at various softwares which has mushroomed in the last three years across cybersecurity, data security privacy enhancing techniques as well as data catalogs and all these and figure out benchmark which exactly works for us. Then you need to work on the control and protection semantics in terms of the what is the data security tech which is needed. Of course if you're gonna go completely secure you may not be even able to realize many business case. So security is more about how adequate you are protected in terms of this thing and many times it is compensating controls rather than really a direct control in terms of saying, oh no I'll never give access to the data that doesn't work because it is like one of the primary fuel in terms of the business running. Then if you're talking about data security identity and access comes and response and recovery of the data and privacy enhancing technique is a highly researched area in the past two years there has been lots of papers and lots of improvement in this particular area. We'll talk about that as well going forward in terms of privacy enhancing techniques. Then you need to have the audits in place both internal as well as external audit. So that kind of reinforces your commitment towards the whole privacy management for the company. As I mentioned in one of the intro slides neither GDPR and probably the upcoming data production bill of India is not going to exactly say these are the things audits you need to come and give back to me as certifications but the onus is on the company to figure out, okay if I am playing with this person later in this manner say probably for healthcare I probably better get a HEPA certification. If I am in the auto industry probably I should get a PSAC certification on a general basis for any kind of data I'll take the ISO 27,000 series certification. So that is the management division which is part of the planning and it has to be executed as well. Then the other things which any security or privacy expert who is in the company will talk about it in terms of how the model is conducted and what should be the review guidance. So this is a cycle this keeps on happening. So over a period of time say the company changes a business then you are suddenly acquiring more biometrics that means the data asset changes and you have to plan the same thing. This cycle goes on, right? This is like a you could plan for your business it could be quarterly, annual, by annual basis. So this is a cycle in terms of how you are constantly managing your privacy. As I told the privacy enhancing tech this is I would say more or less past two years it has caught many researchers attention largely this is classified into two areas in the X axis in terms of input privacy that is when the data is flowing into the system what are the certain enhancing techniques which can be or what are the certain techniques which can be put in place for the privacy to be controlled and when the data is flowing out of the system what can be controlled? So any data flow if you look at it there is always two semantics you can either pull the data or you have to push the data. So that is what is the Y axis showing in terms of if you look at the Y axis on the top it is talking about pushing of the data where you publish or you export or you give an API where they are able to programmatically access things, right? So this I won't delve deep into this each of them is a research area if you look at differential privacy it is a research area in terms of adding statistical noise so that identifiability of a data to a certain individual is not possible and again on T closeness again all I would say various research areas where a couple of companies have even come up with products which we are also evaluating at this point in time. So this is mainly for FYI I didn't want to really explain these slides but largely if you understand the data flow in terms of input privacy and probably you may not be investing on all the four legs initially but you could choose which leg is very important to you with your current business use cases and look at the certain softwares which are coming up and the last compliance management is not like a singular thing as I told you there are three different angles especially from a data end of things you have the data security angle and the privacy is a very, very upcoming area in terms of in the past privacy was more about protecting your data with perimeter security and encryption all these things now there are privacy enhancing techniques as I showed in the previous slide where they could actually make your data completely anonymized and still achieve the value exchange for the downstream things so that is other leg and the third leg is definitely the data governance leg so all these three work in tandem to achieve your real compliance in any company and of course whatever you put in place better have an external audit scrutiny to reinforce one internally that you are confident it is like a developer writing many test cases so this kind of reinforces we are in the right direction and you also gain to learn a lot in terms of the external audit and that could sometimes add as a feedback and a backlog system to this whole circle which you are talking about so with that I will end the part one introduction which I wanted to set the context in part two we'll talk about Ziotap's journey specifically from a product and tech angle in terms of what we did to achieve the GDPR compliance thank you