 Okay, great. Welcome everyone to the second edition of the data governance meetup. Today we have a talk and a birds with feather session. In the talk, we have Tophik Ali, who is a principal cyber security engineer, and he's gonna talk about data governance on unstructured data. Tophik, on to you. Thanks, Ajit. Good afternoon, everyone, and even more to all of you. This is a quick word on me. This is what I am. I have over a decade of experience in the field of cyber security, and I'm very passionate about delaying offensive and defensive practices, which include security by design, privacy by design, threat modeling, et cetera. I'm a very active member in a community called Null, which I'm not sure if you guys know well, and our recent project was jobs.null.community, which powers free recruitment for employees and job seekers both, and you can always find me on LinkedIn and Twitter. My DMs and messages are always open, so if any questions, please feel free to drop a line. So before I get into the presentation, I have some ground rules. If you have any questions, please feel free to raise them while in the middle of the conversation. If I don't see a raised hand, feel free to leave a message in the chat and I will have a look at it. Everything that I talk here is purely out of experience and something that we've learned or I have learned over practicing different streams like privacy by design and security by design. So that's pretty much it. So let's quickly jump in, right? Now, what are the key takeaways that I'm expecting that our audience will benefit at the end of this is, A, we will talk about a lot of unstructured data and what are the privacy implications around it? We will also talk about privacy by design blind spots and I will give you a number of examples how these areas are often missed when you're coming up with a data governance framework, especially when you're talking about privacy by design. Agile practices for implementing privacy controls. I will talk about this as we move through the slides. A lot of stuff I will talk about will also be out of experience and you may not find them on the slide, but I think wherever it is important, I have made my best effort to put them onto the slide, but if not, we can always also have a chat maybe offline later. So before we begin into the unstructured data world, this is a very high level definition of what data governance is all about. This is for audience that is largely into governance, privacy and compliance and similar practices. So what is data governance? It's essentially a set of principles and practices that ensure high quality through the complete life cycle of your data. Now, if your data sits in one particular system or in one particular database or one particular location, it's fairly, fairly easy to come up with a data governance framework and then you implement rules that make sure that achieve the purpose of why do you have a data governance framework set up in your organization? But the real problem then starts when you don't know where your data sits, you don't know what is the structure of that data. And as a result, you cannot implement uniform set of rules to make sure that you have data or privacy by design implementations for systems within your organization. Again, this is a very high level data governance framework that generally is followed if an organization has a data governance function, you answer typical questions like why, who, what, when, et cetera, et cetera. Lastly, to be able to answer, A, why do you need some data? Who's going to have access to that data? When will they access to that data? And what are the means for accessing that data? But let's break this complex diagram into some very, very simple examples. So that takes me to all the blind spots. And again, I think a lot of this that I'm going to talk about is something that we've developed over years of experience. When I started the journey into helping different teams within the organization that I work for privacy by design, especially when the GDPR was very, very rampant and we operate in European geographies where GDPR seems to be mandatory. It was a fun exercise to start with, to see every single system that stores customer data or any information that is deemed as private. In doing so, like I said, when I started, that if the, it's very easy to identify data that sits in a structured manner because then you can implement uniform controls, the real challenge was unstructured data. And I'm going to give you some examples in a bit. The other blind spot that I discovered during this process was inventory of systems that actually store process and transmit data. And once you have a look at these examples, you will realize that these are systems that we don't even consider when you talk about data governance or data privacy or privacy by design related stuff. I will touch upon some examples on masking, tokenization, anonymizing or encrypting the data. If I were to summarize all the blind spots that we've come across through our practice of privacy by design is you cannot protect something that you don't exist, right? I mean, it's a very, you know, it's common sense that if you are responsible for protecting an organization from external threats, you cannot protect that organization if you don't know what is the exposure of that organization on the internet. So, I mean, that's just my way of, you know, summarizing the different blind spots that we have. Is that clear so far? Any questions? Am I going too fast or too so? Please let me know. I can accordingly tone up, tone down. Rajatthai, can you guys hear me? Okay, excellent. So, let's move on, right? Let's move on and talk about what unstructured data is. A very simple definition is that if it does not have a predefined data model or a schema, for me, it's an unstructured data and I'm sure that, you know, it's a general definition for any kind of unstructured data. Now, some examples are sensitive data that is exchanged over emails, documents, text files. It could be anything, right? It depends on what is your function in that given organization. For example, if you are somebody who supports a product and you often deal with a lot of customer, you know, questions, then you are exchanging a lot of data over emails and documents and, you know, it's very difficult to put a structure around it. Sensitive data stored on public S3 buckets. You can find several, several examples on the internet of how people have managed to find sensitive data about a given organization on, you know, in a public S3 bucket, which does not have any, you know, correct ACS configured analysis is accessible to anyone. Data shared on social media, though, this is very interesting, right? Now, a lot of my examples in this presentation are relevant to, you know, my organization. The reason I say that is because, for example, you know, let's talk about data shared on social media. Now these days, we often exchange a lot of information with our suppliers. So for example, you know, if you want to tweet to your electricity provider, then you, you know, go ahead and tweet and in doing so, you might end up giving a lot of information to them, which is something that, you know, in principle you should avoid. If you're tagging your friends, if you're tagging, you know, images which may have, you know, location information on them, et cetera. All of this is basically, you know, data shared on social media. Data shared indirectly with suppliers or partners. Now, while you may know that your organization is sharing information directly because there is some form of a legal, you know, binding between you and your supplier, but what about the data that you share indirectly? What that means is that you start with a certain initiative, saying, saying that this is what the data that is going to be consumed as a part of my product life cycle. But in doing so, other information as well, which makes your data more and more richer, you know, from a privacy point of view. You know, Facebook is a classic example of this, that it's not just you who give information to Facebook, but it is also all the people who advertise their products on Facebook. Facebook is giving them your information. As a result, you see a lot of tailored experience in terms of pointed ads, you know, that come up on your Facebook pages. So let's, you know, break this down. Now, if you think that your data is only stored within an RDBMS, think twice, right? So here I'm going to give you some real examples, as well as I will give you samples of how, you know, these things could actually have privacy implications. In my experience, it often happens that developers are never exposed to the entire life cycle of the product. What that means is that as a developer, if I'm writing code for a certain product, I have no visibility as to how these are exposed to the internet. What kind of logging is configured for these applications? What are the kind of analytical trackers that are implemented? You know, what kind of other activities are carried out, you know, by my business that works on these kind of products. So as a result, because the developer visibility is minimized and is not, it leads to blind spots in a lot of areas and we are going to talk about some of them. So poorly designed APIs are very, very common ways of leaking sensitive information about, you know, your customers or anything that is deemed potentially sensitive by your organization. Query strings, there are some very interesting examples that I will give to you in a bit. Data analytics systems, I'm not quite sure of, you know, how many people in the audience are versed with, you know, full blown data analytics systems that can actually record an entire user session in their browsers and give opportunity to your businesses to be able to replay that entire session and come back and give you a more tailored feedback or an experience. I mean, this area in my experience has been a big, big privacy nightmare if not implemented correctly. Again, something that the developers are often blindsided is, you know, the kind of logs that gets stored within web servers, proxies, load balancers, errors, web application firewalls, et cetera, et cetera. So everything that I'm going to talk from now is going to be related to this slide and the way I've structured is that I've come up with a case study and this case study is something that we will keep often referring to throughout the presentation and please feel free to raise questions if you have any doubts or you want to add something in. So this is what my case study looks like, right? It says the business requirement is very simple. I need to design an API as a developer that will return all the attributes that I have about a given user after successful authentication. Now, consider a system like Facebook again, where once you authenticate Facebook, you always have these websites that power login by Facebook, login by Google, et cetera. When you click on login with Facebook, it basically returns a very specific data set and it asks you for your approval before it is shared with the actual vendor or the actual website that you're using login with Facebook as. Now, the other requirement that I have is there could be N number of partners that could consume this API and each partner needs access to different attributes within that data set. So for example, if I store 100 attributes about a given user when they register on my website, partner one needs access to only five, partner two needs access to six and so on and so forth, right? So this is what the business requirement is and let's look at possible solutions of how you could design an API like that, right? So the first one is let's say that I create one API that sends all the user details upon successful authentication and every partner can choose whatever field they want. Now, what I mean here is that if I have 100 attributes about a given user and I would successfully login and in my head, because there is an NDA or a non-disclosure agreement signed with the vendor, I have no problem giving them all the 100 attributes, but the vendor will in turn end up using only those that are required by them. Now, when you're doing something like this, right? Those important one being that time to market. If you tell your business that I have an API that you can actually consume and give it to all the people that you work with or all the partners or suppliers or vendors you can quickly work with, my time to market is extremely less. I have less over it and quicker turnaround. So even if I have to make the slightest modification, I just make it in one place and it's applicable to everybody. Now, the biggest challenge in doing this is it's not data privacy friendly. Let me give you an example, right? So for example, you have one vendor that only needs five fields from the response of your API, but because you're giving them all 100, there is nothing that stops the vendor from iterating through your database and storing all those 100 attributes about those users. What they do with that is completely a different discussion, but this is a potential area of concern if you are somebody who's very privacy-minded organization. The other way to solve this problem is you could create multiple versions of the same API for each partner and send them only what they have signed up for. Now this is a very good approach, right? Because even if I have 100 attributes and let's say that one partner needs five, I only send them five. If the second partner needs six, I only send them six, which means that I have full control over what I can share with each of these vendors. Now the biggest problem with this approach is time to market, which means that if your business follows a lot of agile practices and they onboard new vendors on your platform very frequently, then this is going to be time consuming because with every new partner that they onboard, you have to create a new version of that API and then map the fields that are required. Of course, this also means it's a bigger overwrite and turn around times and something that your business may not necessarily appreciate. Now, considering the different technologies that are available in the market, the third approach could be that you create one APIs but you create multiple views on top of it for each partner. Now I'm not sure how many people are developers in the audience, but if you know technologies like GraphQL or if you know Facebook's Graph API, this is exactly what they do. They create one API and they create multiple views on top of it and then depending on what the partner ID is, you only send them the data that is required. Now, I'm not sure if you guys are well-bust with the whole Cambridge Analytica and the Facebook scandal where Facebook has this approach to all their APIs. They create, they have one version of the API and they build multiple views on top of it and then they allow Cambridge Analytica to pull a lot of information about Facebook users which they would potentially not really give access to other people. What are the pros in this process? It's very efficient process, it's very scalable and it's also very data privacy friendly. Come on, because you're giving only access to the data that is required. The only challenge in this is and that is something that you learn out of the Facebook and the Cambridge Analytica scandal is governance and oversight. What this means is that there is no way for you to not know what data is accessed by a given vendor. If you have the right checks and balances in place where you're reviewing the responses that are sent to individual partners on periodic basis. Right, so as a result, the cons are governance and oversight. But hey, governance and oversight are fairly easier problems to solve when you know that the underlying platform that you have built is full proof or you know for sure that it does the job that it was designed for, which is allowing you to create views and only exchanging that information that is required. So this is the first blind spot which we have often come across that you design APIs but these APIs are sending information back which is not intended for sharing or was not a part of your agreement with the vendor or whatever your agreement with the vendor or supplier was. Any questions on this one? Okay, all right, no questions. I mean, this is personally my favorite one. Let's talk about this, right? So there are a lot of functionalities in your applications which require you to deep link. Let me give you example. If you see here, you have a service online in which if you go back and click, I forgot my password. They email you the password link on your email with a token. And then when you click on that link, sorry, sorry guys. My bad, sorry, I just happened to click in the wrong place, right? Now, when you click on that link, sorry. When you click on that link, it takes you to the password reset page. Now, if you carefully see here, there is, sorry. There is a code which is nothing but a password reset token. If this token is valid, then it'll allow you to reset the password. So this is our typically password reset functionalities are designed. You send an email to your customer. Your customer will click on the link, you land on the page. If the token is valid, you will be able to allow the user to reset the password. Now, what happens actually at the back, right? And something this is often areas where developers don't realize the privacy implications. Now, the browser security model simply states that anything that you send in the query stream, A will always be visible and B will always be logged by a web server, right? So let's say, for example, you have two parameters that take a username and a password. And if you ever do a get with that request, whatever is the fragment of that URL will eventually get recorded as a part of your web server logs, which means that whoever has access to your web server logs could also potentially have access to the password reset codes. Now, if you look at two important problems with this approach, is that whenever, sorry. When the page loads, the entire link along with the code will be submitted to any trackers that are implemented on the page. Now, what does this mean? What this means actually is if your organization or if you're building a product that is having this requirement where you want to track user experience and you want to see how many users actually use the password reset functionality for whatever reasons, right? Or you built a tracker that is there on the page. Every time somebody visits, it collects certain information that is available in your DOM and submits it to the tracker. Now, this tracker that you have may not necessarily be designed by you. A most common example and most widely used marketing trackers on the internet are through Google Analytics or through DoubleClick or to Kretto. There are a lot of ton of these providers on the internet. These trackers also have the ability to inject JavaScript into your browser if that's what you've designed them for. And again, an area where you have to be very, very cautious about what they are capable of injecting back into your pages. Now, the other common area that I've seen often, you know, is an oversight is if on this page, when you click on the Save button, whatever is in your browser URL actually gets submitted as a referrer to the next page, right? Again, if I were to explain this in detail, you have a link to the left, which allows you to reset the password. When you click on it, you get to the page on the right. And when you click on the Save button, whatever is the page that you go next, this link automatically becomes the referrer. Now, from a privacy point of view, there are a number of things that could potentially go wrong here. Let's say that your password is said functionality was not implemented correctly, right? Which means that once the user clicks on the link, you don't expire the token or the code in this case. The other mistake could be that if your business is very keen and they are very keen in terms of making sure that the users have the best of experience, these codes, they live for longer than 24 hours, which means that I request for a password reset and I will continue to keep the link active for the next 24 hours, right? Even if the user has clicked on the link and that link is having a data that is then submitted to a tracker. And anybody who has access to their tracker will potentially have access to your code links as well, right? So as a developer or as an organization, anything that leaves your vicinity or your boundary of systems is out of your control. What happens to that data is something that is very difficult to govern. And as a result, it becomes very difficult to implement privacy-related design controls on such pages. Now, the blank is that the data that is sent in the query string will always be present in the web server locked or leaked by a reference and submitted to trackers implemented on the page if the request, of course, is visible in clear text. I am not sure of how many developers are well versed with systems like AppDynamics or Quantum Matrix or Full Story or IBM Daily, for example. Now, these are systems that allow you to implement some kind of tracking onto your web page that allows them to record the entire user session. They then convert this session into some sort of an HTTP request and then they post this information to the backend. Now, from a business point of view, they never want to lose an ounce of business just because the user had an unintended or unexpected experience on that page. Let me give you a very simple example. It's very common in an airline industry or for that case, even on any e-commerce platform that if a user puts something into the cart and if they go all the way to the payment page and for whatever reason they are not able to make a payment successful, your business would definitely want to know this. The reason they want to know this is because they don't want to lose their revenue on that user. Now, somebody on the business side when they identify these kind of events, they will try and tailor an experience in a way so that next time when the user goes back or logs in again to the application, they will continue the journey from where they stopped. This kind of systems are very, very useful and powerful but at the same time, they are a massive privacy nightmare. Now, I've taken this video. This app called as Full Story that has a similar pattern so this will just give you some sort of feedback or just give you a look and feel of what these kind of systems are. What you see to the left is a simple page in which a tool like Full Story is implemented and you will see that as the user enters information on the left how they get recorded as an HTTP session to the right. Now, you will see here to the left, there's a lot of information that's available and given or supplied by the user and to the right you will see that as and when this user actually modifies any information, this app at the back of this API implemented on your website is actually collecting all of this information in real time. I've also given a link in case if anyone wants to go back and have a look at it later on but these systems from our privacy point of view are a big nightmare if not implemented correctly and you can see here that it not only does passwords but it's also collecting your card numbers and bunch of other things as well and they are getting almost recorded in real time. Now, why are we discussing these systems is because of this. Now, I will allow you to read these links after the session but at a very high level, I can tell you what these issues have been and I have my team back at work we've tested a similar system and we've come across some very interesting observations. So I think both these incidents were very good examples of how unintended data can get captured by the actual vendor or the owner of the website. So in this case, let's say it's either Facebook or Flipkart or Amazon or whoever it is and then what do they do with this data at the back? Now, while all of these systems, they have one thing in common where they actively talk about a lot of privacy controls that are there in their application but at the end of the day, you can only trust them for what they say. And from my experience, what I've realized is that yes, it's good to trust and have these kind of implementation done but it's also very important to verify what the claim has been made here and verification could be as simple as you set up your app integrate this tracker and then you do an assessment or a review from both the sides while you're actively submitting the data from the app, you see what's happening on the other side and then you find ways and means of compromising that piece of information or trying to get unauthorized access to that information and that's how you will be able to tell yourself how foolproof this technology is or how foolproof that solution is. But I would highly encourage you guys to read these articles because they give some very interesting insight on how different organizations have implemented it. I think the first example, if I recall well, is related to Air Canada where they had a similar feature technology implemented in their app and then a privacy researcher identified gaping holes in the way it was implemented. This wired article talks about how organizations like Google and Apple are working very closely with their developers to ensure that if there are any such technologies implemented within the app, then they have to fully declare and they have to follow some accessibility rules and guidelines to be able to make sure the user is aware of what data is captured from their screens. Moving on, this is one area which is where developers are often blindsided for example, they come up with designs of get posts, request or any restful API for that case but whenever they try sending the data in the query string they don't know that it is potentially getting logged everywhere where the SSL is intercepted or offloaded. What that means is that you have perimeter technologies. For example, if you talk about F5 or Akamai or any other web application, Firewall service provider or vendor in the market, you need to offload your SSL onto those devices so that they can carefully examine the request for any potential anomalies and if everything seems good then they forward this request back to the actual origin server. In doing so, if you pass anything that is potentially sensitive then you will end up showing all of that information in the log for those kind of devices and I've listed some more technologies like for example, very big organizations which give access to their employees for internet implementing proxies, they often SSL intercept your traffic that way they know everything that is going in and out of the system so again, if you're passing anything that is sensitive in nature you only think twice because you know that it is going to eventually get logged. Getting logged is one part. The problem is that if any of these implementations have any sort of authentication or authorization issues then it becomes a big nightmare because now you can actually take that request, extract that request from your proxy and actually replay it on behalf of the user. Similarly, you have load 7, layer 7 or application load balances these days with all advent of cloud web application Firewalls. These are very interesting devices. If your organization and if your product that is consumed by different people on the internet is under any kind of attack and if you have an application Firewall then you will always be able to see what the attacker is actually doing. You will be able to go through the packets you will be able to know what the attack is and thereby you will be able to answer questions like when the attack started or what was the vulnerability that was being exploited what was accessed, what was exfiltrated or what was touched upon. These are the kind of questions that you will always be able to answer with the application Firewall like devices. Others are of course market capturing devices if your user or your customer base is compromised through phishing campaigns they are actually submitting their credentials to the people, the adversary here in this case but if you consider an example like the British Airways hack that happened a couple of months ago where somebody injected a malicious JavaScript on the page itself and it was only on the payment page and what that script essentially did was that it logged every single keystroke of the user. So you as a customer you actually go to the website you are entering and trying to book a ticket through the platform and while you enter the details this piece of script that is embedded on the page is collecting all this information and sharing it with whoever is the attacker behind the scene. These kind of attacks were also very popular by the name of the whole crew that was behind this were termed as mage cards and I think they compromised a number of e-commerce websites through vulnerabilities like cross-site and thereby injecting transcripts that allow you to harvest a lot of information as the user types this data onto these websites. Before I get to masking, tokenizing and anonymizing any questions so far? Any questions? Okay, all right. Let's have a quick deep dive into masking, tokenizing, anonymizing, encrypting. Now I think from a data privacy point of view I think these are very, very important terms and I can tell you this sort of experience. So let's say that your organization wants to host a hackathon and as a part of that hackathon they want to share some sort of data with the people who participate in the hackathon and the goal is for them to be able to come up with a creative app or a creative service or a creative feature on your product that can then be, you know, I would say then they will be able to, you know, either, what's the word? They can generate revenue out of these kind of features and analytics that they capture from the website and other similar situations could be that you have a data science team that is responsible for using different kind of machine learning algorithms to be able to give insights to your business teams to be able to make educated decisions of, you know, forecasted sales or forecasted inventory or, you know, if they potentially opt for a certain kind of service, what's the cost benefit that they would potentially get? Now these are some very, you know, various use cases. The mistake, the most common mistakes that I've seen in my experience is that they very easily interchange the definition of masking, tokenizing or anonymizing or encrypting the data. So for example, a lot, I often talk to a lot of developers and for them, encryption often seems to be, you know, the silver bullet that will solve all the problems but that's not potentially true. Encryption does not solve all the problems, right? And most importantly, encryption may not even be the requirement for that particular use case but you're just trying to over engineer something and as a result make it complicated. Similarly, I've often seen people exchanging definitions of masking and tokenizing. I will give you some examples as we move along in the slides, right? Now the most common errors that I've witnessed from data masking point of view is first is consistency. Now what consistency means is that let's say that you have a website and you also have a mobile app. In your mobile app, you decide not to mask a certain piece of data or let's say, let me take a step back. You have one API that is also consumed on the web as well as on the app side. You decide to mask certain sensitive information like for example, if the website allows you to store credit card information, the website only shows the last four digits of the credit card number. Whereas if you go back to your mobile app on your mobile app for whatever reason either you don't mask or you decide to mask the first four characters instead of the last four, right? And these inconsistencies can introduce a lot of problem. The reason I say that is because let's say for example, on Amazon you've seen, yes, I think I should be done in another five minutes. Okay, so let's say for example, Amazon allows you to see the last four digits of your credit card number and Apple back in the day used to use the same last four digits to verify a customer. So you can understand that something that is sensitive for Amazon is not necessarily sensitive for Apple. And as a result, this inconsistent masking then creates a lot of other privacy issues maybe outside of a product or even within your product. Patents, you want to use different character sets. So try to mask them on the client side. So when somebody does a right click and inspect element or a view source you can actually see what the real text is. Unmask data that is data in log transactions. For example, in PCI, it asks you to log certain kind of financial transactions for audit purposes. And if the data is available in clear text there's another issue from a privacy point of view. Now I have given some links which are very useful that we've used back into the place that I work where depending on whatever the technology stack is that will always have options for doing masking or implement custom field for my test or encrypt the columns that you consider them as sensitive. Tokenization, I would always question that whenever there is a business requirement that asks you to capture some sensitive information from an analytics point of view these are genuine requirements for you to capture the data your approach should be very simple that when you want to rebuild the data set you should always tokenize it and when you only care about the data structure you anonymize it. So for example, I spoke about the hackathon event and let's say in the hackathon event you don't need to tokenize it you don't need to encrypt that information because it's pointless but then you can anonymize it which means that you can replace a password number with a similar formatted string you could replace a pan card number with a similar formatted string or we've adopted these two simple rules where if you want to rebuild the database we tokenize it, if you don't care and if you care about the data structure of the attribute then you only simply anonymize it. Last but not the least data encryption we've spoken about this, this is basically data in transit data at rest so whatever you do essentially goes inside backups, data stores data warehouse, data lakes etc etc the most common mistakes that I've often seen developers do is they're actually trying to encrypt they're actually encoding some piece of data and calling it encryption they choose the incorrect cryptography type so instead of using asymmetric they end up using symmetric and whenever they're expected to use symmetric they end up using asymmetric and these do have privacy implications I've also seen with my experience that a lot of times because all of this knowledge is always known on the internet, developers will end up writing their own encryption routine which is something that I would never, never encourage any developer to do because encryption algorithms have complex mathematical properties that they meet in terms of trying to protect the integrity of the data and as a result they become a standard so if you ever come up with something that does not meet the same level of standard it's not worth spending your time on it the other bit from a privacy part of you that I've often seen as a challenge is you can't have 100% security and have 100% privacy and have absolutely no convenience so it's always a balanced call between what is right, what is wrong and which is where you can draw a ground or a middle line saying this is where you would want to stop and not normalizing it and other stay with the denormalized part so this is my last slide hopefully yes Avin I want you guys to look at a very high level process that we think works very well for us so for example to the left you are seeing an autonomous team which is basically having representations from different parts of your organization all of these people come together and they carry out an activity that we call as data cataloging or data classification which means you've identified the data that is important from a privacy point of view and then you've also classified the data and then eventually you feed all of this back into data governance framework and you will see that the lines are both ways, arrows are both ways which means that this is a bi-directional process and then you see continuous assurance at the top which essentially means that while all of this is always going to be a moving target continuous assurance is just like a feedback loop that you often are doing the right audits or you're trying to raise or identify different blind spots through the life of the data itself lessons learned this is really really my last slide catalog all your data it's very important you cannot protect anything that you don't know discover privacy related data in your existing systems this is a very very important activity and there are a lot of tools available on the internet something that I personally is necessary to be able to identify systems data don't store the data that you don't need once you've cataloged all your data and once you've created a framework define technical controls to protect sensitive data and the continuous assurance that you show here is basically summarized in this line which is build measure, learn, feedback and loop and from in a privacy world it's very very important that you trust but at the same time you also verify and the only way to verify is to have continuous assurance alright guys so that's pretty much it that I had to share with you guys today