 Today's welcome to module 178 of Introduction to Data Science Course. The ethical values which are in our segment, we will take them further and we will see what more is there for you to understand. Data privacy, this is again a critical area. Like my information is private to me, your information is private to you and some other stakeholders who have the right to access that information. So basically we can call this data privacy and in the next slides we will see how data privacy is important in the data science and the analytical world, what is its importance in computer science, how can we use it, the different areas within data privacy, if we use it effectively and get a good result, to minimize the risk which we discussed earlier, that there is a business risk associated to the data, if it is the breach of the data and if we want to avoid it then how can we do it. So basically there are different angles or different aspects of the data, how to save it, how to secure it if it is private, how to secure it if it is public, how to secure it. So this is what we are going to discuss. Data, basically again I will remind you that we talked about a pipeline, the whole pipeline of data, the whole life cycle of data. So with data collection, the dissemination of data, its presentation, its authorised sharing with someone, the whole area or whatever you call it, the whole ecosystem or the whole process of a company, first of all I have collected the data, how to collect the data, I have to make sure that the data I have collected, if it is done by an individual or by a machine, then I should know where this data came from, for what purpose I have collected it, then where I have to store the data, where I have to use it and then who I can share the data with. So this is a basic principle, data privacy share related. Then there are multiple things in it, if I have the data of a survey, then at the time of the collection it can be breached, that someone else can share the data, but if someone has scanned or scanned it through an IoT device, then at least that data is limited in the machine and that is not accessible to anybody else. So this is your privacy issue, you can see how people are involved in it during the data collection, when you have stored it, you have done its ETL or you have enlarged the data, you will, you know, you have seen these, gone through these during early segments of this course, so these things will come again and again because in practical life, it can happen in one day that whatever we have covered in this course or have to do, maybe all these things will pass through your hands in one day, so take it positively if there are some things that are repeated or re-iterated, so that is how it is, there is no line that these things will happen, and in this way these things will continue to happen, it is like many things, in practical life when these things happen, then they are going in parallel, many people work in an ecosystem and so many things happen in parallel, so you should also appreciate this, that if something is coming in front of you again and again, or is being discussed in multiple angles, because that is how it is in the real world, right? now in this, the technique that we discussed earlier, first of all you should go to the internet, just do one experiment today, go to any website and register yourself, or you say that I have some information from there, you have to download an article or something, so what will happen? that first of all they will ask you for your name, age or something else, email address where they will ask you, even email address or there is something called an IP, that your global, the world of the internet has your identification on it, the computer that we use has an IP address, so that IP address is your identity, that IP address also gets to know from which country you are from, which city of that country you are from, even if your global geographical information system is your coordinates, where you are located in the globe, that can be identified with the available system today, so your data is the biggest challenge, even if you talk about Google or Microsoft, their servers are located or distributed throughout the world, even it is not a challenge for you and me as individuals, in fact it is a big challenge for those companies as well, this is sort of a vicious circle, that if you are talking about collecting data, that they are spreading their work, but because their regulations are weak, that the country's laws are different, the other is different, the third is different, and there are no global laws yet, or at least they are not strong enough to help us control these things, so this is one of the major reason, that the geographically and globally distributed storage, where your data is, which is there, that is basically one of the major root cause, apart from that, that after the mobile phone, for some years, digital transformation, everyone is going digital, now if anyone knows or is not there, do another experiment, go and find out one definition of digital transformation, you will not find, the reason for that is that, every industry, every sector, every company's digital transformation, those are the requirements, and basically its purpose is that, the manual intervention, the manual work, we can replace it with digitalization, so this is basically digital transformation, so this is one of the major reason, that the data we have is being collected, it is not controlled, because the new system is coming, the development environment is separate, its operations are separate, different industries use it, so these are the things, a challenge basically creates, that the security of the data, or privacy, how is it useful for us, or it is challenging for us, now for privacy, your physical, logical, and every kind of access, what we have talked about, you see the user rights, you have a password, sometimes they send you a code, all these things, because your information, that must be under a controlled environment, that every person cannot use it, its again security, information security, the related fields, we talked about identity access management, that to access any system, I have to make proper rules, or those who follow me, so that the system can identify me, role based access, where I work in accounts, I work in stores, I work in retail, in any other organization, the computer knows, that Mr. So and So, has logged in in the system, and he has access, or he should have access to only, limited amount of information, not the whole system, just to summarize the privacy, there are some salient features, the points are, that your entire data pipeline, that we have user rights, identity access management, data collection, data storage, even the universities have used it, they do not know who is who, but if the other information, anonymous information, that can be utilized, under certain rules and laws or regulations, but it is not that everybody should not know your personal identity, so that there are many risks, related to that, which we can send you in any form or shape, this is about privacy in data science.