 Hello, my name is Alyssa Columbus. I am a NASA data knot and I am currently serving as the data governance program lead for the largest credit union in California. I am proud to present R for data privacy and governance. So what is data privacy and governance? Well, it depends on who you ask, but in general, data privacy is focused on the use and governance of personal data to ensure it's being collected, shared, and used in appropriate ways. Data governance is focused on the availability, usability, consistency, integrity, and security of every organizational data asset to ensure that organizations make the most use of their data while following all relevant standards and regulations. On the right hand side of this slide, I've listed just a few of the many standards and regulations that data governance and privacy programs are encouraged and required to follow. The EU's general data protection regulation, also known as GDPR, the California Consumer Privacy Act, also known as CZPA, the Grand Leach Biliac GLBA for protecting banking and financial information, the Health Insurance Portability and Accountability Act HIPAA for safeguarding health and medical information, and a few standards from the International Organization for Standardization, ISO 27,001, 27,701, and 38,500. This map provides a helpful infographic of major privacy regulations around the world and a broader perspective on how global privacy regulations are. As of today, there are over 500 data privacy laws in effect in over 80 countries around the world. If one's region hasn't been impacted by privacy legislation, it likely will be in the next decade. Organizations that don't follow these regulations or don't do privacy right are at risk of government reinforcement, class action lawsuits, financial ruin, damage reputation, and loss of customer loyalty. Privacy is now a necessity for every organization and a vital part of every organization's data governance program. To dive deeper into data privacy and governance, we must keep in mind that a program that implements all of the concepts I previously mentioned is a hybrid of many ideas. On the one hand, you have to have a rigid enough program to stay compliant with every rule and regulation. On the other hand, you have to be flexible enough to adapt to changing times and needs of the organization in all of its departments. When you're building a program like this, you have to think like you're building a community of users who are all working toward the same goal, protecting the most valuable asset of both your organization and the people who trust your organization. Overall, a data privacy and governance program boils down to the following mission statement akin to the golden rule. We must approach and attend to our data like we would approach and attend to the people who are represented in our data. It's essential to keep this mission statement in mind through the rest of this presentation. For a data privacy and governance program to be successful, it needs a proper framework for implementation. This image is a generalized template framework I created for data governance and privacy program implementation in any organization. A more detailed version of this image is attached to my presentation to enable you to an implement a program like this in your organization. The bottom layer of the pyramid is the foundation. It is comprised of executive stakeholders. If you don't have executive stakeholder buying and support, you probably won't have a successful data privacy and governance program. The executive stakeholders provide direction, funding and support for the program. These stakeholders also empower the program through deep knowledge and ownership of the organization's data infrastructure. The next layer above the foundation or the third layer from the top of the pyramid is that of the data privacy and governance leaders. They align the program with the executive stakeholder's vision, oversee all operations of the program, monitor compliance and collaborate with the council and stewards. The next layer above this layer or the second layer from the top of the pyramid is that of the data privacy and governance council members. This council drafts and enforces policies and procedures to comply with all applicable regulations and standards. This council also acts as the liaison between the program's leaders and organizational departments. Finally, the top layer of the pyramid is comprised of the data privacy and governance stewards. These stewards are responsible for creating and owning data assets for reporting, modeling and product design. They also take the time to write documentation and complete regular risk assessments. Everyone in an organization is responsible for the success of a privacy program. To paint a broader picture of this, I will detail step by step how to use R for better data privacy and governance at the top two levels of this pyramid, the council level and the stewards level. Starting with the data privacy and governance council level, I will show you how to tag metadata with R to comply with CCPA, GDPR and similar privacy regulations. Recall that the council is fundamentally responsible for enforcing policies and procedures to comply with privacy regulations standards. The following code illustrates how to tag metadata on a sample data set I made with completely arbitrary dates, first names, credit card numbers and payment amounts. First, we can tag the sensitive elements of a data set in more general terms like names, CCN and transaction instead of first underscore name, card underscore number and payment to be aggregated, analyzed and filtered later for compliance. This tagging also helps the stewards determine which elements of the data set to anonymize, which I will cover later in the stock. Walking through the code, we can add a PII or personally identifiable information attribute to our data object through the ATTR command and add the values name, CCN and transaction to represent the first name, credit card number and transaction or payment information in this data set. We can then verify that this PII attribute was successfully attached to our data object through the attributes command. We can also tag a data set to categorize it into its appropriate regulations. Since this data set contains a name attribute, it is subject to CCPA in California and GDPR in the European Union. Since this data set also contains payment and credit card information, it is subject to GLBA in the United States. So in total, we can tag this data set as subject to CCPA, GDPR and GLBA. Similarly to the code on the last slide, we can add a REG, which is short for regulation attribute to our data object through the ATTR command and add the values CCPA, GDPR and GLBA to represent the regulations that this data set falls under. We can also verify that this REG attribute was successfully added to our data object through the attributes command. Another method of tagging metadata would be to build a model to detect which sensitive attributes are in a data set or database and tag the data asset accordingly. However, that is beyond the scope of this talk. To permanently save these additional attributes to our data object, we can save our data object as an RDS file with the save RDS command. Saving the data as an R file automatically compresses the data and collects any R-related metadata associated with the object. These properties will come in handy in the next few slides as I move into the next level of privacy programs or the stewards levels, use of R for better data privacy and governance. At the data privacy and governance stewards level, I will show you how to anonymize data in R that was previously tagged as sensitive by the council. Recall that stewards are principally responsible for creating and owning data assets for reporting, modeling, and product design. First, we need to load the RDS file we previously saved. We can do this directly with the READ RDS command. We can also verify that all of the attributes we saved already are in the RDS file we just loaded with the attributes command. Now to anonymize our dataset, we need to first create a custom anonymize function. This function will need the digest package, so we'll load that first. Then we'll define anonymize as a function of an object, X, and a hashing algorithm with the default here set to MD5. There are nine different hashing algorithms to choose from in the digest package for different use cases, but I decided to use MD5 here for simplicity. This anonymize function will then apply the selected hashing algorithm to input object X. Loading the D-plier package and applying our anonymize function from the previous slide to our dataset sensitive elements, first underscore name and card underscore number, we produce the following hash and anonymize result. Pretty effective, huh? Now for some final thoughts. In addition to the markdown in HTML slides and data governance and privacy program implementation and training framework attached to this presentation, I've outlined additional resources about data privacy on this slide and data governance on the next slide. With regards to data privacy, I'd recommend reading Data Privacy Law, a practical guide by G.E. Kennedy and LSP Prabhu. In an introduction to IT privacy, a handbook for technologists by Travis Bro. I'd also recommend joining the International Association of Privacy Professionals, also known as IAPP. They regularly have webinars and other virtual events that can get you up to speed on the latest developments of international information privacy law. With regards to data governance, I'd recommend reading Data Governance, second edition by John Ladley and non-invasive Data Governance by Robert Seiner. These are the two best books I've read on the topic. There is an active community also in the Data Governance Professionals organization, also known as DGPO. And whether in data privacy or data governance, I highly recommend joining a community if you want to learn in-depth about a subject. Joining the R community advanced my knowledge significantly and incredibly quickly when I first joined it a few years ago. I learned more in three months from the hashtag RSAT Community on Twitter than in a year of reading tutorials and using search engines to find the answers to my questions. In conclusion, I want to thank the organizers of USAR 2020 for the opportunity to present my work and thank you very much for listening to my presentation on R for Data Privacy and Governance. If you have any questions on anything I presented, please feel free to contact me at my website, which is my name, elissacolumbus.com. I've also included my other social media contact information for Twitter, LinkedIn, and GitHub. Thanks again. And I wish you the best in the formation of your organization's Data Governance and Privacy Program.