 Hi everyone, I still remember when I was a kid and in the school We were getting invitations to attend different events with our parents in return We will get a present for free for me It looked like the most exciting plan for the weekend Although my parents didn't like it so much. I kept insisting and insisting and I think I managed to convince them Once or twice, but not as many times as I would have liked it What I couldn't understand at that time is that nothing comes for free So in return my parents had to listen to one talk Regarding or describing the benefits of a new toy or a new jogger and also sharing some personal information Names addresses ages and possibly answer some questions regarding food or playing the habits If we think a bit carefully, this is a Pretty similar business model to companies like Spotify Facebook or Instagram are currently using at the moment. I Want you to watch a video now is from a campaign run by UNICEF with nine children and teenagers That are completely anonymous. I will let you judge what happens the videos in Spanish I put some Titles they are auto translated so some of them are not 100% accurate, but that will not block anyone from understanding the context Let's ask you about the last games of basketball. I don't think you've put a lot of effort No, you've taken very good notes this quarter, right? Yes, how have you done it? Well, studying, the first thing was not good, right? For the fantastic season that you are doing in the team of the basket Yes, do you want to dedicate someone your success to my family and my friends? This is going to be on TV or something. There are many cameras and a lot of focus I Think you are studying English. Yes, you want to develop your artistic career in Hollywood You think that the endless story the movie is a truth I don't know what this is, I have this photo that shows that you effectively dedicate a lot of time to your body It is linked a lot with this style Well, it is that there is more to it, you sing, you play the violin, the flute, could you sing us something now? No, what a shame, no, no, what a shame, no, no We also know that you do contemporary dance, could you make us a demonstration here at the camera? No, no It gives me a lot of shame and I don't feel comfortable My God, I can go now Do you want to go? Why? And this is where you have taken it I can say my mother and I feel like a weird They have given you something of information about me It seems strange that it happened today, I am very shy So all the information that was sharing this video was getting or was gotten from the public Profiles of all these kids and teenagers from Facebook, YouTube, Instagram, Snapchat However, I'm not here today to tell you to think twice Before you share your personal information Because I do believe that sharing personal information also brings so many benefits For example at Spotify because we know your listening history We are able to deliver on a weekly basis Our discovery weekly, which is mainly a tailored list of songs based on your music days What I want to talk today is about the responsibility that we as companies Have to protect our users regarding how we collect, how we manage and how we share your personal data I will do that by describing all the data privacy infrastructure that we have built To help Spotify comply with the different data privacy regulations around the world Including GDPR So just to back up and make sure all of us are on the same page Spotify is a music streaming service. We launched exactly 10 years ago We have free and premium tiers and we are available in 65 countries We have over 180 million monthly active users We often talk about how big data we have but it's hard to put it into perspective So I like football and if we compare how many users we have Compared to the size of the Real Madrid stadium, this is more than 2,000 times And we have a catalog with more than 40 million songs Obviously, I think it's fair to say that we are a big data company We handle so much data Obviously not as critical as a spot if our sorry as Facebook or Instagram But that's not an excuse to have in place processes and tooling to help protect the privacy of all our users so This is by definition of the different privacy laws Users have the right to Access the data we process about them. They can also request a copy in a machine readable format And transfer it to another system or another party service In order to use it They can update the data if it's incomplete or inaccurate They can object to us from processing some or all of the personal data And they can decide that we should remove all tracks that we have from them all the personal data that we have from them How have we implemented this is by definition, but how we have done it at Spotify? We built a privacy portal So we try to simplify the interfaces for you are users towards a Spotify and this is how it looks like So from the website you can actually log into Spotify and you will find a window similar to this one on the left Sun hide on the left Sun Side you will have a menu and Since the last few months we have included the privacy settings Here is with where you can manage your data and you can control how we as a company make use of your data For example, you can decide to obtain an out from us processing Facebook data or Taylor Advertisements, but again, this is still on a high level. How does it work underneath? Let me introduce you to part padlock padlock is our global key management system But it's a bit more than that. It also handles user Consents at Spotify. We have a very complex and diverse data private or data Ecosystem data infrastructure ecosystem we have a backend Infrastructure with sounds like with millions of requests per second that are processing process by thousands of Microservices on our batch data processing is equally complex We have thousands of jobs written in different processing frameworks. We have here. We have big query We have spark and many others For this reason and and the data that these Processes or these systems and and pipelines use power really critical parts of our organization We use data to Obviously pay labels and right holders We use the data to do music recommendations and we use the data as well to run a bit testing However, even though we encourage to have And This diversity there are certain things that require a company-wide standard and one of them is privacy So it could be extremely time-consuming and resource-intense to validate different strategies And make sure that all of them comply to the standards that we have set at the Spotify So in this case we created a single standard that works Everywhere for everyone and for all our business cases so one of the standards that we have Selected is to encrypt our data So it's a simple one and all the data or the personal data that needs to be persisted needs to be encrypted Why do we do this? First of all because it reduces the impact of leaking a data set each user will have a key chain Associated so if one data set is leaked that data set is encrypted so then attackers will find the data set completely useless because Hopefully the the encryption keys or the decryption keys hasn't been leaked Second because we want to have and control the whole life cycle of the data from a central point The latter is especially important especially for such a big company like a Spotify So let's say that you as a user decides that we should remove all the data. How do we do it now? Well, we said that Each user has a set of keys. Well, if we remove the keys Data that belongs to that user is completely inaccessible What would it happen otherwise? Like if we didn't have this setup with encryption That means that we will have to know all the systems that are using the data Where the data is located who is Using the data and make sure that everyone removes The keys in order to or everyone removes the data. So obviously for us deeds would Never be an option so as I said before we encourage diversity we encourage autonomy and For us autonomy is one of the keys And is one of the successes main successes for a Spotify So we encourage autonomy of our squads that is the term that we use for Teams at the Spotify and we also encourage them to build their own solutions Without minimum contact with other teams at the Spotify. Why do we do that? Basically to increase the perform like the performance of of everyone so having a simple rule Encrypt the data before encrypt the personal data before It is persisted. It's a simple rule that everyone follow It requires a minimum overhead for all our all our teams And also we can make sure that we uphold the data privacy standards that we have at the company so How does padlock work? So each service every time that it needs to encrypt or decrypt the data will make a call to the padlock service So let's say for example that you as a user want to or wants to Take a look at your playlist even though if it's your own playlist that playlist is going to be encrypted So the playlist service will make a call to padlock And padlock will retrieve the keys that are needed to decrypt the playlist Then you will be able to browse it open it and see which shunts are part of the of that playlist But if we talk a little bit more in detail Each service has a unique and secret key Again, we do this to protect Spotify in this case like if the service key leaks It will only affect one service But it's not affecting all the services all the thousands of services that we have at the Spotify Additionally every time the service will call padlock will call it with a secret key At the same time padlock has a set of keys that we call root or master keys Those keys are never Shared with anyone or with any service outside the padlock service So what we do is we use the service key We use the master keys and we use a key derivation algorithm So what padlock will return to the service is actually the derived keys We use all the keys actually our story in a kassandra database You might know that a Spotify is on top of google cloud or our infrastructure is on google cloud So you might be wondering why are we using kassandra instead of a google cloud product The reason the main reason is that when we started to develop padlock Google didn't have any competitors at that time. We had two main requirements Global replication and high availability So by that time big big table didn't have global replication This feature is stine and alpha and a spanner was completely untested within a Spotify So our go-to choice or our default option was to use a product that already had Proven those requirements across the whole company again high availability and global replication So if you are curious about more details regarding padlock we recently published a blog post You have the the link there and I invite everyone actually to read it. We even provide more more information so just to summarize by Implemented this technology or this system by implementing padlock We were able to help Spotify comply with user rise to update object And erase the data for our users How many of you here have requested data from Any of the services can be like Spotify can be facebook can be google joe tube raise your hand Well, I would say it's quite many maybe like 60 percent or something So if you want to do it at Spotify again, we go back to the privacy portal So you will go to the privacy portal You will go to request my data and you will get a window similar to this one so the only thing you need to do basically is click on request and Automatically the process will be initiated. So this is how it will look like So you press the the request bottom Then you will get or you will go to step two Your data is going to be prepared. It will take 30 days And as soon as he's ready to download you will get an email We save here 30 days, but normally we deliver the data in about one week And finally step three um Once the data is available you should have got an email But you can also go to the privacy portal and see it here You will get you have 14 days to download the data And it will tell you like until when you can download it and the size of the data But again, how does it work underneath? We implemented rosa So rosa is The pipeline the aggregate that aggregates data is Spotify collects from our users upon their request So as I said you click on request that will be a trigger to the rosaria service And rosaria is basically like the middle layer or the front layer between Users and all our backend services So we have a service that is called subject access requests that service will Make a request to rosaria saying please give me all dependent requests that haven't been served at the moment So let's say that you as user ID and and we index all those users based on user ID So you will have an internal user ID that can be let's say like one two three So you will we or they actually the service will get User one two three has requested the data At the same time we will actually store all the user IDs That we still need to serve the data in big query and also gcs Next step would be actually all the upstream dependencies to rosa will start producing the data Which data does For example payments That's for example account information or that can be as well streaming history And we will do that for the last 90 days One important thing here is both rosa and rosaria data agnostic Meaning that and I will go into more detail in the next slide But it means that we don't care about the data itself We for that actually we have the different pipeline owners or the different data owners Possibly this is a little bit of a killing like if you have just a couple of data sets or Like a few data sets, but for us we cannot ensure that one team should be responsible for all the data So in this case what these pipelines will do Apart from producing the data that is needed for our users They will actually decrypt the data from the semantic point of view from the From the field point of view and they will encrypt it with the pipeline key Then the collector will collect all the data as I said can be payment information can be Streaming history all that will be collected will be decrypted with the pipeline key and encrypted with the rosaria key And it's going to be a story in big table Why do we do that? It's easier to index in big table And also we will put a ttl We want to make sure if for any reason you as user will not download the data The late the data will not live forever in our big table database So after a few days if you haven't collected it is going to be removed And after that the collector will send a request to rosaria saying Hey the data for user id one two three is ready to be collected Rosaria will go to big table. They will collect that data. They will produce a compressive JSON file and then that's the time when you actually Get an email your data is there. Please download it in the next 14 days So I already introduced this topic a little bit. So why this setup? We have thousands of data sets We cannot expect that one single team Knows all the data that we are handling at the Spotify It's impossible to know the different fields and make sure that we don't deliver Things that we shouldn't and I'm talking about internal identifiers It would be extremely confusing for our users to get an internal identifier in the file And just wanted like what is this? So that's why we actually rely on the data owners to ensure that the data is delivered Is exactly what the customers are looking for So both rosaria and rosa as I mentioned before are completely data agnostic And the old the other thing is data is encrypted end to end You saw that we have different process to encrypt the crib based on pipeline key based on The service key, but the main idea as I said before is we are relying on encryption And data is just decrypted when we share it with the customer So just to summarize Different user rights and we have managed to solve All of them just by building two different systems So rosa is serving access and portability While padlock update object and need a share user rights However, we have also built Some additional controls we have built a system where it keeps track of all them Data access requests based on who gets access and the criticality of the sensitive sensitivity of the data meaning that More sensitive data will require more hoops. And what that means is basically a higher stand up standard and We also keep your data just as long as is needed for Business and also legal purposes. There is some data that we can like we need to keep In order to provide you with the service the Spotify service And that is for example your songs Like your song library that is for example your account information or it can be your playlist But if that is not the case, we will just keep as much as is as it needed So even though my talk was one of the first today I'm sure you will get so many information and so many like good Tips along these two days. So I wanted to summarize my presentation in three main takeaways The first one is we have as I have detail already We have use encryption And we do believe that encryption works completely fine for our use case We are going to start exploring Anonymization techniques such as differential privacy or k anonymity The reason for that is we have We are a big data company. We need to take data driven decisions We have a lot of analysts that are actually trying to find trends and And they don't need to understand what you as a single user is doing what they need to understand is again the trend of Users that are in this country Around this age are actually doing or how they they are using this product So that's why we believe that anonymization is the next big thing that we are going to look at and This is not a substitute of encryption in any way. We will continue doing encryption. This is just a complement But whatever you decide Anonymization encryption or any other solution Keep in mind that you must protect the privacy of your users People want to engage freely and safely Using digital technologies That is why it's really important Like data regulation as GDPR or any other data privacy regulations Because it gives the power to the users to decide how we as a company may use of the data And how and explain why So Privacy standards are crucial and the last thing Is very Is very easy actually to keep all the data that we collect Is much harder actually to start thinking about the business cases That you need to implement in your company and identify the data that is linked to those business cases However, keep in mind as well that Encryption has penalties Encryption has cost and performance penalties and even though we have optimized and stream like the process That Help us reinforce the message that we just want to keep the data that we need for our business and legal obligations So keep the data you need and follow the data minimization principle and data and data driven decisions are kind of revolution Or revolutionizing the whole world at the moment and privacy is One of the big players here. So remember why you are here remember that you have a responsibility And make sure that you protect your users. Thank you very much Okay, I bring it for okay. You have the microphone. I have many questions Go ahead. Okay. The first one Had you considered other alternatives of encryption? Um, yeah So one option that we were considering is actually Defining a deletion data endpoints per system So then every time for example, a user wanted to delete data We from padlock, we will call all the different deletion endpoints and remove the data But that means that we had to rely on Everyone deleting that data in this case our option was more we encrypt We have the keys we remove the keys from a central point is gone everywhere Another option that we consider was tokenization And that means that all data would be stored in a central database Yes, and the different systems we have tokens referencing those That data the problem and even though that look promising is different systems have different Performance or latencies So in our case it would be like extremely hard to have One single database will fill in all our cases or business cases That's all my doubt, but I have no I have another one I think that you are really brave your company is really brave, but you are really brave because Few companies few enterprise talk about privacy Why you are here? Why you are in big data spain talking about your policies for the information for something that is People just to try and hide so hard Well, this is actually the first time that Spotify is talking about data privacy the first time first time And it's here in big that for me actually been in my hometown talking about privacy is like a blast. Thank you And why we do that is like We shouldn't forget that where we are at the moment is thanks to our users And data privacy is key for us and as I said before like we have a responsibility to protect the privacy of all of them Sure, you are and we are also like proud of all the Things we have done to be where we are at the moment. So now it's time for us to share it But you're sitting on example that That's what we are hoping the more we like the more companies talk about it. The better is going to be for the customers Thank you so much. We don't have thank you. I'm so sorry If any of you have more question should be outside because the the time is not Perfect. I'm so afraid to grab me. I'm gonna be outside the outside and ask the experts It's gonna be there and they can ask you as many as you as they want. So thanks a lot and big applause for her Thanks to you