 Let me jump in. As you can see, this is work on data privacy. And in particular, of course, the landmark law that the EU introduced, which is the General Data Protection Regulation. So with all witness, the growth of services that really built on large, very personalized datasets that big platform companies are collecting and having sort of involuntary lived through the last two years, we also have come to appreciate many of these services. They're very valuable in many ways. But they're also down sides as we're sort of becoming increasingly aware. So many of you have heard about high profile and very big data breaches or ways that consumer data was abused in various ways. And sort of beyond these things that really made headlines, I think there are many examples where consumer data is harvested in ways that's very opaque, very hard to understand for consumers and then recombined and resold. And so many consumers do not understand what is actually happening to the data that at some point might have agreed to share. And so sort of against this backdrop, there's an increasing push to regulate these things. The GDPR is one example, but there are other examples in the US in a variety of states, but also worldwide attempts to introduce similar laws to the GDPR. So the General Data Protection Regulation came into effect on May 25th, 2018. And we're going to try and answer two simple questions, really. First, how does GDPR affect firms access to data and in particular firms that make sophisticated use of consumer data? And how does GDPR affect the firm's ability to predict consumer behavior? So this is something that you can think of this as a statistical notion of privacy, at least in Europe, there's oftentimes a sentiment, I just don't want to be as transparent to companies, I don't want to be predictable. So that's sort of an interesting question per se. And lastly, how does this affect the bottom line of companies that really crucially rely on advertising, which is how many online platform companies, of course, monetize their products. So the setting here, we are working with data from a third party intermediary in the online travel industry. This is, as you know, a big and important e-commerce industry. This intermediary works with online travel agencies that spend 40 countries. So this is of course well suited for our analysis. And what I would also like to mention here is that since these are all different OTAs, there's also quite a bit of variety in the way that these companies might or might not have become compliant with GDPR. The company itself, as I said, is a sophisticated data-reliant company. A specific aspect of its business is to predict whether or not a consumer end up purchasing a product based on the history of data that this consumer leaves and then tailor certain aspects of advertising to this prediction. Okay, so let me give you a little bit of an overview of the literature. I'm going to keep this brief, but I would like to mention Garrett's work and his co-authors, which is the most closely related paper, is quite complementary in terms of the data scenario and the questions being asked. And I want to point out that sort of where we are looking at the same outcome variables become quantitatively to very similar conclusions, which is of course reassuring. Garrett's paper also has a nice, proposes a nice method to bound the effect of GDPR, which creates a missing data problem because GDPR basically keeps the data out of the hands of the researcher as well, and that creates for certain problems. And the paper sort of proposes a method to deal with this. So then I just want to briefly mention here that there's now an increasing interest in the theory literature also to understand how data gathering affects various economic aspects of prediction and online advertising. And so one thing that's being brought up here are data privacy externalities, and we are going to point out sort of a concrete example of such a data privacy externality, as I will show you later on. Okay, so the outline for the rest of the talk, I'm going to talk about the institutional details, introduce you to our empirical strategy. We're going to measure the consumer response. So this is how many consumers actually end up in the hands of the firm after GDPR comes into effect. So how much data is the firm still able to collect? And then look at the consequences for advertising revenue and the firm's ability to predict. And then I'm going to conclude. Okay, so let me give you a few more institutional details. So as many of you know, the GDPR was adopted in 2016 and implemented in May 2018. Now we have to interpret the results relative to existing protections that already existed in the US, such as the data protection directive or the privacy directive. Now these are laws that didn't have as much bite and where there was a lot of leeway given to the member states and how to implement them. Okay, so but GDPR is not sort of completely de novo. You know, it sort of created incremental protection. What we're in particular looking at is the consent portion of the legislation. So this is basically the requirement that in order to collect your data, the firm needs to ask you for your permission. And I'm sure especially those of you have spent some time in Europe recently, you know, this is sort of the many clicks that you have to make when you're browsing various sites and, you know, can be quite annoying. But that's the part that we're looking at. Okay, so the compliance with this law is from the firm's perspective really important. There are heavy fines that can be imposed. Despite those fines, there is some evidence that firms have complied with this law, at least initially in a very heterogeneous fashion. Okay, so what I would like to do now is to just explain a little bit what it actually means to collect the consumer's data and sort of how this works from a technical point of view. So that's going to be like, but I think it's going to be illuminating in order to understand our results. So many of you might know that what firms do is they basically save a little file in your browser, which is called a cookie. And I like to think of these cookies as essentially creating a panel identifier. So they're telling the firm that this is a user, you know, every time I see this cookie again that's coming back gives me essentially a unique identifier to which I can then link other pieces of data. Now, an important distinction for our purposes is between first party and third party cookies. So first party cookies are those that the publisher, let's say the New York Times or the Washington Post would set on its own website and that's sort of really necessary for the functioning of the website. Third party cookies are those that are set by other entities, such as advertisers on the New York Times or on the Washington Post website. These are non essential from the perspective of GDPR. And if the user opts out in a blanket fashion, so you just, you know, say no to all data collection, then the user is still sending, you know, might still leave data in terms of first party access but no longer in terms of third party access. So that's going to be important. Okay, so of course, before GDPR there was also, there were also means to protect your privacy. Users could delete cookies, engage in private browsing, use ad blockers and so on. Now, there's an important distinction however between the way that these previous means of privacy protection manifest themselves in the firm's data and how the GDPR protection manifests itself in the consumer data. So existing privacy tools primarily serve as a form of obfuscation. So what do we mean by this? If you engage in private browsing or you delete your cookies, you're going to reset your identifier. That means that to the firm, so first of all, your data is still collected by the firm, but you appear as a new user. GDPR opt out however, means that the third party firm completely loses access to your data. Okay, so that means that it's a form of deletion. This data really ceases to exist for the firm and it's not just obfuscation. Okay, so this substitution to the extent that it happens from these pre-existing privacy means to GDPR opt out has therefore important implications and I just want to once more illustrate this with an example. So it's a bit of a heuristic example, but I think it's going to be helpful in interpreting our results. So we have here three data scenarios, full visibility, obfuscation and GDPR. There are two time periods in which a user can be observed, one and two, and the user can either purchase or not purchase the product. So first, we have three different users that are not privacy conscious and then one privacy conscious user, four. Now, if we look at full visibility and compared to obfuscation and GDPR for these first three users, you see that the data scenario looks exactly the same. However, four is privacy conscious. In the full visibility benchmark, she leaves basically two traces of data the first time she purchases and the second time she doesn't. Now, under obfuscation, she purchases, then renews her identifier and then reappears as a new user, five, the second time around. Under GDPR, her data completely vanishes. This third-party firm is no longer getting access to it. So now, and this is sort of where it becomes a little bit heuristic. If a firm wants to predict, let's say, your willingness to pay or your likelihood to purchase, it will compute conditional means, of course, in a sophisticated way, perhaps, by using sort of high-dimensional methods. But it will pool users that look like one and users that look like two and compute some conditional mean. And as you can see, under obfuscation, the firm might pool the wrong users, because in reality, user five is not at all like user two, but is a person who has already purchased before. So what this means is that if these users, the ones that were using pre-existing privacy tools, also opt out, so to the extent that there is this correlation, the data that the firm collects could actually get cleaner. Because these users that were kind of leaving these polluted spells are now no longer part of the dataset. Okay, so as I said, we're going to do this project with data from a third-party intermediary, the online travel industry. It's ideal for studying the impact of GDPR on a firm that is really sophisticated. Of course, many other firms were also affected by GDPR, but in particular, we're interested in those that deal with data in a sophisticated way. As I said, the primary business of the company is advertising, and for that, it predicts whether or not consumers purchase. And so we know that the algorithms that they use for that are very sophisticated. Okay, so what do we see? So one nice aspect of our dataset is that we essentially see the entire pipeline of what this company is doing. We see the queries by the consumers. We see what happens in the sponsored search auction. So we see the advertiser side. We see whether or not the consumer ends up purchasing. And we also see the output as various measures of fit of this prediction algorithm that the firm deploys. Okay, so let me introduce you to our Chiara. Sorry, quick question. Are there organic results in these intermediaries so that a purchase can happen without these advertising? Yes, so that's a good question. So think of these, I can't name names, but think of various online travel agencies. So really, you can be organically on and purchase something from one of the likes of Expedia. And then alternatively, you could click on the advertising and it's going to be travel related advertising. Got it. And you get purchases from both, right, from organic and sponsored. Yeah, yeah, thank you. Okay, so our empirical strategy is simple. We are using a difference in differences design. The EU countries are treatment countries. The control countries are those have been chosen so that travel patterns are roughly the same. So we have used the US, Canada and Russia. And we see data from six weeks before GDPR comes into effect to seven weeks after. We're going to collapse the data at the operating system browser site country product level. And I'm going to show you just a simple difference in different estimates and also graphs from a time varying treatment effect specification. Okay, so first set of results, measuring the consumer response. Did consumers actually make use of the opt-out features? So do we see less consumers and or less consumer data ending up in the hands of the firm? And did the overall composition of users change? Okay, so to remind you, we are our data scenario is that of the third party firm. So we only see what the third party firm observes. So that means obviously that when users opt out in the post period, we also don't see those users. And we are relying on the parallel trends assumption that we, you know, by comparing them to the treatment group can figure out how many users opted out. And we're going to use various measures such as the total number of unique copy identifiers and the total number of searches that have happened on the website. So before I show you results in the aggregate, I just want to show you one admittedly very clean example where we know that the OTA has faithfully implemented and became compliant right around the date that GDPR came into effect. And you can see the there's sort of a nice flat with slight increasing trend line among the non-U versions of the OTA. And then there's a sharp drop in the number of cookies right when GDPR comes into effect for the EU countries. So one thing that I actually forgot to mention earlier that I should just highlight again is one other nice feature of our data is that we have these different country versions of one OTA. So, you know, OTAs run essentially the same website in many different countries. And that gives a particular clean comparison group because, you know, in some countries they're implemented. So in the EU countries they're implemented and became compliant with GDPR in other countries they don't. And that it's all we can do this within OTA. Okay, so I've shown you this. So what does this mean in the aggregate? So, of course, it could be that not all OTA is implemented as faithfully as the one that you see here. The actual effect that we're going to measure is some combination of OTAs that became compliant and those that didn't. And this corresponds to roughly 12% reduction in the overall data that the third-party firm observes. Okay, so you can measure this in terms of a unique number of cookies or recorded searches. So it's somewhere between 10 and 12%. Okay, so roughly approximated, that means that about 10% of users make use of the OTA feature. Okay, so here is the time varying treatment effect specification. Again, you see that right around week 22 when GDPR came into effect, it drops and then it stays low in the post period. Okay, so next question. What happens to the trackability of users? So how easy is it to follow one user around after GDPR comes into effect? So what we're going to do here is measure how many users that I saw in time period T do I see again K periods from now? And we can compute this measure for various values of K. And this is basically a measure of persistence or cookie half-life. So here's again, an example of the OTA that we know implemented this implemented became GDPR compliant right around the date that GDPR came into effect. And you see surprisingly that our persistence measure actually jumps up. And this is something that we find more broadly if you look at the aggregate results. So we find that users are easier to track once GDPR comes into effect. Now in light of what I showed you, perhaps it is not so surprising at all. And the way we interpret this is that this is a selection effect. Okay, so we of course only see those users, those remaining users that do not make use of the opt out feature. And if the users that make use of the opt out feature were the ones that were leaving shorter spells beforehand, because of the use of these other means to protect privacy, such as private browsing, then we see only a selected sample in the post period, the ones that leave longer spells. So our preferred interpretation here is that this is a selection effect. Okay, so what does this imply for advertising revenue and the ability to predict consumers? So first of all, again, why is it interesting to look at predictions to the extent that this is relevant for the business? Obviously, we want to know what happens to firm's ability to predict, but also at least we would argue it's interesting per se because you can think of this as a statistical notion of privacy. At least in the EU, there's a very strong sentiment that people do not want to be predictable. And so this has to be also now interpreted in light of potential externalities that are imposed by those users that opt out. So again, in line with the example that I showed you earlier, if users opt out and make the remaining data set cleaner, because the firm now has an easier way to distinguish users that leave genuinely short spells from those that leave fake short spells, that could have an impact. And through this opt out users might in fact make the other users more predictable. Okay, so let me just give you a very brief overview of what it means or how predictability is actually measured. So the outcome measure here is whether or not a user converted and purchased a product after observing this user for a certain number of times. And so this is a simple conditional probability that the firm computes. And you can look at this, you can look at measures of predictability either in terms of mean squared error, or measures such as the AUC, the area under the curve, which I'm going to just explain to you briefly now. Tobias, sorry, Pinar has a question, let me read it. So post GDPR users can also customize cookies and they may be agreeing to a minimum number or less invasive set of cookies. So does persistence means better traceability necessarily? Can you differentiate between the type of cookies installed before and after? So again, the one caveat, of course, is that we only observe users that have agreed to some data sharing through the first party website. So we actually observe the types of cookies. We can, for instance, distinguish between first party and third party cookies. Now, I'm actually not sure whether the users that we observe under less stringent tracking. We would have to check that. That's a good question. Yeah. So we can distinguish certain types of cookies, but we haven't really dug in to details of data sharing. And my prior, at least here, is that many users are not going in the weeds of this. They're not sort of going in the details and thinking a lot about how they share their data, but usually they either just give their consent or opt out entirely. But, of course, that's just a guess. Thanks for the question. Okay. So I think before the pandemic, people didn't necessarily know what true positive rates and false positive rates are, but now we're all very familiar. So let me just keep this brief. The AUC measure is a measure that basically keeps track of both the false positive and the true positive rate. And you want a better classifier has a larger area under the curve. And you can sort of think about the extreme point here. You want to have a high true positive rate while having a low false positive rate. So the further you're up left, the better your classifier performs. And one particular advantage of the AUC measure is that it's invariant to what is called the class proportion. So it's not dependent on how many positives and negatives there are actually in the data. And so this actually matters because GDPR has led to these shorter spells. That means that they're, you know, to the extent that the number of purchases state the same, they're now more users to whom these purchases or, you know, sorry, less users through whom these purchases could be attributed because the spells are getting longer. And so the class proportion actually changes mechanically. And so the AUC basically keeps track of this changing class or is invariant to this changing class proportion. So this is going to be a little bit important for the interpretation of our results here. So we see that class proportion goes significantly up. The mean squared arrow would suggest that consumers be actually became slightly less trackable or less predictable. This effect size is small. But again, because there's no significant effect on the AUC, our interpretation of this is that this is predominantly driven by the changing class, this sort of mechanical change in class proportion. And looking at our preferred measure here suggests that user predictability actually did not change significantly as a response to the introduction of GDPR. Okay, so the bottom line is here, we cannot reject the null that predictability is unchanged. I want to now once more come back to the illustration that I showed you earlier. And this puts these potential spillover effects, there are two as a result of the two opposing effects. The first one is that there's just less data to go around because there are fewer users in the data. But on the other hand, because of these spillovers, the spells become potentially cleaner. And to the extent that these effects offset each other, they might explain this unchanged ability of the firm to predict despite having less data compared to before. Okay, so last set of results that I want to look at, what does all of this imply for the ability of the firm to generate advertising revenues? So first of all, looking at various measures of clicks, there is a mechanical reduction in revenues just by the very fact that there are fewer users to go around, there are fewer eyeballs. Okay, and this is just because users that opt out can no longer be used for advertising by this third party firm. Okay, so this should be commensurate with this overall reduction in the amount of data that the firm is able to gather. If you look at overall revenues, we find again an imprecise null effect. And here, interestingly, there's again an explanation that show, which is that the prices for the users that are still in the data that to whom advertisers are shown have actually gone up. So let me now explain this. And again, it comes back to the explanation that I gave earlier. So here, just a summary of the results. So overall clicks, number of clicks goes down, this has this mechanical revenue effect, but the average bit actually goes up. So what's our explanation here? We know that bidders or advertisers care about the conversion rate, and they actually oftentimes bid bidding functions that are a mechanical function of the conversion rate. So now before GDPR, privacy and non-privacy conscious consumers were pooled. And we know that the formal leave shorter histories, whereas the latter ones leave long ones. But those that frequently reset their identifiers can no longer be recorded as a conversion. Okay, because it's no longer the same user that user that you saw before at some point. And so because it's harder to get conversion rates, the advertisers will reflect this in their bid. Now, the spells are becoming longer. The advertisers lock more conversion rates and will as a function of that increase their bids per user. So again, this comes back to the same explanation that because of the spill overs, the sort of dirty spells are leaving the data, the longer cleaner ones are staying in the data, and that's now reflected in advertising prices. Okay, so to wrap up, third party access by the firm was negatively affected. And here are results chime with what Garrett and others have found and speak to the very important relationship between data privacy regulation on the one hand and competition on the other. I should mention, it's not, I mean, I cannot reveal to you which firms data we're working with, but it's not one of the large firms. Okay, it's a smaller firm that actually depends on this third party access, which is not true for the large advertising companies that have their own ecosystems and then don't have to go through other publishers necessarily and and through the use of these cookies. Okay, so so in that sense, our results are very much in line with what others have found that the GDPR is falling harder on smaller companies. However, there's also a bit of nuance to our results in the sense that because of these privacy externalities that others have conjectured, there are spill overs and interestingly here, they go the other way around users that are privacy conscious opt out and lead make other users more transparent. And this is reflected in the ability to track those users and also in the advertising prices for those users. Our last result we think is also very interesting and again ties back to a statistical notion of privacy to the extent that Europeans or European regulators have hoped that users are becoming less predictable. This is of course, one example, but it's a case study that at least shows that for one company that does deploy a very sophisticated machine learning algorithm, this this has not panned out. And so predictability is unchanged. And with that, I'm done and I'm looking forward to Garrett's comments and of course, also to all your questions. Thank you so much, Tobias Garrett, it's all yours. All right. Well, thank you very much for a nice presentation, Toby. I want to start off by saying that this is one of my favorite papers on the GDPR. The paper is very well done. The analysis is careful. It's thorough. The paper is well written. And I highly recommend that you check it out for yourselves. I also want to reinforce that Guy Arredor, who's the lead author, is on the market right now. His job market paper is really interesting. It's a field experiment that restricts access to Instagram and YouTube to investigate people's substitution patterns. So Guy has written at least two very excellent papers. It's worth checking out. Today in my discussion, I want to highlight three high level points. The first is about the contribution of the paper. So they're emphasizing the contribution of talking about privacy and data externalities. And I want to emphasize a secondary contribution that I think is also very important of their paper, which is that it wrestles with the how to do empirical economics when we have data censoring due to privacy regulation. So this paper is looking at the GDPR, which defines personal data very expansively to be all data relating to a person, regardless of whether the data is personally identifiable. So using encryption or using pseudonymization isn't sufficient from the perspective of the GDPR. So this paper shows very nicely that this privacy regulation is increasingly limiting the data, pardon me, that firms can collect. And this makes it harder for firms to do data driven decision making. But I also want to highlight to you that privacy regulation also limits the data that's available to researchers. So as economists, whether it's an IO or labor development, we rely on this personal data for our research. Heck, even macro economists have discovered micro data recently. Now some will say, well, the fundamentals haven't changed, right? The fundamentals of the economy are still there. The accounting data still exists. But this paper, I think, highlights that firms and researchers rely on detailed micro data to gain insights about the decision making of economic actors. And so I worry about losing that. And that brings me to my second major point, which is about the policy implications of this paper. I appreciate that Toby was highlighting the anti-competitive nature, which I think is a really important point from their paper. But this paper reinforces the broadly known fact that sites are using opt-out consent and thereby obtaining high consent rates on the order of 10%. On the order of 90%, I should say. Now, this is an issue because the GDPR actually requires strict opt-in consent. And depending on how strict you make this opt-in consent, you would expect the consent rates to be much slower, like on the order of 10 to 25%. And so if you think if you move from a world where we're observing 90% to 100% of the data to instead observing a minority of the data as little as 10% to 25% of the data, this would be really disastrous for us as researchers and for some firms as well. So I think this is a really difficult problem about studying the GDPR. There's a gap between the letter of the law and the reality on the ground. And I think this is something that all GDPR researchers need to grapple with. And I think they could discuss a little bit more, though they certainly acknowledge this point. I would like them to more clearly articulate that better data quality does not necessarily mean that the firms are better off. And I highlight this because some people that have read this paper have taken away that the GDPR is just an all-around win. It's just a win-win for privacy for consumers in the ad industry. And I emphasize this because the first order effect is a revenue loss to the focal firm. So I see there's an improvement in data quality, but I see that as a silver lining to the cloud, which is that the GDPR has had a net harm on this firm. The final point that I would like to highlight is some complementarity with our own research in this area, which Toby kindly brought up as well. So we use Adobe Analytics data to examine over 1,000 websites and how they were affected post GDPR. And we also find that there's a 12% reduction in the page views and revenue that's recorded by Adobe. And we also find that the set of consumers that are consenting are positively selected. And I think that's worth highlighting because I think Toby and myself sleep better at night knowing that two papers have arrived at the same conclusion almost to the number. But from there we branched out in different directions. So their paper has much better consumer data and they do a very nice job of unpacking the form and consequences of selection through consent and tell a very nice story about data quality. Now we, on the other hand, benefit from having data from more firms and from having some information of how the user arrives on that website through which marketing channel. And so that allows us to relax the assumption that the data reduction after the GDPR is only due to consent and allows us to create some bounds on the consent effects and the possibility of real harm. And we're able to conclude that there's some small real harm of the GDPR to both websites, traffic and website revenue. All right, so I want to conclude by thanking the authors for enriching our understanding of privacy regulation. This is really important. Regulators around the world are trying to grapple with this topic. And this paper makes an important contribution for them. So we'll see it. We'll give a bunch of a name.