 So, without further ado, I guess, Alessandro, the floor is yours. Thank you, Alex. Thank you very much for the invitation, and especially to Julian and André, and thank you, Jay, for agreeing to discuss the paper, I really look forward to hearing your thoughts. Jay has, if anything, the seminal paper on this topic, and so we haven't managed to talk about it one-on-one actually yet, so thank you very much. This is joint work with Dirk and with Tangang, who's a rising third-year student at Yale. This is a paper on information intermediaries, on data brokers, with an emphasis on information design. What do we mean by social data? We mean that data about an individual user is informative about other users similar to them. You know, we all have something in common in this webinar. We're all interested in the economics of platforms. Now the social dimension of the data drives the value of digital services provided by many large online platforms, of which I don't have to tell you about. For example, shopping data conveys information about demand for similar users. Now the point of this paper is going to be that the social nature of the data, or if you prefer the correlation in types, generates a data externality that is not signed a priori. This is not like pollution or climate change or global warming. Whether the information that Google learns about me from you benefits or hurts me will depend on what problem am I trying to solve and how Google uses that data. So this is not clear at the onset. So within this growing literature, we ask three central questions. One is, how does the social dimension of the data facilitate its acquisition by a large player? How does the market power both of the platform and of any data buyers, any advertisers or firms that acquire this data, how does that change the terms of trade in the downstream market and what is the best information design, what is the best way in which a platform should collect and redistribute their users' data? So in the interest of time, I'm not going to spend any further time motivating this. Let me just tell you exactly what the model is in all of its details, and then I'll come back and comment on sort of why is it that we model it the way that we do. And I don't see the chat, but please ask any clarifying questions at any time. So in a picture, this is going to be our model. They're going to be critically greater than one consumers, and they will exchange information and money with a single data intermediary. Information can go both ways. I can tell the platform something about my preferences and they can provide me recommendations, so that's unrestricted. The data intermediary is going to monetize on the producer's side. They're going to sell this information to a single firm in this model. This firm, call it a producer, has an independent interaction with the consumers downstream here at the base of my triangle. They sell them a product. This interaction happens anyway. The question is, how will it be informed by the data that the producer has acquired from the intermediate? So this is the map that we'll be playing in. Now, let me tell you a little bit about each of these edges. So we're going to have a symmetric distribution of the consumer's values or their true types, W, WI. Critically, each consumer does not know their value perfectly, but rather, they only observe an informative signal thereof. All we require is that all the signals are independent from all the values and that the distributions of signals and what did I say? The signals, it's 8 in the morning. What we require is that the error terms in each consumer's signals are independent of the values and that the errors and the fundamentals are symmetrically distributed. So I would like to know my WI, but I only see my SI. And so does everybody else. Where are these WIs useful on the product market? So each consumer's WI is going to be their baseline willingness to pay. What do I mean by that? That would be the intercept of their demand function in a linear world if they actually knew their type. So their realized utility is quadratic, so that WI minus PI would be their demand function if they knew their type. The producer chooses linear prices. That's a restriction. In principle, the producer can choose personalized linear prices, but they will need information to do that because at the onset of the game, each consumer only knows their signal and the producer only knows the joint prior distribution of values and signals. So both players, consumers and the producer, can benefit from augmenting their data by interacting with the information broker. The consumers can figure out what their true demand is. And of course, the producer can also figure out what the demand is and can price better. So how is the data market going to work? In a nutshell, the intermediary is going to acquire signals from consumers and sell them to the producer. This will be modeled as a sequential game where the intermediary cannot commit upfront to a policy that says how they are going to use their data. In particular, it cannot acquire the consumer's data and promise them privacy in this baseline model. So more specifically, the intermediary is going to contract simultaneously but bilaterally and independently with each consumer first. That's stage one. They're going to specify an exante payment. You can take either sign for a data inflow. For now, let's just say the broker is acquiring exante the right to observe the consumer signal. With that structure of information at hand, the broker can then contract with the producer and specify a fixed payment M0 in exchange for what we call a data outflow policy. The data outflow policy specifies who is going to receive what signal. So surely how much information the producer is going to get, but also how much information about other people's signals is going to be shared with each individual consumer. The producer is paying for this on this slide. But of course, they care about that because they care about how informed their consumers are when making their purchase decisions, their demands. So the data market is going to have this sequential structure. With the data in hand, this is the last model slide, what is the producer going to do? Well, it's going to refine their price setting ability and potentially price discriminate. So let's imagine for now that all the data, so the entire vector of signals has been shared completely, meaning each consumer reported it to the platform and the platform gave that information to everybody, all the consumers and the producer. Now, because every player in this game now will have symmetric information, they will form the same posterior expectation regarding each individual WI based on the entire vector S of signals. So the consumers will take this WI hat, this estimate, and use it to build their demand function. They'll just demand WI hat minus P. And the producer, who has exactly now the same vector or posterior expectations, will be able to charge the optimal linear, so not fully extractive, but personalized linear prices, PI star to each consumer. I'm not saying that this is what will happen in this model, but this is how the data bought from the platform will then inform every player's decisions. So having run through the entire description of the baseline setting, especially to a platform's audience, let me tell you how we think about this. So there are several special features and probably more than I can name right now, but a few that I want to focus your attention on. Well, the first one is that this is a slide from the digital conference introduced two months ago. The first one is that in the real world, we don't really see consumers uploading spreadsheets with their data to platforms who then take these spreadsheets and sell them to firms. I described it like this, but that's not what happens. And there are good reasons why the direct sale of information is the exception if not absent from real markets. Well, Aero has talked about that in 62. It's hard to ascertain the quality of this data. Once the firm that has bought it from the intermediary has it, then it's going to live forever. So it's not surprising that we don't see this literal model being used very often. Instead, we see what we call, and that Matti and Fly there were called first, the indirect sale of information. How does that work? Well, a user doesn't really upload their data to the platform. They submit queries to Google declaring what their preferences are and what their intent is. And they don't get paid, but they get quality of free services like Search and Maps and YouTube and what have you in exchange. Likewise, advertisers do not buy a spreadsheet of consumers' information, but they target their bids on, say, keyword advertising to the keywords, which means to consumers with certain traits. And then conditional on having reached them at the auction, they manage to personalize their message, their product offer, and potentially their price to the consumers they've been matched with. So a literal description of these two models, the first one is more stylized. The second one is more real. The literal descriptions are very different, but I am trying to make the point that all of the salient features of what you just saw appear in real-world platform markets if you don't take them at face value with my money changing hands and signals changing hands. Crucially, the platform enables advertisers to tailor their message to a consumer just like on this slide, the data acquired from the platform allowed seller to charge their price. And then we can talk about whether price discrimination is a right or a wrong model for this. But in order to talk about that, let me tell you about what really are the assumptions in our stylized environment that we think generalize and what do we actually want to capture. So the point of this slide was, don't take me too seriously, but for any real-world platform that you have in mind, there is an analog of the story that we're telling where the economic forces apply. And our model is easier to explain. And here are some further modeling choices that you've already seen, but that are important for what we're doing. So the first one is that information is distributed. So the seller and the platform, they don't know everything, but any information beyond the prior is in the hands of the consumers. The second is that data sharing back to the consumers teaches them about their preferences. In the Google cases, what product I like in the model here is what quantity you should buy based on what I've learned from everybody else. But there is definitely this value-generating potential by learning about myself from others through the platform. The flip side of this is that the buyer of this social data, the firm in my model, can exploit this to the detriment of the consumers. So in the model, we have the classic example of a price-discriminating monopolist, but you can think about product steering, which has curated results. You can think about election influence operations, addictive social media. There's no growing work on this. The bottom line that we're trying to capture in this paper is by data going around, consumers can learn about themselves, but other parties can, too, and that can backfire for that. So you can plug in your favorite downstream game here, if you want. Then the last feature that I want to turn you to is that I didn't have any IC constraints in the model. I said all the data trade happens ex ante. The consumer sells the right to the platform to observe their signal. That is an explicit choice, of course. And it's one that wants to capture the idea that when I'm using YouTube or Facebook, not to talk about Google products all the time, it's not like I wake up in the morning and what I post depends on what my type is. It's more like, on average, over a long period of time, I decide whether the terms of use and the conditions at which I participate are worth it for me or not. And then once I am on the platform, I act truthfully. So this is what I'm trying to get at with this. So all the contracting and all the participation constraints will be at the ex ante level, and I will not be getting into interesting screening questions about whether the consumers will report the truth or anything like that. All right. Alex, are there any questions I should feel? No, not in the chat, nothing. OK. Please, you can even jump in if you want. So I hope I have given you the complete picture of our trade partite graph and talked to you a little bit about how I care about some dimensions of this, and I don't want to take some others too literally. Now let me tell you some results. So before I get into what the equilibrium of the contracting game is, let's talk about what the value of information is in this set. In the interest of time, I don't have a theorem for you, but this is the first result of the paper. So in this game, where preferences are quadratic and prices are linear, even though they can be personalized, the welfare consequences of sharing information are signed and are quite clear. And so we think that's a feature and not a bug. The consumer and the social surplus, as you might imagine, increase with how much information is in the hands of the consumers, and which is a feature of linear pricing, they decrease with the firm's information gains. And so we know that in principle from Bergen and Brooks and Morris that the second part could go either way, but it is conveniently going the wrong way in this model so that we have a tradeoff between consumers learning and the firms learning and the effects of welfare. These are opposite. So for example, what are some consequences? If the consumer's signals were perfect to begin with, sigma 0, which means my SI is equal to my WI, then I have nothing to learn from anybody else. And any data sharing that will occur here is harmful to consumers and to society. Likewise, even if I don't know my type perfectly, but I can't learn it from you because both our fundamentals and our error terms are independent, then also I have nothing to learn. The firm has a lot to learn, and data sharing of any kind will be socially harmful. However, if you imagine taking a nice limit where the aggregate information shared by the consumers is informative, but each individual signals becomes arbitrarily noisy and uninformative, well, then data sharing will benefit consumers because they know nothing about themselves and they can learn a lot from everybody else. So let me just, so that you have a sense of what we're working with in the paper, just walk you very quickly through to pull our cases to make the point that consumers can learn about each other in two very different ways. These are two extreme cases. If we really all have the same W, but we observe conditional independent signals, then surely if I take the average of all signals, well, this is going to W as n gets large. So if I have a large number of consumers and share everybody signals, I will learn my value perfectly. But this is also true in the bottom case where we really had independent values, but we all have the same error shock. That could be, we're all exposed to the same traffic conditions we're trying to estimate what's the fastest way to get from point A to point B. Again, here, if I were to just take the average of these signals, this would go to mu, which is the average of the Ws, plus sigma E. And that's great, because it tells me what the error term is perfectly. And then I can subtract it from my signal and get my value. So there are two different ways here in which consumers can learn from one another. And both of them would increase welfare. And now what I'm about to show you is that you will not always be able to capture this value in the equilibrium of the market. So let's get to data intermediation. Let me remind you of the timing. So all I've said so far is what the model is and what are the welfare consequences of distributing information under different information structures. Now the data market works as follows. The intermediary offers exante payments to consumers. Consumers say yes or no. Based on the collected data, the intermediary is going to offer an outflow policy, again exante, based on the consumers you signed up to the merchant. And intermediary transmits the data and the merchant charges their prices. The consumers make their purchase decisions. So I want to begin this being a sequential game. I want to begin entering the second crucial stage of the information of the data market, where the intermediary has collected some consumer signals and needs to decide what policy to offer to the producer in terms of sharing with both sides of the market and at what price. And the result here is clear and it's also very helpful who comes later. Regardless of what data the intermediary has collected, meaning they made some offers before maybe some consumers have said yes, some consumers have said no. The data outflow that they have to choose has to maximize the information value of the producer. That's clear because the producer is paying for it. But the structure that maximizes the value of the producer in this sub game is the complete data outflow. The complete data outflow says whatever information you have, give it all to the producer and give it all to all consumers. Including some that might have not given you their signal to begin with. The intuition for the proof of this is that it's pretty clear that you want to give all the data to the producer because they are paying for it and so the more data you give them, the better informed they will be, the more they will pay you. But the other part of this is that there is no gain ex ante for the platform in giving to the producer superior information relative to the consumers. If you did that as the broker, then you would force the producer to try to signal for their prices to consumers what their values are. And that's costly. There will be an equilibrium of signaling but it's costly for the producer. Instead, just give the consumers whatever information they would have learned from the equilibrium prices, remove the signaling motive and the producer will not have to be fudging with their price to deviate from profit maximization in order to convince the consumers that their W is high. So symmetric information structures are optimal and of course the best symmetric information structure is the one where the producer and the consumers have the most information. The key implication of this is that the consumer eye is going to learn the signals of others for free on the equilibrium path. In fact, off the path too, meaning whether they give the signal to the platform or not, they know in the next stage of the game they'll get a recommendation based on everybody else's signals which is sort of the maximum quality. But that is good because now we can write the participation constraints of these consumers and the producer in a very intuitive way. So let me begin with the producer. So M zero is the fee that the producer pays to the broker for the information. What does this have to equal or be less than? Well, it has to be less than N times the profit that the producer will make on each consumer. What are these functions of two variables? In each function, the first argument is the information structure held by consumer eye and the second argument is the information structure held by the producer. This is for all functions on this slide. So SS means everybody has all the signals and SI empty set, while PI of that is the ex-ante profits under no information for the producer because the consumer knows their signal and the producer knows nothing is the value of information for the producer. For the consumer, what does it have to be? Well, for the consumer, the payment MI that the platform makes to the consumer has to be large enough that the payment plus the utility that the consumer gets with data sharing exceeds what the consumer could get without signing up for the platform. What would that be? That would be no payment plus the utility that the consumer gets when they have all the signals because they're gonna get them in the firm only has S minus I because the consumer has held their own signal. So these are the two terms on the two sides and of course you wanna charge as much as you can to the producer and as little as you can. You wanna pay as little as you can to the consumer. But already from this slide, you can see that the data externalities at play. If sharing my signal is harmful to me, then I will get compensated because it will be the difference between the two U terms on the opposite sides of my participation. If sharing SI helps the firm predict some other WJ, I don't get paid for that. If sharing SI is harmful to J, J doesn't get compensated for that. So the only thing that the consumer gets compensated for is the difference in their surplus given that everybody has all the other signals, the firm included, and the difference between the case whether the firm has mine or the firm doesn't have mine. That is the marginal contribution that each consumer gets compensated for. Now with that, we can decompose things a little better and represent consumers' eyes compensation as the sum of two terms. And this is just adding a subtracting a constant. The latter term is clear, is the change in surplus associated with the whole data sharing. The first term is what we call the data externality. If you look at the first term here with a brace, it is the difference between what I get when everybody has all the signals and the firm has S minus I and the status quo. So it's really the effect of sharing fully S minus I signals. And that enters with a positive sign in my payment, which means that if sharing everybody else's signals hurts me, that is coming right out of my pocket because it's reducing my payment, the payment I get from the platform. It's even easier to see it when you put it together and you write the revenues of the platform of the intermediary, they're gonna be the sum of two terms. The first one is intuitive. This is a broker, Delta W is the change in total surplus. If they increase the pie, they can get the difference back. The second term is the sum of the data externalities with a negative sign. So again, if these externalities were negative, then the intermediary would be required to pay less to the consumers. And therefore, that's another source of profitability. Where I'm going with this very simple expression with two terms here at the bottom is that in this world with market power and externalities, the broker does not maximize total surplus and takes a cut. Rather, you can easily imagine that there will be situations in which data sharing is harmful to society, but then because the externalities are also negative, it will be profitable and the opposite. Situations where data sharing could be helpful to society, but the externalities are positive. That means you need to pay consumers even more on the margin and it won't be profitable. So let me first give you the result and then give you some examples. Complete data intermediation is going to be profitable in our world if G is the information gained by anybody under a given information structure. It's a reduction in square error with these preferences. If what the broker can learn from S minus i signals is sufficiently, is not too bad relative to what you would learn with all n signals. So another way of saying it is if signals are sufficiently close substitutes, the broker will make money. At one extreme, if all our types and signals are perfectly correlated, then if one code gives this signal out, then I have to give mine for free because they already know everything anyway. Good. So here are two examples of what we call market failures in this world. And I froze, right? Yes, your video froze, but lovely. We can hear you fine. OK. Sorry about that. So the first example is what we call common attributes. So if we all really have the same type but we observe independent errors, then if we're sufficiently informed at the beginning data sharing is inefficient because we know almost everything about our type. Yet if n is large, n minus 1 signals are almost as good as n and data sharing will be profitable. And that's sort of type 1 error where inefficient intermediation occurs. The opposite case can also happen. This is where our types are truly independent but we see correlated errors. In this case, if sigma is large, meaning consumers are uninformed to begin with, then data sharing would be efficient. But with independent values, data sharing is never profitable. The reason for that is that the firm doesn't know anything about me until I share my signal. So if you have all shared your signal, then that is the last case in which I actually want to share mine because it's like giving the firm the key to my type. So in this case, the externalities are positive. The broker's revenue can be negative even if information were socially efficient. So so far, I painted this very bleak scenario where I've kind of stacked it against the platform because there's market power on the platform, market power downstream and externalities. So nobody would have bet on the first welfare theorem. But I've showed you that in plausible cases, you can have failure of efficient intermediation in two ways. So what we do in the rest of the paper is then, let me put some quotation marks around optimal, is to characterize some dimensions along which the broker's information policy can be refined. One of them, and perhaps the main result of the paper, is one that talks about anonymized signals. So far, we've talked about a world where if I buy Alex's signals, I get his entire signal as it is. In this world, it turns out that if you give the broker the choice between collecting personal signals or scrambled signals in the sense that you know the vector of signals, but you don't know who they correspond to, or put differently, if you collect data that can only enable market-level pricing but not personalized pricing, but all those statements are equivalent in this model, then the broker will want to do so. And will want to economize on the payments owed to consumers, even though they're taking a hit on the selling side because the producer will pay less, and actually distribute aggregate data, anonymized data, call it what you may, that in some sense will preserve some of the consumer's privacy, it is still valuable for the producer. So this result has strong implications. One is that we have set up a model of personalized pricing, which doesn't really happen, or it's not like first order in online platforms today, and it actually doesn't occur in the equilibrium of our model. Instead, what happens is that prices are very responsive to aggregate demand levels because you only get market data from the broker, not personal data. So we think this is not a bad approximation of Amazon, where if all of us search for the same product right now, we're all going to see the same price, but that price varies very quickly and it is very responsive to market conditions. The intuition for this result, I sort of already said it, it can't quite be that you say, well, consumers are happy to reveal their signals for less if they are promised anonymity. I mean, that's true, but it's also the case that the producer is willing to pay less, so that's not very satisfactory. The true intuition is that anonymized signals, because the producer is going to average them anyway, are closer substitutes to each other. So the externality is a little bit stronger and you can reduce the payment over to consumers more than proportionally in some sense. Okay, so what we do in the paper at this point, I figure I have five minutes, give or take, what we do in the paper at this point is explore the limitations and extensions of this result. One thing we do is to say how anonymized information intermediation facilitates growing the platform and has increasing returns to scale within the number of consumers. Another one is, how would things change if there was no price discrimination game, if the downstream game were different? How would things change if consumers were exalted heterogeneous in a very limited way? So because it's a short talk, I get to pick and I'm happy to refer you to the paper for that or for Q&A. Instead, let me summarize the results up to now and then give you a sense of what is maybe the most platform-centric aspect of this work. So what we found is that the optimal data sharing is not complete, it yields uniform prices rather than personalized prices, but that still means we're far from the socially efficient allocation of data. The anonymization decision in our most general result that I haven't shown you is made in a socially efficient way, but not the overall intermediation decision. What does this mean? I just told you that anonymized signals are better. Yeah, that's because giving information to the producer was harmful here. So in that sense, the broker is going with the efficient thing, but they might still intermediate information anonymously that degrades to all the surplus that definitely does happen. And what I also haven't shown you is that the cost of information acquisition vanishes, but the revenues grow linearly as the market goes. So let me spend the last three minutes telling you, this is actually not in the paper or very little of this is in the paper, asking sort of a more two-sided question, which is so far we have assumed that the platform makes money from advertising. And because the platform sells information to the producer, I've shown you that consumers can expect to receive everybody else's information for free. So there's no point trying to charge them for this information sharing, they wouldn't be willing to pay for it in equilibrium anyway. The reason for that result is lack of commitment. So we can envision a different world where the platform can commit upfront to not sharing any data with the evil producer in this game. And in that case, consumers would be willing to pay for this information from others because and they would be willing to share it, there would only be good consequences of that. So let me just run a super quick horse race between two informational structures. One is the one without commitment that we just saw in the model and the other one is the first best. I mean, the first best subject to the fact that there's market power downstream. What would that be? Well, that would be give all the information to the consumers and give none to the producer. That maximizes total surplus in this game. If you can't tie your hands to not selling the information to the producer, you can't get that in equilibrium. Say you could. So let's compare what we just saw to what the efficient information structure would be. And then the question is, is the first best structure profitable for the broker and is it even optimal? So this is a picture that has to do with Gaussian distributions where we have two parameters that are new which are alpha and beta. Alpha is the correlation coefficient between fundamentals. Beta is the correlation coefficient between error terms. The two polar cases that we've talked about today are the edges where either the fundamentals are correlated and the errors are independent or vice versa. And what you will see is that the blue area is where the efficient sharing is more profitable than the equilibrium intermediation. This point up here is the case where you could not get profitable intermediation despite it being helpful because the consumer signals were effectively compliments. In that case, if the platform had commitment power they could commit to the first best and make money from it. However, if you're in this case which is the one where we all have the same type but we have independent errors, it doesn't really matter whether you give commitment power to the firm and you restrict them to doing the efficient thing because what you just saw in the equilibrium where you monetize on the producer side is nonetheless more profitable. Why is it more profitable? Because you can basically get the signals for free even a large N because the types are actually correlated. So this is a wrap up to say we focus on platforms that are sort of financed by advertising in some sense and that's a restriction in some cases of the model but it's not like giving them full flexibility to commit to these types of information structure would then remove the problem that inefficient intermediation will be profitable in equilibrium. So just a couple of concluding thoughts. The base, the starting point from this project was that consumers all had their signals and they can sell them. And if anything, the major conclusion is that giving them property rights over their data is insufficient for efficient data allocation. Why? Well, because you're not earning the social value of your input. And then the last thought is the third one on the slide. It's whatever we've said here with one interaction consumers intermediary producer in our minds is really multiplied by a thousand because for one data collection event that there is there are numerous data sales events that occur. So in some sense there's a multiplier effect on anything that you've seen here. I apologize for my anonymous talking because you can't see me. Thank you, Jay for the discussion and thank you all for your attention. I'm done.