 So this is the second paper, a very interesting one, very relevant, how you can construct a DSG model where information is a state variable, and where you can test whether that makes the economy subject not to increasing or decreasing returns, so very, very interesting. Mariam Farboudi is the author and presenter here, so please, Mariam, I invite you to the podium and you have 30 minutes as you know, and the discussion is Eduard Schall, we will have a follow-up with 15 minutes. Great, thanks so much for having me, and this is joint work with Laura Welkem, who's at Columbia. So, I want to talk about a model of the data economy, and I kind of want to start by re-itrating what John talked about yesterday, which is there are things in the economy that is changing, and the same way that the policy that was used to regulate large, like, two-sided platforms does not work clearly for different reasons, it's unclear whether the ways that we used to think about different parts of the economy, like the production economy so far, are exactly relevant for the similar tech giants or the data economy still. The point is that if you think about it in a very large scheme of things, largest firms are very heavily valued for their data, and that raises the question whether and in what frequency or horizon the common ways that we used to think about capital is relevant for data as just an alternative asset, and where we should change our thinking, and I want to start by saying that this is kind of challenging because, one, data, and that lies at the core of the paper, that data is a byproduct of economic transactions, and it's difficult to measure. We still, as a profession, have not, I think, settled on a way to say, okay, this is how we measure data, so that makes life hard, and you can think about this as production is a form of active experimentation for firms, because that produces them with data that we usually think about, we used to think about it as a different thing, you experiment, you go and produce, now you produce and experiment at the same time. The second thing which, I mean, this is not, data is not the only non-rival or non-exclusive good, but it is one, and it's very, very important, okay. However, I want you to, as I go on, think about this as like a semi-rival good, so data value falls as more people have it, okay, it can be because of competitive forces, it can be because of a different source of regulation that does not allow a data seller to fully use it, and so on and so forth. Then the other thing which is important about data valuation is that data is the same piece of data that you produce, you can use it for multiple periods, not forever, so in that sense, there is some similarities to capital in that you have to depreciate it, but how do we depreciate the same piece of data that we cannot very well measure, okay. And the other thing related to this depreciation is that data depreciation rate depends on a lot of economic conditions, and both of these facts, and the fact that data is a long-lived asset basically gives rise to the point that you raise that we need like a dynamic programming framework to be able to think about data. Static methodology probably does not work well. So here, what we really want to do is we just want to provide a theoretical framework to think about these key economic forces, okay. So I'm not going to show anything rocket science here, okay. Everything that I show you've seen somewhere in some context, but what we are hoping that the framework that we put forward is simple enough, and you can see like 500 million simplifying assumptions that I'm going to make, okay. It captures the main forces about data, and it allows us to think about important things that are also very important for regulation, in particular data markets, okay. Although I'm not going to have time to talk about it here, because I've got to focus on long run and short run properties of the economy on policy and on measurement, okay. And kind of what I think is quite interesting is that this model as simple as it is, it has very realistic predictions that we see in everyday life around us. So the model is a recursive framework, and it is really as tractable as a standard DSG model. Of course, if you want to do measurement, you should not use this model because it's too simplified. But it's very like the other DSG frameworks, we are hoping that you can add as many complications as you wish to it, and then go and numerically measure it. So it allows us to value the data and data intensive firms, okay. It values data that is transacted at zero price, as well as like relevant digital services, kind of things I want you to have in mind, things that John talked about yesterday. And also hopefully it can inform us about GDP measurements. So part of the GDP measurement that is missing, because we don't measure data. All right, so let me jump into the model. And I'm going to kind of go fast on some pieces, but it's, anyways. So there's got to be, so I want to be again upfront. I'm going to try, in this paper, we're going to try to focus on what is data good for, okay. We know a lot about what is data bad for, so I'm going to focus on about what is data good for. In particular, I'm going to shut down any competition, okay. So not that we think there's no competition, but let's shut it down. So there's a continuum of competitive firms. Each firm uses capital, capital firm I at time t is KIT to produce with the Concave Technology, KIT to the Alpha, and every good has a quality, okay. And the quality, I'm going to call it AIT. You can think about it as a productivity, but let's call it quality. Now I need to talk about the output and the demand curve. So first simplifying assumption, output, all goods quality adjusted are perfect substitutes. This is not true, okay. But this makes my life a lot easier, okay. And you can use Dixit Ciclet preferences to change it, okay. Then we're going to assume a downward sloping demand curve, okay. So and then at the end, hopefully I'm going to have time when I want to talk about efficiency to give you a micro foundation of it. But for now, take it as given. So you can see there is no notion of data here. So I have to introduce data to you. So what's important about data, I want to, like going into this thought, I want you to think about data. And when you think about data, think of three things. One, data is a byproduct of economic activity. Two, data is semi-rival. And three, data is used for prediction. Not all the data in the world is used for prediction. So patents can be thought about a form of data. Not all patents are about prediction. There are patents that talk about, that are about prediction. And there we have to say something to say about them. But not all the patents and patents are data. So this paper is not about all the type of data in the world. It's about data that is used for prediction. Why? Because a lot of technologies that are being developed now, AI and ML technologies, that are tools for prediction. So that is a large part of the ongoing debate. So because data is used for prediction, it is used to improve forecasts. So I have to introduce forecasts into my models. And quality is where forecasts come in. So good quality of each firm depends on the forecasts. How do we think about this? Imagine that each firm has an optimal technique to produce it. You can think about this as customer taste. Like whether we write like next year, we would like running shoes, we would like blue shirts. You can think about Uber, where am I going to send the cars? OK, so that's also the optimal technique. You can think about what technologies has to be incorporated into self-driving cars. OK, so all of these are different optimal techniques. The optimal technique has two parts. One part is predictable. That's theta t. The second part is not predictable. It's completely IID. The predictable part is AR1 and has an innovation. OK. And the unpredictable part is unlearnable. That's what I mean by unpredictable. Then what about the quality? The quality depends on the chosen production technique by the firm and how much it is different from the optimal technique. So if people are going to like blue shirts, the closer you produce the blue shirts, OK, your quality is better. OK. So you can see that A, which is the quality, is a G function of the square difference of the chosen production technique and the optimal technique. Square difference for all the information people expected square is variance, so that's beautiful. OK. What's important is that G is monotonically decreasing. What does that mean? That means accuracy is good. You want your production technique to be as close as optimal technique. OK. All right. Now, this is what forecasts are used for. Now, where does data go? Data is information that is used for forecasting. OK. Now you can see the fact that I said data is a byproduct of production. OK. So a firm that produces KIT to the alpha produces NIT data points. OK. These data points are about the future optimal technique. And data is basically a byproduct of the amount of production times the technology that you have for data mining. How much data you can extract from the same amount of production or transactions? You can think about Walmart as a company which has terrible ZI versus Amazon as a company that has great ZI. OK. So firms can be different in how good they are in the technology to extract data. Of course, thinking about the evolution of Z and how Z is determined is very important. That's outside the scope of the paper. I'm going to take it as given. Now, each data point is just a normal signal about the realization of the future optimal technique. All right. So this already gives you something that we've all heard about very often, which is the data feedback loop. If a firm has more transactions, it has more data. It has higher quality or efficiency. Then it has more customers and more transactions. And this goes on and on and on. And the whole thing should explode. OK. I want to argue for you that this, although the state of feedback loop is in action during when firm is young in the short run, if you want, it's not the dominant force in the long run. In the long run, there is another force, at least for data that is used for prediction that kicks in and is stronger than this force. OK. So before going to that, let me talk about one last piece that is important to think about data. So so far, most of the things that I said, you're like, OK, this is learning by doing. OK. And this is where data is very different from learning by doing that is data is tradable. OK. And that's an important key part of the paper is the market for data. OK. So let's delta IT, the amount of data that's traded by firm I at time T for simple. And then delta IT is positive if firm I is purchasing data and it's negative if it's selling data. For simplicity, let's assume a firm can buy or sell, but not both. If that's not that relevant. OK. I won't have time to tell you what changes, but anyways. Then also, in the spirit of saying that thinking about good things about the data, assume that there is a comparative market for data trade that clears at the price pi. OK. Now, as I said, the other important feature of the data is that data is multi-use. What does that mean? That means data is non-arrival or some, I want to call it semi-arrival. Because firm can sell the data and still use some of it. OK. So let me introduce this parameter, IOTA, which is the fraction of sold data that is lost to the seller. If IOTA is one, that means data is perfectly rival. Think about capital. OK. I have it or you have it. If IOTA is zero, that means data is perfectly rival. We, for technical reasons, we cannot handle IOTA at zero. But you can go as much as you want to close to zero. And it's not such a bad assumption because many data contracts include like prohibitions on seller use of data, as well as it just can simply stand for competitive forces at the value of data falls. Now, because it's a dynamic model, I just don't want things to converge instantaneously. I need an adjustment cost for the data. All right. OK. Now, let me very quickly tell you what I'm going to show you. One, I'm going to show you that, well, data is an asset because it's long-lived. How we should depreciate it and value it. Then I want to argue that in the long run, there's a second force that dominates data feedback loop and that is diminishing return. And that means that in the long run, there cannot be any, in this model, there cannot be any growth without innovation. And innovation, I'm going to talk about innovation too. And innovation can be done purely by data that is used for prediction. And the formulation that I'm going to use, which leads to endogenous growth, looks like a data ladder. OK. But the reason for diminishing return is very simple. As you saw, data is to reduce variance in this model. OK. And variance is a concave function. OK. The first unit of data that you have tells you a lot about how you should move your actions, adjust your optimal actions. OK. But once you have 500 million points of data, the 500 million plus one point of data doesn't help you that much. And that means that in the very long run, when there's a lot of data, decreasing return to scale kicks in. But in the short run, there is increasing returns. And in fact, we show that every firm in this economy needs to make initial negative profits. That can lead to, because they buy data on the open market, that can lead to data poverty traps for the firms. We can also talk about two things that we see every day around those data borders, the apps that you all have on your phones, which are given to you for free. Because the firm wants to basically exchange the good that is producing and giving you with the data that it can get from you to improve its own quality going forward. And then we can talk about book-to-market dynamics. At the very end, I'm going to quickly talk about welfare in this model and how you can very simply introduce business ceiling externalities. All right. So the first, let me talk very quickly about Bayes' law and how to use that to think about data depreciation as an asset. Remember, the goal is to forecast the optimal production technique tomorrow. And the priors about the optimal production technique today has, I mean, an variance. And think about the inverse of the variance, which is the precision as the stock of knowledge of a firm. This is how much the firm knows. And then you can see that why I built in all this beautiful linearity, because Bayes' law says that for normal variables, posterior precision is additive. It's prior precision plus all the signals that you receive from the data that you have. So the posterior precision of a firm about its optimal technique is its prior precision plus the precision that it gets from all the signals that we are going to talk about. And that tells you when you have discount is previous stock of knowledge more. When the persistence of the optimal technique is lower, then the previous data is not that useful. Also, when the economic environment is very volatile, so there is a lot of noise to the optimal technique, maybe because new technologies are arriving. Then also, you need to discount the previous knowledge more. And now, that gives me all the grounds to think about valuing data. As I said, the optimal technique that the firm has to produce with is the expectation that it has with all of the data that it has and all of the precision that it has about its optimal technique. So the quality is basically a function of the square forecast errors that gives me my state, my single state variable. So think about to put in a DSGA model, which is the stock of knowledge. It's the inverse variance of the belief of the firm about its optimal production technique. And that makes my life very easy. I can reduce this huge model into a value function of the firm, and the only state variable is the stock of knowledge of the firm. Now, the firm chooses a time key, its capital, and data traded to maximize the sum of a bunch of terms, the amount of profits that it makes on the goods market. And the quality of the goods is determined by how much stock of knowledge it has to produce better goods. The firm is price taker. Then the firm has to rent the capital. The firm has to pay the adjustment cost. Importantly, the firm can trade. So you can see that when firms choose a capital, it produces a bunch of data. It can use the data and add it to whatever, to the signal that he has. Or it can sell it on the data market and makes profits. So you can see that data has two different uses. One is that the firm can use it itself, or it can sell it on the data market. And this is really an important force to think about regulating data market among firms. So another point which I think is important is that a lot of regulation efforts has gone into regulating the data market between customers and the firms. This is a different data market. This is the data market across, if you want, firms, different firms, or firms and platforms, which are firms. And there is less work in regulating or thinking about those markets. So let me say one last point about this notion of semi-rivalry and data markets. And that gives rise to this very strange property of the data market. So think about what is the benefit to buying one unit of data? You get the marginal benefit of increasing your stock of knowledge, minus you have to pay for it. What about the cost of selling one unit of data? The cost, you only lose a euro fraction of it. You don't lose all of it. But you gain the whole price. So you can see that this is negative bid-ask spread. Negative bid-ask spread is something that we are not used to. And I say, what does it do? It puts a compare, it gives firms an incentive to participate in the data market, OK? Because they lose less than buying the data, OK? And so that means, and that is the part that has to go into thinking about how to design this policy. In fact, in this model, you can see at the very end, I'm going to show you a very simple case where firms oversell data, OK? So you have to stop selling a little bit, OK? So now, in terms of the long-run growth, the long-run growth with the quality function that I showed you is very simple in that the data inflow is concave because of the force that I told you about the long-run. While the data outflow is depreciation is almost linear. This is not exactly linear, OK? So in this case, the growth would stop. Now, you can say that, OK, this really depends is very specific to all the assumptions that you make, and there is some specificity. And here is the specificity. You need two things at the same time for the growth to stop. One thing is that, first, for the growth not to stop, sorry, one thing is that, well, infinite quality should be possible to reach when you have a lot of data, OK? That's kind of a little bit like, obviously, you would need that for permanent growth. The second thing is that you cannot have any fundamental randomness. So that you can achieve perfect quality if you can perfectly hit the optimal technique, OK? But you can only hit the optimal technique with all the data in the world if there is no fundamental randomness. You have to be able to learn everything. So to the extent that we think there is fundamental randomness, like COVID, with this type of quality, permanent growth is not possible. However, you can have endogenous growth. As I said, think about a lot of quality. The quality today is the quality yesterday plus the quality of a new technology, I call it Delta Heart AIT, if the firm chooses to incorporate it, that's the max, OK? Now, the quality of this new technology depends, that's what depends on the data. That is, again, this is similar to the G that I used to have. This is decreasing in the error, OK? So data that is used for prediction improves the forecast, decreases the quadratic error, or if you want, decreases the risk. You can think about decreases the variance or decreases the risk. Think about self-driving cars. The technology is about decreasing the risk of an accident. If the risk of an accident is very high, the profitability, the efficiency of this new self-driving car technology is negative, you're not going to incorporate it. Once the prediction is good enough, then it's good and we incorporate it. And that can change the frontier of technology. This is still a purely predictive technology, but it can lead to endogenous growth. So in that sense, in the long run, data does look like capital. It depends on what data in its purely predictive role looks like capital. In the short run, things are different. In particular, there are parameters that when knowledge is scarce, the stock of knowledge of the firm is small, then the net data inflow is actually convex. So it increases over time. You can see to the left of the graph and that's the red line. So you can see that there is a solid red line. That's the total data of the firm. The dashed red line is the data that comes from firm's production. And the gray area is what the firm buys on the data market. And this, in the beginning, you can see the firm has to either produce data or buy on the data market. And that leads to losses. The firm, in the very early part of the life of the firm, the firm doesn't have any data. So its quality is low. If it wants to produce, it has to rent capital, make losses to get data. Or it has to go buy data on the data market to improve its quality. So in the early phases, there are profit losses. But there are an investment in data by buying data from the data market, from other firms, or getting data from production. So and that rings a bell, like Amazon has been making. Amazon has been making losses for too long. But anyways, so that is something that we have been seeing very often. And the thing is that by accounting rules, at least in the US, the book value only includes purchase data. And a lot of data of the firm is produced as a byproduct of its own economic transaction. And that leads to an undercounting of the book value. So that means a very large market to book. Then the other thing I want to talk about is the data barter, which is basically why do you want to produce at a loss? And the reason is that, effectively, when a firm is producing at a loss, it's exchanging data for the good at price of good of PT. It gives you the app so that it can attract your data and then produce a better quality good and then better quality app. And then come up with subscriptions and get people paid for it. And that rises in the earlier life of the firm because the valuation of the firm is increasing in its stock of knowledge. And this also leads to a lot of missing GDP value because this digital economic activity is undercounted. So let me use my last one minute to very quickly say that you can all see that I've made everything in the model perfectly competitive, okay? So the market, the model, the equilibrium is efficient, okay? Let me talk about the simplest way on earth to incorporate inefficiency in this model, which is think about it like a lot of data is used for advertising. Maybe not all the advertising is actually quality enhancing, okay? How can I think about it? A simple way to think about it is that data processing helps the firm that uses it but hurts others. You can, there is this framework thing introduced by Morris and Schir which is called business feeling, okay? So you can very simply incorporate it in the model. The quality of the firm is downward sloping in its own forecast error but it's increasing in other forecast error. So this is very similar to think the concept of keeping up with Jones's in consumption, okay? The firm's quality is high only if it is better than its other competitors or other firms in the market, okay? And basically you can see that there's an integral, blah, blah, blah, so a firm's choices and firm dynamics and aggregate quality are unchanged. What is changed is welfare. In fact, in the data market in this case, there is over trade because firms do not incorporate the fact that when they're selling data, they're hurting themselves. So thanks so much for having us. Thank you, thank you, Mariam. Adwoa, please. All right, let me first thank the organizers for giving me the opportunity to discuss the paper. This is a paper that's been around already for quite a while. I knew it already. I've seen it many times but still there is a lot to say about the papers. I'm really glad to be given the opportunity to discuss it. So what does the paper do? So the paper first asks a very important question which is how is big data transforming our economy? And what the main objective of the paper then is going to provide a baseline framework that everybody can use to start formalizing a little bit the debate. In doing so, they're gonna choose a very particular approach which is to choose one particular way in which data can be used and that way is gonna be that data is used in forecasting. More precisely, it's gonna be used to forecast demand. Firms are gonna be trying to track consumer taste and they're gonna try to, knowing that, propose services or goods that match this taste the best as they can, okay? Now, the second key idea of the paper is that data is a byproduct of economic activity. So here, transactions generate data. As we're gonna see, this is gonna be really the core of most of the predictions of the model that are gonna be unusual. Data is also a non-rival thing and as I'm gonna show you, I'm gonna discuss, at least in the data, there is a lot of trade in data and so that's also something that is allowed in the paper. So the model that they propose is a very tractable model. It is basically a very standard firm dynamic model with decreasing returns in the long run. So it looks a bit like a standard open hind type of model. So it's something that we know very well and therefore can be very easily introduced into a more standard DSG model. So the model is simple, but it's already able to generate a lot of predictions. The main prediction is, I would say, on the short run. So the presence of this feedback from data, this idea that economic activity generates data, is going to produce some feedback in the short run that's gonna lead to increasing returns. Basically, firms are gonna start producing low amount. They're gonna collect more users, more customers, more data. That's gonna allow them to increase their productivity. They'll collect even more data, et cetera. So initially, we're gonna have this phase of increasing returns. Now these increasing returns are not gonna last forever because this is data used in forecasting. At some point, as you accumulate a lot of information, you're basically gonna know the truth and certainty is gonna be low. So a marginal increase in information is not gonna increase your TFP forever. So in the long run, basically we go back to a standard model of firm dynamics with decreasing returns. So that's really one of the key features of the model that is then gonna generate a lot of the rest of the predictions. So some of those predictions, for instance, is that the model can generate negative profits. That's not surprising because we have increasing returns. What is more surprising here is that the model can explain the concept of data barter. So the idea that these firms can actually still exist even if the products or the services they're proposing has a zero price, just because firms are actually collecting data and selling it on the other market. So something that connects, again, to Jean's discussion yesterday about two-sided markets. The model, the paper also covers a bunch of discussion about measurement because data, of course, is an abstract thing. It is hard also to measure. So there's a lot of mismeasurement and the paper goes over different issues about book values, market values of firms. There is also the question of missing GDP because of data barter. We might be missing a lot of potential GDP. And then the paper turns to long-run issues. So it's gonna talk about whether or not data can generate long-run growth. The answer here is that in the baseline model, it is not possible. This is because here mostly data in the baseline model has a level effect on TFP, not a growth effect. And it also turns to welfare. So here the model is really a simple benchmark, frictionless, perfect competition. There is no externality. Data is non-rival, but there is a market for it to show no externality. So the model is actually efficient. Despite the fact that you have non-convexities, this is not something that leads to a failure of the first welfare theorem. So the model is efficient. Of course, this is not necessarily something that we believe and the authors agree. So that's why then they turn to, they introduce some relevant externality, which is this business stealing externality. So that's a quick summary of what the paper does. Let me now turn to my comments. First of all, this is a thought-provoking paper. So this is an important paper because this is really, maybe with Jones' eternity, one of the few papers that really opened this new research agenda on the data economy. So this is a big picture paper. Maybe we're gonna see, we're gonna discuss that, maybe people in the audience might disagree about some forces that might be missing, but this might be true, but this is not the point. This is a paper that is a big picture paper that tries to derive strong insights out of simple assumptions. And I think on that, the paper does a very nice job. The model is very nice and tractable. And I think this is a great tool for other researchers to build upon. So as I was saying, the paper has been around for quite a while. I think a lot has been said already about the paper. Whatever I'm gonna say, I'm sure Mariam and Laura have thought about it and they have good reasons not to have included it. But anyway, just for the sake of the discussion, let me go over some of my comments. So I would say one thing that is striking about the paper when you read it, is that for a phenomenon that is very empirical, that is all about data, there's not much data in the paper. So this is a very theory-driven paper. And this is totally fun. We should be allowed to write pure theory papers. But it's true that in this paper, I think the tiny tension is that a lot of the predictions are very, very model dependent. And so it would be nice perhaps to give some supporting evidence for all these modeling choices. Now, I'm saying that, but of course, this is one of the first papers in the literature. You have to make choices. I think the choices that Mariam and Laura have made are very natural, especially coming from people that work on information. But let me give you a few examples of the things that of course we're led to question. One is, for instance, so why is data something that affects mostly the level of TFP and not the growth of TFP? So Mariam talked a little bit about that, but the long-run implications, whether or not you can generate growth, are absolutely linked to this. People in the literature, John Zentonetti, have likened data to ideas. So therefore, ideas is something maybe that could be an input to R&D innovation. So naturally, we can question that. Another one is the fact that the paper puts a lot of emphasis on this feedback from information, and this idea that transactions generate information. So I think it's a very interesting idea. Is it true in practice? Is the data that firms collect really something that scales up with their activity? I'm going to try to show you some pictures later. Perhaps this is not always the case, so we may be led to question that maybe this feedback might be weaker in practice. Another one is also the strengths of the diminishing returns here. So here, naturally, we think that there are diminishing returns in the long-run because data is something that, as I was saying, in the long-run you know everything, so data does not really reduce uncertainty anymore. Well, that actually depends a lot on functional form assumptions as well. So here, the authors are choosing a Gaussian model of learning. Of course, this is great. This is super tractable. You have one state variable. But if we were using a fat tail model, and if the information you were getting were about where the lower threshold is, actually, the variance may not decrease, and you might still actually have endogenous growth through a different channel in this model. So I'm not saying that I dislike any of those choices, but just that the predictions depend on them. And ultimately, what we'd want to do is that these choices should be guided by empirics. Now, so I'm going to discuss a little bit the empirics then. I'm going to try to do that in practice. So it would be nice to know who, which firms use big data, which sectors, from what sources, what is the kind of data that they collect, and also how they use this data. So I'm going to try to go a little bit over that. Now, I was surprised to discover that despite this being a very important phenomenon that is all about data and firms having too much data, it seems that these firms do not share this data with us. Most surveys are still actually quite vague and quite loose. So I'm just going to show you what we have. What I'm going to show you actually is from one of our students at UPF. So one of the great impacts of the paper is that it has triggered a lot of new research. And our students are working on Mariam's paper. So my student, Alejandro Rabano Suarez, has kindly allowed me to show you some of his pictures. So let's go over that. So the sources are going to come from two surveys, one from France, the other one from the US. And these are surveys that are going to try to cover a broad enough spectrum of firms, trying to ask about their use of big data. Big data, of course, is something we need to define. It's a loose concept. The way it is done in most of these surveys is the following. So big data, we're usually going to think about the use of massive data sets. Are you usually going to have a huge volume of data, a huge flow with continuous updating of this data and also a complex structure? All these things that make it hard to use standard tools to analyze the data and require specific technique. So that's kind of the broad idea what we're going to define as big data. I'm not sure you can see very well, but this is data from France, so that tries to cover across different sectors the adoption of big data and also the sources of big data. And what's interesting is to see that actually in France, at least the main sector that uses big data is transport. And the source of the data that they use is actually geolocalization. So think about Uber really using geolocalization for its customers and also its drivers, but also think about delivery firms that are actually tracking their drivers to optimize the delivery process. Already there, we can be led to question whether or not the amount of information necessarily scales up with the number of customers if it's just about tracking your own drivers. That's not necessarily the same. The next sector is information and communication, not too surprising. What's interesting is that for them, then the main source of information is going to be social networks. So the main things to take away from this kind of picture is the following, that big data is clearly something that is big, 25% of adoption in the transport sector, but it's also a very varied thing. Different sectors may or may not use big data and they may also use very different kinds of data. This is another figure that goes along the lines of Mariam and Laura's paper. This is about the use of big data as a function of firm size. And it seems to be that, indeed, small firms do not use big data. Large firms do. So that's something that seems to be in line with the predictions of the model. Also something that could be in line with the use of fixed costs. Perhaps sending up a big data department in the firm also requires huge setup costs, lots of storage facilities, and the right skills to study that. Finally, about the use of big data. So it turns out that it's been very hard to find also good evidence on how exactly is this data used by firms. So this data here is from the US. This is not about big data. This is about the use of AI, which is still somewhere along the lines, AI being one of the main techniques that I used to analyze big data. And so this is what firms have reported in the survey. So for the users of big data, how they use big data. Some of them replied that they use it to expand their businesses, orders to automate. So going back to Luca, orders to upgrade, or maybe the quality channel that they are proposing. So I think trying to connect back these different uses to what people have been discussing, I think it's fair to say that we can think perhaps about maybe these three different uses of big data. One big use of big data that seems to be true in the data is that a big part of it is might be used for marketing and advertising. So perhaps the firms that have responded that they expand their businesses. So perhaps a good example is the targeted advertising through social media. And so perhaps we can think about big data as improving matching between people. This is something that they touch about a little bit in the paper as a possibility, but I think indeed is an interesting avenue for research. I think about a model of search frictions with imperfect information. You have more data, perhaps you obtain a better matching between firms and its customers. Perhaps this could be the only model of product awareness as in JCPerla's John Marquet paper models of advertising. Another one is about improving the production process. So thinking about firms are just producing one good. It's still the same good, but you're just going to become more productive at doing that. So for instance, delivery firms using geolocalizations for their drivers to optimize the delivery process. So this is probably something that we want to model a big data as R&D, something that leads to innovation and a growth effect in JCP. And finally, I think as we see in this survey, 80% of the firms mentioned the quality. So of course, the answers are a bit loose, but that might be exactly along the lines that I think Laura and Maryam are trying to go. The idea that there is some customer taste that is volatile, that firms are trying to track, and that they're really using a lot of this data to offer the best product to fit this taste. And here I have a quote from Netflix and the Cheesecake Factory. There's a lot of anecdotal evidence that firms are doing that actively, and this is what this paper is about. So overall, what we see is that the adoption of big data is large, seems to affect businesses importantly, but the phenomenon is quite varied, and probably could be modeled in different ways. And I think it's important to think about these different ways, because the welfare impact and if you want to think about regulation is going to depend a lot on this. Another thing is that not all data is necessarily linked to past transaction. We saw that there are many different sources, social media, connected objects, et cetera. So perhaps the feedback from data that is put in the model might be weaker in practice. So in the end, so I think whenever we talk about big data, I think it's very easy to be quite loose about big data. It's a very abstract thing, and it's hard to measure. I think a strength of this paper is that it does not do that. It embraces a very particular way that what data can do it is about forecasting. Now, I think the paper still is a bit loose on the product market side, because in the end, the model tries to capture this idea of quality, but the quality is never fully measured. We have perfect competition. Everything is loaded on the TFP function, this function G. And it would be nice to know what is G. Where is it coming from? How do you discipline it empirically? Or perhaps use micro foundations to do that. I don't have much time. Perhaps I'll have some nerdy comments in the end, but maybe I see a minute off time. So I'll keep it for Mariam. In the end, just great paper. I think it's a very inspiring paper to lay the grounds for the debate. It's already one of the supersighted paper that has opened the entire agenda. Of course, when we see the paper, we want to introduce more forces. And I'm listening here, some of those forces. Thanks a lot, Mariam, for the paper, and yeah. Why didn't I give you the floor back for in case you have some feedback? I just want to say thanks so much for the very kind comments. And I cannot agree more with all the stuff that you said. In particular, the fact that there is a lot of lack of data about data. I, in fact, with two colleagues, we have fought very hard to finally buy a data set about the supply chain of digital services from a startup in California. So hopefully we can talk about these three classifications. We, in fact, ask them to specifically classify their technologies that are in these like advertisement, quality improvement, blah, blah, blah. So you're like, at least you're exactly on board. I don't know how we did it because I haven't told him about it. The only part that I think I would probably, my way of thinking is slightly different from you is this notion of data is a byproduct of economic transactions. Let's, let me just mention what you said about like the geolocalization in like Uber or transportation. That is in, I mean, I would call that economic transaction because if somebody would not order an Uber, a driver would not go there. But that's the way that I think about. So probably you're right that, I mean, we have to be more precise about what I'm including in economic transactions. But, but, but that's, that's, I guess, that's the only thing that I would say. The language that I'm using is different from the one that I, but everything else I'm like 100% on board. So I think it's a great paper and I agree with the discussion that it's very inspiring. So I wanted to ask about the following. So I think it would be, it would be great in the future to have a version with imperfect competition. So I really like this idea of thinking as data, as used for forecasting and as being gathered as part of economic activity. But then if I think of a firm that's gathering data as part of its own economic activity, it's probably gonna give this firm some informational advantage over the others. It'll make it possible for the firm to charge a markup. And then if I think about the mark, how the market for information would work, then it would actually be, so you, at some point you said, you know, if we think about physical capital, then either I have it or you have it. Well, with data you can, you could sell your data to me and then we both have it, right? So actually your IOTA might be zero, but what I gain by buying your data is I may be gaining the ability to steal some of your customers. I'm forecasting better myself now. So maybe I'll end up reducing your markup. And I don't know actually, so this extension about with business stealing may be going some way toward this. Maybe you could say a few more words about that. Okay, so I think what you raised is like actually great. So in fact, I would like to think about IOTA exactly as you mentioned. It might be that I can give you all of, sell you all of my data, but it's less useful or less profitable for me because you're capturing some of my information advantage. I'm 100% with you that the next step would be to incorporate imperfect competition and that's actually not even the only margin. The margin of entry is very important in digital markets because when you think about digital platforms, one of their points of selling is that we're providing a platform for new entrants to actually use computing power and big data using the fixed cost of somebody else like AWS or Microsoft Azure. So in a sense, these are incumbents that are making money out of the new entrants. So they might not have the right incentives to share the best possible data with them. And that's a slightly different margin but again, imperfect competition. In particular, the business stealing because we don't have that notion of information advantage in fact goes in some sense to the other direction in the sense that as a firm, when I sell my data because everybody else is small, I don't incorporate the fact that my data is being used against me in that I'm basically, I have to be better than them but that helps them. So I oversell my data in a sense and the planner, a social planner would like to a little bit attenuate the activity in that data market. Okay, that goes the other direction but that what you mentioned is really, really important. Yeah, very nice paper. I was wondering a bit about like the interaction with R&D and discoveries of new products or technologies, whether there might not sort of be a trade-off here that firms may like. I think also about John's talk yesterday about platforms and maybe you put more investment into sort of milking that data rather than inventing goods. So would there be sort of a long-run trade-off there? So that is an extremely good point. So one thing that you can think about in this framework and there are other papers who think about this in other frameworks is that okay, data can be used for a level shift in quality, the key part of the paper and there is downward return to scale. At some points firms are better off actually doing a jump, a ladder, but they might, but if that's kind of investment is costly, they might go too long for milking the data as you mentioned. So that's a trade-off that would be amazing to think about. We have not thought about the ability of the firm to switch between these different two types but that's a great idea. Luke. Thanks. I wanted to push you on regulation of data markets. I mean to the extent you can do it with the model that you have because most of it is, things are efficient, right? But in this extension with data stealing that Bartosz also asked about, would you like us to think about regulation? Regulation of what? Because first of us thinking maybe the depreciation schedule. We could just have different accounting rules or whatever. But then you also highlighted the fact that most of these economic transactions are not even incorporated in the way we measure outputs. So could you help me a little bit to think about what we would regulate? Okay, so let me say two things. One of them is unfortunately we took it out of this paper because the paper was too large. Is that this notion of negative bid-ask spreads that I mentioned makes that even if you start with perfectly identical firms in the steady state, the data market would not die because firms would like to trade, okay? And then the model is too simple, but if there is the other business stealing, if you want like force, then there is too much trade and firms become too dispersed from each other while. In terms of the regulation, there is... I'm sorry that I'm not aware of what's happening in Europe, but in the US there is this kind of newer industry called data brokers. What they do is that they basically collect either other firms' information, like the firm that we bought their data from, or they collect customer information and they sell it to other firms or competitors of the original firms and want to know what their competitors are doing or sell it to firms who want to advertise to these customers. And for instance in California and I think Vermont, these data brokers have to at least subscribe to a list so that people know this. So this notion of at least, I don't know, providing some broad knowledge that you have to at least say that you are this or maybe trying to tax them in some shape or form because they sell these data packages and I've called like 40 of them and I was like, can I get like anonymized quantity price of your transactions and they're like, no way because that's their business model and because that's so unknown, I have to ask you as a key regulator, how do you think it's possible to get access to the data of these guys? I don't see further questions from the floor, but there is one here. In the linear normal framework, more information reduces variance. Could trading on it introduce noise to counter decreasing returns to scale in forecasting? So there are different ways of decreasing returns to scale to forecasting. One is what Edward mentioned, which is when the distribution of the noise is fat tail, of what you learn about is fat tail. Well, that's one way. The other way, which is slight in some sense, simpler but it has its own difficulties is if the variance of the noise of the AR1, which is the variance of innovation, is increasing over time, then the effect of downward return to scale kind of falls because you're discounting the same information more. So that becomes then more and more important. When basically the point of this variance is that your prior information gives you a lot of information so the new data is less useful. If you discount that more, then new information is more useful. So the decreasing return kind of dies down. That's a very good point. That is again, as everybody mentioned, is very context dependent in some markets. That's the most relevant one. In some markets it's less relevant. So this brings us exactly at the end to the end of the session. Great paper, very good discussion. Thank you and great session in general. So many thanks. There is a coffee break now. A hard commitment to come back at 11.45. Thank you.