 Let's take our seats. So let's start with the first afternoon session. We have fortunately John Lee, who will keep us awake, I'm sure, and he will be talking about attention, social learning and choice, and then Bartosz from ECB will discuss. John, floor is yours. Hello. Okay. I want to thank the organizers for inviting me and for putting me in the afternoon session. I see it's now 8 a.m. in New York, and so now I'm awake, and I will try to keep you awake. You all have food in your bellies and your bloods being drained from your heads, and it's usually a good time to do a little economic theory. Okay. So this paper is joint with Andrew Kaplan and Philippe Miteka. Okay. So this paper is based on two premises. The first is that people have limited attention and time, and the second is that people make choices about which they have things that they have very little expertise, and people are not born knowing how to value firms into the infinite future. They're not born knowing what retirement is like. They're not born understanding the vagaries of the U.S. healthcare system. They're not born knowing that you could maybe, if you ask, get a mortgage that at some future date you could switch to an arm if there was a financial crisis. These are all things that we have to learn and teach ourselves, and that's where the first bullet point comes into question is that people have limited time and attention. We're all busy. We don't often have time to fly to Europe for a couple of days and talk to August groups of economists and other gathered luminaries, but, you know, so we do what we can, right? Okay. And so in this situation, it's natural, first, that people are going to use all the information that they can get. They're going to use their own expertise, what they can learn on their own. They're going to Google things, but they're also going to pay a lot of attention to what's going on around them. What are other people doing? What are people doing who they respect and they know? And so there's going to be a lot of social learning, learning by observing. And then, second, the people are going to make a lot of mistakes that they're not going to get it right, unlike standard economic models where you have a menu and because of revealed preference, whatever you chose was clearly the best possible thing. When you have limited attention, sometimes you choose the worst option, okay? In the examples I was giving before, there's a paper by Hall and Woodward which argues that people are very bad at choosing mortgages, that they leave thousands of dollars on the table, papers by Fed guys that people don't even know what their mortgage is. There are people make a lot of mistakes, okay? Now this leads to two views of market share or, you know, popularity of items, popularity of choices, popularity of the 30-year mortgage, for example. One is that large market share reflects preferences, that you see something as popular because a lot of people like it. And this is the basis of inference in almost all I.O. models, trade models, trade effectively, and I.O. equate market share with preference, either low marginal cost, low price, or high return. The other view of market share is that I may reflect beliefs, which are mistaken, hurting, fads, fashion. People doing things because other people are doing them. And this is the basis for most of the social learning literature, that people don't always do the best thing in their interest. And the question is, does this matter that both of these things are going on, that people are making mistakes, that they're following other people, in which dimensions does it matter, and so forth? And that's the topic of the paper. Okay, so what we're going to do is construct a model because we're model builders, and in this model it's going to have several features. First, that people are going to have limited attention, they're going to make mistakes. We want people to kind of not always maximize from the menus that they look at. Second, that they're going to learn in part by looking at what other people do, so that there's some potential for hurting. There's potential for fads and fashions. And then third, that choices are going to be a mixture of being popular because they're awesome, and popular because they're popular. So popularity is going to feed back into choice, reinforce popularity. Things have to be in some sense reflect individual tastes, because if people are just doing things because they're popular, then popularity would have no information, and no one would have any reason to do what is popular. So you have to have some link between choice and utility or payoff, but popularity will exacerbate this link and make it much, much, much stronger. Okay, so some previews of the results. First one, focus on the welfare results. It turns out in our model that hurting among chosen options is not, there's no externality. Even though there's learning from others, people know the world. And since they know the world, they know other people are hurting, and they kind of correctly invert signals. So among the stuff that's actually seen, people choose optimally. And that comes out quite robustly. I'll show you the proof later, but that's quite robust. The inefficiency, which is similar to other models of hurting, is mostly on the extensive margin. Things that are unpopular tend not to be chosen at all, but they could be quite good for some people. And so what you get is a suboptimal menu, but people order, they make mistakes because they have limited attention. But given their limited cognitive capacity, given their limited knowledge, they do the best they can, and no outsider could make them do better without, well, the standard recipe is to provide them with more information, and it's not obvious that people who are overloaded cognitively, that's actually a good thing, but it's not obvious an outsider subject to the same constraints would be able to do any better. So the hurting is going to affect mainly the menu and not optimality within the choices, and that's what I'm going to show you. Second, those with idiosyncratic tastes are very ill-served because of the hurting tends to shift things towards the more frequent choices. If you happen to be the one person who likes polka music, actually that might be liked around here. Anyway, if you're the one person in the US who likes polka music, it might not be very many places to find it, but they tend to do poorly because they're hurting and pitching the optimal things, they're almost always making mistakes, people who are idiosyncratic. And then second, the pattern of mistakes turns out to be very revealing. If you, it's going to be the property of almost any optimal learning mechanism that you shift choice in the direction of what you like. I mean, it would be kind of insane if you went out there and tried to learn and it gave you all the stuff you don't like. I mean, any optimal learning mechanism is going to make you more likely to choose good things. You might still be hurting, you might often choose bad things, but you're more likely to choose good things. So if I compare what Frank is choosing to what I'm choosing, the relative frequencies would be very informative. Frank almost always might go for German food. I always go for German food too, but he just every once in a while he goes for Mexican. He should always be eating Mexican in our model because the relative frequencies are gonna be very informative because what he's learning on his own will shift him to what he really likes and away from the hurting. So if you have a lot of data on choice by type, a company like Amazon or Google, companies that know pretty much everything about everyone can do a lot better job choosing for you than you can choose yourself. And that's not inconsistent with my earlier statement, the choice was optimal because I was putting the government on the same footing as the individual and I'm not gonna put Google on the same footing as the individual, Google sees a lot more stuff than UC and has algorithms for calculating a lot more things than UC, you buy a mortgage once, Google can look over all mortgages at all times and in all places since 500 BC to the present and calculate basically to very small error frequency of choice, that's not anything that I do. I know nothing about mortgages in the Roman times and I have limited attention and so I won't know anything about mortgages in the Roman times. But Google, they have an incentive to, they already have the data, they just have to put it together. Okay, so the pattern of choice and mistakes is gonna be revealing, optimal learning raises the chance that you choose what you like and that basically it's relative choice among groups that determines what groups like. Assuming groups all, we've grouped people together in the right ways. Okay, last bit, an advertisement for rational and attention a la sims and that is it makes things very tractable. You can do much more complicated learning situations. The social learning literature, the learning literature in general has basically lived in two paradigms. One is the Bernoulli world with two choices and two signals and you can invert that and you always get another, you always get a probability of being right so you can make it recursive, you can do dynamics and things like that. The other's the normal, normal world with normal quadratic utility, normal signals, normal noise, normal shocks and that world also replicates itself but if you wanna move outside of that world it things get very messy especially to write models down and solve models. Rational and attention gives you very nice equations that you can play with. They're not always nice in the sense of solving them. Well actually solving them numerically is trivial but analytically manipulating them but you get the answer very quickly which is kind of nice. So I'm gonna make it a plug for that. Okay, here's the model, very simple model. So if you know there's this old model by Bikchandani, Herschleifer and Welsh about herds, fads, fashions, they have a Bernoulli world, things can be good or bad, people take sequential choice. We're gonna adjust that from multiple choices, multiple types of signals and play off that. So time's gonna be discrete. It's a bunch of periods, zero, one on, there's no end. There's gonna be a fixed set of options. You're gonna be learning so we can't have the world changing too much. If we had the world changing, I mean we could get similar results but it's just gonna be much messier. We're going for as simple as possible model so the world's static and therefore you can learn from what people have done in the past. There's a fixed set of options, call them I in some set A, there's NA of them. Each period there's a continuum of agents that are born. They make once off choices from A and then they die. Life is nasty, brutish and short. You don't live very long in this economy and we killed dynamics there so there's no experimentation. Learning is all gonna be once off, right? If this was the, I gave the example of food with Frank, he probably knows a lot about his food taste because he eats every day. It's as much more like choosing a mortgage, choosing life insurance, things that you don't do very often so you don't accumulate the experience and so you really are pretty clueless about your match with the product. There's gonna be a finite number of types, call them omega, they live in some space, there's an omega of them and then there's a utility function which says that the utility of type omega from choice, some choice in A is some number R and that's the payoffs. That's what you're trying to maximize. Now there's a time invariant, there's gonna be three mus. So there's time invariant distribution of types, mu star, that's like how many people there are of omega one and omega two and omega three? That doesn't change. If that changed, you wouldn't be able to learn from the past, I mean you would have to take into consideration that what you're learning from the past isn't exactly what you're trying to learn today. Just keep it fixed, keeps things simple. So we're gonna be able to, because mu doesn't change, past choices are gonna be informative about current tastes. Agents don't know their type. You might think it's more natural to say agents don't know what goods they like, it's the same thing. I don't know if I'm the guy who likes German food or Italian food. I could be the guy who likes Italian food or I could put it the type being what kind of food I like. It's basically the same thing. Mathematically this works out simpler, so we're doing this. So I don't know which kind of guy I am. Am I the one who likes this thing or this thing? And now there's gonna be two sources of learning. Like I said, to learn from others, others have to know something. So first one is observation of past choices. We see what everyone else did in the past. Think of that as a stand-in from what you tend to learn from what you see done. And we're giving people a lot of information, again to keep things simple. And then other people wouldn't be very interesting unless their choices reflected something about the world. So everyone does a little information learning on their own. I'm gonna model that like Sims, okay? And that's gonna be the rational intention. And then the last bit, I have to tell you what learning means. So I'm gonna place, so much of the learning literature builds models in which there are signals and then you pick a signal and then there's a posterior and then we try to solve the Bayes rule thing of taking our prior and producing a posterior. That puts a lot of constraints on the types of models you can look at because you need models that are tractable, where the posterior is a tractable functions of the prior. We're gonna follow Sims and we put the costs on the outcome. So to have a certain posterior or a certain outcome is costly. To have an outcome where you know the world is much more costly than to have an outcome in which you don't know the world. So we're not gonna model the signals per se, but the outcome of learning, the more you learn the more costly it is. That's the way it's gonna work. And so what we're gonna do is we have this function P, which is the probability of making choice I if you're type omega. And I'm gonna then basically what's gonna be costly is the outcome, which is look at the big equation on this equation, it's big because it's blown up, but it's also long. So the probability, so what's gonna happen is I'm gonna take this function, this concave, log is concave. And so it's gonna be, and it's all less than one, so it all gets flipped. So spreading out, making PI of a mega variable is gonna be more and more costly. And in fact, because I'm subtracting off PILNPI at the end, if I made this completely state un-contingent, I mean I always pick PI with a certain probability, independent of the state, the cost would be zero. So there's no cost to just ignoring the state and randomly drawing an act. But what's costly is making the choice of I contingent on omega. And the more and more contingent, the more and more costly. So I go out and learn what I'm supposed to learn is something about my type. By learning something about my type, I then make the choice contingent on my type and that's costly. How I learn is un-modeled. And that's the big, in some sense, analytic advantage of Sim's approach. The outcome is what we're gonna model and that's how that's gonna work. So I'm gonna focus, and at the paper we do a bunch with convergence. We put it in appendix because I think Olivier was the one that told me a good paper should always have an unintelligible appendix. Or maybe it was all the unintelligible stuff should be in the appendix. I can't remember exactly the way he said it, you know. I wasn't paying attention. But we have an unintelligible appendix in which we do convergence, but I'll focus on the steady states here. We're gonna begin with a problem of an agent who has observed the past market shares. They're gonna lead some prior mu and then I'm gonna solve for what they do given this prior, that will then lead to the next generation having some beliefs and then I'm gonna look for a fixed point in which the prior of today's generation is the same as the prior of next generation and that's gonna be my solution for now. It's in some sense a self-confirming equilibrium. There's no relationship that mu, this mu, has to look like mu star. Could be something totally different. There's just no more learning. For example, you can't learn about things you don't see. So if no one chooses an option, you can't learn the preference for that so you would never converge to mu star in that case. So in general, mu will be different than mu star but the process of learning will have settled down and there'll be no more incentives to learn and so the world will be just stuck there. Okay, here's the agent's problem. This looks, this is actually not, it's pretty straightforward. So, and this, so I have V, which depends on mu, it's my value function. I'm gonna choose all these PI omegas. I'm gonna choose how state contingent to make my choices be and in doing that, I burn information. The next thing is just rational, it's just expected utility. So I have a prior mu. If I do mu, there's a probability I choose I. If I choose I when I'm mu, I get UI mu. That's just expected utility. So I'm maximizing expected utility subject to this cost, which is the state contingency of information and land does a parameter that scales like how hard it is to learn. Land is big, very hard to learn. Land is small, even undergraduates can do it. I apologize if there are any undergraduates in the audience. Okay, now I'm gonna do the literature review. In some sense, this paper relies on two other papers in the past, one of which is Matek and McKay, which solved this problem for discrete choice when a whole group of objects, so what Matek and McKay do is they show that PI omega has this form where if you already know the, if you know the PIs, this is what the PI omegas are. They don't show how to solve for the PIs, but they solve for the, actually, sorry, I take that back. They don't show when PI, when, so let me do some notation. PI without the omegas, the unconditional choice. PI with omegas, the conditional choice. They do not characterize when PI is positive or zero, but conditional on it being positive, this is the optimal choice, and it has this loget kind of McFadden feel to it, that you basically twist your choice in the direction of things that are good. If you is good, you do it more often. If land is high, learning's difficult, so you don't, that mitigates the payoffs. So landa and you characterize the twisting in the direction of good choices. So you care more if the use are very different, you don't care if the use are the same, you care if, and so forth. Then this other, the other background paper is this paper that I wrote with Mark Dean and Andrew Kaplan, and this paper is all about the PIs when they're positive and when they're zero, and it's this complimentary slackness condition, and the reason this paper isn't better known is this has no intuition. I can't tell you a story for where this comes from. This is the complimentary slackness condition, and I can tell you stories, but they would take us too long, and you just had lunch, and I don't wanna put you to sleep. So take this as given. So this is the solution to the agent's problem. Now in steady state, I said we had this appendix, the way learning works in this world is you think something, you do something, and then the next generation says, you know, did they do the thing they thought they were gonna do? Because the actions, the true actions depend on the true distribution of types. The learning depends on your beliefs about the distribution of types. Whenever those are inconsistent, people will rule out certain distributions. You keep ruling things out until you converge. I think that's all we need on this slide. And here's the first welfare result. I said that the choice among chosen options was always optimal, and this is basically the proof. So what's the first thing? That's the probability that I has chosen given some beliefs. The next thing is the probability that I has chosen, which depends on mu star, not mu bar. Mu bar is the beliefs that people have. Mu star is the true distribution. Given the true distribution, P i omega mu bar, that's the choice that people are making. So the probability something's chosen depends on what strategy people are following and the number of people following that strategy. So those two things have to be equal. Then you plug in the P i omega mu bar, the optimal strategy, that's from Menteca Mackay. And then you can subtract, you notice the P i mu bar at the beginning, and the P i mu bar at the top. I can cancel that, and then I get our necessary and sufficient conditions. Well, those necessary and sufficient conditions have mu star in them. So even though people don't know mu star, their choices are satisfying the necessary and sufficient conditions for somebody who does know u star. So it's as if they know the truth. We can just assume they know the truth. Because they're behaving exactly the same way as somebody would behave if they knew the truth. And so they're behaving optimally. The two, so one comment and one caveat. The comment is, the way this works is that if people's behavior was inconsistent with the truth, people in the next generation would notice that and they would say, wait, we got something wrong, they'd learn something and then you would converge. So it's only when the behavior you see is consistent with the truth that learning stops. The caveat is that this requires the set of choices to be, this is optimality given a set of choices that are made. I didn't say anything about the set of choices that were made. And that's where all the externalities hurting and stuff has bite. Is that there's lots of choices that might, imagine a world in which everybody loves maple bacon ice cream. But no one ever serves maple bacon ice cream. So no one ever sees anybody eating maple bacon ice cream. Then no one would learn that maple bacon ice cream is the best thing. And if the government then mandated that every restaurant for a week serve maple bacon ice cream, there'd be a huge welfare improvement, right? They're actually have had maple bacon ice cream. It is not as good as it sounds. Maple ice cream is great. Maple bacon is great. But there's a non-transitivity that works when you combine two great things, they don't exactly go together in the same way. It was, I mean, you gotta try things anyway. This actually fell in the face of my favorite learning strategy, just choice strategy under uncertainty, which is that there's something on the menu that sounds horrible. It had to earn its place on the menu and therefore you should choose it. So I always choose the most unappealing thing on the menu. And that usually worked except for maple bacon ice cream. It failed there. Okay. Couple of other comments in the few minutes I have left. First comment is that unlike the McFadden model, here choice depends upon market share and utility. So there's this extra term. I actually should have done something, I should have taken this PI and made it E to the LNPI and then put the PI up in the exponent. And then you can see that there's a taste shifter, which is if other people are doing it, it makes it as if it's more valuable. Last comment is this bit about relative utility. So if we take this payoff, this function here, and what I'm gonna do is suppose there's two goods, I and J, and then I'm gonna solve for UI minus UJ. And I get this equation, which says that the difference in utility between two goods is, this looks like Bellasso's comparative advantage equation if you're a trade guy. So if people who get more utility from I than J tend to be people who are more likely to choose I than J relative to population, which just makes sense. I had an example, I don't have time for that. Let me do the conclusion. Let me say things that we didn't do, which would be cool. And that is if you have all these confused consumers, you really wanna model the supply side. I think that Peter's gonna talk a lot about evil suppliers who might supply things that people don't need. And that side we haven't done at all. So in evaluating the optimality of choice from menus, another reason to think, worry about the menu is the menu is not exogenous, it comes from somewhere. So that's important. And yeah, there's just a lot of work ahead. And I think Bertos is gonna bring some of that out. Thank you. Thank you, John. Okay, I'm happy to be here to discuss this very good paper. It's gonna be very difficult for me to speak after John, but I find it comforting that it would be difficult for any of you to speak following John. And also, the paper has a positive part and a normative part, has some positive results. And since some normative results, my discussion will mostly focus on the positive results, whereas John's presentation emphasized the normative results, but we did not coordinate. Okay, so I want to start with an introductory slide about what rational inattention is. So the starting idea is that a vast amount of available information is in principle relevant for many economic decisions. So this is a paper about discreet choice. So to take an example, suppose you decided to buy a car and now you have to make a decision about which car you're gonna buy. This is a discreet choice decision. And there's a lot of information out there that's relevant for that decision in principle. You could study the technical specifications of different autos, or you can read information on the internet, or you can just look around and see what cars other people are driving. And the initial observation of rational inattention is that decision makers cannot pay attention to all this available information. What they can do is they can choose how to allocate their attention, essentially which pieces of information to pay attention to. And the literature on rational inattention following Chris Sims is just a formalization of this idea in which limited attention is modeled as a constraint on information flow, where information flow is entropy reduction and entropy is a measure of uncertainty. And what's going on formally is that agents choose the joint distribution of their action and the state of the economy. That's the P of I and omega in John's notation. The joint distribution of my action, I and the state of the economy, omega. Subject to the constraint that says my action can only provide a finite amount of information about the state of the economy. There's a constraint on how much information my action can reveal about the state of the economy. An equivalent way to think about this is that these agents are choosing noisy signals about the state of the economy, subject to the constraint that the signals can only provide a finite amount of information about the state of the economy. And so the resulting behavior is going to be both error prone and disciplined. It's error prone in the sense that if you decide to allocate more attention to some variable than I do, then my behavior, my response to that variable will appear as erroneous from your point of view. I'll be, you'll be seeing me make mistakes. But of course, my decision, my behavior is optimal for my point of view. I've maximized utility, maximized an objective, subject to constraint. And so from my point of view, I'm not doing any worse than I could have. So this is, so the fact that this is a disciplined deviation, so the fact that the behavior is disciplined means that this is a particular deviation from the benchmark of perfect information or rational expectations, but it's a disciplined deviation. Okay, and the paper is about discrete choice under rational intention. So think of each of a continuum of agents taking one of action I and there are n possible choices you can make and different types of cars you could buy. The agents payoff depends on a random variable omega. So the payoff function is U of I and omega. And as John explained, you can think of omega as your type or you can think about it as the characteristics of the car. And agents start with a common prior about omega, before taking the action, each agent can learn about omega. They can update their prior, subject to an information flow constraint. And what this information flow constraint means is that after updating the prior, there will be some residual uncertainty about omega. Right, so the decision, the choice of what car to purchase will be made under incomplete knowledge about omega. Omega will still be a random variable. Well, in this setting, an existing paper, a paper by Philip Mateka and Alasdair McKay in AER, derived an important result, and that is that the agent's behavior will be stochastic in this environment and will follow a particular logic model. And this is, and so the probability of choosing a particular action I conditional on a state-of-the-economy omega will be given by this modified logic formula, where lambda is the marginal cost of information and P of I is the unconditional probability of choosing action I. Okay, so this is a very important result. It's because it's very general. Okay, you, there's no restriction on you on the payoff function and there's also no restriction on the probability distribution of omega. So this is a powerful result. Now, a drawback from an applied perspective is that in general, there's no close solution for the PIs, for the X anti-probabilities, and furthermore, the PIs depend on prior beliefs, which is an unobservable. Okay, so now here comes this paper and the authors suppose that agents can observe past market shares. Okay, then they prove, given that assumption, they prove an important result, which is that agents' belief will converge to a steady state after a finite number of periods, and in that steady state, the unconditional probability of the action, the P of I can be identified from the market shares, which are observable. In particular, the X anti-probability of taking action I, P of I is just going to be equal to the steady state market share of option I. And so I get to update my formula for the conditional probability of choosing action I, given state omega, and I get to plug in my observable market shares into the formula. So this result is important. I think it makes the rational inattention model of this great choice suitable for empirical work. And the paper discusses the relationship to the random utility model, but also discusses applications to inference of skill distribution and inference of individual preferences. Okay, and in addition, the paper discusses welfare results which John focused on. But my discussion here emphasizes this positive result which makes the model suitable for applied work. Now, I really like the paper. I'm going to make two comments. The first one is much more important than the second one. The first comment is, what if the market is not in steady state? So I was thinking of an applied person who would like to use this paper to actually do an econometric study of this great choice and interpret the results from the viewpoint of rational inattention. Well, the paper's formula for the action probabilities holds in steady state. And the author's discussion of how to achieve unbiased inference assumes that the formula applies. That is that the market is in steady state. But what if the market is out of steady state? What bias in inference and how much of a bias will there be? And furthermore, do we know how fast convergence to the steady state is? Now, having raised these two points, let me also say something in defense of the author's. It's possible that convergence is very fast. It's also possible that consumers simply use heuristics that postulate that my prior probability is just going to be equal to the market share. Okay? Furthermore, I think that point is probably clear but I want to make sure it is. The model does not imply that market shares in the data must be constant. So for example, if preferences change, so will the steady state market shares. But I think for, so to conclude with this comment, it would be nice to know a little bit more about out of steady state behavior before applying this to empirical work. Though I realize this is a very hard problem. This is a very hard question to answer. Okay, the second comment is about an assumption worth making to progress empirically. And I was, for a moment, I was thinking about putting a question mark at the end of this, but I actually agree with the authors that this assumption is worth making to progress empirically. Nevertheless, it's good to keep in mind that the authors are making an assumption which is important. It gives them tractability but at some point it might be a good idea to try to relax it. So what is this assumption? So in general, under rational inattention, agent's perception of any information is noisy. The idea is all information that's relevant is in principle available and any of it I'll perceive with noise and I'll just decide what pieces of information to observe for how much noise. Now in this paper, consumers digest one piece of information that's the market shares perfectly with zero noise. So market share data are observable for free while the information flow constraint applies to all other information. So there's an asymmetry and this asymmetry is a simplification for the sake of tractability and I completely agree that it's an assumption worth making. And by the way, this assumption has been made in the empirical literature on rational inattention before and my favorite paper there is probably by Majin Kasper, 16th, I'm moving Burke and Laura Weltkamp, Laura being our next presenter where they look at a model, they actually test econometrically a model of portfolio decision making, portfolio allocation under rational inattention and they assume in that paper that agents see asset prices perfectly which is a reasonable assumption. And so in this paper, the authors assume that consumers can easily learn market shares essentially which is also a reasonable assumption. Nevertheless, a richer model could ask, well, is this really the most easily learnable information? So is it really true that overall market shares are the most easily learnable information? And I was thinking particularly in the context of social learning, market shares in one's peer group might be. It's not literally the aggregate market shares that I'm best aware of but it's what people in my neighborhood or my friends are doing and exploring this in the future at some point might lead to interesting conclusions also with respect to welfare. So I like the paper and thank you. Thank you very much, Mark. So floor is open.