 is that he has been at almost all of the digital economy conferences that we've held in Toulouse since the very first one in January 2001. So that means that he qualifies as a veteran of the series and we're extremely grateful that he keeps coming. So Hal, thank you and you have 15 minutes for your introduction. Thank you, Paul. And thanks for having this at a reasonable time for us West Coasters. I've been getting up at 3 and 330 to talk to some conferences in Europe. Believe me, I think I'd rather have the jet lag. Anyway, by the way, you were breaking up a little bit just to warn you the voice was breaking up. So watch out for that. How did you discuss? Okay, I want to talk about reputations and recommendations. So I added a little topic. What's the value of information? Well, we all know that helps you make better decisions. What's the value of data? Well, it helps you get better information. And then you can use that information to get better knowledge, better understanding and all of these pieces are part of the famous data pyramid, data to information to knowledge and to understanding. And both reputations and recommendations are important in that. Now, what's the difference? Well, a reputation, a bad reputation as I think implicitly at least a recommendation, don't go to this restaurant or you won't like this movie or whatever. And if I can borrow a little bit from auction theory. Think of the distinction between common value, common value auction, where you've got something that's the same item for everyone, but they could. There's the same value to everyone. They just don't know how valuable that is. Like offshore oil lease work that Bob Wilson did so many years ago. So I think of a common value auction. Well, the same thing with reputation. Everybody agrees on quality, let's say, but they have different noisy views. And so what you want to do is you want to aggregate that information in some way in order to get a single score for the reputation. And then the other distinction would be private value. There you might have different preferences for attributes. So for example, you and I could have the same case in movies or in music or or food or other things and there. They could be common. They could be different. They could be private values. It could be public values, same general ideas, not a contradiction to say the food was of high quality. But I didn't like it because I'm expressing their point about my private value, even though I agree with the public point that the restaurant was a nice restaurant. I just wasn't far into the cuisine. Now, reputations. The first time I ran into this is kind of an interesting story. The FTC sponsored a meeting in 2005 about online auctions. So I gave a paper there, the paper I wrote on position auctions many years ago. But I talked to some of the people at the FTC about why they got into this area. Well, it turns out it was all because of eBay. Back in 2005 and the early 2000s was selling collectibles. They had buyers and sellers. They had all sorts of things. Every now and then a transaction would go wrong. The person we shipped something he didn't ask for or there was some damage on it and so on. And so if you went to the eBay site, you click on the button to file a complaint about a seller. You clicked on that button and you were taken to the FTC website. So all of a sudden one day the FTC started getting thousands of complaints of the form. He said he'd send it to first class and he sent it second class or he said it was in prime condition but there was a scratch on it and so on and so on. And all these things were flowing into the FTC because the eBay decided they just outsourced their whole merchant quality capability. The FTC said we better learn about this. We have to have some policies to deal with this. And of course, what a change from today because today the quality of the reputation is something that's viewed as very important by the websites, by the platforms like Amazon and Yelp and other places. They invest a lot of effort into trying to come up with good quality measures. And the recommendations, this sort of sister topic back in 1996, a computer scientist named Paul Resnick wrote his thesis on collaborative filtering. And the idea of collaborative filtering is I could have a list of all the movies I liked for a score for the movies indicating how much I like them. You could have a similar list. If we had a lot of overlap in the movies we've seen, then it's probably the case that that's predicted for the movies we might want to see. So in fact the collaborative filtering is just a way to automate the sharing of ratings. Find people that are similar to you and their historical preferences and then use that to extrapolate to new choices. You could also add in attributes of the person attributes of the product. But the really nice thing is you don't need that you just need a rank list or a list that indicates this is what I like, you have a list of this is what you like. We compare the two and then offer each other the difference the set difference between those so those things. And then automatically the way these recommendation systems have been built nowadays is to think of it as a matrix completion problem. You have products, you have people, you have a matrix that's partially filled in and what you want to do is you want to fill in the rest of that matrix. And so there are a variety of ways to do this and they work very well. We had one for recipes once and it was great all the vegetarians found each other very quickly. On the other hand, we had one once for jokes and there were a number of jokes. And those were not as good because the joke had this element of surprise to it. So that past behavior wasn't necessarily predictive of the future behavior. Now, in some of these papers that we're going to hear in a few minutes there's a discussion about incentives or at least there's even if it's sometimes a background discussion. Maybe the way to do it is to make ratings more like a recommender system, because the incentives are very good in a recommender system. What do they do they I want to be honest when I reveal my preferences for these products because the quid pro quo is I'm going to get helpful recommendations from you and vice versa. I guess we're pretty well there and maybe if we looked at people who are doing a lot of rating, then if they had similar rating, they were good recommenders, or if they had similar ratings, then they would have a reason to kind of join up. Again, provide honest or accurate ratings in a way that somebody was doing or randomly you're doing it for hire as we'll hear about a few minutes, would not have those same incentives. Now, one of the points that it's useful use worthwhile making about about recommendations and and ratings is that you can use transformation if you find somebody who has just the opposite preferences from you. Because you can do exactly what they don't recommend. In fact, one of the one of the suggestions I've heard is maybe what you should do is you find such a person who's exactly the opposite you then you should marry them. Because then you'll always be able to get good recommendations from each other just by inverting those those preferences. In fact, there's the old nursery rhyme jack Spratt. Remember Jack Spratt would eat no fat his wife would eat no lean. And so between them both they let the batter clean. So there was always something to like between the two. So, by the way, I will tell you another one other story about the recommendations and and ratings and I organized the world's first conference on collaborative filtering and the very first session first meeting that was in 1995 I think 95 or 96. The very first question was, what should we call these things obviously collaborative filtering is not a very good name because people have no idea what that means. It turns out specialists find it meaningful. And I said why not when I call a recommender system. People said recommender systems recommender systems that's a good idea. And so now that's been the name of the whole enterprise and of course it's also extremely important in terms of business models for platforms because people want guidance and help they want recommendations. And that's a big part of the business. Now, the papers in the session. Let me just run through them very quickly. I thought they were extremely interesting. Philippa said all said rating inflation is common. And you probably all heard the story about Lake will be gone where all the children above average. It's inflated so much. It's just broken all the laws of arithmetic. One of the problems we had once when I was at the University of Michigan was how to deal with grade inflation because people in other social sciences were we thought inflating their grades of their students and appropriately. And one of the proposals that came out which I think was quite interesting was to publish the students grade, along with their rank. You could say they got an a grade, and only 20% of the people got a is only 10% of the people got a so you could publish the distribution of the score, along with the score itself and I think that was a very helpful idea that can be used more often. And we see that of course when you report class ranking and that kind of thing. So, so that a may or may not mean anything but the top 10% of the class that's meaningful. And because it's doing this ordinal measure. And now laden at all wrote their paper on the incentives to upgrade and I thought that was extremely interesting that the, the fact that apples set the system resets to zero whenever there was an update and so they had that of course ended up discouraging an upgrade if you had a good rating. If you decided to upgrade then you go back down to not having any ratings at all which was a big mess. So, I thought it was really quite an interesting discussion in fact we had exactly the same problem or very similar problem with Android, this Android phones were sold through the carrier in many cases, and the carrier was responsible for the over the but as soon as they updated their help system would go down because it would be overloaded and people try to figure out how this new capability in the phones work or why this menu item and move from here to there etc etc. And we had set up the program so as to discourage these ratings because we made the carriers responsible support. Now that's reasonable because we knew everybody had the phone, but not everybody had Wi Fi at that time. Nowadays everybody has Wi Fi so the upgrades to Android take place generally over the over the Wi Fi rather than over the air and Google has taken on more and more responsibility for managing those those upgrades, but it's very important because upgrades are critical and knowing who should do it and what the impact will be that's a very important question the industry. Next paper so presuppio on fake reviews market. If you think about fake reviews, I think as far as I can tell they're just equivalent to false advertising which is illegal in most places, but it's done by this very fragmented market rather than by competitors or other other companies so you need a system that can deal with this in parallel ways of a company is falsely advertising you want there to be a penalty to that. And if individuals are falsely advertising. Then you want some sort of way to detect and deal with that but who who should police them. That's the question. Should it be the FTC. Well, I think the scale. It's likely to work as that was single regulatory agency that's engaged with this rating Facebook, Google, whatever. Section 230 has this interesting consequence that in fact is somebody is lying on your system. The platform owner is not liable for that line whether it's lying about competitors whether it's lying about politics. So I think it still is a important question of how that can be managed the platform itself has good incentives but they're not, not perfect. But invest a lot of think of Amazon for example who invests really substantial amounts and trying to track down untrusted or palatious reviews. They want to stall when and why do people rate and I think he has a very nice answer the group has a very nice answer on that they are likely to rate when the gap between expectations and actuality or large a model that is a Bayesian they've got a big gap between the prior and the posterior and that model for rating explains various stylized facts this will hear in a few minutes. The last paper by turn close so is on the and his colleagues is about the impact of gender on reviewing behavior. And that's an extremely interesting paper to they looked at cases in. Jack, I'm sorry you're going to have to begin thinking about trapping up. This is my last last episode will be done in a minute, one minute is a hotel manager allowed to respond to reviews. And in some cases I found this very helpful where you look at a hotel. For example, people have an open line you can read what their experiences are every now and then you have a bad experience and when you have a bad experience, the, the manager can step in and respond to it in some way. Now the problem is that can lead to confrontation that you have a debate going back and forth. And the argument in this paper with the finding in this paper I should say is that this has a bigger impact on women's likelihood to engage in this kind of confrontation versus men's. But again, we'll hear about that in a few minutes. So I'm now wrapped up and want to be happy to turn it over to the authors to describe some of these points in more detail. Okay, thank you very much. If you are wondering why I'm taking over from Paul is because it seems that Paul has got connection problems. So the first speaker is Benjamin Leiden from Cornell University. Ben, can you share your screen. Yeah, let me get that she's working wonderful. And so you've got eight minutes. Great. Thank you so much. Well, thank you to the organizers for including me on this program and for how for the great introduction to this session. I'm going to give you a brief overview today of my paper platform design and innovation incentives evidence from the product rating system on Apple's App Store. There's a somewhat longer talk available in a video forum on the conference website and then papers available as well. So the paper, you know, motivated by I think a growing public policy interest and of course a long standing sort of academic interest in how the design of platforms can affect intro platform competitive outcomes. I'm going to ask how the design of Apple's product rating system in particular on the App Store affected product innovation there. The reason I think this is an interesting context to sort of look at this type of question is that for essentially the first decade of this platform. The system was structured in such a way as to penalize updates by higher quality products. Right, so this creates sort of a purpose incentive scheme. And so I'm interested to see whether developers were in fact responsive to it. Leveraging a change in this policy in 2017. What I find is that developers were in fact responsive to these incentives that developers under the initial system updated less frequently. And in fact, this less frequent innovation was probably the result of lower investment or lower product innovation overall, as opposed to sort of delayed and bundled less frequent updates, as I'll talk about in a minute. Okay, so what was the policy that was in place well, you know, for for that first decade roughly, basically the way your rating would be calculated the rating that would show up all across the App Store would be calculated is they would add up all the ratings and reviews and average all reviews from the current version of your app. So when you produce an update you put out a new version, you're going to go from your nice five star review like you are rating like you have for this modern combat game you can see on the right to this no rating sign that you see for enigma two. Okay, so there's essentially a reputational penalty you're paying every time you update your app. And then once you get five or more ratings, then the stars will reappear and it'll be whatever you're getting at that point, but there's this period where you're not going to have any ratings displayed. So this is going to discourage innovation at the high end. It actually might encourage innovation at the low end to the extent that no ratings are better than than one star ratings or something like that. But that won't be the focus for today. So this was famously very unpopular with developers on the platform. And finally in 2017 the executive in charge of the app store admitted that this was kind of a stupid incentive scheme, and they introduced a new policy, which was basically to give developers a sense when you produce an update you can either reset your rating just like under the old system, or you can choose to keep all the ratings since whenever the last reset may have been. Okay, so that's that's sort of the history of this system. In the paper, I essentially asked three questions. So the first is whether or not the observed relationship between the average rating and the likelihood of updating changes with the policy change. In other words, was there this distortionary effect you can sort of think of the blue line as that the the updating likelihood for a bunch of developers under the initial policy, and after the change do we see a shift back to that sort of optimal red line. Once I find that there was in fact an impact of this initial policy. I'm going to ask how the size of the effect varies with the size of the reputational penalty and app faces so you can imagine. So what I'll show you under sort of question one is that the frequency of innovation was much lower in the initial policy under the initial policy. But you know that could either mean we're sort of throwing out ideas or investing less, or it could mean that I'm just holding and bundling up a bunch of updates. So what I'll show you under sort of question one is that the frequency of innovation was much lower in the initial policy under the initial policy. So we're bundling up a bunch of updates and releasing them in larger packages less frequently. Now even in that case there's sort of a dynamic welfare law so this isn't necessarily some good from a societal perspective, but it would be less bad than if there's just less innovation overall. And so we'll look into that the way we'll sort of address that is to see once the policy changes in 2017, whether sort of small bug fix updates become relatively more frequent, because you no longer face this large reputational penalty for sort of minor revisions to the product. Okay, the empirical strategy lots more details in the 15 minute video version of the stock as well as the paper, but it's basically to estimate a linear probability model to determine whether this ratings updating the relationship changes after the policy change. Okay, I'm going to use a weekly panel of iOS apps in three categories education productivity and utilities. In order to study this question. In order to answer that third question we're going to have to classify the content of app updates as either small or large in some sense I'm going to do this in two ways. One by looking at the version number of these apps. There's sort of this historical system for changing version numbers in certain ways to indicate smaller large updates. This is not always followed it seems to be somewhat falling out of favor with a lot of developers. So I'll also look at text release notes that are written by these developers, describing what they're doing with each update, and I'll classify updates as either large sort of feature adding updates or small bug fixing updates. Okay, so what do I find here to summarize sort of the main results from the paper. The sort of top line result is that the original policy decrease the frequency of product innovation that developers were in fact responsive to this. The storage app in the store is going to increase innovation or product updating by about 8% after the policy change. But of course there's a lot of heterogeneity around this. In particular it becomes really clear in the data that the size of the reputational penalty matters. So first I find no effect of this initial panels, this initial policy on the updating behavior of the most popular apps. Microsoft Word doesn't care what its rating is on the app store, because it's so dominant in the field or in its sub market, whereas my you know very niche note taking app, maybe really dependent on the ratings and that content. And so I'm going to be really responsive to that initial policy. Similarly, I find no effect for apps with the highest arrival rate of reviews so remember the reputational penalties that I go from my nice five star review to that no rating sign until I have five or more ratings when those numbers start showing up again, the stars start showing up again. Some developers are going to get lots of reviews coming in constantly. So this is going to be a very short lived penalty, and those developers aren't very responsive to the initial policy. So for other developers it can take months to get that many reviews or ratings, and so there the penalties going to be very severe, and you're going to see a much larger response from from those developers. Okay, to that third question then of course whether or not this innovation is sort of lost or just delayed. There's suggestive evidence that the original policy did in fact lead to lost product innovation. So, under the version number of approach of classifying updates as sort of small or large. We find no evidence of a relative increase in the small updates after the policy changes. This allows us to reject this theory that developers are holding on to innovations and bundling them into less frequent releases during the initial policy. Using the release notes those text descriptions of these updates. The evidence is a little bit more mixed. But what it appears is happening is in the education category. There's evidence of a relative increase in small updates after the policy change, which means we can't reject either of these either the bundling theory or the lost innovation one. So it's not clear what's going on there right now. But in utilities and productivity my other two categories of apps. There we don't find evidence of an increase in the relative frequency of small updates. And so we can again in those cases reject this theory of sort of bundled delayed innovation in favor of this idea that developers were actually investing less on the platform. Okay, so, you know this was a pretty short talk so I won't just repeat everything I just said but you know I think the big takeaway here is that the structure and the incentives created by this, this rating system had a meaningful effect on on innovation on the platform, which I hope contributes a new empirical evidence to our understanding of both rating systems more generally, but also how, how the design of platforms can have, you know, a real effect on intra platform competitive outcomes which I think is, is of course a growing interest these days with lots of current concerns about dominant platforms and the behavior of platform owners. So thank you for your time. Benjamin that was fantastic timekeeping and a really fascinating presentation. Our next speaker is Conrad Stahl, who is going to talk about when and why do buyers rate in online markets. Conrad the floor is yours. You need to unmute yourself Conrad I think. Okay. Okay, I'm sorry. Yeah, it didn't work. Well, it's it I can hear you now. No problem. Okay, wonderful. Thank you for the organizers as usual and thank you all for the introduction. Let me announce that the longer presentation is somewhat longer presentation is available on the website. So you see the title of this talk. Let me start right away. The motivation is an obvious one. Everyone knows that rating is important in particular in online markets. The key point we want to focus on is that rating is voluntary and every every evidence on this shows that rating is highly non random. In particular, the standard evidence shown is that the rating concentrates on the extremes in the distribution from many positives to a few negatives. The so called J distribution. This gives rise to a number of questions of which we want. We don't answer any everyone but at least a subset of them. The first question is truthfulness of buyer rating. We don't give an answer to this, but the selectivity of rages amongst the buyers is is an interesting question we address. And the selectivity in particular of the individual buyers rating decision. And this this gives rise to the by my standards key question as to whether an unweight rating index that is typically used in in in online's online markets is biased and thus in on informative. Okay, we our contribution we provide a little theoretical model. I think it's a pretty elementary model but it explains a lot by our take. It explains the decision to rate i.e. selectivity at the individual level. The key idea is that the bias utility from rating as how has said is increasing in the intensity of learning from a transaction. And learning is the difference in applies that is involved involving the difference between prior and posterior belief and the key point is how the prior belief is for. Okay. I think, or we think that in spite of selectivity rating aggregates if carefully constructed can be unbiased and thus informative. And the current work this is working progress is on exactly the fine tuning of these rates, which is surprisingly difficult. The empirical results we can be present here are from the eBay raw data from the administrative data I'm sorry the main insights are the following the intensity of rating is decreasing in the number of transactions performed by the typical seller. It's a clear decline in this density. In particular the bias reluctant to leave a negative rating. This is me are reluctant more and more the larger the number of positives if you want that are typically accumulated in the rating record. Now, by inverse when the rating record diminishes by typically by a negative say the first negative. A significant increase in the likelihood to follow up with a negative rating. And this, even if the seller is not that there is no moral hazard. Impa generated on the bay on the level of the seller, I will come back to this point. Okay, now the interesting point is that with our little theoretical model we can rationalize all these results. And most, if not all empirical observations on rating in the literature. Let me quickly walk you through the model. It's on the individual bias rating decision. We consider transactions as being of high versus low quality. The buyer forms a prior belief and that is key here, based on the platform and seller quality that is inferred from the rating index. There is a general platform quality on which I do not elaborate right now. And there is a seller quality which is directly inferred from the rating index that is provided. Okay. Now that that rating index can be pretty general. Rater forms a posterior belief based on her experience that is generated from the transaction when she has purchased. And as we, as I just mentioned, or just emphasized, you know the benefit from rating this is our principle guiding assumption is increasing in the absolute difference between prior and posterior. And the rating then is taking place if the benefit exceeds some just in credit costs. The prediction is quite straightforward. The first prediction is that an increase in the rating index decreases the rating probability and the decrease in the rating index increases the probability that a buyer rates negatively relative to that of rating positively. These are the two predictions we take to the data. But before I go to the data, let me just illustrate what happens here. This is basically the difference between the positive posterior belief and the prior belief as generated on the basis of what I just said. And this is the difference between the negative posterior belief. I mean generated from a negative experience and the prior belief. You see that essentially, you can generate all the results from this little picture. Okay. Let me go to the data just in, I have to save on time. Unfortunately, we have three samples and the guiding sample is the top 5% of the of the sellers. We start, I should say, we start from sellers all starting in a particular time at a particular time and we follow these sellers over time. Okay. So these are these three samples that these are the top 5% sellers we are looking at. And these are sellers that have at least 86 transactions in the first year. This is a balanced panel. So there is no, there is no truncation bias. The rating indices, I guess, are known in the in the scene. The eBay rating indices are the percentage positives feedback score and the so-called DSRs the detailed seller ratings that are differentiated. We use all indices and check the results for robustness. So if you observe the obvious thing is the well-known vast dominance of positive feedback that is typically considered a per se reason for rating bias. I don't think or we don't think that this is not necessarily so. First of all, you have a seller self-selection onto the platform and then you have a platform selection strategy which plays a key role in this and that is probably underappreciated. You're running out of time. Could I ask you to try and wrap up as quickly as possible? This is the first, this is the first empirical result. You see here this decline in the probability of rating with an increasing number of transactions in the horizontal axis. The second result is a bit more involved. We construct, we construct several sub samples of that sample. The key sample we are looking at is that is controlling for seller behavior by looking at transactions before some time at which the first negative arrives and the feedback is given thereafter. And we see when we look at the effect of the first negative then that in this class two, this is exactly the sample I was looking at. You see a significant increase in leaving a negative feedback. So there is a clear path dependence in leaving a negative feedback. That is incidentally concentrating on the young buyers rather than the old buyers that is the experienced buyers. And it is concentrating on the earlier sellers, the young sellers. So let me summarize. I think what I just said, one key point I want to make here. The rating decision is path dependent, which is an interesting point by my take. And the rating index is clearly inefficiently a standard rating index that is on rate is inefficiently aggregating the information, even if the individual ratings are truthful which is assumed here in our, in our exercise. And the work in progress involves unbiased rating index constructions and we give preliminary results but I'm afraid that I'm out of time. Thank you very much. Thank you very much. Comrade. This is a jack I'm taking over. Paul is again cut off. Thank you. The next speaker is. Is that right way to pronounce it. Yeah. Thank you. Okay, so I'm going to share my screen. Doesn't look okay. That's perfect. Okay. So hello everyone. This is a mark. I'm a PhD candidate in marketing at USC. And today I'm going to be presenting a joint work with David and Francesca, which we titled this gender matter. And what we're doing is exploring or trying to investigate the effect of management responses on reviewing behavior. And briefly illustrating what is the context that we're studying, which is online reviews and management responses rate. So you may be already familiar with only reviews, which we know usually consists of a rating, some text, some description to, and at least in some preview platforms, we will also have some information of the author. Now the second component here that you may be less familiar with are these responses from the manager, which in simple words are basically a message that some representative of the business can post, which is in direct response to a specific review. So it's like a one to one message here. And it's also public it's visible everyone else in the platform can see this response. Now, the adoption of this management responses has become quite popular has been gaining important popularity on platforms such as TripAdvisor or Yelp, because business see them as an effective way to manage their own reputation. We know that rating are important for them. Now, we know also from the literature that these management responses may can increase the number of reviews a volume business receive and will also impact the average rating, but in this paper we're trying to understand where is this change coming from in terms of reviewers decisions to write a review, right, who is leaving a review now and who is not leaving a review. Normally, the two research questions which blur here are first two different reviewers react differently to the presence of this management responses, and in particularly we're looking at potential differences between female and male reviewers. And if these differences exist, what are the consequences in terms of their review and behavior. Now, we started blurring this first research question with a survey in which we asked online review platform users, what is the reaction to the presence of management responses. Now we found something interesting significant differences between female and male users, in particular, that when these reviewers have a positive experience and they want to leave a positive review. Female users are more excited about the possibility that now they're talking directly to the manager so they see them as an opportunity to praise or compliment them for the good job. But if the experience was negative, and they want to leave a negative review, there is like female users are also at the same time more concerned that this might create some sort of conflict with the manager or that the manager may discredit what they're saying. Now we found this last concern particularly intriguing, and we wanted to further explore whether is this just a perception driven by gender differences or can we actually find some evidence of this on this platform. Or in other words, whether this concern is partially explained by the way that managers actually address reviewers. Now to explore that we analyze the text of some management responses that we collect from TripAdvisor. Now we look at the differences in with some of the linguistic choices made that managers may depend on whether they address female reviewers or male reviewers. Now we know that especially when responding to reviews, negative reviews from female users. Managers tend to use fewer positive emotion words, but more negative emotion words, more negation words, more anger words with like, for example, saying that the review is a lie or something like that. And they also tend to use more fair personal pronouns. So rather than directly talk to the reviewer like thank you, or using the uver pronoun, they tend to refer it like to the reviewer as she so she did something she said something and so on. Now, this difference may suggest that female reviewers receives some less favorable responses from the managers but we wanted to further formalize this analysis. And for that, we build a text based classifier. So we try to character characterize responses on whether they are contentions or not. And by contentions we mean that the manager is trying to discredit the reviewer, it's being confrontational with the reviewer or maybe responding aggressively. Now this classification is very purely on the text, the content of the message, right? It's a bag of words based classifier, it's a logistic regression, we try many, but the point is that it's really just based on the text, the content. And when we look at who are these reviewers being addressed to, we know that actually responses to female reviewers are more likely to be classified as contentious. Or in other words, seems that female reviewers are more likely to receive this kind of responses. Right, so this far we learned that female reviewers perceive responses as a potential source of conflict, and that that seems to be a well founded conflict that the situations can take place in this online review platforms. So we then wanted to measure or quantify what are the consequences of this in terms of reviewing activity. And to explore this, we collect the entire review history for 5000 hotels and trip advisor. And we use this individual level review data to explore whether the probability that a given review comes from a female user change to change in the presence of management responses. Now, because of time constraints, I'm going to skip the details of the analysis, but it's basically it went up to a difference in difference strategy, in which we exploit the fact that some hotels responds, but some hotels do not respond. So we have some treated our control hotels. And then there is some significant variation and when did they start doing that. So we have some before and after period too. Now, with this analysis we see that after managers start responding to their reviews. It is likely to see that a given negative review comes from a female user. Now, this analysis of course have some concerns of hotels may sell out into treatment so we have some robustness checks with some other treatment variables and we also replicate similar findings in a lab setting. And what it's also quite interesting is that we show that this effect of female users writing fewer negative reviews is stronger for hotels that write more contentious responses. Yeah, so just to write up, we believe that our findings help or contribute a little bit to the understanding and reviewing behavior, this selection into who write reviews, and our findings also have some implications for review platform and for our businesses. Specifically, they may be considered as a red flag right that there is some from from improvement to make sure that there is a fair communication process here. And that especially hotel managers are treating their customers in a fair way. And with that, I think I'm going to close here. Thank you for your time and attention. Thank you very much is Amar, not going to say she did a great job. I'm going to say you did a great job. Lots of positive work. Perfectly to time. So the next speaker on reputation inflation. Thank you very much. Can you see my screen correctly. Yes, yes. So thank you for having me. I wish we could do that in person enjoying southern France and having fantastic conversations, but this will have to do for this year. So I'm going to talk a little bit about one phenomenon in online reputation systems that are being used in online marketplaces. Every time we look at reputation distributions on this marketplaces, we see that they're overwhelmingly positive. Here's an example from eBay. Here's an example from Airbnb. And here's an example from our focal platform. So our focal platform for this study is an online labor market for freelance work. So according to this distribution, things are going really well on the platform. But our first question is, is this reflective of actual very high quality transactions and how could we know. So the first thing that we do is we take advantage of our data set that spans more than 10 years, and we go back in time in 2007, and we take a look at what the average feedback given in each month was over time. So what we find is that average feedback scores have been increasing over time. And up to the point that the average feedback score that's given at the end of our data set is much higher than what was the case initially on our platform. So actually distribution that I applied in the previous slide was just from the last two years of our data set. So feedback scores have not always been overwhelmingly positive. They became over time. And a question here, a natural question is, is this phenomenon taking place only in our platform or is it a little bit more general? So the next thing we did is we tried to collect data from many online marketplaces, and it seemed pretty general in other marketplaces are employing similar reputation systems scores have been increasing over time. And if we want to go a little bit in the non online world, this thing has been happening in other rating systems, and in this case, in GPAs in US schools. So in the 1940s, only 15% of students received ACE. In 2012, this percentage is 45% much higher, and it's probably even higher in 2020. So the question here is, what does this reflect? In the case of colleges, you could reflect, let's say, US professors becoming 300% better, or US students becoming 300% better, or, you know, classroom education technology air conditioning, what have you, becoming much, much better and improving performance. So let's go back to, or it could, of course, reflect the fact that we're just giving out grades more easily for some reason. So let's go back to online marketplaces and see what that what that translates to. So this translates to two sets of potential reasons. First, we could have improvements in platform fundamentals that increase transaction quality. This could be better cohorts of people joining people amassing more experience. Better search matching technology and so on everything that might improve transaction quality. The second set of reasons lower standards giving out feedback more easily is what we call reputation inflation in this paper. The next order of business is to try to disentangle these two sets of potential reasons for this increase that might be taking place at the same time. So the problem here is that we are observing this primary numerical score over time, but we're not observing transaction quality, it's unobservable, and it's affected by many, many things. So what we're doing here to disentangle inflation from improving fundamentals is we're trying to find alternative measures that might be subject to less reputation inflation or no reputation inflation at all. So in this paper we use two alternative scores and the first score is a private feedback measure that the platform started eliciting in combination with public feedback. This private feedback was given after transactions and the platform was not revealing it to everyone. They were just saying the employers that they would get it only to do some internal evaluation. So we have a public feedback measure and a private feedback measure for the exact same transactions. So what we see during the time that both were collected is that for the very same transactions, average public feedback scores increase whereas average private feedback scores increase. So this gives us one way to kind of see this inflation taking place. The second alternative measure that we're using is written text that is given in combination with numerical feedback and we're doing some slightly more sophisticated natural language processing analysis in the paper. But I want to show one piece of evidence that's kind of model-free and I think gets the point through. We see that when employers give the very same phrases in the written text, the feedback that they used to give in 2008 and 2015 greatly differs. So when employers have a good job, they used to give 4.7 stars, 2008, now 2015 they give 4.95 stars. When they used to say terrible, they used to give 1.5 stars in 2008 whereas now they're giving 2.5 stars. So putting all of the estimates together, our most conservative estimate is that at least 50% of the increase we observe in numerical scores is due to reputation inflation. So our next task in the paper is to kind of try to pin down why this is happening. And of course there might be many reasons but we're trying to accurately pin down one of these reasons. And the way to do it is writing down a theoretical model in the paper, but let me give you the intuition behind the theoretical model. So the first thing that this model recognizes is that badly feedback is costly for the rater to give. And raters might dislike giving this bad feedback for many reasons. They might not want to harm the prospects of the rated worker. They might not want to harm their own future prospects because workers will avoid them if they know their harsh raters or they want to avoid costs that have to do with worker retaliation since there's angry emails and so on. But these costs by themselves, they would probably explain only a bias in the feedback, not this trend over time. And to explain this trend over time, the second component we need to alter our model is that what feedback is good and what feedback is bad changes over time. And what this means is pretty simple. If everybody gets a 2.5 out of five stars, then giving somebody four stars is excellent. If everybody gets five stars out of five and giving somebody four stars is bad. So we put this thing in a model. So what feedback is bad depends on the average feedback that's given on the platform at each point in time. And now employers find it costly to assign bad feedback. They incur a cost that's analogous to the cost of giving this bad feedback to the workers. And I might start lying about their experience and start inflating the feedback if the cost of assigning this bad feedback becomes high enough. So this model gives us exactly what we see in the data. And one last component we have in this paper is that we have a direct test for this model from a change in costs to bad feedback. So what happened in our platform, as I told you before, is that they started collecting this private feedback measure. They said they would never reveal it to other users of the platform. And they did some internal tests so it was much more informative, much more predictive of future performance. And of course, what they decided to do ultimately was to start revealing it. So they started revealing this private feedback to everybody in the platform, but they started revealing a batched and anonymized manner in the sense that workers could not see which employer gave which feedback. And its workers private feedback score was updated every five new private feedback ratings that they got. So this kind of controls against fears of retaliation, this kind of batching and anonymization, because the worker cannot trace bad feedback back to a specific employer. But on the other hand, if raiders care about harm, then they will start not giving bad feedback and will start seeing this inflation. So if we go in the series of private feedback scores over time, and we got the time where this private feedback was publicly revealed, we see exactly what this model would predict private feedback scores. I'm afraid you're out of time. Could I ask you to wrap up please? One more slide and I'm done. Thank you very much. This private feedback scores are inflating immediately. So what we see is that much of this increasing feedback scores is due to reputation inflation seems ubiquitous across many platforms and can be driven purely by a desire not to harm rather than fear of retaliation in the terms of raiders. So there is a lot of room here for future research about implications and market design interventions that can counteract this reputation inflation. All right, thank you very much. Thank you very much.