 Thank you so much for this great opportunity. Thank you to the seminar organizers for including our paper to the seminar series. So I'm very happy today to share our work on quality certificate design. So this is a joint work with Xiang Hui from Washington University in St. Louis and Ginger Jian from University of Maryland and NBR. Both of my quarters are in the audience. So let me talk a little bit about the motivation. So quality certification is a common tool used by platforms to enhance trust in the marketplace. Just as the use of reputation systems such as reviews and ratings, quality certificate is actually a very good way to mitigate information asymmetry and moral hazard problems. On the one hand, you can have quality certificate to help consumers to identify high quality sellers from local quality ones. On the other hand, you can use quality certificate to motivate the sellers to exert effort. If you look across quality certification programs on online marketplaces, you will find that usually quality certificates are based on two types of information. So first you have consumer reported information such as ratings, claims, complaints, and so on. So clearly there's a good reason for using such information to construct quality certificate because they are relevant for the ultimate consumer experience. Therefore it is good for the long-term health of the platform. However, these consumer reported information can be random noise to individual sellers, something that they have no good control over. For example, consumers can blame the seller for a late shipment when in fact the late shipment was due to the fault of a third-party shipper instead of the seller. The second type of information that goes into quality certificate is the seller input. For example, seller cancellation of orders, whether seller shipped out of the order within the specified time, and so on and so forth. So there are also pros and cons of using seller input measures. From the seller's perspective, these measures are their action items. They are direct input, so they are far more objective and controllable from the seller's perspective than consumer reported information is. However, the downside of seller input is that they may be misaligned with consumer experience. Therefore, the platform really faces a trade-off between relevance in the consumer reported information and the controllability in seller input measures. That is, a consumer-centric or output-based certification program can very much align incentives from both sides of the platform, but sellers can be discouraged because of the noise or uncertainty in consumer reported information. A seller-centric or input-based certification program can better motivate sellers to exert effort with clear signals, but this may come at the cost of misaligned incentives. So our paper studies this question. What is the effect of going from an output-based system to an input-based certification program and the economics behind it? So we achieve this goal by leveraging this rare opportunity of a major redesign of eBay's certification program. So eBay's certification program or the eBay top-rated seller program is the long-lasting certification program on eBay. It has been running for many, many years now. So ETR sellers enjoy two benefits. First, they have a 10% commission discount, so there's a benefit of cost savings. And secondly, they have the ETRS batch like this one prominently displayed in the search result page as you see here, as well as the product listing page. So there's a benefit of the signaling value. To earn the certificate under the older regime before the policy change, sellers need to maintain a defect rate no higher than 2%. So the defect rate is counted by combining all these cases here. Okay, specifically, we count the number of negative or neutral feedback, the number of low detailed seller ratings on item description, buyer claims, seller cancellations, and the low detailed seller ratings on shipping. As you see in the older regime, it puts a lot of weight on consumer-reported information because four out of the five measures are consumer-reported information, and only number four can be considered as direct seller input. Therefore, in the older regime, consumer feedback is highly influential as to who gets the batch and who does not. Sellers can be discouraged because they are exposed to a great deal of risk and uncertainty because of the nature of consumer reports and the noise embedded in consumer-reported information. And this is actually best summarized by a quote from an eBay seller. So one of the reasons I stopped chasing top-rated sellers in the past was that I had no control over factors like detailed seller ratings. One bidder gets a package a day late, leaves you two stars and boom, there goes your top-rated sellers. So realizing this is a problem to sellers in 2015, eBay announced a change in ETRS requirements. The goal of the policy change is to make the certification program simpler and more objective. Specifically, the new criteria reduced to three measures. On resolve, the buyer claims where eBay find sellers are at fault, number of seller cancellations, which is the same as before, and late deliveries as defined by tracking information instead of consumer-reported information. So these quality measures, as you see, are direct measures of seller input. For example, late deliveries are based on whether seller ships out of the item on time based on the tracking information. That is, if the order is late because of third-party fault, then it will not count against the seller if the seller indeed shipped out of the item on time based on the tracking information. That is, as long as the seller did their part, they're good to go. So as you see, the new certification regime really moves from an output-based system to an input-based system. So that is the nature of the change that we interpret here. Okay. Let me now give you some high-level findings in case I don't get through all of them. So first, we find that there's an immediate selection effect going on who gets certified. So we find that the certificate that is free from consumer-reported information or free from noise is automatically friendlier to small sellers and sellers who operate in markets with more critical consumer ratings and reviews. We'll explain how that is defined later on. Second, in terms of quality provision, we find that the quality improvement is smaller, actually, for small sellers and sellers in critical markets. These are the exact seller types and market types with more favorable selection effects. That is, whenever selection effect is more positive, you would expect the less effort making from those cases, indicating a strategic threshold to targeting behavior, which we will elaborate. Next, we also find the proportion of certified sellers across markets on eBay become more homogeneous. And lastly, we find that sales are more concentrated towards big sellers as a result of the policy change. So the key implication here is that if you reduce noise in the certification program, you can motivate the seller effort, but this also allows them to target the threshold. So the total effect on seller quality provision is exactly unclear. So before I talk about the model and empirics, let me briefly walk you through the literature. So our paper relates to three strands of literature. First of all, we are related to a set of papers on eBay certification and the value of eBay certification using eBay's internal data. For example, Alphabine et al. estimate a 7% consumer valuation or consumer willingness to pay using eBay's UK data. Hui et al. showed a 3% increase in sales price using eBay's US data. And beyond the eBay setting, we are also related to a set of studies on quality disclosure and the welfare impacts. The seminal work by Gina and Leslie finds that the mandatory display of restaurant hygiene cards incentivizes restaurants to improve hygiene quality measures. For example, Farronato et al. showed that occupational licensing would add very little value above and beyond the reputation systems on a platform. So lastly, we also relate to a big and growing literature on the value of consumer ratings and reviews, particularly a set of studies that show how design of rating systems can motivate seller quality efforts. So our contribution to these literatures is to study whether certification design should be input-based or output-based and the consequences to seller quality distribution and market outcomes. So because of the time, I'll just skip the details of the model and really devote the time to highlight the theoretical predictions. So here, this graph really captures the essence of the model. So in this graph, what we do is we simulate the probability of a seller getting the ETRS batch under the old regime with consumer reporting information, where the threshold is set at 2%. That is, the share of bad consumer feedback has to be no greater than 2%. Also, we assume that the sellers are making perfect effort as defined by the new system. That is, their input measures are 100%. So because consumers can be critical or were demanding, even when sellers make a perfect effort, so there's still a positive chance of sellers not getting ETRS when they are making 100% effort. So we compute this probability of getting ETRS by a binomial distribution, where the key parameter here is how critical consumers are above and beyond the perfect seller effort making. So we plot the probability of getting ETRS certificate as a function of how critical consumers are. We also plot the numbers by two seller sizes. One is small with 20 orders and the other is big with 2,000 units. So we observe two things from this graph. First of all, the probability of getting ETRS is a decreasing function in consumer criticalness, regardless of seller size. And this is very much intuitive because when consumers are more critical, it becomes less likely for the sellers to overcome the hurdle, the bias, to become certified. And secondly, we find that seller size is also a very important margin to think about. The key here is that all ETRS relevant metrics are averaged across all orders of the sellers. So when consumers feedback are more lenient than 2% cut off, which is the left side of the vertical line, small sellers are more vulnerable to consumer criticalness than large sellers. Essentially, you can think of this as a law of large numbers, which can guarantee that a well-performing large sellers will almost always pass the bar, but chances play a very, you know, another role for sellers with only a few transactions. But this pattern is exactly reversed as you get to the other side of the cut off, because in that case, it's easier for a small seller to just get lucky with a few lucky draws, but the large seller would have very little chance of passing the bar when consumers are particularly picky. So this means that the regime change from the older regime, which is input-based to the output-based measures, would immediately benefit all sellers who operate in critical markets. In addition, it should disproportionately benefit small sellers in non-critical markets and large sellers in critical markets. So that is the predictions that will guide our empirical exercises. So a little bit of the data, okay? So for this project, we use e-base internal data on transactions and seller performance metrics aggregated to seller months or market months level. Our sample goes from October 2014 to August 2016. So we have 11 months before the policy announcement, five months between policy announcement and implementation, and six months after implementation. The sample includes the 336, the level two product categories, which you can think as the product markets on e-bay. So this is after removing very small markets which can induce unwanted volatility to aggregate measures. We also have about 380,000 sellers after removing extremely small or occasional sellers who had less than $5,000 of sales in the year before the policy announcement. So after removing these outliers, we are still able to capture 99.7% of the total GMB in the sample period. So now let me walk you through the key results one by one. First, for result one, immediate selection into certification. So let me first be clear about how we define selection effect. Selection here is constructed by simulating sellers hypothetical ETRS, which is done by applying the new ETRS requirements onto sellers performance metrics before the policy announcement. That is, we apply the new requirements onto existing performance metrics so that we make sure that sellers make no effort change as yet. So the difference between the sellers actual and the simulated ETRS status should be able to capture only the algorithmic selection effect. We do not allow for seller effort change as a response. So here, what we do is we plot the algorithmic selection effect across markets on e-bay that vary in consumer criticalness. So each circle here represents a market, a level two market on e-bay, and the circle size tells you how big they are in terms of dollar sales. On the y-axis is the number of the simulated ETRS sellers upon the policy announcement divided by the actual number of ETRS sellers at the policy announcement date. So this ratio is smaller, that is, out of all sellers with ETRS under the old regime, what is the chance of the sellers getting simulated ETRS sellers based on the new regime. So this ratio is smaller than one in almost all markets, which means that the algorithm has a negative selection effect in almost all markets. This is actually reasonable and expected because the sellers were making their effort as defined by the old regime. So when you change the quality measures to something else, if you change the quality standards to something else, then naturally not as many sellers would automatically qualify for the new requirements. So it is not really the magnitude that we care most about, but the variation across markets. So particularly we care about how they vary, how the selection effect vary across markets that vary in consumer criticalness. So on the horizontal axis, we plot consumer criticalness of the market. So we define consumer criticalness in a market by the share of transactions in that market that were considered defect or bad in the old regime out of all transactions in that market that are considered good in the new regime. So again, provided that sellers are making perfect effort, what is the chance of consumer complaint? So we draw a few insights from this figure. First, we find that markets vary greatly in terms of consumer criticalness. So we have a very nice variation to leverage. Secondly, we find that almost all markets experience negative selection from the algorithm change as we discussed just a little bit ago. The sellers were targeting their effort towards the old regime once we apply, when we apply them, the new regime. So if you apply a new, completely new set of standards, then naturally a lot of sellers will lose batch because of that. Lastly, there's a clear positive association between the simulated change in ETRs sellers and consumer criticalness. Okay, so this is consistent with our theoretical prediction that sellers were handicapped in the markets with harsh consumer criticalness. As a result, once you remove the noise in consumer reporting the information, the new regime would generate a greater positive selection for sellers in these markets. So in this figure, we look at the data a little bit differently. So here we plot the before and simulated share of ETRs sellers. The before is represented by the blue curve and the simulated ETRS shares is represented by the black curve. And we plot the series by across different percentiles of the seller sizes and by critical and non-critical markets with a median split across the markets. So here, the selection effect is actually defined as the distance, the drop between the blue curve from the blue curve to the black curve. That is going from the ETRS status based on the older requirements, then the black curve reflects what is the chance of the sellers automatically qualifying for the new requirements. The vertical line around the 90th percentile is actually the cutoff for large sellers as defined by eBay. So large sellers are sellers who have 400 or more orders in the three months before each ETRS evaluation period. So we make a few observations here. First of all, as you see, large sellers are more likely to get ETRS. So they have a higher baseline ETRS probabilities to begin with. And this is true for both market types. Well, this makes sense because first of all, I didn't mention this in the background, but the certification also has a requirement on a minimum number of past sales. So there's a quantity threshold for a seller to be eligible for ETRS. So larger sellers are more likely to satisfy this quantity threshold. And second of all, you may think that seller size is a positive signal of seller quality, which has been documented in many contexts. So because of these factors, it is expected that larger sellers are more likely to get ETRS in the first place. Secondly, we observe that the selection effect or the difference between blue and black is more positive for small sellers than large sellers. As you see, the gap increases in seller size. So one reason is that small sellers were disproportionately affected by random noise in consumer reporting information because of their small number of orders. So a reduction in this noise immediately helped these small sellers gain the ETRS badge. Next, the selection effect is more positive in critical markets than in non-critical markets. Where sellers are less likely to be badge in critical markets in the old regime because of more criticism in consumer reports. So the drop in critical markets is not as big as in non-critical markets. And lastly, there's this very interesting uptake for very large sellers in critical markets, meaning that the selection effect actually favors those large sellers in critical markets. This is quite consistent with our intuition, our simulation results earlier, where we find that large sellers in critical markets are actually more handicapped by critical consumer reports because a lot of large numbers. So it's very interesting to see that confirmation from the data. So that is selection effect. Now let's move on to an effect on seller effort. To understand the seller's effort change as a result of the policy change, we actually use the same graph as before and add this green curve that represents the share of ETRS sellers after policy implementation. So here we define the policy effect on seller effort as the increase from the black to the green. So apparently we see an overall positive gap between these curves for all seller sizes and all market types. In addition, we find that the effort improvement, the difference between blue and difference between green and black, is actually smaller among small sellers because the gap increases with the seller size. So a comparison between the market types suggests that the effort improvement is actually smaller in critical markets. As we discussed before, small sellers and the sellers in critical markets actually benefit more from the selection effect. And here we see that they make less effort, which suggests that there may be some strategic threshold targeting going on here. That is, if a seller can easily meet the new requirements without doing anything, then she is not likely to make much more effort beyond that. So we now provide direct evidence on the threshold effect. This is raw data, which shows a clear pattern of convergence towards the threshold. So in this graph, we plot the time series of seller effort making for different seller types, where seller types are defined based on seller effort levels before the policy announcement. So the left and right graphs plot the defect and the late delivery, where defect is a combination of seller cancellation and unresolved buyer claims. So these vertical lines correspond to the policy announcement and implementation, and the horizontal lines correspond to the threshold. For example, for late delivery, the cases of late delivery has to be less than 5%. In both graphs, we see a convergence towards the threshold for all seller types. That is, sellers who excelled in these effort measures before the policy announcement would shirk, their measures would actually go up and gravitate towards the threshold. At the same time, sellers who are short of the certification requirements actually would improve their effort, and therefore their effort measures would also gravitate towards the threshold from above. So that is based on raw data. Now, we adopt a regression discontinuity design type of analysis to more rigorously identify the threshold targeting effect. Recall that the new certification requirements takes into two metrics, defect rate and late delivery. To construct the sample for the RDD analysis, we focus on sellers who meet the bar on the defect rate, but whose late delivery is actually just 1% above or below the threshold at 5%. So we just compare the before-after change between the just below and the just above cases to see which side has a stronger incentive to make effort. Column 1 shows that the sellers who are simulated not to be batched or who are below the bar based on their existing performance metrics would actually improve on delivery speed relative to sellers who are just above the bar. So this result is actually consistent with the threshold effect because sellers just below the bar are motivated to exert effort because their marginal benefit of doing so is larger than sellers who are just above the bar. In column 2, we control for the post-DAMI for the after-policy implementation and find the threshold effect is actually stronger after the implementation date. And in column 3, we also add interactions with DAMIs for the quarters before the policy announcement. So we find no significant differential pre-transit. So across the specifications we see a clear threshold targeting behavior. But if you think about it, if the threshold story is true, the logical deduction is that sellers just below and just above the bar on delivery speed should have no differential effort incentives on the de facto rate because they already, all of them already satisfy the de facto rate requirement. And this is exactly what we see in column 4, 5, and 6 where there's no difference in effort making for just above and just below the bar. And this is a nice kind of placebo test. So we observe effort making as well as the threshold targeting behavior on seller effort making. Next, let me walk you through this result on homogenization in the share of certified sellers across markets. You may wonder why do we care about homogenization of ETR's shares across markets on eBay? Well, the reason is that consumers may be confused if they see very different ETR's shares across markets. For example, if they see only 10% of sellers are certified sellers in one market and 80% of sellers are certified in another, they will be confused in terms of what exactly ETR means. So in that case, different stringencies of certification threshold may lead to misinformed consumer decisions, particularly for inexperienced consumers on eBay. So if the ETR's share is relatively stable or similar across markets, then they benefit consumers by providing a more consistent experience. So here we plot the distribution of the share of ETR's sellers across all 300 markets in each month before and after the regime change. So the six lines in code colors represent the distribution of ETR's shares across markets in the month before the regime change, month minus six to minus one. The curves with warm colors represent the distribution of ETR's sellers across markets in months after the regime change. So that's month zero to month five. So apparently, the distributions after the regime change, the warm colors, have a greater, have a smaller variation. They are more concentrated than the distributions before. So we take this as evidence of homogenization of ETR's shares across markets as a result of the policy change. Lastly, let's talk about the impact on sales and market concentration. To the extent that critical markets may be more affected by the removal of noise, we leverage a DID specification at the market month level, where we essentially compare the before-after differences in outcomes between markets of high and low criticalness. So here this is a bit different than the standard DID because we do not have a clear treatment and control group because the policy change is platform-wide. We instead leverage the different treatment intensities between critical and non-critical markets. Notice that here we have two poster dummies, one for the announcement and the other for implementation. So the post-implementation should be considered as the additional effect above and beyond the effect on policy announcement date. So in column one and two, we find no statistically significant differences in logs of sales and logs of quantity sold across markets with different rating criticalness, neither after the policy announcement nor after its implementation. Next, we study how sales and market concentration change. In column three and four, we study the policy effect on the logged number of sellers with any sales and the logged HHI index, where the HHI potentially goes from zero to 10,000. So SMA here suggested that in critical markets, critical markets become more concentrated in sales after the policy implementation. And lastly, we study changes in the market shares of small sellers. Column five shows that out of all sellers with positive sales, the proportion of small sellers becomes smaller in more critical markets as a result. Therefore, the evidence here suggests that there's a market concentration effected due to the certification change, which seems to benefit the largest sellers more than small sellers. Well, this could be good or bad to the platform of our consumers, depending on the cost of providing efforts as defined by the new regime. If large sellers are more efficient at making the effort highlighted in the new regime, then it will be possible that it can compensate or even overturn the adverse effect of reduced the competition. So we'll leave that to future research. So to conclude, we are in this paper, we study a major redesign of eBay certification program that removed consumer reported information from certification requirements. So for sellers, we find that the policy change causes an immediate algorithmic selection effect, which disproportionately helps certain sellers who suffered more from consumer reported information under the old regime. So that is small sellers and the sellers who operate in more critical markets. In addition, we find that the new regime actually motivates sellers to exert effort desired in the new regime, especially those who are close to the threshold. For consumers, we did not cover this in the talk due to time limit. But in the paper, we study the ETRS premium, which you can think of as consumer willingness to pay for otherwise identical products with ETRS batch. And we find that willingness to pay for the batch is positive for both the before and after periods. And in particular, we find that ETRS willingness to pay actually increases, increased instead of decreased after the policy change, which suggests that consumers still value the information contained in the new ETRS regime. In addition, we do not observe a significant decrease in the total GMB as a result of the policy change, which suggests that that the new ETRS regime did not cause a deteriorating certification program with less information for consumers, because otherwise you would expect a decrease in total demand. So for market outcomes, we find a homogenization of the share of certified sellers across markets, which may arguably provide a more consistent experience for consumers on eBay. But at the same time, we also find that markets become more concentrated towards big sellers as a result of the policy change. So that is my talk. And then thank you so much for listening. And I look forward to the discussion and the questions afterward. Awesome. Thank you. So you finished exactly on time. And now Mike will give us discussion. Thank you. So first off, thanks for the chance to read this paper and reflect on it. I thought this was super interesting. So nicely done. There's not a lot of results in here. And you went through a lot today. So what I thought I'd do is just put this in a little bit of context and give a couple of specific thoughts on the paper. So I think it's easy to think about why this is an important paper. So if we think about the internet and what it's done for people is it's helped you to buy things from strangers and transact across markets. And then one of the important reasons that we're able to do that is because platforms have been able to try to build trust. So what does it look like to build trust on the internet? You could use online reviews. That's one piece of it. And that's sort of the starting point here. There are some problems with that. So we can think what the problems are, selection bias, fake reviews, reciprocal reviews, inaccurate information. And in fact, eBay has been a setting where people have documented a lot of these challenges and where a lot of these challenges are pretty extreme. So this is a market where you might think that other types of certification are particularly important. I'll come back to that in a second. So what are the solutions that platforms think about? There are a few things that they think about. You could create better incentives for reviews. You could better aggregate information. Or you could do kind of what the thrust of this paper states, which is basically rely less on outputs or kind of overall experiences and try to tighten up the focus on non-review information or things that are measurable to the platform that might be more objective. And you might ask yourself, why not just always do that? Like when you always want the objective thing that you can measure and just put lots of weight on, but there's a theoretical tradeoff and the tradeoff is basically you can think about this as controllability versus alignment. So like if you think about what's best for customers or best for the platforms, you would want kind of broad measures that reflect overall experiences, but those could be harder to measure, noisier for the reasons we talked about, harder to control on the part of sellers. So that's the aligned broad stuff. Or you could get narrower measures that are more controllable and perhaps more measurable. And then kind of about the margins, this paper saying, okay, maybe in eBay, perhaps elsewhere, you could benefit from shifting toward the more controllable stuff, then easier to measure stuff. And then they kind of present a series of facts around the roll out of the new ETRS. So it helps some small sellers, although it kind of changes concentration at the end to create a good incentive and notably was good for sellers in areas where ratings were low. So I thought the analysis was clean. I thought the exposition was thoughtful. So I think the results are great. So everything I'm going to say here is just sort of tweaks at the margin or opaque questions that I had after reflecting on this. So one thought was eBay was an early example of reviews, but also an early example of broken review systems. And I was trying to think about how important or unimported that is in this context. So it'd be helpful to get a little bit of a sense of how informative the system was in the pre-period and what that means for the analysis and for where you might expect us to generalize to. Along those lines, can you say something more about attributes outside of inputs? Like which ones might be important here? Is this just a world where you could just pretty readily observe both the stuff you care about, in which case you wouldn't expect reviews or other information to be that useful? Kind of specifically trying to think if there are other blind spots at this grade, are you able to kind of measure that at some scale and quantify, oh, here's the unintended consequence. So while you're getting better on X, it's either also getting better on Y, getting worse on Y, or staying about the same, which kind of speaks to some of the theory here. Like kind of you're talking about the one-dimensional version, but there's, in some ways, you can think about this as a multitasking problem, right? Where kind of like the complementarity between the stuff you're observing and everything else is relevant here. And along those lines, I'm thinking about does this also work because these are search-ish type goods rather than more experience-y type goods. So trying to code something up along those lines, if that could be helpful just for interpreting, especially on generalization. Building on that, I'm wondering kind of the characteristics of the markets where this is relevant. So you kind of talk about the markets, but can you featureize the markets a little bit more just to get some sense about like, where is this more useful? Where is this less useful? Just to give a couple more questions I had on here. I was wondering about the homogenization of ETRS shares. If it's easier to buy a keyboard than artwork, then you might want to see different shares. So I wanted to get your thoughts on whether it's necessarily good. And if you could sort of do a little bit more to push on conditions under which you think it's good to have more homogenization. And if you still see more homogenization there, and if there's areas where you don't want that, if you don't see it there. So the last kind of broad comment I had was on how much of this is about kind of the reviews failing versus can you re, I'm wondering if you could re-weight reviews. Is this a little bit of an aggregation problem too? So I'm wondering, for instance, can you predict your new ETRS stuff using stuff without as much weight on them? Like if you just sort of ran a predictive model to see, is it that the information wasn't in the reviews at all? Or is it that it was there? But just the way that things were getting weighted were putting weights on the wrong attributes. Sorry, let me just make sure I cover everything here. Oh yeah, the last thing I wanted to mention is you focus a lot on more and less critical markets. And that's one way to think about kind of the way that a seller might experience it. But I was trying to think if that's the right paradigm or if you want to think about more or less informative. So if there's areas where the existing system was more informative and you get sort of correlated with things like bad experiences afterwards, I'm wondering if you could, you could see does it still benefit a market where this is more informative, but also potentially more critical at some level. So let me just stop here. Again, I thought this is a super interesting problem. It helped me to think about both kind of the trade-off, like I those are a very nice example of like kind of the controllability versus alignment in action in a platform. A good example of where if you know the right things, you could create the right incentives for sellers to work hard and how you can improve a reputation system. So thank you.