 So thank you so much to the organizers for inviting me to be a part of this seminar series. I've really benefited from attending it a lot over the last few months. And so it's great to now be able to share some of my own work. So the paper today is gonna be platform design and innovation incentives, evidence from the product rating system on Apple's App Store. And what I wanna do to start off today is give you the really, really big picture motivation for this project and sort of the perspective I took when beginning to think about this paper. And then we'll zoom in to this really sort of narrow context of the product rating system on the App Store and look at how that sort of affected innovative behavior on the platform. So starting with this sort of big picture idea, digital platforms are becoming an increasingly important part of the economy. Of all audiences I've ever spoken to, I'm sure you all sort of know this the best. And as that has occurred, platform owners have essentially become de facto regulators. That is they're setting rules, they're setting policies for these sizable economies that they essentially own and run. Now, is that concerning? Not necessarily. So as was noted in the relatively recent EU competition report, the fact that platforms are choosing these rules and acting as these sort of regulators isn't a problem per se. We should welcome competition between different business models or platform architectures and encourage innovation. But if that competition that is absent, there may be cause for concern. And that's sort of where this paper is going to fit in a little bit. Now, I think we've seen a lot of concern in recent years among economists and policy makers as well. The latest of this, at least on the US side of things is of course the House anti-trust report that was just recently released. And a lot of the concern I think has been focused on sort of actively anti-competitive behavior. So some sort of leveraging or self-preferencing behavior or other things along this line. But even if we don't have that sort of conscious abuse as they put it in the Furman report that was released last year, markets can still sort of competitive markets can still produce better outcomes for us. I don't think this is sort of a contentious claim. And so we should hope to see sort of more competition across platforms than we might be seeing right now. And so what the goal is for me today is in light of this sort of larger motivation and this broader discussion that's going on these days about competition in platform space, I'm gonna zoom in to a very specific policy on Apple's App Store and provide evidence to you that basically a socially inefficient policy was able to persist for almost a decade in a context where arguably there isn't much competition between say Apple's App Store and the Google Play Store which is really the only sort of sizable competitor at least here in the US. So that's sort of the goal today is essentially to provide some evidence that we could see sort of inefficient platform design or that we are seeing inefficient platform design in this broader context and these broader concerns. So specifically what I'm going to do today is analyze the effect on product innovations or product updating of this one specific aspect of the App Store which is the product rating system. What happened was for years the App Store had this policy where it would reset an app salient rating whenever it was updated. So if you're a developer you push an update to the store your rating is going to reset automatically. As I'll argue in a minute this is going to likely encourage more updating from lower rated products which might be an inefficient allocation of effort and it'll discourage innovation or updating from highly rated products which will also be sort of welfare reducing. So this is going to be I'll argue and I'll provide evidence consistent with this sort of an inefficient policy that persisted as I said for nearly a decade. What eventually happened to this policy is that in 2017 Apple finally changed it. They didn't fully reverse it. I'll talk about the details of how the policy changed in a moment but I'm going to leverage this sort of exogenous policy change exogenous from the perspective of the app developers to provide evidence that developers were first responsive to the incentives created by the initial policy that is higher end developers were less likely to update all outs equal and I'll provide what I'll call suggestive evidence that this policy led to lost as opposed to simply delayed innovation that is the developers are actually not going to produce certain updates that would have been produced in the sort of but for world where this policy wasn't active for those many years. Okay, so that's sort of the goal for today. Let me provide a little bit of context although again, I'm sure you all are very familiar with all of this. So as product rating systems are very important for a lot of these online platforms as Steve Tadellis has written you, it's necessary that both sides of the market feel comfortable trusting each other. They need to have safeguards that alleviate the problems caused by asymmetric information and a product rating system can often fill this role and in the app store, that's exactly what it's going to do. There's evidence importantly that consumers do respond to these rating systems. So that demand is affected by the rating an app receives and that's going to play an important role in how I sort of think about the effect or the potential effects of this policy. More generally, there's increasing empirical evidence that the structure and the design of online platforms can affect intra-platform competitive outcomes. So we've seen that in terms of product quality. For example, on Facebook in the Colosson paper I cite here, we've seen it in terms of product entry as an Indirashov paper. And if we want to think specifically about product updating or innovation, Camino and co-authors and some work I've done previously both show that sort of the structure and the features that are available on app stores can affect how much product updating you see from developers. Okay, so it's clear that the way these systems are structured or these platforms are structured can affect how firms are behaving. And so here I'm going to look for sort of another example of that in particular with respect to the product rating system. Okay, so before we get into too many of the details let me sort of lay out how the rating system works on the app store and what the original policy was that resetting policy I've mentioned and how that policy changed. Okay, so here's an example of a storefront page for the Duolingo app on the app store. Basically on Apple's platform users can rate and review any app they've previously purchased. So all of the ratings here are going to be verified purchases. A rating is going to be a one to five integer scale score that consumers are going to give. They can also write a text review that's optional. I won't be looking at that data in this paper today. Importantly, an app's average rating is going to be displayed across the app store. So it's going to show up pretty much everywhere that the app is listed. So on the app store homepage when you just first open the app store it's there in ranking list, in search results and in the storefront like you see here for Duolingo where they've collected about 206,000 reviews when the screenshot was taken and had a score of about four and a half stars. Okay, so this number is going to be just all over the place where these star icons that are going to be all over the place. The original policy which dates back to essentially the beginning of the app store is that every time you update your app your rating is going to be reset. And the reason for that is the salient rating the one that's being displayed all over the place is only going to calculate your average rating using the reviews for the most recent version of the app. So as soon as there's a new version you're starting all over again. Now to be clear, the historical ratings data is going to be available. You just have to click through several menus and drop down screens and that sort of thing. So that information is available to consumers. It's just going to be sort of very costly in some sense for them to go get it. And so in general, they're not going to actually seek that out. And so the salient rating as I'll refer to it is always going to be just for the most recent version of an app. When you reset or when you update your app and the rating is reset, your rating is going to go from whatever it may have been to this no ratings indication we see here in the screenshot. So if you look at this Enigma 2 game there's one that has recently been updated apparently and so they have no ratings at this moment. So that's how Apple's going to sort of display that particular status. And then you can see, for example, the Yahoo Fantasy football app just below it has 13 ratings. It was likely updated very recently as of this when this photo was taken just because that's a fairly popular product and it had very few ratings at the moment. So that's essentially what this original policy is. It's going to persist until essentially September, 2017. Okay, so what happened in 2017? Well, essentially they decided that this was not a good policy for the developers on the platform. So here's a quote from the executive at Apple in charge of the App Store. And he notes, you know, some developers don't like submitting their updates because it resets the rating. So they'd get upset saying, you know, oh man, I have a choice. I can fix some bugs and blow away my ratings or I can keep my high rating. I have a four seven, I don't want to submit it. And in the words of this executive, he thought that was kind of stupid. Okay, so what they're going to do here is change this policy, but they're not going to totally reverse it. So they're not going to switch to a system where ratings just persist through versions. Instead, what they're going to do is they're going to give developers a choice. So every time you update your app now, you get to click a box that either says, preserve my ratings to date or start over as if we were under the original policy. So that's going to make it unfortunately a little bit difficult for me to understand what's going to be happening at the low end of things for the lower rated apps. But for the higher rated apps we'll be able to speak about whether or not this original policy affected them and how they reacted once the policy changed. So just to be really clear about the timeline, this original policy is going to be in place from when the app store opens in the middle of 2008 until September of 2017. Importantly, the policy change was announced earlier in 2017, so in June at this big developer conference, the company holds, and it wasn't enacted until September. All the results I'm going to show you are just going to use that September date to sort of separate the pre and post period. In the paper I have some robustness exercises where I basically pull out that summer 2017 period to see, because we might imagine that developers would anticipate that the thing is going to be enacted in September and behave differently as a result. Ultimately everything I show you today will be sort of robust that concern. Okay, so that's essentially what the policy is. Let me pause for just a second if there's any questions or comments on that before I move on. Okay, all right. So if there's nothing there, then let me talk now about what we should expect the effect of this policy to be. So I'm not going to do anything super complicated here. I just want to build some intuition for how we should expect this policy to affect developers, both at the high end, the sort of highly rated apps and at the lower end. So let's consider a developer who's going to maximize their profits by choosing whether or not to update their app each period. In my empirical analysis, some period will be a week for us. Let's make two assumptions here sort of based on existing literature. Could I ask a quick question? Sure. So I mean, if I wanted to launch an app and call it Deolingo Turbo that was basically an update, I could do that, no? I mean like really or would Apple stop me from releasing basically an update of the app and giving it a different name? So you want to release like a separate skew, like a separate product, which is just the update. Yeah, you could certainly do that. That would be fine, certainly during the period I'm going to study. Okay, so I always have the option of effectively resetting my ratings through something like that. That's right. You're going to have throughout the history of the App Store the option of resetting your rating by updating in the pre-period or updating in the post-period and choosing to reset. So that option is still available to developers. But yeah, you're right. You could always put out a new product. That's going to be probably not a great strategy for you because it's very difficult to move consumers from one product to another because you don't have a direct relationship. Apple sort of maintains that relationship with the consumer, which is why we tend not to see that. But yeah, you certainly could. Okay, okay. So we're going to have a developer maximizing profits choosing whether to update each period. Let's assume that updates are going to at least on average increase the demand for a product. This is consistent with prior work. In addition, let's assume that demand is going to be increasing in an app's average rating. So let's let Sigma be your sort of optimal updating policy. Given your average rating today, I'll call that R bar and whatever else we want to sort of shove into this thing we're not going to sort of worry about that because really what we're interested in is essentially what's the relationship between the likelihood I update today and my average rating today. And so let's just take Sigma to be the probability of updating in a particular period. Again, we're just sort of building some intuition here. Without loss of generality, let's assume this is a downward sloping relationship that has higher rated apps happened to update less frequently or going to update less frequently than lower rated apps for whatever reason. We can then sort of consider what's going to happen. And this is sort of absent the initial policy in the app store. So this is assuming we have a rating system where we're just sort of cumulatively averaging up all of the ratings that we've received. There's no sort of resetting going on. And that's going to be my baseline the Sigma star relationship. Okay, so now let's introduce Apple's reset policy that existed for those first several years in the app store. So what happens is when I have this policy in place if I as a developer update my app any revenue benefit I was getting from my higher rating and we're assuming that ratings are going to drive some degree of my demand is going to be lost. So if my five star rating brought me a 10% increase in demand, I'm going to lose that the instant I push the new update to the store. Now that doesn't mean I'm never going to update. There's other benefits to updating. For example, updates are going to drive new demand too because the product on average is going to sort of increase in quality. And then certainly that's what I found in some of my own work. But we're going to be sort of all else equal less likely to update in any given moment because we're now going to pay this reputational cost every time we update our app. So at the high end at least we should expect to see a decline in the likelihood of updating. The question then is what should we expect to happen to lower rated apps? And here it's going to depend. What it's going to depend on is how consumers view that no ratings status you get when you haven't received any ratings for the latest version. Okay, so let's let our bar star be the sort of ratings equivalent of no rating, right? On one hand, this could be the lowest possible rating an app can have. So when you reset your ratings and you go from three and a half stars or whatever to no ratings, that little text that shows up, maybe consumers are going to view you as just the worst quality app on the store because there's just no information about this experience good. They, for whatever reason, sort of anticipate that the apps that haven't received ratings aren't going to be any good at all. And if that's the case, then sorry, if that's the case, then what we're going to see is a discouragement effect for all of the apps on the store. Because even if you have a really low rating like one and a half stars, when you reset that's going to drag down your rating and therefore the demand boost you're getting from it. And so there's going to be a reputational cost for you too. Of course, the alternative is that consumers when they see no ratings impute something else. For example, they might sort of assume you're an average rated app. So around three and a half, 3.6 stars. If that's the case, then we should actually see an increased incentive to innovate or rather to update for these lower rated products because by resetting, I can go from my lousy one and a half star rating to sort of to the case where consumers sort of have this belief that I'm a three and a half star app, at least until the ratings start rolling in for me. Okay, so it's going to depend really on how consumers view that no ratings indicator. But ultimately what we're going to see is at the high end for sure we'll see a decreased likelihood of updating. At the low end we may see a decreased likelihood of updating or we may see this increased likelihood of updating. Regardless of what's happening at that low end this is going to all lead to welfare loss, right? Because at the high end we're losing innovation potentially or we're getting less frequent innovation. At the low end either that's also happening or we're getting sort of a misallocation of effort. Really lousy products are exerting a bunch of effort to push new updates to the store and that's going to be costly. Okay, so then the next question I want to ask is is this innovation actually lost? So what I've argued here is that at least at the high end we'll see a decrease in the frequency of updating. But there's sort of two possibilities that could be going on here. On one hand what could be happening is the developers are still innovating just as much as they did before but they're just bundling them into larger bundles and releasing these things less frequently. If that's the case when the policy changes in 2017 we should see an increase in the relative frequency of small compared to large updates. Because essentially once I'm able to keep my rating through these updates I'll return to sort of the optimal updating schedule. On the other hand, the second option is that the developers are just engaging in less innovation. There's less product updating, they're not bundling things, they're just giving up on some of their ideas. This is going to lead to an ambiguous effect theoretically at least on the relative frequency of small to large updates. So that's going to be sort of hard to figure out from the data at least the approach I'm taking here. Yeah, is there a question? So, I mean you could imagine that there's sort of heterogeneous effects across developers. So there's sort of independent information on some better known developers than others and the ratings hit or not is going to be different across different people. And so for example, this might encourage you know better known developers to become even better known to advertise and to make sure that they're on top 10 lists and things like that and to sustain that. So it's not obviously going to get a uniform effect and it'd be interesting to use apart different effects on different kinds of developers. Yeah, certainly. So I'll look, I'll allow for, I'll sort of estimate this, the regression I'm going to show you in a minute separately for the three categories I'll look at and we'll see some evidence of differences there. But I agree that looking at sort of big names or more successful products like Duolingo, which I showed earlier they're probably going to have a smaller impact than the little app that I'm producing as an independent developer. Yeah, yeah, that's a very good point. Okay, so we have these two options. What's nice is option one is going to give us a really clear prediction for what we should see in the data, which is an increase in the relative frequency of small compared to large updates. Note that even under option one there's going to be welfare loss, right? In the sense that even if product innovation is just delayed it's just being bundled and released less frequently. There's sort of a dynamic loss here because the update I should have gotten today under the optimal schedule isn't going to come out until next week. And so the value I would have gotten this week is never going to be recovered. So both of these options are sort of part of me. Sorry, I have a question about that. So doesn't that depend on, because I mean there could be some complementarities between different updates. And so instead of, you know, getting kind of one update today and then there's, you see that there's a problem in getting another update tomorrow. You're kind of creating this bigger bundle of updates that just improves conservancy experience in a better way or something like that. Yeah, yeah, that's a good point. Yeah, so I could see where I'd have to think about the conditions under which that would be the case. I'm sure that there are some. Yeah, so you could certainly imagine that it's putting out two release, two updates together. You know, creates these complementarities. Whether that's going to outweigh the loss of not getting one of the benefits earlier and then getting the complementarity when the second one is released a week later or something like that, that's less clear to me. But that's something I'll have to think about more. Okay, so this is essentially what we, this is essentially the two options we have in terms of what's happening to these innovations. So what I'm gonna do is now go to the data and basically ask two questions. First, does the observed relationship between average rating and the likelihood of updating change when the policy changes? In other words, is there evidence that we go from this blue line to the sort of flatter red line, assuming this sort of downward-sloping relationship? And then second, does the relative frequency of small to large updates change for these higher-rated products when the policy changes? In other words, is there evidence that allows us to sort of support option one or are we able to rule out option one here? Again, importantly, we're not really gonna be able to test for option two directly. But again, if we're able to rule out option one in some sense that that provides suggestive evidence that option two here is what's sort of occurring. We are sort of losing innovation. Okay, so those are the two questions I'll bring to the data then. So unless there are any clarifying questions really quick, I'll tell you what my data is and then we'll see the regression results. Okay, so the data sample I'm gonna use is going to be a bunch of iOS apps, weekly observations from basically the beginning of 2015 to the end of 2019. I'm gonna have data on three product categories so education, productivity and utilities. So education probably fairly should forward. These are things that are teaching you or your kids something. Productivity apps are like note-taking apps or task management apps, things like that. And then utilities is sort of a miscellaneous category. So we can imagine calculators here, password-managing apps, things like that. Just sort of general utilities you might use on your phone. One big challenge that really anyone studying the app store faces is that there are just a ton of products on the store and many of them are not sort of actively competing in the marketplace. So it's extremely cheap for you as a developer to put an app on the store and just let it be on the store for ages and ages. There's basically no cost to keeping something on the shelf. You're paying about $100 a year to be a member of the developer program. And so what that means is two things. First, there's tons of hobbyists on the store. So I could put up an app, you could put up an app and that may not really be competing in the productivity space. But also if I have an app that's competing today and I abandon it tomorrow, I can just leave it up on the app because the worst thing that's gonna happen is I'll make five bucks here or there and many developers do that. So one thing I'm gonna take some effort to do is and I've done this in some of my previous work as well is sort of restrict this sample of all of the apps in these three categories to the ones that are sort of actively competing on the platform. The way I'll do that is I have daily sales and revenue ranking lists for the app store and I'll basically make restrictions based on how highly apps rank rather on these lists. So as long as you're sort of making a showing on the store you're gonna end up in my sample trying to keep it sort of as large as possible here. Okay, two things we're gonna need to do with the data before we can sort of estimate the regression I'll show you in a second. If we wanna think about what happens to the relative frequency of small to large updates we need to classify what are small and what are large updates. I'm gonna do this in two ways. The first way is to use an app updates version number. So developers will frequently sort of signify the content or the importance of an update using their version number. This is probably something you're very familiar with from seeing with software you interact with. And what's sort of traditionally done is you change the first number in a version number say from 7.1.1 to eight when there's a major or large revision to the product and you change some of the subsequent numbers like you see from 7.1.1 to 7.1.2 when there's a more minor or insignificant change sort of a bug fix or something like that. So I'm going to go through and classify every update as either major or minor based on this classification system. Now, this is useful because in general developers do sort of use this nomenclature that said it's not required of developers and developers are increasingly sort of moving away from this type of system. So there's gonna be a lot of noise here that we might worry about so that's why we'll have a second system and I'll talk about it in just a moment. Importantly, one thing that a lot of iOS developers I talked to are now doing is they're just versioning their apps based on like the year and a consecutive integer. So as you can see if I'm numbering my app versions 20, 20.1.2 and so on there's just no information about the content of these updates here. And so that's the primary concern with this sort of approach. Okay, so I'm gonna keep this system we'll have major versus minor updates and so we'll be interested to see in the data whether we see sort of a relative increase in the frequency of minor versus major updates. The second approach I'm going to take is one I've used in some of my other work which is to look at the actual text of the release notes developers right when they submit updates to the app store. So here's an app called drafts. This is a little note-taking app and you can see that in version 5.7.1 they released a quick fix for an issue preventing action completion notifications from appearing. So this is a fairly minor revision. The version number is suggestive of that as well. And so this is gonna be classified under this system as a bug fix update. I'm gonna use a different nomenclature here just so we can separate the two approaches. If on the other hand drafts had if on the other hand drafts that added a bunch of new features I would define that as a feature update. And so we're gonna distinguish between feature and bug fix updates using these text release notes. And I'll use some natural language processing techniques and then a support vector machine in order to basically predict what type of update we're getting based on the text of these release notes. One thing I wanna point out before we get to the regressions is these two updating systems or classification systems aren't sort of perfectly overlapping, right? This isn't redundant to just sort of have these two systems. In fact, you can see that in the version number system where we had sort of major versus minor releases a lot of the major releases a significant proportion of the major releases are just fixing bugs. They're not making significant changes to the app based on the release notes text. Similarly, a lot of the minor releases are actually adding new features or new functionality to the app. So I see these two systems as complimenting each other and essentially picking up two slightly different things. We'll see that the results will differ a little bit between the two systems. And I think there's sort of still work to be done and sort of understanding how the two relate to each other but that way we'll get a sort of a clear sense of what's going on without having to worry about sort of the noise in the version numbers or concerns about developers putting out release notes that aren't particularly informative or something like that. Okay, so let me show you the regression that I'm going to run and then we'll talk about some of the results. So once this data is all clean and put together I'm going to run for the first question which is what happens to the likelihood of updating after the policy changes? I'm going to run a linear probability model which you see here. So I'm going to regress an indicator for whether or not app J in weak T updated on an app's average rating and then importantly an interaction between the average rating and whether or not we're in that post period. So whether it's after September, 2017. I'll control for a number of observable characteristics that I have data on, including price, the number of reviews, the versions age, you might imagine we're less likely to update one week after we just updated. So we'll want to control for how old the version is. I'll control for the size of the app that is how much room it takes up on your phone and then have time and app fixed effects to control for time, sorry, app fixed effects to control for sort of time and variant characteristics of these apps. But beta two is going to be the real coefficient of interest here, which is how does that sort of relationship between ratings and updating change after the policy is changed in September, 2017? Okay. So here's the result of estimating that model in the overall sample here in column one and then in the three subcategories, education, productivity and utilities. First, we see evidence of a negative relationship between average rating and the likelihood of updating in a particular period. So that's consistent with the pictures I was drawing for you before. Yeah, so that's consistent with the pictures I was drawing for you before. But of course the thing we're really interested in is what's happening after the policy changes and there we can see a positive coefficient which shows this evidence that essentially that relationship is indeed flattening after the policy changes. So we find evidence here that the original policy that resetting policy was in fact discouraging innovation or at least making innovation less frequent for sort of higher rated products on the app store. Now, if we look across the three categories we can see some variation here which comes back to somewhat to the earlier question. So some suggest a variation that there are differences across categories. On one hand, this sort of fits with my own personal prior which is that productivity apps which tend to be priced above zero more often and tend to be priced higher on average, conditional being priced above zero are more reliant on the rating system. So they seem more reactive to that policy. On the other hand, there's not really a statistical difference between the three coefficients you're seeing here. So it's not clear that there is really a difference in the productivity app. But to the earlier point, I think it's definitely worth looking into whether more established developers react differently to this policy than less established or sort of newer developers. So that's something I'll definitely have to do going forward. Okay, so overall though, what we see here is evidence that the initial policy, that resetting policy was discouraging innovation. It was inducing that sort of reputational cost we theorized about and developers were in fact, responding to it in that pre-period. Okay, so the second question then is what's happening to the content of these updates. Remember, if the innovation is just being bundled together in larger and larger bundles, what we should expect to see is in the post-period, a relative increase in the likelihood of minor or bug fix updates compared to major or feature updates. The small updates should become relatively more common than the large ones. So first, what we're going to do is run essentially the same regression model that I just showed you, except we're going to reduce the sample to just observations where an app actually updated. Okay, so conditional on updating, what happens to the likelihood of doing a minor update? That's going to be my left-hand side indicator here. First, again, we can see evidence of a negative relationship that is higher rated apps tend to produce fewer, tend to be less likely to release a minor update relative to a major update. And when the policy changes, we don't see evidence that that relationship changes. That is that minor updates become relatively more, sorry, that minor updates become relatively more common compared to major updates. This is going to allow me to reject what I called option one earlier, which is this bundling story. The bundling story says we should find an effect here and we simply don't in the overall sample. So this provides, I think, some suggestive evidence that we are in fact losing innovation as a result of this policy, at least based on this version number classification system. Now that said, there is some heterogeneity across categories. We see evidence of an effect in the productivity category. This gives us sort of a less clear result. So on one hand in the overall sample where we find no effect of this policy change on the relationship between ratings and the likelihood of a minor update, this allows us to sort of reject that bundling story. When we find evidence of a relative increase of minor compared to major updates in the productivity category, this is, we're not gonna be able to sort of draw a clear conclusion from this because this result is consistent with both the bundling story and the lost innovation story. So it's not totally clear what's going on here and it'll require sort of more sophisticated analysis to dig into that, I think. But I think it's important to note that while overall there's evidence that we are in fact losing innovation, there may be some heterogeneity. We're not losing it from everybody. Maybe that differs by category. Maybe the story is the one we've been talking about already which is that some developers are different than others. So something we're to dig into there. Okay, so that's looking at it with the version number classification system. The other approach is this support vector machine system. So this is using the text release notes for all of these updates. Again, we see a negative relationship overall between the likelihood of a bug fix compared to a feature update. But of what's of interest here is the beta two coefficient so that interaction coefficient. And here we do find evidence of a relative increase in bug fix updates compared to feature updates when the policy changes. So here we're unable to reject the bundling story and we're also again unable to reject the lost innovation story. So it's not clear what's going on overall here although importantly this seems to be entirely driven by what's going on in the education category. Okay, if we look, sorry, if we look at the productivity and utilities category there we seem to be able to reject the bundling story in favor of the lost innovation story. Okay, so I think there's still work to be done here in the sense that we have sort of clear heterogeneity across categories but importantly there's some differences in what we're seeing compared to the support vector machine the sort of release notes approach of classifying updates and the version number approach. So that's why I want to be sort of cautious in how we're interpreting these results. I think this provides suggestive evidence that we were in fact losing innovation or product updates as a result of this initial policy that we'll add in place for nearly a decade. That said, it clearly wasn't sort of homogenous that is there's variation going on here whether that's sort of a category specific thing or whether that's sort of a high end versus low end developer or established versus not remains to be seen. But I think it sort of calls for concern overall that this policy did really have an effect on the types of innovations and product features that were being added to products on the platform. Okay, let me pause here in case there is any questions so far. Could I ask a question? Yeah. So I'm wondering about what if it is costly for consumers to actually update? So for me, it is costly to download these updates. I really can't be bothered. So can you say something about how high it would be? Would like, I mean, because this also speaks to the fact that you don't find this increase in frequency of updates. Maybe they don't do it just because they know that it's costly for consumers. Yeah, that's a very good question. So I'll just say two things. One is, yeah, I think that's right that we do need to worry about the extent to which consumers are willing to engage in these updates. So yeah, so if you can't be bothered, that's obviously gonna create an issue for the developers and could be affecting things here in general. And certainly for the majority of the time the App Store has existed, Apple has done its best to to force these updates on you essentially. So if you buy a new iPhone, if you're a new iPhone customer, they're going to essentially opt you into automatic updates that will occur overnight while you're asleep or something like that. I've seen numbers, so this is sort of anecdotal, but I've seen numbers from specific developers that show that the vast majority of their customers are updating whether actively or whether it's sort of passively happening because the iPhone is doing it on its own very quickly after a release is put out. That said, there's certainly people who have that feature turned off and some people who aren't getting those updates are actively resisting those updates. Because certainly I'm sure we've all experienced an update that is sort of horrible and ruins the experience or something like that. So yeah, that's something I'll need to think more about. But I think on average or sort of overall in general that people are sort of getting these updates pretty quickly and at limited cost if not no cost to the customer other than perhaps like bandwidth or cellular cost or something. I mean, because maybe you can also say something about how high it would be. Maybe you can say something about that it would have to be very high in order to rationalize the results that you get or something like this as well. I don't know exactly. Yeah, that's what I'm talking about. Yeah, thanks. Can I also ask another question about the size of the updates? Can you actually measure also like basically let's say the relative size of the previous version and the new version and somehow put an actual bits type of measure to the size of the update? I don't know if that would make sense but I'm just wondering whether it makes sense for me as a developer also to not say truthfully what is in the update necessarily. Maybe I don't want to come every week with saying, oh, I found another bug and want to say something different in my release or something like this. So maybe you can use a third measure which would actually be somehow, in some sense objective, but I don't know if that makes sense. Yeah, that's interesting. I'll have to think about that too. In the current regression I control for the file size or like how big the actual download is which isn't quite what you're saying, I guess. Yeah, because it should be related to the actual size of usual size of the app or the previous version or whatever as well. Yeah, exactly. Yeah, that's good. Yeah, I'll have to do that. I think that's right. I mean, I think that's your point about the developer not wanting to say, look, I found another bug. That's why I don't sort of rely entirely on this approach to classifying updates because on one hand, this developer has clearly identified what they've done. If you go through your phone and look at these release notes, especially in other categories like social media and stuff, they'll just say, we're continuing to make the product better for you. And that's just totally meaningless. And so you never know whether they've reinvented things or if they've just changed the spelling or something. Yeah, but the change, the delta basically, I'll, yeah, that's a good idea. Thank you. Thanks. I missed it, did you tell us why you chose these three categories? So it's unfortunately not a particularly exciting answer. This is where I have the reliable data on the sort of rating or the sales rankings right now. My hope is to expand it to the full app store. I think there's some concern about doing this sort of analysis in certain categories. So games are just fundamentally different than almost every other category social is as well. But yeah, it's really a data decision. That's great. Okay, so let me just, yes. And just a quick note. So given, you know, there's been quite a bit of Q and A during the talk, you don't need to rush. I mean, you can take a little bit longer. So like until 8.45 or 8.50. Okay. Well, I'm actually... Again, I don't mean to rush you. It's, yeah, well, perfect. No, that's fine. We're there. It worked out well. Okay. So let me sort of wrap up then, but thank you. So for nearly a decade, Apple's App Store had this policy of resetting the salient product rating after an update. I've argued that this policy had the potential to reduce the frequency of updates and potentially lead to sort of lost innovation or lost product updating. And I've now provided evidence that shows you the frequency of updates at the high end sort of did come down as a result of that initial policy. Developers did respond to the incentives created by this policy. And I've provided what I'll call suggestive evidence that this policy led to lost innovation as opposed to simply delayed innovation. I think what this speaks to is this idea that platform design decisions, such as the ratings reset policy can in fact affect intra-platform competitive outcomes. We've seen this in other contexts, like including on AppStores, but on Facebook and elsewhere. But I hope this can sort of add just another piece of evidence that in cases where we can sort of argue there's very limited cross-platform competition, we can end up with sort of inefficiently designed platforms and where these sorts of bad and quotes here, decisions can persist sort of absent that competition, right? That is we're getting sort of sub-optimally designed platforms and there's very little incentive for a platform owner to address that if it isn't going to sort of directly benefit their bottom line. And so while there's this sort of growing concern, which is totally warranted, I'm not pretending it isn't, about sort of active anti-competitive behavior and self-preferencing and all of this, I think it's worth keeping in mind that even if we don't have an open and shut case about those sorts of issues, we should still be concerned about the quality of these platforms and the sort of welfare losses that may be occurring because there's no impetus to sort of improve on the platform design. Now of course, Apple did eventually change this platform, right? It took several years, but they did eventually reverse this policy. That's the policy change I was able to sort of take advantage of in the empirical aspect here. So I don't want to say that they're totally non-responsive, but the fact that it was able to persist for so long, despite as the executive I quoted earlier sort of saying, developers just hated this for years, I think it's a sign that there's reason to be concerned about these sorts of things and that other sort of similar and efficient policies may continue to persist on these platforms even today. Okay, so I'll pause there looking forward to the discussion and the Q&A that follows. Thank you. Awesome, thank you. Let's move to the discussion and then we'll come to the Q&A. Jorgos, did you want to take it over? So Ben, maybe I'll share your, actually, no, so yeah. Sorry, Ben, you can keep the slides, I forgot. So Jorgos, there's no slides, right? No, no, no slides. Just some thoughts. So let me start the timer for myself. I'll take five minutes, hopefully not more, though that was a great paper and I have many thoughts. It really inspired me to read it again and again and think about it and thank you, Ben. That's great work and a wonderful presentation. It's in an area where there is little work. So we know a lot about the relationship between ratings and demand, but we know much less about the supply side. And this is one of the papers that says the supply side, the people that produce the apps that are rated on the app store. And I think it makes a very nice point that while we know that quality affects ratings, which affect demand, it might also be the case that the rating system affects quality. And this is something that's missing from prior work. And I really love that. I really thought that's super interesting and I think it could even be made more prominently as a contribution in the introduction of the paper. So this is about the positioning of the paper. Now, as I started thinking about the paper, one thing that, first of all, I totally believe the results, but let me try to stimulate some discussion. I have no doubt the results are true and the story is true. So the first question I have is, do updates always increase quality or even do they typically increase quality? And here I'm thinking about the poor consumers that updated from Windows XP to Windows Vista, for instance. So that was a disaster. So one thing that I'm thinking about is that quality doesn't always improve after an upgrade. And that might affect what we think about welfare loss due to lost updates. Moreover, there is, in my mind, some inconsistency that I cannot resolve. So the paper kind of assumes that updates improve up quality, but developers are hesitant to update because they will use the same data to update because they will use the ratings. But if the future apps, the updated apps are better, they're only one five-star rating away from having a very good rating again, the next week after the update, which makes me think that maybe it's not just the average ratings that matter. Maybe more things matter to developers. What might the developer think about when they're updating their app? So what do you see when you go to the app store and Ben said some of that stuff? First of all, you see two ratings. You see the rating for the current version, but you also see a cumulative rating for all the versions of the app. And it's true that the old version rating is sort of hidden away. It's not as prominent though you can see that. You also see the text of the reviews that consumers wrote and that's not affected by the reset policy. You see the number of reviews and I think this is affected by the reset policy. So developers might be thinking, well, I have a million five-star reviews, I'm very hesitant to update, or they might be thinking, I have only one five-star review, who cares? Maybe I want to update. So my point here is that maybe the mechanism is slightly more complex and the decision of developers to update depends on some interaction between number of reviews, average ratings, some more complicated form. I haven't had enough time to think about that deeply, but I think number of reviews matters there. The other thing that I was thinking about is sort of like your standard endogeneity concerns, though honestly, this is more like I see it as nice to have. I totally believe the results. Like I think really what Ben is saying is happening. So the first thing is that around the time that this reset policy happened, iOS 11 was released. iOS 11 had new APIs, new features. So maybe that incentivized developers to upgrade their apps regardless of the rating to have new features. So maybe just delete a few months before and after. I'm not sure. Control for major iOS updates, the dates are well known. Another thing I wanted to focus on is Ben very cleverly, I think, controls for version age. How many days or weeks it's been since your last update and he clearly pointed out that if I update last week, there is almost zero chance I update this week. I wonder if a linear relationship is enough to capture this sort of like complex thought process of developers of software development. So I wonder if maybe we need some sort of like more flexible functional form to control for the probability of being updated as a function of the time since the last update. Another minor thing that is some censoring in the model at some point, the data is cut off. Apps that did not update by the end of the panel might update a bit later. We have models that can deal with censoring. Maybe again, as a robustness check, think about estimating some sort of like hazard model or something like that with recurring events. Again, perfectly fine with the linear probability model. Very easy to interpret, but maybe in an appendix. Something to think about. I also wondered what do you do in your model with R bar when it's reset? So how do you code no rating? I did not clearly see that in there. What do you set it to? So do you set it to zero? I was expecting to see a dummy for no rating, but I did not see that in the model. I'm out of time. And then the final thing I'm going to say is I was wondering about the dynamics of that effect. So after there is a policy, there might have been some pent up demand or upgrades, a bunch of developers that were delaying their upgrades because they were worried about loss of reputation upgraded and then maybe there was some steady state that's different than the prior, but not exactly the same as exactly after the policy. So maybe, you know, interact the reset policy with some time dummies going forward to see the evolution of that effect. So I'm out of time. These are my thoughts. Again, great paper, totally believe the results. I think something can be done to make this a more sort of like robust paper that reviewers will complain less about. And I look forward, yeah, to seeing this paper published. So thank you, Ben, and thank you for inviting this guest. Cool. Well, thank you. Thanks, Yurgus. We'll open it up for Q and A now. Can I answer the question? Go ahead. So here, another thing that I guess you can consider is how it can impact competition because from one side losing all your reputation up to that point, at least in terms of the rating is not very good for the app developer, but it can also sort of leveling the field for people who want to add new apps into App Store. Now, if you have like these giants who have thousands and millions of reviews, it's gonna be much harder to enter the App Store at any point in the future. Do you have any kind of idea on the impact on in three? I've not given that enough thought. That's a really good question. I'll have to think more about that. I mean, I think one thing that, yeah, I mean, I think to some extent you can, I could imagine that if I'm the dominant player in a sub market or something and I update and there's this reset that like I think you were sort of suggesting that that could provide an opening for everyone else to sort of jump in and do it at relatively lower costs. That would be really interesting to look at. Yeah, I'll have to look at that. That's really intriguing. I hadn't given that much thought. So my intuition is almost exactly the opposite. The big players can reset and improve quality and do so with relatively little penalty because they're already well known and it's the less well-known guys who become very hesitant to do the updates. You mean the middle quality or something like that? Well, I don't know. I'm thinking that, you know, previously there might be kind of Matthew effects around where the big guys who have the independent reputations, you know, can rely less on these ratings and can do the updates and improve and get better. So that's why I think the kind of heterogeneity would be really interesting to look at because I think that would speak to these questions of what is it doing to the competitive landscape in the app. And so I agree. Like the very top guys are completely unaffected by probably very unaffected by this policy. It can impact their decision of when to update but they would have like people know what's the burden. Like they would know that app even if it changes, resets its reviews every week. Facebook isn't gonna be affected by this. I don't think so. Yeah, yeah, yeah, yeah. Or like GMO or something like, yes, yeah, something that has independent reputation built up. But you also have like this huge middle rank apps, right? And like if I'm adding a new game, it would be very hard to like compete with the middle guys that I think are the bulk of the apps, not the very top guys. It would be very hard to compete with the very top guys anyways, but. Yeah, I think in the entry thing, I'd have to think more about, I think for the apps that are already on the store, if we sort of look at the mid level or the sort of smaller players, or if we're in a market that's relatively more competitive, that there isn't a Google or a Facebook dominating the sub market, then it would be interesting to see whether or not the relatively dominant, although not Google Facebook level firm, basically when they choose to update everyone can then shove in a bunch of updates and the reputational costs are basically smaller because things have opened up there. And that would be able, that effect could exist even in the presence of the sort of heterogeneity where the dominant players aren't really being affected by the policy because they're just doing their own thing, but they're creating these pockets where the smaller players can strategically engage in innovation or updates at a lower cost than they otherwise would have faced. Yeah. I was also thinking about how the ratings are reset but nothing else about the position of the app in the store. So like how top rank they are or anything like that. So in some sense, you could look at maybe, because some of these apps are still as visible to consumers is just that there's coarser information about them, whether other apps are less visible to consumers so they depend more on the ratings or something like that to kind of for consumers to ensure quality about them. And seeing about how those different apps respond to the change could be helpful. Do you observe the top lists, right? Like the... Yeah. So I'm using those top lists to do the sample thing in a very aggregate way, but you're right. I could look at essentially if I'm ranked in the top 100 or something and reset, how is it going to be different than not. The effect would I would imagine depend on the degree to which consumers are finding things through these top ranking lists or other things that are featuring, because if we're just getting, if we're going straight through search or something like that, then maybe it doesn't matter because the rating will be sort of the key thing. But if we're looking at the top 10 apps or something and I'm there with no ratings, maybe the no rating doesn't matter because there's a, as you're saying, a second signal of information essentially. Yeah. Yeah, that's something I definitely have the data to look at for sure. Ben, you have, so there are a couple of questions in the chat and make sure I give a chance to Fiona. Do you want to ask your questions live? Or should I read them? You can just read them. I'm not sure they're that great anyway. All right, so I'll read the second one because it seems to be some discussion on this one. So there could be an interaction between size and quality. How many days does it take to get my top quality rating back? Okay, that's a very good question. It speaks to the heterogeneity point we've been talking about, I think. It really depends. For the larger apps, let's say Duolingo, which we saw the picture of earlier, I don't know the exact number. They're going to collect ratings really quickly. For a lot of the independent developers I talked to, getting ratings is a slow and, I don't want to say costly exercise, but it's a slow exercise. That is ratings arrive sort of stochastically and for the less popular products, that can take a while to sort of collect a decent number of ratings. So to the extent, to the earlier question or to the earlier discussion rather, to the extent that it matters how many ratings I have, you can often wait a while to get above just a few ratings or something like that. So there definitely is a relationship there where the really big dominant players are going to be able to refill their ratings stock very quickly, whereas the smaller firms are going to potentially struggle there. There's lots of, sorry, yeah. Did I just follow up? Do you have any ability to measure the variance of a rating or do you just see the stars? You don't see, well, I guess you see it every week. So you do know a little bit about variance because it seems like the strategy might be different for an app that knows its type and everybody always gives it that rating compared to the ones that are at risk from like an outlier view. Yeah, yes, I could certainly look at the variance. That's a great idea. Yeah, that's something I'll have to look at. I think sort of related to this in addition to looking at the variance, I could also try and piece together like the average or arrival rate or something for ratings for these apps, which would also speak to this idea of some are going to face a long-term penalty in some sense and others, it's going to be a very sort of short-term thing and that's probably going to be correlated with how established the firm is and size. So we keep circling back to, I think this established firm heterogeneity question, which is a great one. So one other question is that there is this also discussion of people misusing their reputation. So when they build very huge reputation and with a lot of positive ratings, now they can share and not put as much effort. So now when you're not resetting, would that make it easier to share in the future and don't care about reputation or there are other, so that can be also have a negative impact. Yeah, no, that makes sense. Yeah, certainly once you're in the post period after September, 2017, you can hang on to the ratings, the old ratings as long as you want. And so that would certainly make it cheaper to sort of engage in shirking. I mean, presumably the ratings are still continuing to arrive, but depending on how large that stock is before you start engaging in this activity, that could buy you some sort of window there. I mean, as Yoris mentioned in the discussion, the text reviews are there too, which just isn't factored into the analysis here. So you could imagine, if I really start not keeping up the app, the text reviews become more fraught or more upset or something. And so that might clearly signal that kind of behavior to consumers. But yeah, certainly it becomes easier to engage in that activity or I would think it would become easier to engage in that activity. And there's no other sort of mechanisms out there other than the extent to which the apps being featured on the store or ranked highly in the sales list things or something like that, that would overcome that concern, I think. All right, so this is a perfect time to stop the recording and we'll just move to the informal, I mean, we just keep the conversation going. And this is where we can ask all the questions that you're not sure are that great. So I'm gonna pause, stop the recording here.