 Right. I'm Karim Likani. Also a Berkman faculty affiliate associate, something like that, at the business school at Harvard, and also at the Institute for Quantitative Social Sciences. So I've got a presentation. The paper is already up on the site, so you can use that. You can download that paper if you want. And what I thought is, I'll just make this an interactive presentation. So instead of just me presenting for 30 minutes and then making it Q&A, we just open it up as you have questions and so forth. We can just engage in a dialogue, and we'll take our 45 minutes, our 15 minutes to be through it. This is joint work with Kevin Boudreau, who is a colleague of mine at the London Business School. You'll see lots of Boudreau, Likani Boudreau at all papers. We collaborate. We're like a duo, an academia that sticks together and does a lot of work together. So Kevin and I have been working a lot together over the last six years, trying to explore many of these topics that we'll discuss. And our work is situated at the NASA Tournament Lab, which is part of the Institute for Quantitative Social Sciences. I'll explain a bit more about what we do, but essentially, we take algorithmic computationally intensive challenges for NASA, and we get them solved through crowdsourcing. At the same time, I'll layer on a field experiment in social sciences as we solve these problems. And so that's what the driving Mondus operandi is for us in terms of our research approach and so on. This paper has a long genesis, but it really tries to solve a puzzle that we've had for a long time. And so you'll see us try to make a headway about open versus closed innovation, disclosure, non-disclosure, and these topics, because it's fundamental to how we organize innovation. And in interesting ways, it has been fundamental to how my research trajectory has also progressed. So I've been thinking about crowds and how crowds can be utilized for innovation for about 17 years when I first encountered open-source software communities while I was working at General Electric a long time ago. And that has been a long journey to think about how crowds have been used in the economy. And what we see out there are two fundamental ways to set up crowds. One is through competitions and contests. So essentially, you set up an institution where lots of people are competing to solve the same problem. They're working in secrecy. And at the end, you award a prize. And here, I've spent a bunch of time trying to understand who wins and why, how we think about how big a contest you want, how people have an essential taste for competition, and that impacts performance motives in contests why people participate, and also a range of questions about behavior in contests. At the same time, we've done a bunch of research also on communities, so the open-source software world. And there, I've done a bunch of work looking at, again, motives to participate, the costs and benefits of participation, how self-selection matters in these communities, and how you can actually find partners to collaborate with both online and face-to-face. And in many ways, these two worlds have been separated for me in the sense of that a whole bunch of studies diving deep into the organization of contests and a whole bunch of studies looking at collaboration. And this paper actually is going to try to do both. Look after both these things and see what differences are and what the similarities are. You know, it was interesting. When I started my research career, I looked a lot in communities and collaboration. And of course, open-source is the only best way to organize software development. But then I was very surprised to see platforms create competitions in contests. And it didn't make sense to me that why this was working so well and also why this was also working so well. And so a lot of my work now is trying to really get the boundary conditions between these two institutions. And this paper is headed in that direction. As I discussed, we work a lot with NASA. The platform we use quite a bit for our software contest is TopCoder. About 600,000 members worldwide, they have run over 15,000 contests, creating a whole bunch of software for a whole bunch of organizations. And then most recently I've been working also with colleagues at the medical school where the question of collaboration actually is very important for them. And we've done a range of field experiments with them to see how scientists and physicians can collaborate together. Because in many ways the institution for collaboration is very much like open-source in the sense that people self-select their partners. People choose their projects, choose their partners, instead of in a company where it's very much top-down. So this setting has also helped us to also think a lot about collaboration as well. Just to give you a flavor of what kind of work we do, I'm just gonna show you a quick video of a challenge we ran last, actually about a year ago for NASA. So you'll get a sense of the kinds of contests we run and then the types of research we do after that as well. Energy, in one form or another, powers everything on Earth. And the man-made things floating above it too. This is the International Space Station. You've probably heard of it. It's powered by the sun and the sun's energy is captured by the station's solar panels. Ensuring the space station harvests the most energy possible is a complicated task. Why? Well, for one reason, see those large solar panels? Holding them to the station are very long, thin arms called longerons. Any time an odd number of longerons are in full sunlight, with others in the shadows cast by the rest of the space station, they bend and eventually break. For this reason, ISS operators are careful to position the station to limit shadowing and so only an even number of longerons are shadowed at one time. However, this conservative positioning reduces the power the station can collect from the sun, thus causing inefficiency. NASA wants more power for the ISS, more power means more science and cool style that NASA can do on orbit. NASA needs a sophisticated algorithm and they think you just might be the key to this whole equation. Introducing the NASA Tournament Lab's International Space Station Longeron Challenge. Your solutions just may help power the International Space Station and allow more science from more scientists around the world. Consider this your invitation to blast off with NASA. For more information, visit topcoder.com slash ISS, if you've got the right stuff. So this was a two-week long contest, 30K prize money all together. You know, we had pretty broad engagement. So 2,000 people signed up. We had 459 people submit programs, right? So they for us to evaluate. And then, you know, the winners, Italian, Chinese, Chinese, Polish, Chinese, Chinese, Eastern European, Canadian, Chinese, Eastern European as well. Forget the countries. So, you know, broad engagement and also actually pretty high performance. So what I'm showing you right now is the final output from the algorithms for solar power for the space station. And this is sort of the distribution of the submissions and what we want to care about is the counterfactual. Like, how good is the internal NASA solution that we compare this to? And it's better, right? Depending on how you measure it, we were basically able to surpass or come close to what NASA has internally, which they developed through their own private contractors. And what's interesting also is that it's not just one or two solutions. It's like lots of people actually met this. And this was, you know, lots of years, lots of investments in creating the solution and we were able to replicate and exceed what NASA has within two weeks. So this basically sort of, for us, you know, as social scientists, sort of thinking about how these systems work, we want to actually always have a counterfactual. So we can show you all this great performance, but then you'll have a counterfactual to sort of say, how well does it work against what already exists? And, you know, this is part of the work that we're doing in terms of amassing lots of examples in a whole range of circumstances where we can see these systems outperform what happens within traditional settings. So what is this paper about? So the paper that I'm trying to walk you through is that when we think about innovation and institutions for innovation, there are two major goals that any institution for innovation has. One is creating incentives to exert effort. And the second is to disclose knowledge. So once you find these great solutions, you want them to be able to disclose that knowledge to other people, so they may build on top of this. And it's the timing and form of the disclosure we argue is what differentiates systems. So if you think about open science versus open source and open science and scientific process, we disclose our knowledge after we get rights to publish and our name attached to something. Versus in open source, there's a lot of intermediate back and forth going on between the various programmers that, you know, as the solution is being developed. So this timing is actually what makes the difference between open science and open source. And the argument we make is that intermediate disclosures as in what we see in open source, the advent of incentives and participation yet leads to our performance. And we're trying to understand that result as we walk through the analysis. But this is what we find in our analysis. And there's more exploration and more experimentation in regime of no disclosure. And what we see is that the convergence of technical paths on really high performing solutions is what drives the ability of intermediate disclosures regime to do better. And this conditional, of course, on having the appropriate knowledge talks. So this is the overall plan of what I'm gonna try to get to in our discussions and provide you with evidence from a contest and experiment that we ran to get at this. But this is the main point. And so, you know, I think this room and I think the view often always is that disclosure is always good and open disclosure and immediate disclosure is always good. And we've come in and said, yes, but it's conditional on a bunch of things. And that's what we'll try to get underneath it. Okay, so, so let me just give you the highlight in terms of the dual objectives that we sort of find in innovation systems. So you first wanna offer incentives of some kind or the other for people to exert effort. And we have various different ways in which the different systems that exist in society offer these incentives. So patents allow for minority rights for firms or for individuals. Academia, it's all about funding, right? You get funding, you get promotion, you get awards, you get honor. There's an incentive system behind academia that's driving participation. In open source, we see use benefits. People who need the code themselves, reputation, intrinsic motives, all of these incentives are applied. And in terms of disclosure, what we see is that patents in fact are a disclosure mechanism. They make private knowledge public. The trade-off society makes is that in order for you to get the monopoly, you must disclose the knowledge that you have. Academia, of course, we all are under publisher parachutes. And open source is actually enshrined in the process. Both from a legal perspective of GPL, but even just how things work, disclosure is part and parcel of the process that we sort of see in these things. And I think it's important for us to know that that's the basic makeup of I think most innovation systems. Incentives and disclosure with a view towards reuse. And what we argue is that at least in the literature that I know about in management, there's a false debate between open and closed. Like people say, oh, this is open innovation, this is closed innovation, evil, good, blah, blah. What we're saying in the paper is that in fact, it's all about disclosure. When the disclosure hammer comes down and who decides what is disclosed and when. And that's the differentiating components between this debates of open which is closed. There's a lot of other baggage that we can attach to it, but fundamentally the differentiator is when disclosure happens and then who has the rights to use those disclosures after the fact. In a whole range of institutions, disclosure is what happens, right? So patents, you have a complete invention in academia from the published paper, in a prize contest you actually have a technology at the end involved. These are all attempts at disclosure. At the same time, what we see is that intermediate disclosure involves a range of outputs including complete outputs but a range of intermediate outputs as well. And certainly the Human Genome Project is a great example of a system which actually enshrined intermediate disclosure as part and parcel of the academic system. First time it was ever done has never been replicated again in academic sciences and we'll talk a bit about that near the end. But the Human Genome Project basically said that as soon as you made the discoveries within 24 hours, you're gonna release them to the open web and to everybody else in the academic world for reuse. Open source, right? Both from code fragments, advice and tips to having completed code. All of those elements are intermediately disclosed within the system. And there's now in academia also biological repositories in which you can share organisms and mice and that kind of stuff to get a certain set of of objects, scientific objects that can be used for further research. So both of these institutions exist. I thought has very strict standards for whether a paper at a conference counts as disclosure. Yeah. It strikes me that there's some intermediate room here. Absolutely, yeah. So I think if you sort of think about it, what we argue is that there's a completion element to it. So you complete the invention or you have a published paper, right? Before you allow anybody in the world to participate in using it. Versus here, there is this incompletion. It's not useful yet. There's a lots of stuff in between that could be used, that could be put to use for future developments. But we're still disclosing that throughout the invention process. Does that kind of make sense? Yeah, although I'm still wondering about like the math institutions at a conference who work on the question together. Yeah, those are newer institutions that have come together, right? So that would be put under the intermediate disclosure bucket. If you're at a conference and you present a paper, right? And that paper, your name's attached to it, then that would be put in the bucket of final disclosure instead of an immediate disclosure. Well, I think this may be what you're getting at, but in the path system, disclosure happens in different contexts. Yes. So I thought you were talking about disqualifying disclosure that might, you know, be outside the grace period. In Europe, there is no grace period. Yes. So that serves one purpose. Then there's the 18-month publication, which is kind of a preliminary disclosure. Yes. And then there's the final issuance of the patent. Yes. And of course, one strategy that patent applicants use is not to disclose as much as would make it practical to follow on innovators, just disclose what's legally sufficient. Yes, yes. So there's all these games that people play in the patent system. I think we're going to look at a more abstract level of completed invention versus intermediate work towards an invention at the end and the sharing of knowledge in between those two. So what's interesting is that intermediate disclosure is actually historically important. So a bunch of work by Allen and Meyer is now showing, and the Wallari has been showing that a range of industrial revolution inventions were actually done in communities where people were constantly sharing knowledge about how to improve upon technologies that were essential to it. So Allen sort of talked about this from the height of a blast furnace reach and why he has shown this in terms of the development of the aeroplane. And his work is actually pretty seminal because Meyer is showing that there was a very much an open source type of community that existed that shared knowledge about how to get flight and sustained flight. The Wright brothers were part of this community and then they achieved flight and they basically went right to patents, patent that heck out of it. And then the community basically moved to Europe. And so if you sort of think about the terms that we use now from modern aircraft, fuselage, early on and so forth, those are all French terms because it was basically in France that the community reestablished itself in exchange knowledge. And the US government had to intervene and basically stop the Wright brothers from exerting too much control because they were basically preventing innovations from coming through. And so we see that intermediate disclosure as both a modern perspective from open source but also from historical perspective as well. What we know from sort of decades of research on incentive systems is that there is an innovation is that there's a trade-off between incentives and reuse. And the issue is sort of how do you balance ex-ante incentives? How do I encourage David to keep investing in innovation while also making sure that his inventions are widely available for others to reuse? The basic economic argument is that greater emphasis on disclosure requirements will lower incentives for an inventor because you can't reap the fruits of your invention by yourself. Other people could potentially come in and grab it. So that's what economists of innovation have spent a lot of time thinking through and there's great work by Scotchmore and her colleagues Jim Besson and so on that sort of tries to untangle this conundrum. And Scotchmore's book, Incentives for Innovation is actually a landmark book because it walks through all the different economic arguments for various types of incentives that we need for innovation and the various institutions that exist and the trade-offs between those institutions. And what we see is that in order to get overcome this incentives versus reuse trade-off, society and our innovation systems have created compensating mechanisms to allow for some degree of incentives even in a world of disclosure. So in academia, we disclose our final products but we are playing a game of priority in citations, right? And that's what drives us to keep letting people know about our discoveries. In open source is about signing an authorship, right? Your name is attached to that piece of code. That's why you may actually participate. And so these systems exist to basically to allow for this trade-off to exist and to then create some mechanisms, compensating mechanisms so that we can have workable innovation systems in our society. And we may also attract individuals like David who may have non-traditional motivations and a sharing ethos, right? So there's heterogeneity in the population, right? The puzzle for me when I started looking at open source is contest was that they were both equally good but the people there were very different. There's a set of people that love sharing and a set of people were like, I want to be number one, I don't care about sharing, I just want the rewards. And they could be equally skilled and that's what we'll get at in terms of how the society itself is fabricated in these different preferences for these regimes. And of course, and we know, right? Incentives matter, right? So this is Bill Gates, more than 40 years ago, sort of arguing for incentives, right? So he says, will quality software be written for the hacker market? Most of these cheaters offer what hacker can put three-man years into programming, finding all bugs, documents, product, and distribute for free? Most likely the thing to do is theft. And then nothing would please me more than being able to hire 10 programmers and they'll use the market with good software, right? This is what established Microsoft and billions of dollars of revenue and profits were accrued by having this story about incentives in place, right? On the other hand, we have Linus Torvalds, right? Who says, hey, do you pine for the nice days of Linux 1.1 when men were men who wrote their own device drivers? Are you without a nice project to decide to cut your teeth on NOS? You can try the model for your needs. I'm doing a free operating system, won't be big and professional. I've enjoyed, this is great, right? This is like early days of the internet. I've enjoyed doing it. Somebody might enjoy looking at it and even modifying it for their own needs, dropping a line if you're going to let me use your code, right? And arguably, this has had as much of an impact if not more than what Microsoft has done in the world as well, right? But two very different views about what motivates innovators and what motivates our innovation systems, okay? And so that's, this is like, so from an academic point of view, it's kind of like very abstract and very interesting, but it's very practical as well. These debates happen in our society today and we make rules and choices based on these debates as well. One thing that you should know and agree upon is that the more open something is, the more access there is, you'll see increased reuse. So there's a bunch of papers already out there that show that in fact, as you open up the dial, as you turn the dial towards more reuse, more sharing, you'll get more reuse and more sharing to be expected. And so of course, the work of Eric Juanepo, my advisor at MIT and colleagues, have shown that user innovation, the phenomenon that's so important in the economy is based on the fact that people can easily reuse and build on the work of others. We've seen platforms emerge as the ways to reuse of common components, right? And the platform identification of the world is based on the fact that we can, the platform owner can create components that can be rapidly reused for new dimensions. And academic science is again based on this thing that future citation rates are a function of reuse rights. Some great work by Fiona Murray and Scott Stern and other colleagues have shown that in fact, as you turn the dial towards more reuse rights, more reuse in fact. So this is the traditional argument, right? There's a debate between incentives and reuse and that's what matters. But there's a separate view about innovation which sort of almost brackets the innovation incentives argument and says, let's solve a process. In the end, innovation is about somebody sitting in front of a computer or a machine tool and trying to invent something. And that is a problem solving process which requires search. So if you cast innovation as a search process, then different arguments can be made. So if the first argument in the literature is about levels of effort and knowledge reuse, search can be thought through as you're just basically, you're basically trying to find a solution to an uncertain search space. You don't know what high value solutions are gonna be, so you need to be searching. And what innovators try to do is they try to use novel combinations of existing knowledge to come up with their valuable solutions. And if you take this search paradigm to heart, what matters is as much as there is incentives and reuse, you also care about the direction and paths of the teams of folks that are searching out their looking for solutions. So this points to, let's also focus in on how the artifacts are being created, the mechanisms by which innovators are actually engaging in trying to find their solutions. And there's again a ton of literature that takes this perspective that actually innovation happens on trajectories and paths, paths get set up and people fall down these paths and these paths are consequential. And so when you take this view into account, new sets of hypotheses can be raised about, again, how do we think about disclosure in light of search. So what we have seen in the literature is that there's some very formal mathematical conceptualizations of search. Simon has shown this quite a bit. And what we know is that once you know what it says that multiple independent search trajectories are important for innovation outcomes. So you actually want lots of people searching independently. You don't want just everybody converging on the one solution. And the broader the search trajectory, the better off you are in terms of innovation outcomes. But there's a tendency for the community of searchers to converge, to pick the early good paths and go in that direction. So we have attention. We have attention between broad searches being important for innovation outcomes but a tendency for convergence amongst innovators. And this again, you can now start thinking about in terms of disclosure. If you allow for disclosure or no disclosure you may have these approaches or these approaches coming at you. And the concern that the literature has had has been, do you get trapped in a local optima? Do you converge quickly on a local optima and stay there versus broad search, multiple search trajectories coming through and you finding the best outcome from that? So I'm gonna skip to these predictions because they're like, they're garb in academic language. So we'll go through that and we'll just get right into the what we did to untangle this kind of stuff. So what we did is we said, let's just implement a field experiment where we take a real problem and try to solve it and see if in fact we can recreate these conditions. You know, it's very hard for us to in fact compare across open source and open science. And so what we did is we used the top quarter platform and we offered cash prizes and we created a contest to enable us to bring in these as policy treatments that people were subject to in trying to kind of solve this problem. And what we did is we worked with our medical school colleagues to actually create a solution to a genomics problem, a competition biology problem. And our ability to use this platform over this problem allowed us to get very fine grain details about the performance of the solutions, the technical paths used, you know, individual characteristics of the solvers and so on to get really good at identification of what's really going on within these. So the problem, I'll skip through this, is basically around immunogenomics and it's a core functional problem in genomics of sequence alignment. And the challenge was anti-100,000 sequences on a laptop. And here's what we did. We announced this contest on the platform. We had 700 people sign up to solve the problem. And what we did is we then matched them on skills and then we randomize them into three different treatments. The first treatment was one of, I should walk you through the final disclosure treatment. So what we did is we said in the final disclosure treatment, there's two weeks, $1,000 prizes at the end of each week, right? And you're just gonna be working in a contest format, right? You're just competing and it's a prize at the end of first week and it's a prize at the end of second week and we'll reward the top five. In the individual disclosure, we basically established a Wiki-like setting, right? So as soon as you submitted your code for analysis, that code and scoring, that code was available to everybody else. They could take it and reuse it. And that was the setup in the individual disclosure setup, one week the same way, the second week the same way. And then we also created a mixed regime where the first week you were working in darkness, so you were competing in a contest, and the second week you were actually set up to go after in a Wiki-like setting where all the knowledge was available for reuse by the people. Yes, and people were isolated. Okay, did you worry? Yeah, we worked pretty hard to isolate these folks completely, yeah. I would think you were giving a prize for the bill of a competition. Is this the kind of work where half the work could be evaluated and half the other half being there? Yeah, so basically they were solving a problem and then we were scoring it against a test suite. So as soon as you submitted the code, we could look at how good they were towards that solution. So there's an objective automated testing suite that tells you how good you are. So after the first week, they may only be testing test pairs and everybody else, they win. Exactly, they got a score. Exactly. We're working on it until they get close to 100. Yeah, exactly. So they have a test suite available? They have a sample test suite locally, but then we have reserved a whole bunch of the test kits in the back end. Did they know their relative rank compared to other content? Yes, yes. So the top coder in fact, and I'll show you the details. So they would know they're both, they're skill rating themselves as a competitor of everybody else, but also as soon as they submitted their code for testing, and you could submit it any time during the week and as you'll see, there's multiple submissions. As soon as you submit, you get assessed and given a score, then you can see how well you're doing against somebody else as well. Okay. You can keep submitting over and over again. Yes. Yeah, absolutely, yeah. And we exploit that. So this is the price distribution. So if you think about our operationalization, we have a disclosure regime where the mechanism for disclosing is a click on the solution catalog. You can actually look at the rankings and ratings of people and say, who's code do I want to look at? And we're tracking who's doing that. And then we also had an incentive for disclosing. So you submit your score for testing and then half the money that's awarded goes to citation. So people, if they use other people's code, I have to cite them. So we fill it in a bit of that system and then you could get money for if your code was being cited highly. The non-disclosure regime, you cannot observe the solutions, all communications are barred. And what we held constant is the problem, the assignments, interface, the market signals, the total first and the top five firsts. So if you think about what we're doing, if you think about it from running a medical trial, we have a group of people that were giving a treatment of disclosure and a group of people that have a non-disclosure treatment. And then we see how that works. And some that are mixed. Okay, so again, like what I showed you in the NASA problem, this actually results in pretty incredible performance overall. Okay, so again, we have the 733 signups, 120 coders submitted, 654 submissions. Not only do we get lots of people submitting, we also get a diversity of approaches. So we have 89 different approaches to solve the same problem. And I'll walk you through how we determine that. Winners from Russia, France, Egypt, Belgium, and the US. And you can annotate a quarter of the in-sequences on David's laptop, basically, with this code in an hour. This is what was, again, shocking to me and what shows us why, as a general thing, we want to be thinking more about using crowds with our innovation efforts. So this shows you the final submissions broken into two components. One is speed, how fast can you solve the problem? And this is in log time, okay? And secondly, how accurate are you in being able to solve this problem? And point eight is its radical maximum for the data set we've had. The green dots represent the final submissions from our guys. The red dot represents the NIH bill standard solution megabast, okay? Lots of investments over lots of years. The yellow dot represents one of our colleagues code. He was a physician and a researcher at the medical school. And the effort spent by his lab to create the solution. And then this is our, right? The top left is where you want to be. So we're basically, again, able to show some incredible performance gains by going to the crowd regardless of the institution we're actually using and we'll be convolved in a second. But this in itself is kind of shocking. I think what it's telling me is that across our economy, there's a whole bunch of lots of low performing solutions that exist that can actually be improved upon radically. And in the world of big data and lots of data and NSA and all that kind of stuff, right? We probably need to be able to speed up many of these algorithms. So this works, this works, you know, both from a practical point of view and also from an academic point of view. So who signed up? So we had, again, 700 people sign up from 69 different countries. Top Coder has a skills rating, I'll show you in a second, which allows us to sort of show the distribution of skills that people had. 44% of them were professionals, 56% students, you know, the undergrads or graduate students, and mostly a young person game in terms of who's showing an offer. What's great about Top Coder is that it allows us to control for skill. So one of the concerns you should have about anything where you have a crowd going is that it could just simply get skill affected. The highest skill people show up and then that's what's driving the results. And Top Coder has an objective rating of skills based on your past performance against everybody else. And so that allows us to put that in the regressions as a control. Ma'am, you had a question. Graphics that you just showed in the previous, do you have any information about women or men participating? It's mostly guys. It's like 98%, 99% guys. We just don't see that many women on the computational side of the platform. And it's consistent with some of economic research showing that, and some of my own research, that women choose not to participate in a contest type setting, per se. There's many reasons for it, but this is all most of us. You answered my question. Okay. I knew the answer ahead of time. Good, good. Of course you care about why do people participate in these contests, right? What motivates people to participate? And what we find, and this is actually across, again, both communities, both communities and contests, we find extrinsic motives matter. So cash, job market signals, community prestige, those things matter to people. We also find intrinsic, the fun, enjoyment, learning, autonomy, those elements matter. And we also find that pro-social, community belonging and identity also drive participation. What's clear from open source settings is that these motives are heterogeneous and that different people, for different people, different things light up. And now in our work on top coder, we're seeing the same thing, that in contests as well, that people have very heterogeneous motivations. Importantly, because we have the skill rating in some future work, we're gonna show that in fact there's no correlation between skills and motives. That in fact, they're orthogonal. So you could be highly skilled and care about this or you could be highly skilled and care about that. Which has deep implications for, again, how we organize our innovation systems. And remember that in contests, most people are losing, right? So there has to be other things driving them inside of this business mind. Okay, so let's get to what's going on. So what do we see? The first big message is that there's lower participation in terms of disclosure. Fewer people participate when they're given the treatment that all your code is gonna be available to everybody else. So this is the traditional economic argument. When incentives are lower, fewer people are gonna participate. So both in terms of the number of active participants at any given point in time, but also the cumulative number of entrants across the contests as well. All right. And what you also see is that the hours worked also by skill appears to be lower in both for the intermediate regime as compared to the other regime. There's some convergence here, but when you account for it all, these are statistically different significance differences across the board. So fewer people enter, those that enter an intermediate regime also work fewer hours. Okay. And we can basically see that in our results, both from an hours work perspective, submissions perspective, and participation rate. So given the number of the pool, people that are available, how many will participate, we sort of see that these numbers are lower in the intermediate disclosure regime. Okay. Again, this was contrary to my expectation. My expectation was, again, if it's open, you can get some messages code and then reuse it, right? That's how awesome is that. But the incentive argument comes in and kicks in in a big way. And this is what we see. I think we saw claims, legal claims for the code when you submit. Both had at the end, that everything was gonna be open sourced and made available to Harvard. Yeah, it was the same. So for the intermediate regime, I mean, is there an incentive to just wait till the last minute and benefit, look at what everyone else does and then say they can't benefit from it? Only one person did that at the end and he apologized profusely for doing it. He was just testing us, testing our credibility and doing that. So only one person did that. Because again, this is the community setting where everybody knows everybody. So if Jocko's gonna come in and do that, like, I mean, he got a lot of gruff from everybody else saying, hey, this is not the intention, right? But he showed that this could be done. Did it work for him? Yeah, he got the top prize in the second week. Because that was the rule. That was the rule. We suck at rules. But then he got shamed in the forums. So I would say he got like 500 bucks, but lost the shame. Even nature picks in. Now, what we see though is that there's better performance, right? We see better performance in the regime. So both in terms of, so we're controlling, we're on for a skill level defensive. And what you see is that early days, there is some, not that much difference, but then over time, Indonesia just shoots ahead in the first week. And then in the second week, it's just completely out of the- That's pouring speed or accuracy or both. This is a combination. So the average score is basically when you account for skill and effort is about 1.6 points higher for the intermediate disclosure regime, which basically makes a difference between winning and losing in the end of support. And of course, as you expect, the vast majority of people are peaking, right? So what you see here is that in the mixed regime, nobody's peaking and then boom, everybody participates. And then here between the two weeks, lots of people are peaking out of people's code. But once it's available, you're gonna look at it and do things like that. Peek means I can open up your code and look at what you're doing. Yeah, yeah, yeah, I'm sorry, that's exactly it. Yeah. So of course, the question you have is what explains these results? So what we did is we said, okay, so we have the intermediate versus incentives story, but is there something else going on? So we went through and we went through every single code submission to actually discern solution paths. So we assessed how we're working with a poor Rula who's now doing his postdoc at the School of Public Health. He went through a whole bunch of submissions and said there's 10 canonical approaches being used in all these submissions. We took those technical approaches and then we had three other people, PhDs from Germany and from China, go through every single code submission and tag them for these canonical approaches. And here are the different types of approaches, Hamming, Hash, Levinshine, you know, a whole bunch of different things. There were basically things to increase, you know, the two sides, right? Get faster and then also get more accurate. So that's what these were all at. And what you can think of them as there's solutions or combinations of the techniques, right? So there's 10 different techniques and you get to see which one of those techniques that they implemented in the code fix. So I'm glad I didn't do it. I, you know, we hired people to do it because this was grueling to go through every single submission and look at what they've done, understand it and then from there, let's select them. And what we do sign, what we do fund is that combinations are actually consequential, okay? Paths are consequential because we can now discern different paths being used by different competitors. And what we see is that there is a direct correlation between the number of paths used and performance, okay? So as you implement more paths, as you implement more techniques, you know, you get better solutions and there's a difference between the various different paths achieved. And this is our search argument, right? What you can see when you're comparing no disclosures versus disclosures is you see this up and down crazy search patterns used by people when they're working blindly. Which is what you see here in the intermediate disclosure regime is this nice smooth boom where the top were performing the way. And this is what is actually important for us. So there's one contribution for us is to be actually be able to show you these pathways are materially different. And what we discovered is that the regime of intermediate disclosures used 30% fewer novel techniques and novel combinations as compared to the no intermediate disclosure regime. And so we looked at the Harpendahl index of submissions across approaches and we see these metrics coming through. So there's just a much more bunching across a fewer submission paths than what happens in the no intermediate disclosure regime. And what we see is that the final disclosure folks, people without intermediate disclosures are searching broadly, but they're searching on the lower tail of the performance distribution, okay? So they're just out there looking at worse solutions and trying to figure them out versus these guys at the top end do much better. So let me just walk you through the summary of results and then we'll have some more time for conversation. So when we think about incentives versus reuse, we see that an intermediate disclosure fewer participants, they exert less effort, yet there's higher marginal and average performance, vast majority of people will peak and reuse solutions and a search behavior such as more exploration and experimentation in the final disclosure in the closed regime and convergence and coordination over the best paths in an intermediate disclosure, okay? So what this means for us is that we confirm this view that increased reuse comes at the cost of incentives, all right? The overall stock of knowledge being created is lower, right? From pure quantity point of view, right? The quality is higher, but it's lower. And we see this convergence and coordination of behavior in the intermediate disclosure regime. And it relies on this stock being of high quality. And the potential concern that you would have in any kind of a system that implements open disclosure or intermediate disclosure is that are you gonna get stuck on a local optomar? People are gonna basically find themselves in a local optomar and improve upon that instead of searching broadly. And so path dependence we think is a risk, in open systems. Now, if you sort of think about Linux, Linux already had a map from UNIX that it could basically rebuild on. And so there was actually no worries about lower quality paths being discovered because we already had the paths there. And a good way to sort of think about this graphically is to sort of have this view of a solution landscape where you might have a global optomar and once you guys find it, you can go right off it. But if it's in a more complex neighborhood, there could be risk of somebody being stuck up here, somebody being stuck up here in an open setting versus in a much more independent search you may be able to actually uncover the entire event. And so there are comparative advantages of the different systems. In intermediate disclosure, where you have a broad stock of knowledge already, we can reuse it and make it better. And potentially the diversity of participation may overcome local optomar lock-in. And there may be compensating incentives like we discussed and benefits to participate. In final disclosure, it's useful where broad-based search is needed where you want lots of people working independently towards a problem. You may emphasize implicit incentives as well, in terms of fame, reputation. And of course, I'm at Harvard Business School, I must give you a two by two, that's like in a contract with the dean. This is like a quarter, I don't know, this is a missing quadrate. So anyway, so what you can think about is in the economy, we can think about people having a view about incentives versus reuse trade-off. Some institutions emphasize disclosures, some emphasize property rights and exclusion. At the same time, you would think about, do you disclose the final inventions or intermediate inventions? So our industrial innovation system, patents and markets is based on final disclosure, and property rights. Open Academic Science, Public Information Contests are about final invention disclosure with final inventions but emphasis on disclosures and human genome project, open source, open data is all about intermediate disclosures. So we can start to think about various ways in which our systems or innovation fit in and then think about is that the most optimal way given the problem we're trying to solve. And I always made this argument that even firms actually use a lot of open intermediate disclosure type regimes. So we think of Apple as a bastion of closeness and Steve Jobs as the guy who controlled everything but they're one of the biggest participants in open source software communities and your iPhones all have tons of open source code that utilizes that. So firms have figured out ways to play both sides and now went to participate in one regime which is the other regimes. I think we have to catch up as academia to sort of understand the cost and benefits of those. Okay, so questions, thoughts? Thanks. Yes. Just to cut it by, did the intermediate solutions or immediate disclosure, did those get just better average solution or was the best overall solution? Best also came from there as well. Yeah. How much of a difference was there between the best solution and the final disclosure category or the other two? The final, I believe the final one was, I think in the complete rank order it was like six or seven. The top, no surprise, the top ones were all open disclosure because they all had the same base to work from and they made slight tweaks to get one, two, three, four, five. So like the disincentives of participation or trumped by the higher quality? No, but remember there were, there were still disincentives to participate because not that many people participated. So from the pool of people participating there was just fewer people participating but they just happened to lock onto a good solution and they could all then collectively get there. Okay. Very quickly. You divided three groups. You only talked about two of them. How about the one in the middle where there was an interview, it was probably halfway. Yeah, so Mixed, the paper has details about that. We basically find the same effects. So in the close time, there's lots of entry, lots of exploration, lots of effort. Then in the open time, lots of people drop out but then they take the best of what exists and then they rank it up. So Mixed actually, so the rank ordering was full disclosure, then Mixed and then No Disclosure. He doesn't appear to be any advanced to deliberately having that mixed regime. No, I mean, I think, not in this problem, right? So I mean, I think what we need is like a hundred of these problems being run in these two ways so for us to be able to get that. This is just, it's hard for us to run them repeatedly as this is the petroglyph for us to be able to see that. I mentioned earlier that there were two types of people, ones that really responded to competition and others that didn't. How did you control for that? So in this case, we did not. So in our other research, we actually have been able to elicit preferences as to people's tendencies to prefer to work in a cooperative regime, which is the competitive regime. And we see that in other analysis that that's actually consequential. Here, we basically assume that the people, you know, they couldn't select their preference. They were just given that treatment and based on that, you go on for it. But I think that for me explains why we see such heterogeneity between contests and competitions and collaborations and also firms organize themselves as well, right? Some firms are very much team-based and lots of collaboration. Other firms are very much like individual effort matters and that's what we pay you for. And I think we see in the economy, some people gravitate towards the collaborative systems and some people gravitate towards the competitor system. These contest platforms are very specific and they're artificial in the sense that they have artificially constructed incentives of various sources, as well as some natural intrinsic ones. Can you talk a bit about your... So I wouldn't call them artificially constructed. I mean, they offer incentives. Okay. Like they offer a salary, you know, like this. So there are system of incentives that are peculiar to that side so you don't get automatically out of the real world always. How confident are you that the results that you're seeing in those environments apply to things like open source, especially given the very troubling 98% male view? Yeah, so I mean, open source is also close to 98% male. So there isn't that much difference. Can we pretend otherwise? We can't. Wikipedia. Again, same thing. So I mean, I think the question of generalization is actually a very good question. So can we take this semi-natural lab and does it generalize to a whole bunch of other settings? Of course, my answer is yes, because I'm an academic and everything generalizes but the reality is that even this platform is selecting on a very elite set of people that are used to competing and are used to... So one criticism of this work would be, well, look, you have attracted a bunch of people that are competitors and all of a sudden you're making them disclose, like share, they're not used to sharing so that's why people aren't participating as much. That could be a genuine criticism of our work and I accept that. That's just a limitation of what we can do. Although I do think that rank order-based outcomes, devoid of the contest platform are endemic throughout society. So you look at SAT testing, you look at law school testing, you look at league tables for law schools, you look at league tables for business schools, you look at... So CEO-level contests, right? So rank ordering is endemic in our society, it's used in a variety of settings and so in that sense, I think we could argue that there is generalizability here and even if you sort of think about, even in Wikipedia and open source, there is an implicit rank ordering going on. There are status battles going on and there's status-based ranking. In fact, Wikipedia thrives on giving you status, more or less status, right? The whole bureaucracy setup to give you more or less status and creates a rank ordering based on that. I don't know if that answers your question, but a reasonable answer. Building on David's question, I'm curious how you would comment on the relationship between the funding for the study and the effort to generalize or abstract to a system in which the outputs of innovation are not units of code or, you know, I don't know, a sort of tangible output, but if they're education systems or ways that the municipal governments operate or some larger local. Yes, I, by, just to get a little bit of tractability, I've always studied coders because I can look at their objects, I can evaluate them and so on. One good news is that, you know, as Andreessen is saying, software is eating the world, so more and more of the world is becoming like software programmers, which is great. So we can test, you know, it's great for social scientists. I do think that there is, you know, devoid of this ability to look at these fine-grain measures, I think there is sort of generalizability here in terms of the way work is organized and the way you organize innovation. So even in a municipal setting or even in a government setting, people have to search for a solution, right? People have to engage in some kind of a problem-solving effort, even though the outcome may not be attended as a code. And that's where we think our contribution is that if you think from a search process, then these findings are high. Are broadly you're searching, which is an area where you are, what are your inputs that go into your search process in these findings? Yes. Quality and skills of the coders. You seem to think that you could manage to scale them or recognize that they all had similar skills. No, so not similar skills. Or similar abilities. Because couldn't you expect that that was another factor that you had one group that was particularly better skilled than another? I think that's a good question. Let me just show you where's my graphic. So the top code regime is built on creating ratings for all the participants, skill-based rating based on a specific type of a task. So what they, you can't see it that clearly on this projector, but they have an algorithm task and development task. They have architecture tasks and so forth. And as you participate in more and more contests, you get a skill rating compared to other people. And they actually create a whole distribution. Where are you in the distribution of skills for people? So what we did in our analysis is to take that skill rating and put that as a control. That's a control in our regressions. So we account for the skill of the people and then we see if there's a treatment effect from our treatments. Because that was, the other part of my question was related to more of a cultural or scientific or educational basis. Because you identified a large group of Indians, Russians, Americans, et cetera. And we know if we look at the PISA scores that the results of the educational systems and the studies that are done are quite different. But maybe this is a particular group that doesn't follow into that. Yeah, so I mean these are weird people in the sense of half a million people that like to compete, like to have a public disclosure of their skill rating. So they're special. And they like coding. And they like coding, exactly, exactly. So they are special in those ways, for sure. This one, on average, we had about 20 hours. Two weeks or two weeks. Okay. Yeah. Which is about similar to what people spend on open source. I'm interested in the guy who was cheated, basically, which is the last minute. No, he didn't cheat. He followed. He bought it in the spirit. I've seen it in norms. Right. But those, I mean, in real life, he wouldn't have had to apologize. He would have just won. Yeah. So how does that translate to? So, no. So what we have, right, in both, in many, so the work, again, of Erick von Hippel, there's some very nice new paper by Nick Franke at University of Vienna looking at cheating in crowds. All right. So Erick von Hippel's work has shown that there are informal norms-based mechanisms to control cheaters, right? So he looks at French chefs, right? And the fact that you don't see that much rampant recipe copying amongst French chefs. There's no copyright in recipes, right? You can take, you know, Dave's famous souffle and then, you know, and then call it your own, right? But in fact, there isn't that. So there's norms-based rules that communities develop. And Nick's work, actually, what he did was brilliant. He took, I don't know how they got it past the IRB, but they went on a threadless and they would take somebody else's design and then resubmit it, right? Complete copy of resubmit it. And what they found is that the crowd is like, there's expectations of fairness and honesty and participation that comes through, which in fact prevents this from happening rampantly and controls it. Even in business school, yes, even in business schools. So we, in fact, we create these norms about fairness, which for which there's no explicit regulation. So I think what happens in society is that communities form, you have reputations, right? And people will, those reputations and those norms matter and people behave within these norms, yes. So right now in Congress, there's proposed reforms of both the patent system and the copyright system. And especially with copyright, there is a call to having more database approaches on what actually encourages innovation and incentive and so forth. So if Congress called you down and asked you to testify about your research and its implications for the patent and copyright system, how would you respond? I'm glad that I'm calling me. This reminds me of one of my field exam questions when I was doing my doctorate. I mean, of course, you would give the academic answer, it depends. I mean, I think we know, so there's a great artist on YouTube called Kuriman who creates these amazing videos of people playing different instruments and then he just grabs the different instruments and creates compositions from it. I mean, there are beautiful, both lyrics sound, I mean, it's just amazing. So, and imagine trying to get that through. So I think he has like in each song, there might be like 50 or 80 people that he has spiced together. And their works of art, they truly are. And we know that our culture works on sort of creative recombination of existing materials. Like you need a stalk of knowledge to recombine. And I think it's the question of balancing the creativity that can come from these approaches versus strict limitations that dampen the reuse. So I don't actually have a good answer for you. I mean, I think if there's a view, my law friends would say that the Congress has gone too far in protecting incentives and not enough for encouraging creative reuse. So patent terms are too long. Copyright is life plus 70 years, like something crazy like that, which is I think probably too far. So there's probably some way for us to dial that back. Certainly given our culture now of access and reuse and recombination, the tools for recombination have exploded. And so we probably don't want to limit that, but we still want to create incentives for original works. Brian may have some perspective on this, but I certainly, I tend to shy away from very specific policy debates and look at my p-values instead. Yeah, this is a private ordering situation. So you can't extrapolate from this the public policy for default open situations very easily. But it does suggest that at least in this context, there's a lot of importance of humility. Yes, innovation. Yeah, what I would argue though is most interesting is, part of the motivation was that like, why wasn't the Human Genome Project Bermuda Rules broadly adopted in academia? It's a big puzzle, right? Because you had NIH and the Welcome Trust and a whole lot of other public says, if you want funding from us, you will disclose your discoveries right away within 24 hours. Because we had a race against CELERA in many ways. And many people point to the Bermuda Rules as a tipping point towards the public effort succeeding so wildly and so greatly and now we're living in the fruits of that ordering. Based on that, on those regulations, that has not been replicated. That has not been embraced in academia. We call ourselves open science, but we're not really open science. We don't have a lot of sharing, because our incentives aren't set up that way. So if there's a role, I mean, I think patent systems and copyrights are very difficult because the interests are so entrenched. But certainly in academia, we own the purse drinks. We decide the rules and there should be potentially conversations about, there's at least open access debates now on the final portion, but could we think about other ways to go after that? I mean, certainly the, I don't know what the details are for the brain project that Obama is funding, but that could be an interesting place again to remember. And with that sort of, once you know how to do it, isn't that just basically number crunching? No, no, no, no, no, no, no, no, no, no, no, no, no, the discoveries were not just number crunching. There were actual work being done because people would then reserve the discoveries for papers, right? I found this correlation with this disease or this mechanism for this particular phenotype, I can write a paper about that. That's what they basically give up on in terms of keeping that private until the paper got published. Well, this, you know, I'm not an expert in, but it sounds like this goes to the report decision that you know my fragments were not admirable. Much later, much later, right? But I can still publish a paper on it as an academic. Yeah, sure. But you're creating a new kind of value by publishing the paper and making the correlation. That's right, but that's what, prior to that, people would not even publish the correlations. They would wait until the paper was done and then make that available. So I'm actually really curious about your use of the human genome project as an example. It was obviously an unprecedented collaboration across the academic institutions, but that's also kind of precisely what enabled Craig Venture to splinter off, start a private company with most of the work done and then profit. So how would you, yeah, I'm definitely seeing why openness and collaboration is good. How would you prevent people from taking publicly funded innovation and turning it into private profit in that way? Yeah, I mean, I think that's all about contracts, right? It's about the contracts you write and how you actually control for detection. All right, so we have this in Linux right now. We have this in open source, right? Where there's the BSD world, which is like, we want you to share, but if you want to privatize it, you can. And GPL says, no, in fact, we care about you sharing everything and you can sell it, but you still need to make the code available. So, and look what Apple does, right? Your iPhone, our iPhones have a tons of things. And so they've made a bargain saying, we want this code in here and we'll come back to the communities, but we can wrap it up in ways in which creates greater value. And certainly, I mean, I think, so I've always argued about this. I don't want my mom, my mom uses TiVo, right? Has lots of open, I don't want my mom on the Linux kernel meaningless, right? She should not be participating on that. Like she'll just add a lot of noise, all right? She shouldn't be sending Linux and saying, this box isn't working that well, right? So there's value to what TiVo did with using Linux or what Red Hat did with Linux, wrap it up in ways in which it was broadly accessible, but they also contribute back to the public good as well. So I mean, I think that's the, you want a trade-off that says there is a public good contribution even though you can privatize some components and resell that components. But I think it's about the contracts. That should be, I'm afraid, our last question. Great, thanks very much, it was great to you. Thank you. I was just keynote. I'm sorry, I'm sorry. Oh, really? It's the third keynote I've seen today. I've never heard of it. I'm doing well. Yeah. Please give some plugins. Yeah. You are.