 Okay, great. So again, my name is Robert Haas and I'm here to give a short talk on moving PostgreSQL forward. And we've actually heard quite a bit of this from various speakers over the last couple of days. Simon in particular mentioned it during the panel discussion and several of the other speakers already mentioned it, but I just want to give my own take on it a little bit here. I think most people would agree that PostgreSQL features are usually high quality. They are also often delivered incrementally across multiple releases. So we get a little bit in one release and then in the next release we get a little more and then in the next time, in the next release we get a little bit more. Sometimes they are slow to arrive and I've been thinking a lot about why does that happen and what could we do about it? And thinking about why it happens, I think one of the things that we're always very concerned about is bugs. I feel like I'm missing a slide here. You have to slide this out of order, I think. So we kind of have a philosophy that getting it right is more important than getting it done sooner, but obviously we'd like to have both things. And so when a feature is buggy, there's a variety of bad consequences to that that we are, I think, rightly concerned about. So sort of the mildest thing that happens when a feature isn't very good is that a user might have a frustrating experience using PostgreSQL. A little worse, they might have bad performance. Really bad, they might get wrong answers to their queries or their server might crash. That's pretty bad, but that's not the worst thing. The worst thing is if we actually lose your data. And all of those are possible consequences of bugs for the user. But there's also consequences for the project. Bugs can result in development resources being redirected from new development to stabilization. And obviously in severe cases it could even result in damage to the project's reputation for reliability. Now I do think that we are getting better at this. PostgreSQL 10 in particular has an enormous number of features that are coming. Partitioning, native partitioning is coming. Logical replication is a huge feature. Parallel query is getting better. Every release has some good performance improvements. The ones in 10 are very significant. And we're going to have lots of other things like durable hash indexes and aggregate pushdown and all kinds of stuff that's very exciting. And we've seen a number of really hard problems get cracked over the last couple of releases. But it would still be nice to go even faster. And the reason is pretty simple. Other databases are continuing to innovate at a rapid pace. People are coming up with completely new database systems all the time. And existing systems, even ones which have been around for 10 or 20 or 30 or 40 years, are continuing to do new things. And some of the features that we've recently added, such as declarative partitioning and logical replication and parallel query, have been present in some commercial systems for decades. So of course our goal should be to go as fast as we can to add features that other people have that we don't have that are good features or to come up with completely new features that nobody else has, but we want to have in PostgreSQL because they're good features. But we want to do that without compromising the quality of the project. What keeps us from doing that? I came up with this list of six things. Other people might have their own ideas about what should be on this list. But these are the things that I came up with for things that kind of limit the rate at which we can make progress. So one of them is obviously, if somebody writes a patch and it's not a very good patch, either as judged by the author of the patch or the reviewers or the potential committer, then that patch is not going to be able to move forward at that time. Political opposition can be a factor, particularly from people such as committers who are well established in the project. One thing that limits us is it's hard to do a big project because most patches have one or two primary authors. And there's a limit to how large a patch one person can write and successfully bring off, or even two people. Our development community, as other people have said, is growing, but it's still not all that big. And one of the consequences of that is that there is a shortage of committer bandwidth. And lastly, sometimes the patch author has a good idea and a pretty good patch, and they give up for some reason that we can't always put our hands on exactly what happened there. So before I talk a little bit more about those reasons, I want to just show you a few statistics that I put together. Here's what I did. I went through the commit log for 2016, so it included the end of one release cycle in the beginning of the next. And I made some attempts to eliminate certain kinds of mechanical changes, whitespace only changes, and file renames, and so on, so that I could record the number of new lines of code that were introduced into PostgreSQL in each commit. And I went through the commit log also and manually identified the primary author of each one of those commits. I apologize to those people who are frequent secondary patch authors, but just to keep the statistics simple, this seemed to be the thing to do. And then I pulled the results into a PostgreSQL database and started having fun with window functions. So here are some things that I found. In 2016, there were 141 people who contributed at least one line of new code to PostgreSQL. However, the work was pretty concentrated. 90% of new lines of code were written by 37 people. 66% of the lines of new code were written by 14 people. There were 18 committers who committed at least one patch by a non-committer, but 90% of the new lines of non-committer code were committed by just six committers. And 66% of new lines of non-committer code were committed by just two committers. So now you're all thinking, who are those people? So here is the list of primary patch authors. I heard a few people laugh there. There does seem to be an outlier in this data set. I'm not sure if you can spot it. OK, it's not that subtle. Yeah, so obviously, Tom stands out. He's in the category by himself. So those statistics aren't what they are. The asterisks mark the people who are not committers. So you can see that there is some interleaving of committers and non-committers that aren't quite a number of people who managed to get very significant chunks of code committed despite not being committers themselves. And here are the statistics for committing other people's patches. And you can see that once again, there is a certain concentration among a relatively small number of people. So I think what these tell us is that the development community, although it may have grown, is still pretty small. In theory, anybody can submit a patch, but in practice, a few dozen people do nearly all of the work. Tom Lane is amazing. And of course, one of the consequences of all of this is that if you can't get one of the small number of active committers interested in your patch, getting it committed is hard. And there are some examples of this. I gave a version of this talk a month ago, and this slide had five things on it. But now it only has three things on it, which is great, because we've cut up a little bit. But it is, and one of those things has actually been partly committed now. So people get frustrated when it takes a long time to get their patch committed, which is completely understandable. And it discourages them from making future contributions. So we'd like to fix that. These are just a couple examples of things that have dragged on a little bit longer than would have been good. There are fewer examples of this now, though, and that's a good thing. So you could say that we just need more committers. And I do think that's a part of the problem, but I don't think it's the whole problem. Committing a patch is pretty easy. Just patch minus p1, kit commit minus a, get pushed. You're done. So the hard part is actually reviewing a patch and determining whether the design is good and the patches at the level of quality that we'd like to have. And the tricky part of that is that that means that in order to have more committers, we need more people who are experts at the code. But becoming an expert is hard if you can't even get feedback on your own patches. So how do we fix that? It's a bit of a chicken and egg problem. And I think that it boils down to that one and bred there. All of these issues, in some sense, really trace back to the development community not being terribly big. For example, insufficient patch quality, if there are not that many people who are experts to write the patches, then anybody who's not an expert is going to have trouble getting started. It's hard to assemble a big team to work on a patch if you've only got a small community. So that contributes to patches with small numbers of primary authors. I think in order to fix this, we need to get more people involved in post-crescule development. As much as we've already done to get the development community to where it is today, we need a lot more people to be involved in this. And maybe first and foremost by coming along and testing the patches that other people are already writing, looking at those patches, understanding what they do, and putting in the time and effort to figure out whether a patch seems good to you. There are plenty of patches out there that people in the community who very rarely write a line of code themselves. There's a number of people in the community. Eric Rikers, I don't know how to pronounce this last name, Jeff Janes and some others, who do enormous work in reviewing and testing those patches and figuring out which ones are good and which ones are bad. And we need more people like that. And if the people who do that work also happen to know C or learn C, then we will actually start to get people who are more familiar with the source base, who can join the pool of people who are trying to push this development effort forward. Part of the obligation toward growing this development community also falls on the experienced contributors, me included, who need to write our patch reviews with the goal of helping less experienced contributors to learn so that they gain an experience and encourage them to persist and not browbeat them till they give up, which is something that I'm sure everybody who's a longtime hacker in the PostgreSQL community has done accidentally at least once, including me. Another thing that needs to happen is that companies that care about PostgreSQL need to pay their employees to help with patch review and commit, also patch authorship, but in particular patch review and patch commit. There are just not enough different companies who are contributing to that effort. And there are people who have a lot of money and could make huge contributions to this area. And maybe the community hasn't done a good enough job figuring out exactly how to integrate everybody into this effort. But if you want to see PostgreSQL move forward faster, if you want to see it get new features, if you want to see it get to someplace, either you personally or you on behalf of your company, then we need your help. Another thing we can do is get better at working in teams. I don't think this is easy to do. We're working over email. People are in very different geographic areas and sometimes working for different companies. But I do think that that would enable us to tackle larger projects more quickly. And not that this is a session where we really have the time for an interactive sort of thing, but you tell me. I'm really interested to hear other people's ideas on how we can grow the development community and therefore, I think, accelerate the pace of progress. So just to sum up, what I'm hoping for in the future of PostgreSQL is more developers, more reviewers, more committers, more projects that are done by multiple people, ideally across multiple companies. And then PostgreSQL takes over the world.