 The next few talks will look at sustainability of open-source through a variety of lenses. Sustainability of contributions, environmental sustainability, and economic sustainability. The first talk, focusing on sustainability of open-source contributions, will dive into a model that Drupal is using to track the source of their contributions. By gaining better insight into who is contributing and why, Drupal and other open-source communities can better balance the incentives between contributors and users of open-source software. If you're interested in the dynamics around open-source software and ensuring a successful future for open-source, you'll want to watch this talk. Everyone, welcome to our session at this year's GitLab Commit Virtual Summit. Today, myself and my co-speaker Matthew are happy to speak to you about evidence-based open-source sustainability. So this means we're going to be talking about the challenges of sustaining an open-source project and how you can use some data-gathering methods to make that process easier. So we'll be using the Drupal project as a case study, as that's an open-source project we're both very familiar with, which has recently moved over to using GitLab as our contribution platform. But first, some introductions. Matthew, why don't you go ahead and introduce yourself first. Hi, I am Matthew Tift. I am a lead engineer at Lullabot, and I have been involved with the Drupal project since about 2010, and I am very interested in all things Drupal. Awesome. I'm Tim Lennon. I'm the CTO of the Drupal Association, the 501C Theory Non-Profit that helps foster the Drupal community and the Drupal project. I'm also a Hestanet on Drupal.org and other places, GitLab around the net. And of course, I'm naturally very interested in the sustainability of open-source of the Drupal project in particular. But also in how we can take what we've learned in Drupal and maybe expand it to other projects. So speaking of that effort and what we're going to talk about today, we're going to go through a few different elements of this problem space in order to sort of set the stage for the conversation. And then we are proposing a change to GitLab itself to provide some tools that will allow for this evidence-based sustainability stuff that we're talking about to be used by other projects. I think it'll be a really cool opportunity. So we'll start with these challenges in open-source, kind of explain that problem space, as I said. Then we'll talk about how having a systematic contribution recognition system can help. So the Drupal community will be our case study for that, as that's what we're most familiar with. But we think the principles will apply to many other open-source communities in this space. Then we'll talk about why we want to bring this system to GitLab, how we think that could be done. We'll point to an issue on gitlab.com that's already open that any of you can go chime in on and participate on. And we'll talk about how other open-source projects would benefit from adopting this model. And importantly, how we can standardize on this model so that hopefully this data is comparable from project to project and therefore becomes useful for understanding the health of open-source communities compared one to the other. So we're not all operating in our own silos with our own data and metrics. So to kick that off, the first challenge is understanding who actually builds an open-source project. And as I ask this question, you are probably out there thinking, well, we know the answer to that. That's kind of obvious. I know how to read a Git commit log. And that's true. Existing systems do offer us a number of tools that help us understand our communities and understand sort of the code side of how an open-source project works. So an issue tracker will describe changes, the commit message, if it's written well, we'll hopefully have a good reason for that change. Our version control system will say what code was changed, when it was changed, who it was that did it, give us a whole Git blame log of everything that happened in our project. And if we look at Gitlab, we see commit messages in the UI as well. We see things like this example from the GTK project of a commit message and some basic information. And from that, we can gather a lot of useful data about the GTK project. We know what changes are going on. We can see who's doing them. We can see how many different people are involved over the course of the commit history of the project and get a sense of community health that way. But it doesn't tell us everything. There's some data that's definitely missing. So what is missing? So we know that this person, Marco, has contributed to the GTK project. And we can see a bunch of detail. In fact, if you see in the image here, I've hovered over their name and username. And I get a little bit more info. I get their IRC, Nick. I get that they're working for Canonical, which is kind of helpful where they're located in Italy, but apparently moving around. And of course, I get more detail on the actual change they made. So there's information there, but it brings me to ask some questions. So if I'm viewing this, trying to understand the GTK project holistically, what supports it, where it comes from, I might ask myself, well, did Marco commit this change as part of his work for Canonical? Was it sponsored by Canonical? Did he do it as part of his day job? Or did he do it as a volunteer nights and weekends? And he just happens to also work for Canonical, right? I might also wonder, were there any other people involved in this change? Like, were there non-coders who contributed to this? Was there a project manager for GTK who said, hey, this is a priority issue and we want this one to get done before another one? Or was it a case where Marco is scratching a particular itch with a bug that he found or a feature that he needed, right? Were there people involved in writing documentation for this change? Or were there even other coders, other developers who might have done code review on this change, but didn't wind up in a commit log or an author's tree for the merge request or things like that? Those are all things where I can kind of dig around a little bit and maybe look for references or maybe go back and look at the issue for this commit and see if there's some documentation of that, but it's not really part of the commit log. And then finally, I might wonder if the company Canonical is the primary sustaining organization for GTK. I mean, who knows? I might guess if I see a lot of commits from this person who works at Canonical that, hey, they have a vested interest in supporting the GTK project, but I'd be guessing to say that, hey, they've decided to sponsor that project and really carry it forward in general. So, you know, there's good data in a commit log for sure, but there's also stuff that's definitely missing. And that lack of data is certainly one challenge in understanding how to sustain open source, but challenge two is specifically around the incentives for sustaining that project, both on an individual and on an organizational or even corporate level. So let's break down the incentives behind open source contribution and sustainable contribution in general. So this is borrowed from a blog by the Drupal founder, Dries Bytart. The link is provided there in the slide below and talks about balancing the makers and takers in open source. And first we have to understand what kind of a good open source is, what is it that we're creating? And we can understand any good that's been created on this spectrum of whether it's an excludable or a non-excludable good and whether it's rivalrous or non-rivalrous. So excludable goods can be deliberately restricted to particular groups of people or they can be held privately or their intellectual property, all these sorts of things. And so we already know from the open source realm that we're not gonna be in that column of excludable goods. But are these goods rivalrous or non-rivalrous? Arrivalrous good is a good that it's out there and anyone can get it, but it's not an infinite resource and using someone using this good means that someone else might miss out. Whereas a sort of true public good, anyone can use it. It's kind of unlimited and everyone is supported by it. No one loses just because someone wins by using it, right? And for open source itself, it really is a public good but is that totally the case from a business perspective of someone trying to support a public good? So I look at this quote from that blog post where Drew said, for end users, the open source projects are public goods. The shared resources software, anyone can download and use that software. The whole world benefits because it's available to anyone and it's not like there's limited copies where it's bits. We're copying around digital bits and everybody's building something on top of that and that's awesome. So we can think of open source in that sense as a public good from our previous slide. But for open source companies, the open source projects that they're taking on are common goods because in many cases, the shared resources is a potential customer, a client for whom they are doing a project using this open source software and probably a combination of some proprietary layer and using that as their opportunity to contribute to open source or perhaps not to contribute to open source. So this leads to the classic conundrum of where to invest their resources. So a maker organization with a million dollars to invest might put half of that money towards open source and half towards proprietary IP. Whereas a taker in this context with the same million dollars might only put $50,000 towards open source and 950 towards their proprietary intellectual property giving them this competitive advantage on the proprietary side and yet the ability to take advantage of all the same open source work that the maker invested. And so these incentives, these natural incentives that the businesses get become somewhat perverse. An individual organization is incentivized to be self-interested, to make greater proprietary investment and to stand on the contributions of other organizations without necessarily giving themselves. And this confluence of neither party being really incentivized to contribute leads to the tragedy of the commons. It leads to every individual organization involved in this process, eventually opting not to contribute and this public good or common good becoming lesser as a result. So we're gonna talk about how we can improve these natural incentives and make things better. So from that, what do we mean by evidence-based sustainability and how does it solve this problem of perverse incentives or this problem of data and understanding who makes our open source software? So for that, we need to answer some more questions. And those questions are gonna be who's working on that project? How many contributions are sponsored by organizations? Who's sponsoring them? Are there major corporate users that aren't helping out even though they're major users of software? What types of contribution, whether it's code or non-code get sponsored? How diverse is that community of contributors? How resilient will it be to turn over in the contributor base? And what's that ratio of volunteer to sponsor work and blended work? And all of these things can actually be used by the sort of management of a community that the people involved in fostering that community health to inform better incentives. So we're proposing a feature for GitLab and this is the issue, but we'll mention this again at the end of the slides. We hope you'll follow along and maybe comment. But this issue proposes an example from the Drupal community. So Matthew's gonna talk about how we created a system of contribution recognition that helps us solve these sorts of problems and answer these sorts of questions. So I will be talking about Drupal's specific implementation basically a solution to this problem. We're not suggesting that this solution is the only solution but that it is one example of a way to systematically understand and recognize contributions to a project. So the way that Drupal's credit system works can be somewhat confusing because it does cover a lot of ground. So I'm gonna start with a basic example because we've talked a little bit about sponsorship and contribution. So I'll use Tim as an example and you can see from this screenshot an example of an issue where Tim worked on something and he uploaded a patch which is similar to creating a merge request on GitLab. And you can see here that the attribution for Tim is the Drupal association. That's Tim's employer. So this is a fairly straightforward thing where Tim was working on something and it was something that he did that was part of his job, connected to his job in some way and he specifically chose to credit the Drupal association. But it gets more, there are more options than that. It gets more complicated, I guess you could say. So in addition to having an employer sponsor, someone's work, we could also have somebody who is working on behalf of a client and we also could have people that want to specifically indicate that they are working as a volunteer, that they just love Drupal so much that they are using their own time, their unpaid time to contribute. And believe it or not, this system allows for us to track various combinations of those. So whereas Tim's particular contribution in that one issue is probably fairly consistent in his work across issues, for others of us there might be more of a complex scenario. So for example, I'll use myself as the slightly more complex type of contribution. So I work for a web development agency called Lullabot and in this particular issue where I was working on a feature for Drupal.org. I think this was connected to an NPR or PBS module. And I was working for Lullabot, but I was also working on behalf of our client, our customer, Georgia Public Broadcasting. So you can see this system allows for more complex sponsorship information to be tracked per issue. Now, what can be even more confusing is that you can see I've also checked this box that I'm volunteering my own time. And you might think, why would you also be volunteering? Aren't you getting paid? Well, in this particular case, I was working for a client that is a public media organization. And I have a long history with public media and I'm passionate about the public media. So I did, after my day of working for issues that were priorities for Georgia Public Broadcasting, I could volunteer my own time creating other features that might be useful for other public media organizations or that maybe Georgia Public Broadcasting had not prioritized, they weren't necessarily on the road back, but maybe I thought they would be useful features that they could have available to them. So this is one of these examples where, again, a real life scenario where how I got to the point of contributing to this particular project was quite varied. The next thing that I wanted to talk about was the other aspect of how this system works, which goes from my own individual participation in an issue to what happens when it is time to say commit code or to close an issue. So in Drupal's UI for our credit system, we have the ability for a person, the person who is, in this case, committing the code to be able to assign credit. So in this particular issue, you can see that somebody who had contributed 14 patches and had made a bunch of comments, got a checkbox next to their name or their username, and a few other people did, but you also noticed that there were some people that participated in this issue, but that didn't necessarily make any significant contribution, maybe even just a plus one or a question. So it's really up to the person committing to make that determination, whether or not each particular person who contributed gets credit. So that's how it works to allow for us to paint a very complex picture of who is contributing. And in this particular screenshot, you can also see that these people that have a checkbox next to their name, they either credited their employer or their employer and perhaps a client. So it is a robust system that allows us to have a deeper view into Drupal's contributions and it allows us to get more information about how people are contributing and why. So from these data, we're able to create different portals into understanding the Drupal community. We're able to come up with metrics. And in particular, we can decide that we want to find out more about a particular aspect of the project or answer a particular question. For example, in 2016, Dries Boytart and I published a blog post. We co-published a blog post called Who Sponsors Drupal Development? And that was one of the main questions that that blog post answered back in 2016. Dries has since updated that blog post annually. We were able to answer questions such as, what is the Drupal community working on? Who is working on Drupal? How much of the work is sponsored? Many of the questions that we pointed out at the beginning of this talk. So we can also see how those answers have changed over the years and what the information is across time. So for example, in the most recent version of that blog post, which looked at data between July 1st, 2019 and June 30th, 2021, we could see that there were little more than 8,000 issues that were purely volunteer where somebody had checked volunteer and they weren't sponsored. But we could also see that there were 37,000 issues where somebody had been sponsored to work on an issue for Drupal.org. And then we can see the number a little over 8,000 where it's a combination of both. So we can see a lot about just this particular statistic that there was a lot of work that was sponsored. In addition, we can pull things out like differentiating between code contributions and non-code contributions that for the most part, our credit system is capturing data about code. And then from there, we're able to look at different aspects of what a particular person did. So beyond a blog post, this is somebody's individual user page on Drupal.org, Amy June. And we can see for example, in the last year, she alone was credited with 736 issues fixed. So this tells us a lot about what Amy has been working on, the types of things she's doing. We can see that some of this is event organizing, mentoring, as well as all kinds of other different module updates and that kind of thing. So we can get a really clear picture on where people are putting their effort. And then we can also get a sense of where organizations are contributing, which organizations are contributing to the project. So for example, where I work, Lollabot, we can see all of the issues that people worked on where they selected Lollabot as sponsoring them for that particular issue. And that allows us to get a nice clear indication of what type of work a company is valuing, where they're putting their efforts. It also allows the company itself to see where have its employees been contributing to the Drupal project. But in addition to that, we can start to look at comparisons. So I'll let Tim talk about the next piece. Yeah, so from having this data, from having this understanding about how both individuals and organizations contribute to the Drupal project, we can start to build incentives and start to reward people for good citizenship in the Drupal project. So as I said before, the natural incentives that come out of the problem of makers and takers, the tragedy of the commons, is for an organization to maybe free ride on the contributions of others. So how do we combat that? In the Drupal community, we've done a few things. We've introduced a certified partner program that lets us say, hey, these organizations are known good contributors and only these organizations get access to certain opportunities that the Drupal Association provides. So if people out there come to us with a government tender for a Drupal project or with some other kind of major information and they look to the Drupal Association as a nonprofit kind of neutral third party, we know which commercial organizations we want to route that potential business to because they're good citizens in our community. Similarly, we have a marketplace page on Drupal.org where any company can list themselves as, hey, we provide Drupal services. And for the longest time, this was simply alphabetical. So folks who put numbers at the beginning of their company name kind of wound up at the top of the list, which wasn't a particularly fair way of doing things. But what we've done is updated this system to be based on these company contributions. So those good citizens of the Drupal project on the organizational level get promoted to the top of our marketplace. And this factors in a variety of things. It factors in the issue credit system we just described. It also factors in things that we as the Drupal project sort of shepherds as part of the Association have decided to make important. So being members of our supporting partner program, creating case studies that help promote good stories about Drupal to the world, all these other things are elements that we factored in to organization marketplace rankings. So you can see how that begins to change these new incentives to protect the common good, to break that anti-pattern that suggests that self-interested organizations should not contribute, they should just freeload on the contributions of others. So now, this chart that we looked at before has changed. We see that the best case scenario is both company A and company B contributing and that if one company doesn't contribute instead of making that money sort of at expense of their contributing company, they've lost their visibility. Their lack of good citizenship has lost them that presence and that promotion that they were receiving from the Drupal Association on Drupal.org. And so that business instead gets routed to the company who did contribute. It's not a perfect system. It's not the only way that people evaluate what organization to work with but we're beginning to see that education happen. We've actually seen government tenders come out that say, hey, we need a Drupal organization and we're gonna evaluate you based on your contribution profile reported on Drupal.org. I've reviewed at least one of those myself. So we're starting to see that become adopted and help break that cycle. And this is what's going to promote good citizenship and therefore sustainability of the Drupal project. So this is an awesome idea and it's a lot of cool things that we've done just in the Drupal space but you've seen a UI that looks like it's part of a complicated custom issue tracker. So hooray, that's great for Drupal but how do we make this a model for open source as a whole? How do we kind of standardize this idea? So Matthew's gonna talk a little bit more about our proposal to create a kind of standard around some of these options. Once again, I'll say that we view this as a solution and we can talk about how it's worked well for us, what sort of challenges we've encountered but we know it's not the only solution but as part of this process to bring our credit system over to GitLab we have been interacting with other communities and specifically other organizations that are trying to create some standards. So for example, the internews is an organization that has created a lightweight rubric that they use which is supposed to help journalists, human rights defenders and people that are doing what we might think of as important work in the world. They want to help those people determine open source tools that they can trust. So you might understand if you've had any activity in open source communities that projects can be of varying use and they will be different communities that we know we can trust. So we want to have these people want to choose software that is useful for them, that will be around and that is not gonna be spying on them, that kinds of things. So they have a whole bunch of questions in their particular rubric including a question about volunteers that our system can help with. And that's just one of the many questions in this rubric. Another rubric comes from the Apache Project Maturity Model where they are able, well, as part of their look into what makes a project mature, they want to know do contributors act as themselves as opposed to representatives of corporations. So there are other projects that are interested in this and other models that use these sorts of data. And then finally, the one that most people might have heard about more recently is the Chaos Projects, the Community Health Analytics Open Source Software Projects. The Chaos Project is focused on defining metrics to help define community health. So they offer a whole bunch of different metrics where they look at a project to define is this a healthy project or not? And some of the kinds of questions that they want to know about are simply the number of contributing organizations. And we can see from a system like Drupal's credit system that we would get a more complex and thorough understanding of the answer to that as well as questions like the elephant factor where is there just one company that's providing a lot of information, a lot of support for a particular open source project? And we have engaged with them and we're actually trying to make the kinds of questions we're answering into an official Chaos Community metric. So that covers the kinds of questions that we're looking into, engaging with that community, learning from their ideas, various people that are contributing to that project and their different roles, getting their feedback on how things work and coming up with some standards, if you will, on what it means to recognize contribution in a particular community. So finally, in addition to working with and researching all of these individual organizations that are trying to come up with their own metrics, we decided to open this issue I referred to earlier. So on gitlab.com, issue 327138, you can find our proposal to create a system of contribution recognition that's based on our understanding of these community health metrics, on our proposal to the Chaos project and on what we've learned from the Drupal project about what would be useful. So this issue's been open for a little while. There's definitely a few folks from the gitlab team themselves chiming in, a few folks from other projects and even from some more private organizations who use gitlab who've chimed in with their interest. But we would love to ask any of you out there who are listening to this presentation to please take a look, give your opinion, give a plus one especially if you'd like to see this become a gitlab feature because we think that will A, just be useful for comparing data across all of the open source projects that host on gitlab, make it easier for other projects to move to gitlab like the Drupal project has been doing over the past couple of years. And we think it will be a win for open source sustainability. So that's where we are. That's the main thing we'd love to see as a follow up to this presentation. If you'd like to learn more about Drupal, there's some events coming up. So DrupalCon Europe and all virtual events is happening from the 4th to the 7th of October, 2021. You can learn more at events.drupal.org slash Europe 2021. If you're interested in contributing to the Drupal community, you can go to drupal.org slash community to learn all of the different ways that you can participate in the community, whether it's just meeting people and going to meetups or diving into our gitlab instance and working on the code. You can also support the Drupal association by going to drupal.org slash membership. And with that, we want to thank you very much for your time. Again, I'm Tim Heston on drupal.org. I appreciate you being here. And this is Matthew. I'm MTIFT on drupal.org. We both thank you very much for the time you spent with us today.