 Hello, everybody. Thanks for being here. Today, we're going to talk about your product if you want to make it open source, or if, or how you want to build a community around it. I'm Adrin. I work at HugginFace. I'm also one of the maintainers of Cycadland. I'm going to talk about a few things that I have learned along the way, and the community has learned. Before we start, let's talk about a few different terms. Open source is one thing, but free and open source software is another thing. Free there as freedom, not as free beer. What freedom is, is different to different people, and that's why you have different organizations like Open Source Initiative or Free Software Foundation that they don't necessarily agree with one another that, if a license gives you enough freedom or not. And then you have different distributions that take a bunch of software, and they ship that, and redistribute that to people. And they have their own definition of what's accepted and what not. And the implication of that for you is that if you want to make a software, then you might want to have something that is accepted for the communities and for the distributions that you want to be included in them. But the licensed world is kind of crazy. If you look at the Wikipedia page, there's a ton of them, and each column here is a different feature, think of it. They're quite different. They allow different things. They ban different things, which kind of makes sense. But how do you decide? Maybe you should decide some of the licenses that everybody agrees they are good licenses. Well, here, each column is an organization, and they don't necessarily agree with one another. But at least there are a few that people kind of agree they are good licenses. Which one should you choose? Depends on how much you want to allow your users or your community members to be allowed to do, or things that you want them not to do. If you want another company, go use your software, make money without unchanging it, without releasing the changes. You can be on the permissive side. If not, you're more on the protective side. So easy, right? Not really. At some point, the community and all these companies who were doing open source, they kind of got to the point that a lot of people caught it that the open source is midlife crisis. Think of it as a company works on a product. They open source that. And they release it on an open source license for people who want to try, or companies who are small and they want to use it. But if you want to use it heavily commercially, or you want to use extra features, then you have to go back and pay company X. And then you end up some large cloud companies who end up using your product without paying you, and they take a big chunk of the cake and they basically grab your users. Company X goes back and changes the license, trying to prevent that cloud company to do that. That's how service site public license came out. When you look at it, it says SSPL requires that everyone who offers the functionality of SSPL license software to third parties as a service must release the entirety of their source code, including all software, APIs, and other software that would be required for a user to run an instance of the service themselves. That basically meant if you are AWS and you offer elastic sets, you can't, or you have to make AWS open source. This license is also not compatible with a lot of other licenses, including the Linux kernel, so it's really not clear how you would use it if you were to offer that as a service. And it's not OSI approved. It's not, it doesn't give you enough freedom. Whether or not that's a good thing is a different story. But this basically wanted, like these companies, Elastic Stitch, Mongo, and a bunch of others, they just wanted to have a piece of that cake the large cloud companies are getting. Why does that matter? Because it changes the kind of community that you build around it. But the boring license stuff aside, let's talk about community. Now I have decided, I know what I'm doing on the license side, I want to build a community. How do I do that? Or rather, let's talk about the things that you shouldn't be doing. The first one is lack of onboarding. If somebody comes to your project, you create a new one, and it's very clear to you what it does. But from an outsider's perspective, it's not clear what it does, who the user is, how the users are supposed to use it, and how to get it. Then they can't enter your community. What you end up doing is having people who think and code exactly like you do, it's not many of them, we are all very different. And it also hurts the diversity of the contributors you're gonna have on your project. And I hope at this point, we agree that diversity is a good thing for a project. So we're not gonna talk about that. Things you could do, you could have it like a much better read me than you have. You can have a contributing guideline. A lot of people are now used to having a contributing guideline. And that's the first thing, when I go to a new project, that's the first thing I look for. I'm like, okay, like how do I install it? How do I compile you? If I have to compile you, how do I start contributing? Try to reduce the friction for new contributors. Instead of arguing about what format you like, what kind of, what's your line length and all of those, automate as much as you can. And if you can, have them in pre-commit hooks. They work and have CI and tests that run. And they also run for people who come from outside. If you are in a team that if somebody from outside submits a pull request from their fork and half of your CI doesn't run, then you're basically not inviting people who are not in your organization to be active in your community. The next one is nothing in writing, documentation. We all love documentation. We all hate writing documentation. That's why I'm a strong advocate of what I call documentation-driven API design. When I start a new project or when I start working on a new feature, the first thing I do is I write a Python script or think of it as a Python notebook. I explain what the problem is. I explain what this is trying to solve, how the user would interact with my software, what they would get, and at the end what the output is. If this is the first thing you do, it means that you can communicate that to your stakeholders instead of your implementation and having your stakeholders having to read your implementation and kind of guess what the API is and how the users are interacting with you. The better thing is that you start with documentation. It doesn't become an afterthought. It doesn't become a documentation that you have to pay later. But that's not the only kind of documentation you write. You have your usage docs, you have your API docs, but you also need to version your docs. If you don't, then you're gonna have users who come and they're confused because they copy some code from your documentation page, they go run, it doesn't work because they have an old division of your software installed locally. And code documentation is as important as the other public documentation that you render. Code documentation is important because if you have a new person, either an outsider or a new hire who wants to work on your code base, they look at your code and they have no clue what it does, they can't enter that community. And you can, even if you're just doing closed source, you're just inside the company, each project is its own community and you want to have people who can enter that community as easy as possible. And be comfortable with your documentation toolbox, whatever you want to use. A good part of the PyData ecosystem uses Sphinx, it nicely renders your docs. We use also Sphinx Gallery, we take the Python file and generates an IPython notebook and an HTML file. And if you don't want to deal with the infrastructure, you can read the docs that does all of that compilation for you, it renders it and it also versions your documentation. The next one is leadership is a mystery. A new person comes to your project and they don't know who makes decisions. They don't know if they have questions, if there is a person that they should ask. They wouldn't know if somebody's review matters more than others. Or is there anybody they could just like send an email to and talk? Is there anybody who could mentor them? And this brings us to the governance. Deciding how your leadership works is deciding how you're governing your project. And there are many different models. You go through a few of them and then you need to decide what flavor works best for you. The first one is duocracy. You might have a team, they're all internals, they all have access to the repository, they all work on different parts of the project. And practically the one who works more on a module becomes the owner of that module and they make the most important decisions of that module. It works, it has no overhead, but when newcomers come, they have no idea which module is owned by which person. It's very implicit and it's really not clear to the outsiders. So it works at the beginning, but you can't really continue doing that for long. The next one is founder leader or BDFL and the Python community is very familiar with this. Hilo used to be the BDFL of Python. It's a very easy system to start with. You start your project, you are the one who makes all the decisions about that project. It's good for a while. It doesn't have any overhead, but at some point your project becomes hopefully bigger than you. Your preferences become your personal preferences that might not agree with the direction that the project is going. If you have an active community, if you have a large user base, they might want something that you don't. And if you're the only person making those decisions, then there's a clash. And it's usually to the detriment of the project. To solve that, a lot of projects have a self-appointing council or board. So you're like, okay, I want to have a single point of failure. I choose a bunch of people. They all work together. It's good, but if you are self-appointing, then you're not necessarily representing your user base unless you actively try to choose people who represent your user base. And as humans, we are really bad at doing that. So the next model would be electoral. It does add a ton of overhead. You can have explicit time limits. You do need to have multiple people who offer to do the same thing. You need to have a large project and contributor base to do that. But if you have a sponsor, the sponsors might require you to do that. So it's a good step for a lot of larger projects. But if you're a company, you might want to have your governance to be kind of single vendor. You want to be the one making decisions. You want to decide the pace of the project. And not everything that people see in that project are things that are relevant to that project. You might have a closed source part of the ecosystem that influences the decision you make here. So it makes sense to you. But you want to have it open source because you want to engage a broader community. You want to increase that option. But it's kind of open, but it isn't. It's kind of a walled garden. It looks really nice from the outside, but it's not clear for people how to get in. You might have dual license system, which means people who contribute, they might be happy to contribute to a free open source project. But if you're making money from their contributions, it's not clear to them why you make that money, but not them. And the one with the most overhead probably is if you create a foundation for your project. The foundation will do all the legal for you. It doesn't have to be the same people who lead the foundation, who lead the technical part of the project. So you might have two different leadership teams. You might also decide to join another foundation or an NGO. Like Non-Focus is for a lot of non-focus-sponsored projects, like Non-Py, Sci-Py, Mathplotly, Psychicland. They're all a member of that. It does give you a lot of better funding opportunities because it's much easier for the funders to pay the foundation instead of an individual. But which one should you choose? It's a really good question. And these few that I mentioned are not the only flavors. It's a spectrum among different axes. You need to see what kind of community do you want to build. Do you want, if you're starting it, do you want this project to be something that can go on independent of you? Or is this something that is a main asset to you, to your group or your company and you want to keep it inside? Do you want it to be kind of thing that works really nicely in five years? Or do you really want to be fast and you want something to be delivered in six months? All of those define how fast you want to go and how much you want to engage the broader community and then you can decide what kind of governance you want to have. But it's better to be very explicit so that when people come to your project, they know what they're dealing with. The next common issue is no path to success. That means when users or contributors come to your project, they have no idea what success is and how to succeed. Does it mean that they should be pinged and their opinion should be cared about when about certain topics? Do you have different teams like, I don't know, committee X, committee Y, like an advisory board or a core developer committee or a triage member team? How do people advance? Is that clear? And it matters because it sets the engagement level that people will have on your project. If they don't, some people might be more than happy to just code, but a lot of people don't and that's more than okay. They want to know how they can progress in your team. You need to be clear about how they can do that. And then that also brings us to internals versus externals. If you treat people who are inside your team differently than people who are outside your team, then people who are outside, they might not feel like they want to be a part of the project because I worked, for example, as a consultant and I was always the external. I was never treated the same as the people who were employed by that company. That's why after two and a half years, I was like, ah, it was nice, but like I want to belong. The sense of belonging is really important for a lot of people. So practically, if you have workflows that wouldn't run for others who are not in your team, try to fix them. Do your team members have a workflow which is very different from people who are not in your team? Try to encourage them to follow the same workflow. It would also, if there are pain points, it would show you what those pain points are. So try to treat people as much as you can equally if you don't have to differentiate between insiders and outsiders don't. The next one is poor communication. Communication really matters, even though like a lot of us would rather that it didn't, but it does. And we think that it doesn't matter or we don't want it to matter because we don't want to do the extra effort of doing good communication, but we will be offended if somebody else does like treats us badly. And 50% of texts and email messages are misunderstood. That's because first of all, humans have a tendency to assume the worst and all the emotions and body language and tone is lost when you write a message. Also people who know each other treat each other differently than people who are strangers. So if two people who know each other on a thread have a kind of a communication that is okay for them, but an outsider comes and sees that, oh, that looks really harsh. I don't think I want to be a part of that community. That's something for you to think about. And reviews, code reviews, are kind of by nature criticisms, supposedly constructive criticism. So it's really easy to offend people. Try to use emojis. Bring your emotion to the game. In most cases, your emotions are not bad. They're nice. And if you also, you need to have somebody at least who watches all the communication going on, on your lists, on your issue tracker, and sometimes when, if people are really passionate about something, the discussions can get really heated. Be the person or have somebody who's a good mediator and try to mediate all those very heated discussions and try to cool it down a little bit. The next one is lack of transparency and it comes in many different levels. The easiest one is if you have a bunch of people working in the office, they might go have lunch together. They talk about a certain aspect of your project. They all agree, they come back, one person opens the pull request, the next person approves and it's merged. From an outsider's perspective, they don't know why that PR was opened. They don't know why it was approved. There's no discussion. They might disapprove, but most of all, they don't feel that they've been included in the decision that was made there. So if you do talk about something, first go and say, hey, we talked about it and this is what we talked about. Does anybody have an opinion? And if your team is distributed, the rule that a lot of companies follow is, have at least don't merge anything sooner than 24 hours after it being opened unless it's really a hot fix that you have to fix. It allows people from different time zones to chip in. If they want to. And then this comes in different layers. You might have a company Slack that you have an open source product, but you talk about a lot of stuff inside your chat channels or internal systems. Then it excludes outsiders. But even if you're talking about only people who work in your company inside that system, you might be having some private conversations that excludes other people who might care about this project that you might not even know. So try to be as open as you can. It slows you down a little bit, but in the long run it really helps with keeping people and bringing people in. And the last one is not seeing ourselves in others. Basically have empathy. We are not necessarily the best people who have empathy for others, but it really matters. If somebody comes to your project and they open a really grumpy issue, try to understand where they come from. Ask questions instead of being like, no, this makes no sense. I'm going to close it. Okay, why are you hurt? What's bothering you? Try to understand their pain points. It might be good for your product. It might be something that they are feeling that others are feeling too and try to fix that. Try to have emotional intelligence. Until a few years ago, I thought that's absolute garbage and I don't need it. And then somehow I was convinced and it does help me. It really does help me. And try to be a feedback magnet. None of us are perfect, but that's totally okay. People don't want to give you feedback, especially if you are a core maintainer. If you are in any sort of position of power, most people don't speak truth to power. If you want them to talk to you, if you want the feedback, you need to actively go and seek specific feedback. So, to give you maybe an example of why all of that matter, I can talk to you about my personal experience. Around 2018, I started working a lot more on open source and back then I was using TensorFlow, so I thought, I can go contribute to TensorFlow. Why not? I started, I opened some issues, no response. I tried to open an PR, no response. Okay, I guess this looks like, and then I looked at the commit histories, it was all coming from a bot that was coming from Google. So I realized it was kind of a product that is developed internally and it's kind of just pushed out. It's not something that wants to involve the community. So another thing that I was using was Scikit-learn. So I went and I found an easy issue, documentation issue on Scikit-learn, I opened a PR and I got a response pretty quickly. I was like, oh wow, and then I fixed the issues and it got merged, it was an amazing feeling. And then I kept going and at some point it became kind of my full-time thing and my weekends and my evenings. And after a few months, then somebody pinged me on an issue and I was like, hey, Adrien, what do you think about this? I was like, oh wow, somebody cares about what I have to say. And then a few months later, I woke up to an email asking me if I want to join as a core developer. I'm like, what? Yeah, hell yeah. And then that changed the whole course of my career. That single first interaction and the way that the community worked changed how I was involved in that project. And at least for quite a while, I had a lot of contributions to the project. Whether or not that's a good thing, like the jury's out. But you do get people. And from your perspective, if you're a maintainer, a random person opening a PR, especially if you have a busy project, is just yet another random person with an insignificant contribution. But that person has the potential of being one of the main members of your community. So to end, I'd like to talk about some resources that you can go and it's a massive topic. You can't really fit everything in a single talk. If there is one thing I want you to take from here is to read emotional intelligence. I thought the book is going to talk about things that are not scientific and I'm not gonna like them. But at least the first third of the book was about the neuroscience of emotion and how we make decisions because we have emotions and how our emotions influence our decision-making process. It's great. It's from the 90s and it coined the term in emotional intelligence, but it's great. Another one is this book is a little bit dated. It's a little bit US-centric and it treats gender quite binary, but it still has quite a few listens for a lot of us. How and why men should mentor women, if that's something that you care about, like in your projects, because I think you should. Another one is 35 dumb things well-intended people say. This book really helped me understand microaggression. It helped me understand Poiland principle and how impact is different from intention. Something that you say you might be very well-intended, but it might have a very different impact and that matters. If people are offended and leave you, even if you didn't want them to, you can't blame them that they did. You had an impact on them. Another really great book on the open source side is Working in Public by Nadia Arbal. It's a great book. It talks a lot about the dynamics of open source. Then when you look at the open source, each project as an organization, then you understand how decisions should be made and having everybody engaged. The open organization book is great. It talks about how decisions are made and how red hat works internally, but you can apply that to any kind of project where you have a bunch of people making decisions and some of them might be more senior than others. And the last one, Producing Open Source Software. It goes in a lot more detail on a lot of topics that we talked about, but a lot better. And if you're not into reading books and you prefer online resources, there's a ton of them. I can't include everything, but they're great. And with that, come to an end. Thank you. Thanks, Adrian, for this amazing talk. And if you guys have any questions, please walk by to the mic. Yeah, so I think everybody has. If you like your open source ecosystem session. So yeah, thanks again for it. Thank you. Yeah.