 Welcome to my talk on open source. It's just about the source, isn't it? Why am I talking about open source and what it's about? Software engineer, the last research director at the Apache Software Foundation, co-founder of Apache Mahood, hands up if you know the project. Keep your hand up Apache Mahood if you are using it. Okay, no one to talk to after my talk. I'm a co-founder of Berlin Buzzword, so if you're into search, scalable systems, storage, no SQL systems streaming, and you need an excuse to talk to your employer to pay for your trip to Berlin in June, go there. Now I want to know you a little bit more. Usually I do meetups, so it takes a microphone and hand it around. We've got like 30 minutes, I won't do this right now. I've got a few questions here. How many of you run their own open source systems projects? Okay, pretty much half. How many have contributed? Everyone? Nearly everyone? Okay. How many of you wrote anything about open source? Press blocks? Pretty much everyone? Okay. Have you helped your users with an open source project? Nearly everyone? Family counts. How many of you are contributing to open source as part of a day job? 100% contribution during day job? One third? Roughly one third? Okay. I've been using, I won't ask that. So why should I care what open source is about other than source code? I have a history of converting my family to free software, and my husband has a history of converting me to free software as well. How many of you ran into the issue of installing whatever operating system there is? I'm going back during Christmas to find the system and it totally out of date state. I'm very grateful to my husband who converted the image database of my mom to a new version, digging down into the table structure of CS SQL like database underneath. If you're betting your business on open source as an external dependencies, you may still want to think about what open source is about and what these structures are about. Because someday you may need a tiny little change applied to it or your project may decide to go closed source or your project may start plugging changes. So at least you want to understand what's going on and how to make changes. If you yourself are running an open source project, you want to know how to grow the strength of your community. You want to know how to attract a diverse set of people or at least some people who contribute and help you. Because at the end of the day you probably won't be able to do everything on your own. You want to at some point probably know how to deal with developers and contributors such as you have the same idea as yourself. Such as you can assimilate them and so that you can coexist in a nice environment. Also you want to protect your downstream users from legal threats apparently. Now before we go down let's first take a step back and think about what kind of motivations there could be behind you wanting to build an open source community. One goal could be I want to build a business around the software that breaks into an existing market by changing the economics. Based on this goal you make decisions differently than if you say your goal is to collaborate with others who have the same needs as yourself to just fix the problems that you have. There may be different license decisions involved, there may be different communities decisions involved. Another goal could simply be to build up your CV, your reputation skillset, whatever. Two questions you probably want to answer early on. One of them is how much control do you want to exercise yourself personally as opposed to how robust should your project be? What do I mean by that? You can decide that you want to have a benevolent dictatorship for life. That means you've got total control, you dictate the project decision, your company may dictate the project decision. But at that point where you lose interest your project is out there and nobody knows how to continue. There's no continuous flow. The alternative is to build up a group of equal peers to seek consensus. This may end up being slower but it may survive people leaving. Let's first dig into the three obvious topics that you have to think about. First topic is copyright. Second one is how do you deal with straight marks because apparently your project is probably going to have a name. It's probably going to have a fancy logo. What kinds of issues are waiting for you there? Copyright. That's simple. There's a legal death room over there where you can learn more about this. There's a very brief overview taken or inspired by the FSF. There's essentially two ways you can go for copy left. This is where you yourself state that you care about any and all of my downstream users to have the use, study, share and prove rights. You go for LGPL for anything that's especially like for libraries, especially similar libraries. You go for 8GPL for hosted software. You go for GPL essentially for everything else. Then, if all I want to do is to ensure that my very own project gives the use, study, share and prove freedoms, I could go for a non-copy loft open source. If it's small enough so I don't care. If it's libraries where I want to push standards forward, it pays to have a non-copy loft license so that everyone can use it. If there's projects with established economics out there, it may make sense to use a non-copy left open source license like the Apache software license or what have you. Second topic that you will have to think about is patent law. I'm sorry. There's dragons there. I won't go there right now. There's a legal death room. Talk to the people over there. That's definitely not my best topic. Trademarks. Why should you even care about trademarks? On the one side, you want to find a project name that isn't infringing. Imagine that three years down the road, your project is successful. Suddenly, some commercial entity or even worse, another open source community comes up to you and tells you your name isn't infringing, change it. You don't want that. So before you start, you want to do a name search. You want to probably do a logo search. On the other hand, if you do have an established project name, you don't want to wake up in the morning searching on eBay for your project name and there is like printed CDs out there with people selling your software without your permission and without downstream users knowing what's on that TV. That's actually your software or something malicious. Some examples here. What you want to know. So you want to avoid this case where open source software is being sold without your knowledge and without your approval. But you also want to figure out what infringing use is. So this is a sign-on logo. This is a logo that once was used for a feed pedicure service. It was decided this is not infringing because it's clearly a different area of business. But those are the kinds of questions that you will be phasing them. Next question that you will have to answer for yourself is whether you should register your project name as a trademark. You don't have to as long as you're actively using and protecting it. But having it registered adds a certain additional level above it. How do I deal with these infringements? What you have to do is to find where the project name is used. Identify whether this usage is infringing and then actively fight. Actively fight sounds like going to court. Usually it's just sending out a nice email asking that person, hey, this is my trademark. I believe you're doing something wrong here. Can you change that? And usually this already helps with fixing cases. So far for the easy topics. Let's talk about the slightly messier topics, which is the people aspect. How many of you still believe the lone brilliant hacker legend? One. That's true. It's a good like legend. So essentially my belief is that great software evolves from, not only from one bright mind alone, but from a community that works together. And usually ideas are born based on pretty existing ideas. There's a saying at Apache, which goes community over code. I would like to translate that that a project without people is a debt project. So you can go to the Syria MQ community, which essentially has the same mantra, which says I believe people before code. So what are the people aspects that we have to care about? First of all, there's this lovely software that we've written. Where do we find users for it? Users are ugly, right? Where do you find them? How do you turn them into actual users? And once converted, how do you change them as your users? What ties into subjects that are typically called also something like marketing? How many developers love marketing? Cheapest marketing out there. Go to Twitter and talk about it. Get other people to talk about it. Search for mentions about your projects that get involved in these discussions. When involved in these discussions, ask people to publicly share that they are using your project. Why should you do that? I've been doing my heart for a couple of years, going on Twitter and just searching for the mention of my heart was a great way of finding downstream users, finding good use cases, which I then could again use as an argument in favor of my project. When someone came up to me who's using you, then you have this list of people who are using your project and you can also talk about what people are doing with it. That's a totally different story than just talking about a feature set. It's different telling people that Adobe is using my heart as opposed to we've got a collaborative filtering algorithm. You go to conferences. If you're at ApacheCon, you go to FastDem, you go to FrostCon, wherever you can find your community. You talk to the press. You can write press announcements. Those are like predefined texts that you hand out to reporters who then create nice articles from that. However, you may be surprised talking to the press. Oftentimes, they are even happy to accept articles that you yourself have written with your name underneath. It does take time, though. It does help to write books and have them printed. One warning, writing books is a huge time thing. Reviewing books as well. Some people may be more happy finding out how things work from the documentations than others. You may need to talk about what you're doing and answer questions, which essentially means giving trainings. This can be done at conferences. It can be done during the hallway track. It can be done at a booth. If we are talking Apache, you can go over to the K Building. There's an Apache booth there. You can figure out what the foundation is doing to you. Those kinds of hallway conversations give you a little glimpse of what people are doing with your stuff. Next thing, giving support. Why should you give support? You want to help beginners getting started. We've had this nice talk just now on onboarding new people. There's one saying that I keep repeating to people who are afraid to ask questions. They are not dumb questions, only dumb answers. If your conversation is friendly to your contributors from day one, that may mean that one day they turn into active committers and active promoters of your project. So you said you have less work to do. And we all want that, right? And as a good source for helping people go to where those people are, don't expect them to come to your channels. I've put Stack Overflow here. I've had an experience with Twitter recently where someone had a question about an Apache board report on Twitter. This person was then invited to post on the project list and the communication on the email list of the project itself was much more detailed and much less flaming, much less aggressive than it had been on Twitter just for sharing more details, just for making it a longer argument. So how can you help your project? It doesn't mean only contributing code apparently at Mahoud and also at Apache. We've made several people committers just because they made an effort to help the users grok the project. So there were a couple people who helped on Twitter. There were a couple people who helped on mailing lists and we have a question for a lot of time and at some point we just made them a commuter and gave out this badge. Whatever badge you have, you want to give this recognition to active contributors. Remember that not everyone speaks the same language. You may think about translating documentation. OpenOffice does a great job there. So does LibraOffice. You may notice that not everyone's comfortable with using mailing lists and IRC. So you may think about using a different communication channel like WebFora over here. Potentially it was a back end to whatever communication medium suits you most. Once you get people on your communication channels, once you start the conversation, one of the most common questions that I get is how do I get started? I had a long-term Apache person come up to me just recently asking me how to get started on my hood and I was like, you are here for 15 years. He didn't know how to get started. Apparently he did not. The second most common question that I get is when will feature X be shipped? Typically what people answer is patches welcome. Typically what this amounts to is that users believe that this is not a welcome feature. At least in the communities that I've been part of, this is a cry for help. If people tell you patches welcome, what they really often mean is please do start contributing. Please do help us get this done. This can start with tiny documentation patches and go up to more complex architectural coding contributions. It's typically not meant like, go ahead, write it, we will deny it. It's usually I don't have the cycles right now to get to it. If you make those patches welcome, a little bit more explicit, that may be even more helpful in attracting people. The Apache Software Foundation has a project called the Apecziatic. Any project that doesn't have enough traction anymore, like not enough oversight anymore, is being moved to the Apecziatic. Before these projects move to the Apecziatic, I went on a mailing list that they are going to the Apecziatic. You would be surprised how many people, how many projects didn't go to the Apecziatic because of this one last email, because of this one last cry for help. I did the same with my help, not because of Apecziatic pending, but because the project was essentially very quiet. I made it explicit that we needed documentation help, we needed communication help, we needed help with present PR, etc. Suddenly people knew where to contribute. Before we had students coming to us telling us, we've got this new grand new machine learning algorithm, do you want it? And we were like, not really, not yet another one, but we needed something different. When we made explicit what is something different was, suddenly contributions were much more valuable. Same thing happens at other projects as well. Is there a question to that? Yes. Is there a hard role, like you say, if there's no commitment within one year to start this Apecziatic process? No. There's just one rule, like at Apecziatic you have a so-called PMC, Project Management Committee. For each release you need three PMC votes. If it's a parent that there's less than three PMC members active, then it's very clear that this project is on its best way to the attic. So you need three people active watching the project and the reason behind it, one reason behind it is, imagine there's a security issue coming in. The project doesn't need to be able to make a patch and make a release. If there's less than three people, by definition they are unable to do that. So they are doing it as service to their users. And when you are in this situation, you are more than welcome to add more PMC members because this is what makes the community safe. Another good way to talk about where can people help is to have real-time help requests in your issue trackers. How do you keep people motivated who submitted their first patch? Here's a patch. Often scratching the developer's own edge. As soon as you see it, your clock starts ticking. Back when I was at my first job, I contributed to, I believe, rules. But I had a fixed amount of time given for this project. When this time was up, I had to scramble to get back and make any changes to it. So getting fast feedback is a way to get more and better patches and to get better refinements. Getting feedback after months also isn't a very good motivator to go back to this thing that you wrote months ago. So you want to give feedback early. Apache Hadoop does a good job there as they've automated most of the tedious stuff, like a serial test. Does it adhere to coding conventions, et cetera? Another thing is to merge quickly and to have clear rules for what constitutes an acceptable patch. Nothing is more frustrating than spending hours and days and weeks on a patch to get it right and correct, only to figure out that this is not what the community wanted. So you want to have clear rules for what is acceptable up front. You can ship chocolate to keep people motivated. For my first patch ever, that went to the tools project long before it was at Red Hat, the reviewer, despite sitting in Australia, gave me speedy responses and offered iTunes vouchers so that I would convince my manager to actually approve the submittal. I didn't want the iTunes vouchers. I was more into getting publicity for Apache Mouth back then, so I got a blog post, but what was much more motivating than this kind of feedback was the thank you that I got. I got a thank you on the issue tracker. I got a thank you in the commit message. I got a thank you in the next release notes. Those don't cost anything, but they help your downstream contributors tell their boss, hey, look, this actually generated some publicity on the wide internet for you and you can point to it. So thank you consistently helps. Another thing that you want to think about long-term, saying things only go so far. If you have people who are contributing a lot out of their day job, sooner or later, they're going to be frustrated. They may run into burnout. They may run into issues. So for long-term success, you want to figure out how to help these people. You want to find payment for these contributors. This can be done through sponsorships and donations, which are funnels through to your contributors. I've seen that happen. It can be done through finding work contracts for them. It can be done through finding freelance contracts. That's a whole new topic and talk. Also, you're speaking about money. Sooner or later, you will end up wearing mouthable hats. So you want to make sure that you don't confuse them. We had a keynote over at Berlin Buzzwords of a person talking about an open-source project. It was totally open-source related. The slide he used had his company logo on it. It wasn't well-received. The speaker didn't know why it wasn't well-received. I pointed him just to the master slide, and he was like, oh my gosh. Ever since he put up his company hat, when he was talking company issues, he would come on stage with his company hat and explicitly put it down when it wasn't about his employer. This kind of helped the environment. Speaking about funding, we already talked about funding for paying people. What else do you need money for? You may need money to pay for infrastructure. There's like hand-hosted hosting versus self-hosting. You may need time to configure your infrastructure. You need machines to actually work on, like laptops. You probably also need machines to run continuous integration on. There are great services out there. Speaking from an Apache background, the goal there is to build open-source projects that survive the next 50 years. Buzzwords didn't survive 50 years. So you do want backup services for your interaction. Again, you want to get funding for your own time. Speaking of Apache, you can donate. Shameless plug. Speaking of communication. For most, you want a clear mission statement so that people know what you want to do as opposed to what you do not want to do so that you can point them back. You want to avoid having this kind of mass media where you have an exponential growth in connections. You want like a central place, like a central mailing list, a central point of choice. All quickly goes through the means of communication. Face to face, meeting in person, like we do right now. It's high bandwidth, it's great for resolving conflicts, but it's definitely not durable. It has to be repeated over and over. Let's go one step further. We take video chat. It's still pretty high bandwidth. You see faces, you see interaction, but it's barely durable. Imagine having to re-watch every discussion video that was happening in the past when you joined in a project that's not feasible. One step further. Online group chat. Lower bandwidth, it's text-only, but at least you can search and you can skim through the locks, but it's still pretty unstructured. One step further. Web forum. You can still search. Now it's asynchronous, so you don't have the need to be in the same time zone, and it's still lower bandwidth. Go one step further, mailing lists. Pretty much the same as web forum. Pretty durable, asynchronous. Needs a decent client, and it has the disadvantage of being low bandwidth. Usually for longer-term discussions, design discussions really nice. One more thing. You can have an issue tracker. It's much more structured than the mailing list because suddenly you have discussions structured by component type, but it's lower bandwidth. However, it can be very fine-grained discussions. What else do you need? You need some kind of high-level overview that's something you can't do with the media we've seen just now, that's something where you want to do a wiki page, a web page, stuff like that. So essentially one thing that I find very important for projects is to have one canonical source for decisions. At Apache, Sirius is saying what didn't happen on the mailing list didn't happen. If you have face-to-face communication and you don't take it back to the mailing list, it didn't happen because not everyone saw it. Okay, what else? Mental health problems. Sirius seems like cookie effect. If you say you want to do something, actually do it or give it explicitly back otherwise it won't get done. Sirius has great content or not taking on too much work. You will end up at some point growing older, going into physical issues, weighed up on these issues or it may be very painful. If you are successful, you will have to find a way to integrate people who want to drink from the fire hose as well as people contributing to your project at a free time. That's something that's very successful open source Apache projects are dealing with a lot. You will have to figure out how to deal with top poisonous people, just two links. There's a nice talk by Colin Sussman and Brian Fitzpatrick. There's another great talk by Christian Koenthop on YouTube, however in German on how to deal with flame wars and breaking communication breakages. Last but not least, you will have one, two plan for your exit. I built up Berlin buzzwords in the year 2010, mostly through my private inbox. It took me five years to get this event out of my inbox handed over to the event producer. I still have people coming up to me personally asking questions about the event. Even so, I officially stepped down. I'm not anymore on the program coming here. I'm not anymore involved in the project. There's still people coming up to me. It took me five years to hand all the contact points over. It took me five years to hand over the strategy for finding speakers, finding keynote speakers, evaluating submissions, et cetera. Plan for that from day one to make your project sustainable and to survive your own disinterest. And with that, I'm open for questions. Yes? What you talk about making it sustainable, would it help to prepare for each of those tasks somewhat protocol? The question is whether it would help to prepare a protocol for each of these tasks for handover? Then you have it for yourself as well because you don't have to sync. That makes sense. ZeroMQ has the C4 protocol for contributions. Apache has something smaller, like the bylaws per project, where basics are being written down. This definitely helps. How would you decide which various sport for our best spread your project and outcome? How would you decide which web forum to use to have your presence around? Which system I would use? How would you judge which system to use? Two decision criteria. What suits best my workflow and what suits best my downstream users? Ask your users, talk to your users and then figure out what works best for you. I agree with you that sooner or later you have come to consider this because you cannot stay just for a chance to try to know that better than I do. Among the possible means to find channels, if you ever consider a sponsored model from private companies, from government, do you have some experience? For me, the Apache Software Foundation does accept sponsorship. It's not on a project basis. It's on a foundation basis. What they have sponsored is stuff like infrastructure, stuff like people doing press, etc. I do know other projects where development work is sponsored as well. That's not CASF. I can't talk too much about it because I don't know exactly how this is set up. This is not my project. I believe it was somehow that these people were funneling this money through a foundation. I don't know how it was set up. But it's also possible to get funding for development work. For Apache, it's at a foundation level but not project specific. There are other foundations who do it differently. Visibility. That's amazing. For Apache, it's visibility. At Apache, you don't get project influence but because projects are independent, you get visibility. Something I've noticed in the past couple of years different projects have been developing interest in collecting data from users for say, from user information. What are the mass practices there? Probably seeing who gets success and who wants it. At Apache, essentially everything is open. Everything is transparent. The only things that's... If you come to Apache and talk on the mailing list, these emails are open. I believe there's only a very limited chance of getting emails that you send to these email lists retracted. If there's any privacy...