 Great good morning My name is Robert Watson I'm going to be talking to you this morning about how a large-scale open-source project works before I get started Before I get started I'll tell you a little bit about my background Before I came to the open-source community. I actually lived in the shareware community in about the early 1990s My first experience with dealing with somebody else's large-scale open-source project or at least shareware project was the World War 4 BBS software Which was shareware that had the somewhat interesting and unique property that if you registered it the author Wayne Bell sent you the source code in C And this was a very enlightening experience. It was a very large piece of software But one of the things that happened with World War 4 is that a community built up around the software where people distributed patches or mods modifications and these could be anything from cosmetic changes in the software to quite large-scale infrastructural changes and That community resembles a lot in many ways The open-source communities that we have today and a couple of years later in about 1994 I discovered my first unique system, and I realized that actually that was what I had been looking for It had context switching and had processes that had all this cool stuff that didn't exist in MS-DOS But it was open-source Which is quite a neat place to be So my background is that I started out using unique systems in about 1994 I've got involved in DARPA research and development in the US Operating system security research and for that we used free BST and Linux and later on Mac OS 10 as Darwin became open-source and Open-source really transformed the way that we did research And so I got more involved in particular in the free BST community So what I'm going to do today is talk to you a little about how you structure an open-source project I'm going to use free BST as my case study throughout because it's the open-source project I'm most familiar with But I have been involved in other open-source projects I was involved in the code of file system project at Carnegie Mellon and some others, too I think you're fine that some of the things that I talk about the themes that I talk about Really recur across a large number of open-source projects and especially large open-source projects I think it's important to begin by talking about what we mean by open-source project because I think there's some confusion on that front There's a big difference between a piece of software under an open-source license and the community that produces that software And what I'm here to talk about today is the community that produces the software and not so much the software itself This is not an operating systems talk I'm not going to talk to you at least not any detail about kernels much as I would love to do so But instead I'm going to talk to you about the social structures that allow us to build a large-scale operating system over a very long period of time So I'll tell you about the free BST project up front, but mostly I'll tell you how it works And I should of course preface this by saying I am a kernel developer They tell me that there are these user applications that exist in user space the system calls there's something on the other side of that barrier But I don't believe it. So this is of course entirely from my perspective as a kernel developer So what is an open-source project? An open-source project is the social structure around a piece of open-source software And I think we know what open-source software means It simply means software distributed under a license that is considered open and there are various definitions of that And previously certainly falls under that But the open-source project is a lot more. It's obviously includes the source code But typically also the revision history of the source code which in the case of the previous project is going on a 30-year software legacy at this point Which is quite a long time, but it involves other people who write the software But also the people who document the software support the software in many cases use the software Distribute the software rebundle the software and hopefully advocate for the software When the organizers of foster I'm asked me to give this talk They were particularly interested in the large-scale aspect and I had to sit down and think for a minute What does large scale mean in the context of an open-source project? Well, I think I understand small scale small scale is one guy sitting in the basement hacking on code and until 2 a.m. Every day or maybe 4 a.m. More optimistically But what is a large-scale project? I think what makes a project large scale is not the size of the source code and previously is a large software project But it's the size of the institution around it. It's the fact that it has a sustainable community model That over time the project keeps going even though people individuals come and go, you know The people who founded the previously project some of them are still involved But many of them aren't and there are many many new people involved and I think that's what makes a project scalable It's the social model being scalable as you move forward So I'll tell you very briefly about what free BSD is because it provides some of the context for understanding Some of the points that I'm going to make Free BSD is a BSD Unix The BSD project the Berkeley Software Distribution was a project at the University of California Berkeley It began in the late 1970s and the free BSD project is an open-source project that spun off of that In the early 1990s. We do have source code in our tree, which is 20 to 30 years old We also have a vast quantity of source code. There's a lot more recent than that But it's an interesting software legacy I think we actually have a heritage of software and there's an increasing number of projects that do that are no longer on their first generation developers Of course, one of the other things that makes the free BSD project Particularly interesting is that we do have some of the original BSD developers working on the project We do have people who have been working on our software for 25 years continuously And that's quite unusual for an open-source project And I'd like to think there will be many more of these projects over time And that the open-source model will keep on working the way it has but I think it really is a really valuable resource To be able to turn to someone who has been working on the software for longer than some of the people on the project have been alive To ask them why it is that things are done the way they're done Free BSD is very widely used. It's a little bit less visible than some of the other members of the open-source community People don't know as much about free BSD as many of the places it's used as they might with Linux Or maybe even something more widespread like Firefox But it is widely used you'll find it at many internet service providers around the world You really truly cannot use the internet you know for 30 seconds without hitting free BSD boxes from the root name servers To many of the ISPs that host websites to large-scale hosting sites But also routers on the internet Juniper routers run Juno s which is based on free BSD You also find things like anti-spam appliances from Cisco Mac OS 10 contains significant parts of the free BSD kernel and user space Vx works many versions include a free BSD based IP stack the net app operating system runs free BSD Their new version of on tap GX Uses a free BSD kernel and free BSD user space along with their own parts obviously So you find free BSD in a lot of places Free BSD is also thinking back to the previous talk fairly widely used in the media and entertainment industry We don't run so many render farms although the matrix was rendered on free BSD running Linux render man binaries But we do run the storage clusters that many of these places brought it used for example technicolor Uses a product from Isilon a storage cluster to manage their video editing and video processing and that is based on free BSD Avid which is also widely used in the media industry for TV editing and video editing uses a free BSD back-end for the storage clusters So you find us in the same sorts of places although maybe in slightly different kinds of ways And as I said, it's very difficult to get very far without running into free BSD even if you don't see it What is free BSD? Well, it is a complete integrated Unix system. We have all the good stuff, right? We have the multi-processing multi-threaded fully preemptive kernel We run on a number of hardware platforms They don't quite run on everything in the world, but those that we do run on we run on quite well We provide all the normal programming interfaces Last I checked when I did a graph last night. It seemed to be about 17,700 third-party packages We're also the reference implementation for many network protocols including STP So we have involvements in the standard community. STP is the most recent of these to turn up in our tree One of the three reasons that sites and pick free BSD especially to build products on is that we have a fully unified build system You type in one command it builds the whole system using consistent make parts throughout the entire operating system We also have extensive documentation. So these are I think attributes you'll find For many open-source projects. They have to try and build the infrastructure Assemble lots of pieces from lots of places. So we produce a lot of software We also consume a lot of software and a little later in the talk I'll show you some of the relationships we have in the open-source community that allowed us to build a whole operating system The previous project is of course distinct from the free BSD software I told you before that I draw a distinction in my mind between open-source software and open-source projects The projects are these social institutions that produce software in a sustainable way Free BSD project is largely an online community. We have developers scattered around the world and I'll point at some of them in a little bit But we center our development on revision control the free BSD CVS repository has ten years of history going back And then there are previous CVS repositories and RCS repositories going back into the 1980s We really live in revision control and this is not true of all open-source projects But it is increasingly true that revision control was sort of the heart and soul of the project We really do everything in revision control. We have an online community made up of a great many mailing lists And I'll talk a little bit more about that We have 340 CVS committers and these are people who we grant direct commit access to our source repository But we also have thousands literally thousands of active contributors submitting patches on a daily basis And I'll talk a little bit about the people who are both in our immediate CVS committing community But also the larger development community And I think there's a scalability aspect there both of how you involve people in revision control But also where you bring in changes from but it's quite interesting One of the things that does distinguish free BSD from say Linux is the license We use the Berkeley license which is very similar to some of the other licenses MIT license the Carnegie Mellon license and so on It basically says please don't sue us You might consider giving us credit, but really that's about it There's no obligation to return this your changes to us What's important about the way we've structured the free BSD community is that people return changes even though there is no license Obligation to do so and that's part of what makes these communities successful and even in communities where you have a license like the GPL there's a big difference between people making a tarball available on the website of their company or possibly not making a tarball available on the website of their company and People bringing the changes back to the community getting involved and trying to merge the changes back in And it's the community aspect that's important I think the license is almost Ancillary to the point that if you want to get the changes back into your software You have to have involvement of the consumers of the system And so I'll talk about how we convince people to do that and some of the cases where we succeeded Talk briefly about the legal and organizational infrastructure of the project when the free BSD project Was first created in the early 1990s on the whole open-source projects didn't really exist and those that did exist Didn't really have non-profit foundations associated with them today. Everybody has a non-profit foundation But when the free BSD foundation was created It wasn't clear what the model was by which you associate a foundation with a project Is the foundation the same as the project is everyone who is a member of the project a member of the board of the Foundation or a voting participant of the foundation are their legal agreements in place between the foundation and the project and we made an Intentional decision to separate our foundation from our project One of our concerns was the foundation might become a target for litigation that people might say look Free BSD is cool pieces offer. Hey, there's a foundation. Maybe they have some money We'll sue the foundation and if you make your foundation separate from your project That actually means your project can persevere even if your foundation goes away It also wasn't clear whether there was really all that much used to having a foundation We wanted an organization to provide some legal guarantees and structure and so on but it wasn't clear in the long term of people Would give a foundation money for example it turns out they do give the foundation money in fact last year We raised more money than we've raised in any previous year in our history, which is which is quite nice But the foundation exists and it happens it basically operates somewhat independently There's a lot of communication between the foundation and the project But we don't unlike for example the Apache software foundation require members of the project to be members of the foundation So there's an interesting structural setup there and of course I should make a plug consider giving the free BC foundation some money Even if you don't use our software we're really very nice people What do we produce again the distinction between a project and a piece of software? Obviously, we do produce software We produced a lot of software and a lot of people use our software, but software isn't very useful by itself When we produce a tarball or an ISO image of previous D we don't just stop there We continue to support it after we've released the software we release security updates We go through an extensive release engineering process most releases take six to twelve months to actually go through the release engineering process Excluding development. That's the time to go through the testing process get it deployed and evaluated through betas and alphas and Release candidates and so on we also produce a lot of documentation and documentation is really key We have extensive man pages and online web pages. I'll talk a little bit more about that We also have a community of people who help support the software and debug the software We have a mailing list you can email with any question You might have about free BSD and probably the more ridiculous the question the less useful answer But there were a lot of useful answers and there's a community of people who literally just sit on that mailing list Answering questions, which is quite important And finally we have user events and I'm pretty pleased to be able to include foster and an event that free BSD participates in But we have a lot of conferences were involved in and I'll talk to you a little bit more about how that happens We also consume things opens our software and even free software does not happen for free We have a particular interest in beverages. I think Philip organized an excellent event last night And obviously we would appreciate more events along those lines But we do like other things too. We have a particularly tangible interest in donated hardware We believe firmly that servers belong in racks Preferably servers should require multiple racks We like it particularly when they come with hands people to help manage the servers I'll talk to you a little bit about in a bit our clusters of machines Located around the planet which are largely donated by organizations who use free BSD and also managed by them and management is very important Software people perhaps not always so strong when it comes to managing pieces of hardware We consume bandwidth in literally untold quantities. We have a Co-location with yahoo for example in their data center in the silicon valley and we use the vast quantity of bandwidth there We also have co-location with the ISC internet software consortium and elsewhere We the previously project set a number of records for bandwidth consumed by FTP servers in its early history And we probably consume quite a significant amount still Previously doesn't happen for free while many of the people who are involved in the project are volunteers and take a personal interest in The project we also have a lot of people who are employed full-time by free BSD consumers to work on free BSD from companies like juniper and net App to the yahoo is in various of the world There are a lot of people who are paid to work on free BSD and that is part of what makes it so successful And finally we really like it when people tell us that they like us We prefer not to receive flames We do like it when people write of us about us in the media and say we're good people And this is not unique to the free BSD project I think it's fairly obvious that in the open-source world people thrive on reports of success and people using their software There is nothing more satisfying than hearing from someone who uses your software and tells you how wonderful it is And if they do you tell you that there are problems. It's even nicer when they send patches so So let me tell you about some of our people and our processes and again these areas where you should be able to spot similarities and congruences with other open-source projects. I Mentioned to you free BSD committers Well committers are in the most literal sense people who can commit to the free BSD software repository But there's a lot more to it than that We actually don't give commit rights to arbitrary people. We don't just hand out commit bits on the street We actually don't hand out commit bits to people who just submit patches even We expect people who work on free BSD to be technically extremely competent but also involved in the community The point is that it's a community developed piece of software So you have to be able to work effectively in that community and communicate effectively So there are a large number of people who produce patches against free BSD who aren't committed not because we don't like them But perhaps because they haven't yet figured out how best to work in the community Or they're still building a name for themselves in the community and that's community aspect is very important When we participated for the first time in Google's summer of code program a few years ago We observed something quite interesting about the way we structure our project and the way some of the other open-source Projects structure their projects and that's that we have a mentorship program We have a formal program by which we mentor new developers into the community When a new committer is invited to join the project, they're typically invited by an existing developer Perhaps a couple of developers and the commitment of that developer their involvement in Sponsoring a new developer in the project goes on beyond the point where they get commit access They become their mentor for a period of three months or six months or in some cases several years And the point of a mentor isn't to help the person understand the technical aspects of free BSD Although code review is very important It's actually to help them understand the social aspects to help the new developer figure out what you should do and what you Shouldn't do what the expectations are who you should talk to what things are going to tread on people's toes And what aren't and this is very important to having a self-perpetuating community You're passing on an understanding over time of the informal and social aspects of the project and not just the technical aspects It is not sufficient to turn up with good patches There's a lot more to interacting with hundreds or thousands of people largely via mailing lists and obviously mailing lists aren't the best vehicle for communicating So the process for bringing a new member of the community is a potential mentor turns up with a potential committer They go to the free BSD core team or one of its delegates and I'll talk more about what that means And then they basically ask them can we bring this person on and they create a proposal Which describes the background of the person their interests what their contributions have been and they've been evaluated by the core team Or its delegates who will usually vote in order to decide can the member join the community? And then that developer who sponsored them becomes their mentor for the foreseeable future So who are the committers that we have I did a survey of committers in 2007 to see who was actually committing to the previous project I found that we had developers in 34 countries six continents. We have no full-time developers in Antarctica We'd love to change that if you live in Antarctica you want to work on free BSD do give us call They vary in ages the previous project is quite an old project our mean age is 32 years Quite a few open-source projects have younger mean ages. I would guess But often we find free BSD developers come to the project after they finish their undergraduate degree Maybe after they finished a master's degree who have worked for several years in industry And I'll show you a graph that sort of talks a little bit more about that But we have developers who are everyone from professional programmers to university professors We have students we have consultants we have a range of people and these developers often work with free BSD across several jobs They take their skillset with them when they go to new companies They tend to work at companies that use free BSD they tend to try to teach other people to use free BSD So there's sort of a commitment a personal commitment to the project in most cases And I think many of those free BSD developers have found that is quite good for their careers as well It turns out there's a very high demand for free BSD come developers in the Bay Area especially Here's a map show you where some of the cost is are you notice that Europe is this big red blotch There are a lot of free BSD developers in Europe. We think this is really great We also obviously have a lot on the west coast of the US or the east coast of the US in India We have a big community in Japan Unfortunately, all the dots appear right on top of each other But we have a lot of participants there increasingly in China and in Australia So we see free BSD in use in a lot of places. I'll show you this map again in a little bit with a few variations I told you we're sort of old project You know, we have a median and mean sort of in their mid 30s. I was one bad year there I'm not sure what happened there Think about that one We have a nice blip over here on the left end This blip is actually a result of the Google summer of code program Which helps us to bring undergraduates into the community and in some cases younger than undergraduates in a way that otherwise We tend not to do usually people do come to free BSD a bit later and obviously stick with it I think the tail end here you can see people who've been working on BSD for a rather long time Something I haven't told you is that we have different kinds of committers that work on different aspects of the project We don't just have developers in a generic sense Many of the people who work on free BSD work on our base operating system They work on the kernel they work on the library is the command light tools things like that But we also have a very large community ports community These are people who adapt third-party software to run on free BSD and as I'll tell you in a little bit There are actually a lot more people to do than these 163 people. These are the people who commit to the ports Directly we also have all these developers who submit change as well and we have a substantial documentation project and there's some overlap between these In fact, one of our most prolific s&p developers started off life Working on documenting the kernel as opposed to using the kernel. So people do move around inside those groups Talk about governance very briefly most open source projects do have some form of governance structure Usually early in the life cycle of an open source project It's the person who founded the project or maybe a couple of people who founded the project But the previous project has been going on for a long time And not all the founders are actually working on free BSD still in some cases We've had people who worked on BSD at the University of California come back to the project and take on in some cases leadership roles And some cases not Over time the free BSD project moved from having a self-selected core team in which Occasionally the leadership of the project would say ah this person seems to be doing a lot Let's invite them on to the core team to an elected model in which developers elect from among their number a small number of people To be involved in the administration of the project Historically core team meant these are the people who produce most of the code And it's true today that people who are on the core team frequently do produce a lot of code because that's one of the ways In which they show their involvement in the project and oftentimes developers will vote for people who produce a lot of code on the basis That this is one of the measures of the confidence that they produce really good code Surely they go to running the project Perhaps less so but So we moved gradually to this this elected model We also have a core secretary Philip is our core secretary who is actually responsible for making the core team work When we came up with this model of an elected core team We didn't actually assign it any responsibilities because it wasn't clear what the responsibilities of a core team were the core team does Officially approve new developers But other than that has no official responsibilities in practice it actually does a lot of stuff These tend to fall into a couple of areas it tends to be involved in the daily administration of the project the paperwork of the project if you will Approving groups and official bodies inside the project to have particular rights or authority inside the project We take a strategic role in trying to direct the project in as much as you can direct 340 volunteers and try to get them to all go in the same direction that works with varying degrees of success But also coordinating the project activities with other projects We also involved in the rules of the project and I think every time you have a social institution That's sort of several hundred people larger. You have to have rules. You can't just work informally all the time We have rules about software maintenance yet. We have rules about review We have rules that have to do with things like our release engineering process One of our rules is that when we're in a release freeze only the release engineering team can approve changes to the software repository So the core team is involved in vetting some of those rules and helping to determine who gets to make rules And who follows rules and once in a while how you deal with people not following the rules I do talk a little bit about conflict resolution later on I think conflict resolution is a very important role for the free B.C. Court team to play any time you have a large social structure You have large numbers of people working together. There are inevitable conflicts and those have to be resolved somehow Ports commit is a little more about them. I told you that there were a lot of people who maintain software on free B.C.D This is not part of the base operating system, but it's things that are very useful to have like an X server Life is somewhat boring without an X server or a patchy So we have a number of committers who are involved in maintaining a very large Essentially a database if you will of third-party applications and how to adapt them to free B.C.D I took a look at our CVS repository last night And I found that we had 250,000 files in revision control of which maybe 190,000 of them were in the ports tree So this is a very large project in of itself maintaining all these third-party applications running on free B.C.D And sometimes this involves large patches. Sometimes the patches go back to the original software maintainers I'd like to think it was most of the time. I suspect it's probably not most of the time But it is a very substantial project Each port has a maintainer. These people are not always committers. In fact, I guess only about one in ten is actually a committer So many of these changes are funneled through a small number of people into revision control And that gives you a sense of the structure in which many people develop parts of free B.C.D And submit changes back and then they get funneled into revision control in part through a vetting process But in part simply because we have people who work with a large number of ports or a large number of system components People like org charts. I tried to draw an org chart for the free B.C.D project. It didn't really work But if you look at this picture, there is an important difference between an org chart for a company and an org chart for an open Source project and especially a volunteer-driven open source project It's not all open source projects are volunteer-driven in a classic org chart The people at the top have all the power and all the money and the people at the bottom And they're lucky if they get paid and the money flows down and the authority flows down because the people at the top say Yes, you must you should go do this in an open source project a volunteer project This is not how it works at all Authority flows up the stack the people at the top get to make decisions and their decisions get to stick Because the people at the bottom of the project agree to go along with it And this is part of this is having an elected core team where people delegate as it were Administrative right, but part of this is sort of implicit to the structure of a project If you don't run the project in a reasonable way where people get listened to then those people go away So if you want to keep growing your community You have to have a social structure that allows this to happen and one of the results is this authority that goes up So some of these teams and hats here like our security officer our release engineer and the release engineering team and so on Exists because there are activities that require more authority in the tree for example the ability to say okay Nobody can commit anymore or when somebody says nobody can commit anymore certain people can commit or override for example Maintain a ship so our security officer and our release engineering team need these special rights And the way they get them is that authority flows up the tree to the core team And then flows down the tree through a series of chartered organizations and each of these organizations or many of them Anyway submitted a charter to the core team that was then approved by the core team in order to structure their activities We have a lot of these such organizations some of these are chartered and some aren't I won't attempt to go through them Or I'll sort of pick out a couple of them everything from people who run our website to people who do testing Our summer of code mentors people who mentor students participating in the project for summer projects We have a lot of third-party projects. We interact with who give us code We give them back code code moves in both directions People who do things like release engineering a lot of people work on the project It's too large a project for everyone to sit on one mailing list and do all the work So we have a lot of mailing lists So this is a somewhat complicated picture, but this is the simplified version I tried to think about how does the previous the project sit in this big open-source ecosystem? I mean obviously we're interested in open-source projects beyond just free BST You can write a whole operating system yourself, but occasionally it's nice to get some pieces from other people like compiler suites so where does free BST fit in well a Lot of the original free BST code came from Berkeley and so we have a big line there We also generate code that a lot of other people use for example macOS Hymns Darwin operating system includes a lot of free BST code Significant parts of the kernel a lot of user-based libraries and so on we have projects other projects that are purely open source spin-off of Free BST such as PC BST who take free BST releases and bundle them up with a neat packaging tool Provide nice user interface parts and so on there's a talk on PC BST later. I encourage you to attend We also have people who take it and stick it into the more appliance like structures like PF sense And this isn't unique to the open source. Well, this is also closed-source. We rely a lot on the free software foundations compiler chain for example Code also moves back and forth between us and open Solaris who recently picked up ZFS So free BST 7.0, which is coming out shortly. This is what ZFS out of the box, which is very nice And some code even moves back and forth with the Linux kernel So we see a lot of code to move around we also have open BST and net BST Which are also spin-offs off of original Berkeley code a lot of code moves there as well And obviously yeah lots of lines, I guess So this is kind of the open-source operating system picture and if you let us stuff on top of it You know free BST isn't just an art You know a target itself It's also a platform for a lot of other work then the picture will be even bigger. You fit x-org into it. I Said we do a lot of work on mailing us We have over a hundred active mailing lists and this is where the bulk of the work of the project happens We try to get together in person sometimes, but many many things are done online On the whole these are intentionally public mailing lists There are places where anyone can participate who wants to from the user community To people who submit patches or people who work on other operating systems turn up on our lists and discuss things with us Which I think is actually really great. We do have a few private lists for example our security offer mailing lists We are in essence a public organization, but there are times when you need to have private conversations So undisclosed vulnerabilities are certainly one case, but another one is conflict resolution It's really hard to get two people who vehemently disagree with each other to agree in public First you have to kind of get to talk to each other a little bit in private Maybe kind of find some common ground consensus maybe you know Maybe somebody was wrong after all try doing that on a public mailing list where a hundred people will join the conversation very bad idea So private mailing lists do play an important role on the project. I Said we had a web presence while we do we have a lot of websites I think we've been a little slow in picking up the forum the sort of web forum approach communications We're very much sort of caught in this world of mailing lists I think we're gradually beginning to have some impact there some of our spin-off projects such as PCvc do make effective use of web forums I think I would claim the previous the project itself doesn't yet do that Of course the free busy project grew up with the web and in fact is used to host a great many websites, so Would you try to get together we participate in conferences? I should add four stem to the list of conferences we participate in the most recent new BSD conference They seem to be a new one happening every year is BSD Kentucky. We had the first one in October of last year in Istanbul We also have developer summits which are sort of intended just for the developer community And they're often they coincide with these conferences since you have everyone in the same place at the same time anyway Our big conference developer summit every year is BSD can which occurs in May in Ottawa, Canada But we also have them scattered around the world and these can be anything from ten developers in a room to 60 developers in a room sitting around talking to each other and hacking code and doing all those other things the developers do It doesn't involve occasionally going out to eat and things like that This is Pavel by the way with this ZFS presentation last year. He observed that it uses a lot of memory Here are the kinds of things we talk about at these developer summits They are pretty intensely technical events although they obviously have a social aspect And I think it's interesting to look down this list because you'll see that not only we write a lot of code For example security frameworks ports new architectures things like support for virtual access points in 802 11 Multi-period networks that kind of thing, but we also bring in code from other places for example Support for then detrace coming in from open Solaris. We've been working very closely with co verity for a number of years They help us run static analysis We are I think the only open source project making use of the extend tool not just the prevent tool extend is an extensible Software analysis tool for static analysis. You can create your own invariance and test them So we had co verity out at our developer summit last year in May doing training for developers who wanted to learn how to use extend I'll talk briefly about the development model and I think this will look familiar to people who work in open source software We have a development head where the majority of the really disruptive work takes place and every now and then we spin off branches Which are what we actually cut releases off of so previously those major releases and minor releases We do major releases maybe every 18 to 24 months occasionally. It's a bit longer our 325 release took a really long time But the last few have been a lot quicker in part because we've tried to move to a time-based release schedule When we're ready to do a new major relief We spin off a branch and then all the minor releases come off of that But we also create branches for each individual release. These are where our security patches live a ratter and so on things Well, we we fix critical bugs after the software has been released and provide binary updates all done in revision control We have this notion of something called an MFC. There are two ways that major new features end up in a previously release They can start out in the head and then get spun off at some point into a particular branch for the first if you're in a Dotto release or there can be this merge from current where a change is brought back from the head of the tree into a particular branch when It's considered to be mature This means that really major features which you can't just merge because they're complicated They have interactions are guaranteed eventually to hit a major release a Dotto release But minor features can be moved back and are for example lots of driver changes new file systems things like that things where you're not Disrupting the infrastructure of the software But you're plugging neatly into sort of compartments and spots where you can plug things in new storage transforms are easy to merge Versus fundamental changes in the network stack and memory allocators are very difficult to merge Previously 7.0 is coming out really soon Ideally maybe next week the week after I hear they're building releases now We have all kinds of cool stuff. We spent a lot of time working on multi-processed scalability It's quite exciting news the new malloc that we have a highly scalable malloc has recently been picked up by the Firefox project They plan to use our malloc on other operating systems than just previously because it works particularly well with respect to memory fragmentation I was very excited to hear about this But we have a lot of other things going on fine-grained locking the kernel new schedulers threading libraries File systems be picked up the ZFS file system I think this is a good example of us being able to pull in software from other operating systems and make effective use of it We also see code move in the other direction. I believe open-slice now uses our 802.11 stack It's fun stuff going on in networking. I mentioned the SCTP reference implementation But we've also been doing a lot of work for 10 gigabits optimization Chris Genoway has a talk he gives on free BST 7.0 in the BST and Postgres track and I would encourage you to take a look at that It sounds pretty exciting I'll talk a little bit more about revision control. I said revision control is kind of the heart and mind of the project This is where everything real goes All authoritative project activity is in CVS our software our webpages It's all managed in revision control. We ran into a problem though, which is that CVS Excellent piece of software doesn't quite scale the way our software development practices do. We actually have four CVS repositories now And as I mentioned earlier 250,000 files across all of them We do run into the limitations of CVS on a regular basis. I'll talk to you more about that in a second What's quite neat is having a 10-year revision history Right there so you can do CVS log and CVS GIF and go back and see why things happen the way they did And now that a lot of lawsuits and other things have been resolved We're quite pleased that the original CSRG history of BSC development is now also available So you can actually go back and look at the original development of some of these pieces offered in the 1980s, which is which is quite exciting We also use Perforce so I said authoritative work happens in CVS well CVS doesn't scale in a number of ways In particular, it's really hard to branch a source tree that has 100,000 files in it because in CVS That is a linear time operation on the number of files So you have to go through and you have to touch every file in the CVS repository So we try really hard not to branch and not to tag we do it for releases and basically for nothing else Perforce gives us a sandbox with very lightweight branching where developers can log in and create as many branches as they want to Within reason over time and then use it as a place where they can collaborate It also gives us a place where we can have guest accounts because it has very fine-grained access control So we've used Perforce for a lot of project development on the side Everything from our SMP support to new security features support for superpages All of our summer of code students are set up with sandboxes inside our Perforce repository So that they can branch from the main free BSC CVS tree into Perforce and then do their work there and expose it more in the community Recent work includes ZFS support for AFS and things like that So it's where a lot of the active development happens This has led to an interesting change in the way that we use CVS So in CVS because branching was very expensive to the point of not really being possible We found that our development head was very unstable over time It meant that when you brought in one unstable feature and you sort of waited for it to stabilize a bit But then another unstable feature is came in and if you wait too long between releases You don't take a break every now and then you reach a point of instability where the code you have is actually not very usable Which puts all the people who are using the current branch at risk of not be able to make progress on development What Perforce lets us do by having heavily branched development and I trust I'm preaching to the choir in this sense Is allowed you to isolate high-risk development and merge it in a sensible way back into the main tree And as I understand it Perforce was first used with free BSC as part of the CAM SCSI project where the work was then independently and then brought in to replace the existing SCSI layer and we now use this I think pretty much for everything So here's a simplified picture The free BSC current branch runs along the center and what we see is code flying out into Perforce branches and then flying back And sometimes there are Perforce branches off Perforce branches for example We have a trusted BSC project which has many other projects hung off of it In fact, we have an SCB SC project which has a port of flask and type enforcement from SCLinux to run on free BSC Is branched off of our MAC work which is off of our trusted BSC work off of our current branch So there's a lot of sort of depth to this development model I think in some ways I mean Perforce is an excellent piece of software It really works very well, but the use of Perforce is really a symptom of a problem We have with CVS and that's that CVS is an excellent piece of revision control system written a long time ago It doesn't have all sorts of things that we now expect from a modern revision control system It's starting with change sets and but also things like lightweight marching Three-way merging history aware merging when you're maintaining a long work in progress something that takes years to develop CVS is not the tool for you if you're tracking someone else's code because it provides you with no automated tools To merge the changes made by a third party every few years. We kind of look at what the options look like So far we've not found something that would work for us There were a number of features that are missing from revision control systems and perhaps people in the audience who work on revision control systems Would love to take a look at this we would love to be able to move back to one revision control system that does everything we need But we particularly run into problems with scaling and the need for obliteration Obliteration is an interesting point Sometimes I've conversations with a revision control people who say the point of a revision control system is that it never loses anything You've ever done and there are times when you want to lose things that you've done Those times include lawyers when they come to you with letters and the letters say you're infringing our trademark You must cease all distribution of the following thing for example boggle which appeared in the BSD So was tree coming out of Berkeley and was later removed from free BSD if it's in your revision control system You are still distributing it you are still giving it to other people because they could check out the previous version Therefore you are infringing so you must be able to remove something and all of its history from the revision control system This is a practical requirement must be met by a revision control system CVS doesn't do it But we can make it do it per force explicitly supports it But many of the next generation revision control systems don't support obliteration by design They include cryptographic caches that link previous revisions to the next revision Right now most of those tools only include a way to go back and put a little annotation with a signature that says something like There were some changes here, but they're gone now. That's okay, and that's really what we need It has to be best effort. We don't have to do nothing perfect But we need something that does kind of what the lawyers ask for when they ask for it And there are other reasons than lawyers, but lawyers are particularly motivating reason if you run a large open-source project And around the board of their foundation Move on to revision control. I told you we have clusters scattered around the world I guess we really have sort of five main sites at this point although there's obviously there's lots of other stuff going on Most of these places and places that consume free BSD for example ISC Uses previous you on group name servers and other sorts of name servers Yahoo makes extensive use of free BSD Yahoo's where we host most of the project infrastructure our CVS repositories We have a net app filer. We have all sorts of stuff going on there We have a net perf cluster and security development cluster This is where we prepare security advisories. We do binary build update build things like that We also do 10 gigabits network optimization We have an fpp server in Denmark and then in Japan. We have hosting of large-scale SNP systems One of the things we're in the process of doing is exploring providing failover from the previous you do all cluster In yahoo to our syntax cluster I don't need to failover of all active services But we want an off-site backup at all times because things happen like earthquakes And we'd rather not be susceptible to that and as a large organization We have to think about things like disaster recovery the sorts of things that companies often have to deal with because they have very large infrastructures our developers are neatly scattered around the world But a lot of our electronic resources are in a very small number of places and we need to try and manage that risk Last thing I'm going to talk to you about is conflict resolution. I mentioned this is a hard issue So in a large open-source community Inevitably there are people who disagree with each other and one of the neat things about open-source developers is that they're very independent minded They like to go off and do things the right way and sometimes they disagree on what the right way is Sometimes these are technical disagreements and technical disagreements are actually fairly easy to resolve They're the kind of disagreements where you're not quite sure is this the right way to do is that where the right way to do it? And somebody can turn up and mediate they can say well They both have their merits, but maybe this one is in more in our interest the long term Or can we take elements of both of these and combine them? Those are the easy ones the hard ones of personality conflicts or communication conflicts I said we had people in I think 34 countries around the world Not all of them speak the same language This is shocking news and obviously communications issues are one of the major issues is very easy When you're a non-native speaker of a language to say things that come across as brusk and sometimes you're saying them bruskly in your own language But sometimes they just come across as brusk and those are actually really hard to deal with if you have two non-native speakers of language Communicating in the same in that non-native language in order to try and reach an agreement Sometimes these personality disagreements can't be resolved and people go their respective ways Sometimes people leave the project because they can't get along with the people who were there Maybe this is a good thing because the people who are left apparently can get along and getting along is really important when you have a large project But often is not these things can be resolved, but it requires personal intervention of someone in the project It requires someone on the core team to sit down with the two participants in the disagreement talk to each of them Independently and then try to get them talking to each other and figure out what's going wrong because often is not their disagreements that can be gotten over A fundamental misunderstanding early on led to problems and they can't always be resolved But mediation is an important thing that the FreeBSD projects core team gets involved in Okay, so what I've done is I've tried to give you an introduction to the structure of the FreeBSD project But in a way that sort of it might have explains and aspects of other open-source projects And there are gonna be similarities and differences. This was not a software talk, but obviously we're interested in software I think FreeBSD is one of the most successful open-source projects out there I don't think I'd make any claim it's the most successful open-source project And I'm not sure how you would measure success in a really sort of concrete sense other than say that having a large deployed user base Having an active and sustainable community that is growing over time that has new people joining on it And has a model for growing over time. Is it important way to define success in open-source. It's not just about having good code I think we do have good code. Maybe, you know, that's a personal perspective But people wouldn't come to the project if they didn't like the code, but it is not just about code There was a lot more to it So I guess I'd invite you to take a look at FreeBSD if you hadn't and consider participating in our project But I guess it will be also be interesting to hear from people in the audience who have open-source projects They run or involved in whose models differ from FreeBSD and why that is I think FreeBSD model works very well and we've seen some other projects pick up that model Especially things like mentorship, which I think is a really good thing I think programs like Google Summer of Code program have forced projects to adopt a mentorship approach And I think that's part of what you need to have to have a sustainable open-source project Guys, thank you very much. I'm happy to take questions for a few minutes And I think we have a microphone if anyone wants to ask questions Hi What were your measurements to assure the independence of the project? I'm sorry, I have a little trouble hearing you So what were the measurements that FreeBSD core developers and the foundation have taken to assure the independence of the project Yeah, that's an interesting question because a lot of our development is sponsored by corporate sponsors There is a constant question of how do you maintain a difference between the open-source project and the people who do the work on the Project who are doing it for corporate purposes And we've had I think it'll be accurate to say an active debate on how you deal with corporate sponsored work especially Do people have a right to bring things in because they were paid to do it but may not maintain it in the long term? And I think there's a constant tension there Having a foundation is an important way to maintain independence because it gives us our own legal entity Likewise, I showed you the map of clusters around the world making sure we have redundancy over time that we're not all in one place Is a very important aspect to making sure the project is preserved We also take other legal steps Making sure we review new licenses as they come into the tree and so on I think there's a real part of the picture, but if you look at the FreeBSD developer community Often there's not people who are much more loyal to the project than they are to their current employer Which from an open-source perspective I think is a very good thing Okay, thank you Any other questions? Sorry, I'm coming a little trouble here and quiet please What steps has the FreeBSD project taken to try and Ensure whole community cohesion because after a project gets above a certain size it's difficult to have everybody on the one mailing list and And without communication with everybody it's it's hard to have a single community Do you just find that it ends up in lots of small communities which collaborates sporadically or have you been able to try and Hold together that whole project sense and if so how have you done it? I think that's actually a really fascinating question and a really fascinating problem And I think we've struggled a little bit to try and figure out how to address that In particular, I think it's important to get people together outside of the mailing list in order to build that sense of community And that's part of why conferences are so important and a lot of the FreeBSD foundation budget Goes towards paying to fly FreeBSD developers to workshops and conferences so they can meet in the same place We do have a single mailing list that has all FreeBSD developers on it But obviously when you have an audience of 400 people you can't talk about everything on it So we try to discourage people from talking about things that are potentially public on a private mailing list There are a few mailing lists which do have a lot of subscribers the FreeBSD current mailing list I think gets most source developers on it and then for example the ports mailing list gets most ports developers I think I'd say we do struggle a little bit to try and solve that problem And we do have communities that become independent and sometimes that results in conflicts But the role of the core team is to try and spot those in advance And one of the techniques we use is once we have an elected core team We try to make sure that core team members are involved to each one of the independent projects in as much as possible And that way the release engineering team for example I'm on the release engineering team and I'm on the core team And we'd like to think that that makes communications a little bit better between those compartments Can everyone sit quietly for another five minutes because it's very difficult for Questions and answers to get across so if you're moving tip-toes really quietly five minutes roughly ish Philip can be very threatening when he tries That's why I'm working secretary. Yeah, yes I'm sorry, can you hear me now? Yes, I can thank you. Oh Now I lost you again. I Can only hear you when you say can you hear me now? Hello. Oh, that's great. I'm working for a company named g.ho.st It's a virtual computer on the web. You can access it This is actually our URL also and they currently our source is all closed and The reason I'm here is to learn how to make it open. We'd like to take a several practical steps to open the entire source and we will start with something and your you have a extensive experience, so I would like to take your advice. What would you advise as a First practical step to take We the foundation works with a lot of companies who use free BSD and I listed some of the companies that do use free BSD on the first slide Not all those companies have an easy relationship with free BSD We found that a lot of people who built products on free BSD Maybe the free BSD 3 4 auto time ranges had a lot of trouble getting forward to our free BSD 6 development branch It was a period of moderate instability about when the dot-com crash happened A lot of free BSD developers who've been working full-time on free BSD had to go work on other stuff They lost their job sometimes that helped free BSD but more often than not it hurt free BSD And as a result of our free BSD 5 development process really stretched out for a long time There's a long period of instability before we sort of got back to the point We were saying okay now the free BSD 5 branch is ready to use So we had a lot of users who were stuck on free BSD 4 and in the last year or two Many of them have now been getting forward to previous D6 for their products And they found that was very difficult because of course, you know five years had passed It's amazing how much code in an operating system can change in five years You know the architectures we developed for have changed fundamentally in five years Multi-processing is now an embedded feature right as opposed to a feature for supercomputers And so suddenly all these embedded companies need to run on many core on processes or or MIPS processes or whatever The biggest advice we gave to those companies was get involved in the community Before you try to give back your changes because having the social link to the community having a reputation in the community Is what helps you get the changes back in you have to enter the community with credibility The other piece of advice is that there is a long-term payoff for bringing your changes back Even though it takes a long time and it's expensive. It's not free to open source a piece of software Companies that open source their software have to consider intellectual property aspects They have to consider the developer time spent to isolate the changes They do want to open source from the ones that they don't They have to figure out how to work with community that is difficult if you are open sourcing a piece of software There's never been open source before it isn't tied to an existing community I think that's really challenging to do because you have to create not just the open source software But the open source community so your best bet maybe to find other communities of similar interest Where you can find a spot in that community and that will help get people involved and interested in your software If you're contributing back changes to an existing piece of open source software And it's applies to free BSE or anything else under any license that community Participation is critical because otherwise you end up creating a toggle putting it on your website and saying it's open source And that is not the same thing if you want it to be sustainable Dumping a toggle on the web page means nobody will use the software whereas helping to get your changes integrated back into whatever is upstream Whatever that may mean is a good way to get people involved some of the companies thought with free BSE I think yahoo is a particularly good example of this Make it part of their technical development model to merge changes back to free BSE to avoid divergence Other companies especially the ones jumping forward to six have recently become aware that it's very important to do this Because otherwise you end up four or five years behind with heavy customizations that really weren't part of your core product There were just things you had to do anyway And if you got those back into the base OS then your workload would would get much lighter to maintain So I think there is a there's a long term Benefit there for companies to do that, but often they have to make a short-term investment It's quite large which means paying your developers to sit on mailing lists and answer other people's questions about other people's software Which is sort of counterintuitive from a short-term business perspective. That's hard to justify up a line Anyway, I hope that's up. Can you hear me? Yeah Can you provide some advice to open source projects on contingency planning say if Microsoft were to make a hostile bid for a large What a large number of your developers and apparently some hosted servers Um, I'm for that in advance So I guess I guess if you work for a company that does open source stuff And you discover that you may be about to be acquired by a large company that doesn't do open source stuff The first thing I'd say is take a deep breath because sometimes open source makes money Right sometimes open source is the right way to do something and if the company that is acquiring you may not have a problem with your doing open source Obviously, it's very important to have the paperwork in place as your company is about to be acquired Make sure you've actually already open source all the things that you plan to all the teas to cross all the eyes are dotted Make sure copyright assignments are taken well into hand One of the things I encourage companies to do that work with previous D is to be very very clear on what they're doing with copyright And what they're doing with licensing make sure their lawyers know what's going on Make sure that there are signed pieces of paper in the hands of their employees saying this is what I want you to do It can be it can be an email. That's even fine, but it has to be explicit communication saying for somebody who is in charge To somebody who is doing the work. Yes, I want you to do this Don't assume that just because the management stack today is friendly that they won't be tomorrow that they will be tomorrow Make sure all the paperwork is ready, but at the same time I Wouldn't worry too much I think there is a business argument to be made for being involved in open source And at the end of the day it comes down to a business decision And if you can make an argument that open source makes your business run better Then you may well find that you thrive at a company that has never done open source before and I think that the open source Revolution in the last five or ten years has come about because companies who didn't realize they were using open source Discovered that it works really well and that it works better if they participate in those communities So I wouldn't give up hope Can you hear me? What if Microsoft decides to take the servers offline that your project sort of depends on it seems Well, there's only so much you can do if somebody else holds the cold for power cable and pulls it out If you're an open source project that depends on the resources of that company that company Then I would advise the open source project to have contingency plans And I think any scalable open source project should consider contingency plans in just the same way any scalable company does When you're a small company with two or three people, you know, and there's an earthquake your company goes out of business And that's life right you're a couple of people when an office in the office disappears Then your business is in trouble nothing you can do is going to change that But if you are a large-scale open source project with hundreds of developers around the world And you are seriously at risk that your single host is going to go away Then the obvious strategy is to find more hosts and to make sure you have plans to deal with it And I think you know the previously project is not alone in making contingency plans And I hope that other open source projects who maybe don't have them today Take it into account one of the things that we are in the process of formally doing and with something We've been working on for a while is coming up with formal contingency plans Not just okay We dump the servers in the back of the truck and we drive over to the other place But we have backup servers we have sites that have guaranteed that they'll make the bandwidth available We know that these are reliable places. We have data retained off-site one of the things about our development model I mentioned two revision control systems we use we actually use two and a half revision control systems We also use something called CVS up which mirrors CVS repositories We use it to get scalability so that if you have a hundred developers or hitting CVS at once or a thousand re-access doesn't crowd out write access We replicate our source code repository There must be thousands or millions of copies of it around the world And there's nothing quite like having a thousand copies of your CVS repositories gasses around the world for some data reliability I suspect we're running out of time. Do we have time for any more questions? Are we done? We're all done great. Well, thank you very much