 Thank you all for sticking around through to the end. I guess I'm in the unenviable position of being the only thing between you and your post-session beer or going home or whatever it is, so I'll try to be merciful. Anyway, those of you who have seen me talk before know that I tend to talk an awful lot about how well the kernel development process is working, how well things are going, and in fact it is going really well. We put out four to five releases every year, pretty much every 80 days just like clockwork. We get something like 10,000 changes into every single one of those releases. We've got over 1,000 developers participating in every single one of those release cycles. This is a big process that's working quite well. The results are showing up in everything from your toaster through to your supercomputer and everything in between. The whole thing works, and so I spent a lot of time talking about that, but I'm not going to do that today. I'm tired of that. It's boring. Instead, today we get to talk about when things don't work quite as well. Which is kind of fun. I have a few reasons for wanting to do that. One of which I forgot to make a slide for is the simple fact that development failures cost us developers that we can't afford to lose. Even with thousands of developers participating in our process, we're not so rich that we can afford to lose talented people. We never can do that. Beyond that, you see things in the news every now and then when something falls apart and then you see unfavorable articles, that sort of thing, this shows up. You see these kinds of headlines that pop up on reputable news sources. And so on. It makes us look bad, and I'd rather not have that. But the real reason for wanting to look at failures is that you learn from failure. That's how you really learn about how things are going on. Another quote from a very influential figure in our particular community, on yes, it's fine to celebrate success, but what really matters is to heed the lessons of failure. I'm sure that he has done this. Perhaps that's why he's out of the operating system business now. Anyway, more to the point, though, is this quote here. This comes from a book called The Science of the Artificial by Herbert Simon, who's a Nobel Prize winner in economics and all that. Here he was looking into the study of how the brain works and all that, and he was looking at how things fail. His way of putting it was that a bridge, when it works, is just a piece of road that you can drive over. You only learn about how it's built when it's been overloaded and something goes wrong. So if we want to learn more about how kernels are built and how the development process works, then we should look at how things go wrong. So that's what I intend to do here. But I want to just about to put out a quick note quickly because there are a lot of people in the kernel development community, some of whom are rather more serious than others. And I'm not going to be talking about the people who are just sort of there to be strange and so on. That's not what I'm after here. I'm going to have to name a bunch of names because I don't know how to talk about specific failures without doing that. But everybody whose name I put up here is going to be somebody that I respect. Somebody who is a welcome part of our community. I'm not putting anybody up to mock them at all because they're all people who I think are better than me at this stuff. So even this guy who's perhaps one of the more clownish of the bunch who points out that we're not failing, we're just finding ways not to solve the problem. So with that in mind, let's sort of hit the road and look at ways not to solve problems. Example number one is a file system called tux3. It came out back in 2008. Danny Phillips has been around our community for a long time, very smart guy, done a lot of stuff. And he came out in July of 2008 and said, I'm going to make a new file system. There was a lot of interest in new file systems then as there is now, trying to take us forward in the area of next generation file systems. So I've got all these great ideas, puts out his announcement, puts out his code. There were a lot of very interesting discussions that went on between him and other file system developers. By November of that year, he had it to the point where he could actually boot a Linux system with a root file system based on his tux3 file system. He had other people contributing code. It literally looked like the project was getting going. But if you look forward into the next year, things trickle off. And by August of 2009, the last commit goes in, which was some sort of a white space fix, something like that. And the whole thing just sort of died. The project is dead. The code's not in the kernel. Nobody is using this file system. And all the work that went into it is for not. So somebody saw this coming. This is Andrew Morton. I think of him as the number two kernel developer, if you like, who gave Daniel a warning saying, don't keep adding stuff to a project that's out of the mainline kernel tree because that just makes it harder to get it in when the time comes. So that warning went out, was not heard. So the project kept on developing outside of the mainline tree until Daniel lost interest. He got a job at a certain search engine company that I won't name, which is often the kiss of death for community contribution, unfortunately. And he went away and the project died. And he even acknowledged later on, yes, I should have listened to this. I should have gotten my code into the kernel when I was warned to do that. So the lesson from this is something that a lot of us have been saying for quite a while now. If you have code that is outside of the mainline project, it is essentially invisible. It doesn't have the attention. It doesn't have the momentum. It just doesn't have the activity around it that code does when it gets into the mainline kernel tree. If you've ever watched a bike race, you notice how they all ride together in a tight group altogether. If you've ever ridden in a group like that, you'll understand why. The group carries forward the air with it and you can ride at great speed with almost no effort within a group like this. As soon as you go outside of it, you're pushing against the wind by yourself and you have to work much harder to ride much more slowly than you do. The kernel process is really an awful lot like this. Code that's in the kernel gets carried along by the momentum of the kernel itself. It gets carried along by all the people who are focused on it. If you're outside of it, then you're against the wind by yourself. Lesson is really clear. Get your code into the mainline as quickly as you can. If you look at the development of the ButterFS file system, which was being developed at about the same time, Chris Mason put that file system into the kernel, even though it was nowhere near ready. It's still not considered to be ready. The pace of development at that point picked up and ButterFS is still a very strong project and will be the next generation file system that will all be running in the near future because he did that. If he had kept it outside, it would have been harder. I'll take that as an agreement. Moving on. The M28xx driver is a video for links driver. It's a webcam driver that actually was put in the mainline kernel back in 2005 by a guy named Marcus Rekberger. Over the course of the next couple of years, a whole lot of things happened and by the beginning of 2008, he was no longer contributing to that driver. Later on that year, he actually tried to replace it outright and that effort was rejected. In 2009, we saw the last patch from him in the kernel and anywhere in the kernel. We lost him from our developer community at this time. What went on was a whole lot of disagreement between Marcus and the higher level video for Linux maintainer and it was really best summarized by a message he sent out in the middle of this saying, company should be aware that if they submit code to you, they lose control over their work. It was an issue of control over whether Marcus had the absolute control over what went into that driver or whether others can contribute it and carry it forward and enhance it in ways that they saw fit. Here's another example. Back in May of 2004, Hans Reiser saw an attempt to modify the Reiser 3 file system. We're talking about Reiser 3. Chris Mason came along and added code to add support for access control lists and extended attributes to the Reiser 3 file system. This is a feature that you need to support things like the SE Linux security framework, that sort of stuff. Hans said, no, you cannot add that to my file system. I think it's supposed to be stable. I want people working on Reiser 4 instead. We'll talk about Reiser 4 later. So Hans lost that battle. The code went into Reiser 3, the enhancements were there, they're still there, they're being used. He was overridden with that. Linus described it this way. He's had to make this lesson a whole lot of times. If you maintain code, if you contributed it, you don't own it. If somebody else comes along with something that needs to be done to it, then you do not have the ability to control it. This is something that's true of the kernel, but it's really true of any free software project that merits the name. When you've contributed code, you've put it under a free license. You've put it out there for the community to work on. You have given up a certain degree of control. If things are going well at all, others will come along and work on it. They will improve it. They will make it better. Once you put it into the kernel, you've turned it loose. You have to let it fly. In my mind, this is not a downside of contributing code. This is one of the most beautiful things that there is, that you can put some code out there and watch it get better and you don't have to do it. I think that's great. But if you want to maintain control over that code, then you should really just hang on to it because that's just not compatible with how free software works. Back in 2002, this is at the beginning of the 2.5 development kernel series. The position of maintainer for the IDE disk subsystem was actually vacant at that time. We had no maintainer, but it was really a pretty critical piece of code because we actually still had IDE disks in those days. There's no maintainer because that code was widely held to have driven insane everybody who tried to take control of it over the years, and so people kind of went off after a while. So Martin Dellecky shows up. This guy shows up and posts a patch that says, here's a bunch of cleanups for the IDE subsystem that goes in. Within a few weeks, he's up to the 18th set of cleanup patches. These are fairly significant restructuring patches at this point. And he set himself up as the maintainer of the IDE subsystem. He continued to send in more and more patches, which were being merged by Linus, until by August of that year, he was up to number 115. This is a whole lot of patches all going in there. During this time, he was invited to the kernel summit to represent that work there and so on. Things really seemed to be on a roll there. One week after number 115, he quit the kernel development process entirely, and all that code was ripped out. The IDE code was put back to where it was at the beginning of the 2.5 development series. All that work, his work and the work that everybody else put into helping make it work, was lost and went away. It was a major loss of a developer and his time and his work. So anybody who was actually running 2.5 development kernels during those days knows what happened here, right? The IDE subsystem was highly unreliable during this time. In fact, it was considered that if you wanted to run these kernels, you were really best off using scuzzy disks during that time. He described it this way when some people questioned him on it. He said, well, breakage is the price you have to pay for advancements. There was perhaps more breakage than advancements. He was really trying to carry it forward, but he did it with a sort of scorched earth policy that really made the kernel unusable for people and took things backwards. And the lesson is clear, right? Don't do that. This lesson has become much more clear in the time since then. Code that breaks the subsystem for months at a time would just not be tolerated now because we have adopted a policy that is very strongly against regressions. When you are evolving a piece of code as quickly as the kernel is changing, you really have to be careful to ensure that you are not going backwards in terms of quality. And knowing that you're not going backwards is actually very hard. How do you measure the quality of a kernel? It's not just a number you can pick out. It's not just a metric you can have. But one thing that you can do is you can insist that a kernel that works for people at one point continues to work going forward. If you don't allow things to go backwards, then you should be creating kernels that are getting better over time. If you don't do that at this point, if you break things, your code is likely to come out within a week at this point. It won't go on for six months like it did here. But one way or the other, don't break things and life will be better. This was perhaps one of the highest profile failures that we saw with a lot of media attention and so on. The scheduler early in the 2.6 series up into the early 20s was called the Order 1 scheduler done by Ingo Molnar and others. Over time, it had developed a whole lot of little tweaks trying to improve interactivity to make interactive desktop systems more responsive. The code had gotten very complex, very twisted, very hard to work on, full of heuristics. And this still didn't really perform the way people wanted it to with regard to interactivity. So, Khan Kalevis, who is actually not a kernel developer by training at all, he's a doctor, he's an anesthesiologist, but he managed to train himself in kernel development and get quite good at it. He came along and he said, okay, I don't like this idea at all. Let's just throw it away, we'll get rid of all the heuristics and we'll put in a very simple scheduler that works on simple fairness. All processes contending for the CPU, each one gets 25% of that CPU period. So, much simpler algorithm. You put it in, simplified all the code, and as it turned out, that made interactivity better than all the heuristics and the complicated code that we had before, at least in some situations. So, he puts it out there the very next day, and Linus looks at it, and he says, yeah, I can consider merging that. I like this, it simplifies things, it gets rid of a lot of code. But if you follow the discussion and you see that within a couple of weeks, the tone was getting rather grumpier. And what was going on was once again, breaking things for people, right? Khan's scheduler made things better for some people, but it made things worse for other people. And he was not as responsive as he needed to be to the complaints of the people whose performance was going backwards. This got to a point that eventually Ingo Mungar went off and doing as he is wont to do it, Times took a day or two and completely wrote his own thing that did it his way, using the same basic algorithm. It was called the completely fair scheduler, CFS. So that was posted within a few months, it was CFS that was merged into the main line, not Khan's deadline scheduler. And within a couple weeks of that, Khan left the development community and he left in a very public sort of disgruntled, unhappy sort of way. Saying, yeah, out of here I'm done, I'm going to leave before I get so fed up that I end up running Windows. And we lost a developer who was really trying to do good stuff, a very smart guy, somebody we couldn't afford to lose. It was not a good thing in any way. So what do you learn from something like this? You need to learn from these things. So number one, improve the kernel for everybody. You cannot go in and improve the kernel for one group of people at the expense of another. The kernel at this point is running on your telephone, it's running on your desktop, it's running on huge supercomputers, it's running on all kinds of things. We have a very wide-ranging user base and you simply cannot make it worse for some of those people. So if you can't make it better for everybody, you at least need to not make it worse for people. Related to this is the fact that certain parts of the kernel are simply hard to change. There's especially true core kernel areas that are coded with a whole lot of heuristics that have been developed over time and where we have a lot of experience that says, if you mess with these things, you tend to find surprises on other workloads far into the future when it's harder to fix. In fact, we're still finding things that relate to the scheduler change and still fixing them. It takes a long time to do this. So there's a fair amount of resistance and you have to have a lot of patience if you want to work in those areas. That's just the way it is. It's a hard task to do. Participate in the discussion. Khan had his own mailing list for discussion of his patches. The people who subscribed to that list were naturally the people who were interested in his work and liked it. So he was working in an environment where everybody was saying, yeah, this is great, you're doing good stuff, keep going, we want more of it. Khan had his own mailing list where people were looking at things from a wider point of view. So he missed the wider discussion, he missed the view of the situation that he really needed to have. You just can't do that. You cannot isolate yourself from the community. Even if, say, not subscribing to Linux Kernel and getting 500 messages a day out of your inbox, there's an appealing sort of thing to do. You have to be part of the community or else you're not going to work well with the community. But perhaps the most important thing, the key lesson to draw from this in my mind, is that you really need to look for a solution to your problem and not the incorporation of a specific body of code. Because if you look at what happened with the completely fair scheduler, Khan got what he wanted. He won. He was able to, through his efforts, replace the scheduler with one based on fair scheduling. It just wasn't his code. And so that hurt. But if he took a step back and looked at it, he got what he wanted out of all of this and was widely credited for having pushed things that way. Dan Fry is a Vice President at IBM, runs their Linux Technology Center there, gives a talk. He talks about how IBM approaches this sort of thing. Within IBM, if you work for IBM and you push the community you're credited for having done that. Whether or not it is your specific code that is merged. They don't care if it's your code that was developed at IBM that was merged, if the problem is solved. It's a very enlightened view and you can see it in action in the way that IBM's developers work with the community. It's something that I would like to see much more widely adopted throughout the kernel development community and beyond, really. If you look for the solution to the problem, you're a whole lot happier than if you're looking for an entry and a change lock. All right. So... You know, the only other time I came to foster him, I actually spoke right next to Hans Reiser. We had put us in two sessions right next to each other. Hans is a really smart guy with a lot of very interesting ideas. There are certain aspects of his behavior that I just cannot approve of. But... You know, and honestly I don't think it's all that funny. But if we think about Reiser 4, back in 2002 it was already fairly clear that the file systems we had at that time were not adequate to what we needed going forward into the future. They were really carrying with us the weight of file systems that were designed back in the Unix days. Prior to Unix, really, the EXT series of file systems really carries forward a lot of ideas from the fast file system and such from our Unix heritage. Old stuff. Our needs have come forward. The hardware has changed and so on. So we needed something different. Hans saw that, plus he had a whole lot of wild ideas of his own that he wanted to put into a file system. So way back in 2002, he put out the first version of the Reiser 4 file system. He worked on this in 2003. Just as Linus was finally trying to pull together a 2.6.0 release and said, why don't you throw my file system into there? You've thrown in everything else into the sun because 2.6.0 was in feature freeze for the better part of two years for a very Linus sort of value of feature free, shall we say. So, I mean, the file system would really just be in the noise for something like that. But he didn't succeed, but he did in 2004 and managed to get it into Andrew Morton's M-M tree, which was seen as being the main path of secure time that's changed a bit since then. Still didn't get in in 2005 and 2006, he made major pushes to get this stuff merged, never succeeded. Finally he kind of left our community forever more and Reiser 4 has since languished and I don't think we will ever see it merged into the mainline kernel. So, why was there so much trouble? Why did we have a next generation file system that we couldn't get put into the mainline kernel? There's a lot of things that you can point to. It behaves very strangely. The only file system I've ever seen where you actually change your working directory into a plain text file and then cat out the metadata, like the modification time is a separate little file. No one else has done that sort of thing. So, there's certain things that don't conform to the established standards for Unix-like operating systems. There are a number of technical difficulties, things like working problems, that sort of stuff. A lot of those result from the fact that Reiser 4 was developed behind closed doors for a long time and given to the community as a sort of finished product. If he had brought it forward sooner, a lot of these problems would have been simpler to fix earlier on. Hans's approach to benchmarks was creative. Shall we say people who ran benchmarks independently tended not to get the same results as Hans and his approach to others in the community was antagonistic. If you questioned his work he tended to get put into the group of people who were conspiring to suppress his work associated with various companies that he didn't like who were obviously trying just to put his work down and so on. It was a very difficult thing to the point where a lot of people refused to talk to him anymore because they tended to get attacked. And finally the episode with Reiser 3 that I mentioned before was something that was still in people's mind and they were really afraid that Hans was going to dump Reiser 4 into the kernel then go off and work on Reiser 5 and not want to continue with the stabilization and development of Reiser 4. For all of these reasons there was a whole lot of resistance to getting the code into the main line so it never happened. So the lessons from this are fairly clear. Linux is not a research system. There's a whole lot of very innovative work that goes into the Linux kernel but in the end this is a production system that is used for all kinds of real world use cases. It's not something that you can just put any kind of wacky thing into and expect to get away with it. So if you are going to break from something like the POSIX standard then you have to do so very carefully in ways that don't break existing applications and so on. You have to be very careful with that. No matter how brilliant you are and no matter what kind of vision you have and I don't know if his documents are still on the net but this guy had a vision for where operating systems should go that was quite well thought out and may not be where you wanted to go but he really had a lot of interesting ideas but none of that will get you passed an implementation that has technical problems. No matter how brilliant it is if it's going to deadlock the computer it's not going to make it in. You can't get past that. Conspiracy theories are not going to help you. This kind of thing happens fairly often. We've seen some of it recently on the kernel mailing list where people will say, well you're just criticizing my patch because your employer doesn't want it in. I won't say such things never happen because we're human and human things happen but kernel developers tend to think kernel developers first and employees of whatever company second. They think it's fairly likely that in 5 or 10 years they'll still be working on the kernel but may be working for some different company. They're not really interested in compromising the kernel for any particular company's objectives even the one that's paying their paycheck right now. So you don't see very often conspiracies of this type going in. If somebody starts accusing people of it they'll be assigned that the discussion is done and that they're not really going to get much further. Just don't do it. Then finally, the community has a long memory and a long time horizon. If you are posting code to go into the kernel people will always be thinking what will it be like to maintain this 5 or 10 years from now because they know they're likely to be there in 5 or 10 years and stuck with it. So they're going to want to know what will this code do to our maintenance going forward and so on. It's always on people's minds and very strongly affects how people look at things. Alright, system tap. Back in 2003 Sun Microsystems comes out with this kernel tracing, actually kernel and user space tracing facility called Detrace and they give it a lot of publicity saying this is a great tool, we have better visibility into how our system works than anybody else has and so certainly you want to run Solaris instead of Linux. So this of course inspired a response within the community and within a couple of years we had an update to Red Hat Enterprise Linux 4 that included system tap, which is a tool that did very much the same sorts of things that Detrace does. It allows you to put probes into the kernel. You do all kinds of complicated data collection aggregation statistics and so on and try to figure out what's going on within your kernel. So this was posted way back in 2005, but we never saw it merged. Instead in 2008 we saw a different tracing facility much simpler thing called Detrace put in. In 2009 we saw Perf events, which is events, collections statistics, that sort of thing. Very different sort of development there was merged there. Even though last year, actually not last year anymore in 2009 we saw 1.0 version of system tap 1.4 just a few weeks ago, I don't think we'll ever see system tap in the mainline kernel which is fairly surprising given that this is a development that had something like a dozen full-time engineers on it for years funded by a number of companies who were very core to Linux development creating a facility that everybody really sort of acknowledged that we need. So one wonders what's going on, why did this happen? And the key here if you think back to the 2008 kernel summit this particular group of people here was asked how many of you have tried to use system tap? About half of the people in the room raised their hands and then how many of you have actually succeeded in doing it? And most of those hands went down. This group of people here is not just sort of any group of random users, right? This is the top level of the kernel development community, the people who can be invited to the kernel summit. If they cannot make system tap work then this is a fairly bad sign with regard to the usability of your system. Right? And so Ingo Molnar kind of described it like this later on right that what you really have to do is to not concentrate on requirements drawn up by management or so on which is really what system tap was instead focus on usability and in particular usability for developers that's a key aspect of getting stuff into the kernel. These usability for developers because if the kernel developers don't see the value of the code it's not going to go in regardless of what people management level say. This is usually a good thing that the developers make these decisions that it's not a management decision that's part of why the kernel is as good as it is. Sometimes it can be problematic because kernel developers like anybody else can be kind of myopic at times and will sometimes fail to see things that really are needed even if it's not useful to them in particular. So as one example of this we actually had a dynamic tracing facility that was posted for inclusion back in about 1999. Nobody saw any value in that and so that code languished and so on and we had to do it all over again ten years later. So it just happens. Here's another example that kind of ties into the same thing. Back in 2008 developer Red Hat came out and posted a thing that he called TALPA. This was a subsystem that provided a new set of system calls allowing virus scanning, malware scanning utilities to hook into system calls. The idea being that if some process on the system opens a file then the virus scanner actually gets an event saying somebody is trying to open this file. It can go and scan the file first. If it doesn't find anything it dislikes and it says back to the kernel okay let that open and proceed and life goes on otherwise it can actually block the open of the file and not allow it. So the idea being block viruses as they pass through the system. So this didn't go in at all in fact. Shall we say the reception was chilly? Because after all first of all Linux doesn't need virus scanners. That's not really a security model that has much value on a Linux system. We don't have that particular kind of problem. So why should we be bothering with broken security models? Now of course the real use of this was not to protect Linux systems. It was to protect Windows systems that are mounting in mail school or something like that that's on a SAM but exported file system that sort of thing. So it goes beyond that but that again was not necessarily a use case that is interesting to Linux kernel developers who are not really usually concerned with maintaining a lot of Windows systems out on the network. Beyond that the requirements were not expressed very well. There was no threat model. They couldn't really say what they were trying to defend against but in fact that sort of came into focus over the discussions and they focused on the solutions that needs. The requirements said basically we need TALPA. Not we need to try to defend against this particular sort of thing that sort of thing. So this code went down in flames. But if you look later on in August we saw the merger of a thing called FA Notify. FA Notify is a set of system calls that provide hooks for antivirus scanners. Which sounds fairly familiar if you look at it. So you might think okay well what changed here? There are two things that were very different. This was in fact the same code. So one of the things that changed was the name to sort of leave behind the memories of what had come before. But one of the things it has this is essentially a file system event notification maker as a matter of course. We already have two of them in the kernel before FA Notify. One called denotify and one called inotify. So we were adding a third one what the developer did is he went and he cleaned up the existing event notification code which was pretty ugly and made it work for both of the existing mechanisms and for his as well. So instead of having three we went back down to one core notification mechanism in the kernel. And the other thing is that he rephrased the requirement. So instead of saying we want to enable virus scanners we want to say we want them to hook into the system without using the root type techniques that they are using now. Because if you actually look at some of this commercial proprietary virus scanning code that people sell for Linux systems now it actually will go and patch into the system call table and do things that you normally associate with root kits so that it can intercept system calls and do what it wants. So that's really ugly stuff that's not something we want to have happening. So this allows that code which already exists which is out there to function without having to do that kind of nasty stuff. And that's an improvement for everybody involved. So by rephrasing the requirements and by cleaning things up he was able to get that code into the mainline kernel. So the lessons from this sell your patches to the developers. Not to the managers not to the customers you have to sell them to the developers. And if you clean things up on the way then you build goodwill. Cleaning things up by the way does not mean white spate patches for anybody who's tempted to do that it means truly cleaning up the code. So there's a few examples I could do a whole lot more if any of these interest you you can ask me during the question time and I can go into that but suffice to say that there's no shortage of examples out there. So one can look at the history and say well we have an awful lot of examples of how things can go wrong. When things go bad you might say well why bother? Why should we be concerned with why do I want to deal with this when things can go wrong so easily. So I just wanted to talk briefly about that starting with the fact that for all that we have these high profile failures things don't go wrong that easily it's not as hard as it seems. Remember that we're dealing with a developer process that in every release cycle every 80 days or so is incorporating the work of over a thousand developers. So every few months there's over a thousand people who succeed in getting their code into the kernel. So clearly it can't be that hard the barriers cannot be that high if that many people are able to get this done. More importantly it's fun as in working in a reasonable free software project it's a good time you want to be a part of it but beyond that even though it's not that hard even though it's fun it's still not a club that everybody can join it's something that you have to want to do it's something you have to work towards it is not sufficient to simply look good in your swimsuit so is there something that's fun to be a part of and if it's something that you're concerned about there's certainly a path towards employment and such that the fact of the matter is this has been true for some time that if you've established an ability to get code into the kernel then people will come to you and they will throw jobs at you that's kind of a nice thing if that's something that you're after gets a little tiring after a while perhaps most importantly of all the message I kind of want to leave you with is that this is how you get the kernel to meet your needs this is how you drive it forward the kernel is really open to everybody who is willing to push it forward in good directions and this is how you get it to where you want it to be this is your vote, this is how our community works this is true of the kernel it's true of every other process or every other project out there you don't just sort of go and put in improvement requests or so on the way that you get things working the way you want them to be is to actually get your hands dirty and to get the code into the kernel so whether you're trying to simply make a device work or whether you're trying to enable some of the sort of freedom type technologies that Evan Mowlin was talking about yesterday this is how you do it this is how you get things to where you want them to be so I hope that that all of you will be inspired to do that to be a part of this to try to push things forward because after all in the immortal words of a former vice president of the United States if we don't succeed then we run the risk of failure so I have a fair amount of time for questions and would be delighted to answer a few I assume we have somebody with a microphone if there are any questions please raise your hand over there, front row hello I know that some of a lot of your examples were from the 2.5 year of the kernel so do you think that having abandoned long periods of instability has guarded against this kind of thing happening more recently I'm sorry I don't hear that very well it's very echoey a lot of your examples happened during the 2.5 unstable phase of the kernel so do you think that abandoning the unstable phase has meant that these kind of incidents will happen a lot less frequently so a lot of the examples happened during 2.5 but in fact really the only the IDE example is exclusively within the 2.5 development series everything else was taken from 2.6 and forward so the abandonment of the old long-term unstable series has certainly changed the process and made certain things different for example much reduced the tolerance for regressions and so on because we just don't have the time to fix them that we used to but otherwise I don't think it's changed much other than that other than just bringing things to the fore more quickly so what hi over here the back somebody's out there don't worry way back here so which category did the android weight locks fall into is that the badly described requirements category or weight clocks ok weight clocks the sun blockers I could do an entire talk on that in fact if you look on Matthew Garrett did do an entire talk at linux con and the video that is online for people who really want the want the full details of that the real failure with weight clocks what is out of tree development where they actually when they develop this this feature for the android system often their own corner without involving the community in it at all and most importantly they ship this feature to to users before they ever posted for for merging into the kernel there are a whole lot of problems with weight clocks the way they were originally developed they were they were insecure they required a lot of changes drivers and so on nobody really liked the the way that weight lock works so weight locks work so they had to change which creates all kinds of compatibility problems with your existing user space when you have to make those changes there's been a lot of trouble trying to come up with with a suitable replacement we actually I believe have a good replacement for weight locks in the mainline kernel now although the android people have not yet really looked at it or committed to using it so hopefully we'll have a happy ending to that story if you want more details on that either look at Matthew's talk or or look at the stuff that's been written on LWN about that of all the things that you listed both that you talked about and that you just put in your list which of them in your personal opinion is the most significant loss to the kernel and the linux community which is the most significant loss hard to say I mean they all represent significant losses in a way but I would in my mind I still really regret the loss of Concleavus because I think he was trying to to work for a constituency that doesn't always get the the attention that it needs to have and so on he was trying to do interesting things and I wish he were still part of our community so then that's perhaps at the top of my list but I think they're all significant John on a lot of your examples it seems like a recurring theme is once something either doesn't get merged or it's obvious that it's not going to get merged it sort of leads to the sooner rather than later kind of death of it it seemed like the one counter example in there was system tap that didn't get merged yet still got to a 1.0 is there something about that that makes it an anomaly that it lived on even when it became obvious it wouldn't get merged or is it sort of just a matter of time oh that's well part of that is relatively easy to answer because system tap for all its failings is in fact very useful to the support staffs to provide enterprise linux distributions it comes it works out of the box when it is packaged with an enterprise linux distribution and the technical support behind it can make use of it and they like it for that reason the companies and the specific company one you know well I suspect continues to put resources behind the development of system tap and will probably support it for some time yet there's a certain commercial interest there in that sort of very rigidly defined environment beyond that a lot of the development resources that went into system tap has been removed and put into other things but I think system tap will continue under its own momentum for some time because it does serve a need that some people have we are getting to the point where the other tracing facilities can fill that in but we're probably a few years from really replacing system tap for the needs of that particular use case so there's a general pattern I've seen in open source projects where older established projects that are very popular become very very conservative and refuse changes that would destabilize all the features that they have existing users that appreciate so how do you prevent requirements like not being able to change the scheduler unless you can prove everyone on earth has a performance degradation anywhere how do you prevent requirements like that from leading stagnation and eventually projects get very very conservative and stagnate until another project comes along and they can try experimental things because they don't have that baggage how do you go forward in a situation like that can indeed be a problem with something like a scheduler the only thing you can do is to to have to test extensively under all kinds of workloads and see if people don't scream for long enough for certain other sorts of things like the user space ABI we really just don't allow ourselves to break it ever so if a change would break applications with few exceptions it just can't go in at least not without a migration path that can take at least five years till you get to the point where you're really convinced that nobody is using it anymore we take the don't break things rule quite seriously to the point that you can still run 8-odd outbound binaries from the pre 1.x days and if you've got the libraries around for them they'll still work we're very careful about that that is a bit of a straightjacket at times it does constrain how we can do things but we want the links to be useful going forward and more importantly we want people to upgrade the current kernels for all kinds of reasons so we will continue to be very careful about that and yes that slows certain things down but it hasn't I don't think stopped the process yet I think we'll continue to keep things going Hi John during Montevista vision 2008 we spoke and then we talked about your device driver's book and we are now almost three years later you decided not to do a new book there will be a new device driver book I am working with the other authors trying to figure out a model for publishing a book for something that changes as quickly as the kernel does in ways that don't go obsolete I'm hoping in ways that are maintainable let's put it that way so I'm hoping to have something to say at least within the next few months but I'm not quite ready to say how that's going to work yet but yes something will happen because a book that describes 2610 is of limited utility in the current world to say the least are there any more questions I think we can talk for hours with Jonathan but well said the first time is almost over so before you leave I want to give the floor to Matthias but first let's thank Jonathan once again