 is that I've given that a few conferences over time. And I have to apologize to you all. This is a talk that about once or twice a year I like to throw away and do over again just because otherwise I get really tired of it, much less the people who have to listen to me drone on and on for an hour each time. But I didn't quite get around to doing that. So this is probably the last performance of this particular version of it. If you've seen it in the last couple months or so, you're not going to learn a whole lot that you didn't learn that. So I apologize to you for that. One of the nice things about giving to a smaller group like this is that it's really easy to take questions and all that. So if I skip over any, just throw something at me. So, theme for the talk comes from a quote from Linus last June saying that, yeah, maybe you're waiting for power and pre-sex and all that, but don't expect the Colonel to wait for you. The Colonel is moving on. The Colonel is always heading forward into new and exciting stuff. This is a guy who knows what he's talking about. So obviously, this is where we're going. So we're going, we'll start actually by looking backwards. This is the set of Colonel releases that came out over the course of the last year, starting with 2633 back in February. See, we've done five of them. We've settled into a cadence that is almost exactly 80 days per release. It's coming down almost to the point where you can set your watch by it actually. That was exactly. We've done five releases. Every one of these releases has at least 1,100 developers involved with it. A lot of people working with the Colonel and so on. And over the course of a year, we had almost 3,000 people who contributed to the Colonel. Uh-oh. It's definitely not speaking. Does that work better? Oh, it's coming out of that one. I'll stand over here and we'll have stereo. Anyway, so what I can say is that the process seems to be working pretty well. In fact, when this group of very dark people came together back in November, the Colonel Summit in Boston, pretty much agreed that the process is working. There's not a whole lot that we really need to change in terms of serious details about how the Colonel development process is going. It's scaled up to what we're doing. Questions about how much bigger it could scale, but questions about whether it actually makes sense to scale a whole lot bigger than it is now because the Colonel is a pretty big project. There are not a lot of people. If we look at these numbers yet again, you can see we're merging somewhere in the order of 10,000 changes for every release. It's dropped down a little bit from its highs of about 2 to 3132, although 37 was a very active Colonel development cycle. 38, when it comes out, I predict will come at it right about 10,000 changes based on what I've seen before. Just a little bit below that, right in line with these other processes here. So we've seen 51,000 changes merged over the course of about 400 days, which is the development time associated with one year's worth of releases like that. That comes down to 124 changes going in every day. And we've managed to add about 3,500 lines of code to the Colonel every day of the year, 365 days a year, no breaks, nothing like that. And that we did even though 2636 was actually smaller than its predecessor by about 50,000 lines. It's the first time we've ever done that, I think, as far as I can tell. I could not find a Colonel release that was smaller than its predecessor any other time. And I don't think we'll see it again anytime soon, but we did it once. So where does this work come from? The usual statistics I run looking at the changes that went into, again, the same about 400 days worth of development, five Colonel releases. And associating changes with employers that sponsored the work and so on. And you get a table that really hasn't changed a whole lot over the course of the last two years. You see, as always, right up at the top, people working on their own time, supplying something just under 20% of the code that we know about going into the Colonel. Just about everything else minus a small contingent of unknown people that we can pin down. So about everything else is coming from people who are paid to do this work. The Colonel is very much the product of paid engineers at this point. We also have Red Hat and Intel, Novella and IBM kind of up there at the top with Red Hat being at the top of all of them. And then various other companies that kind of move around depending on what's going on at various levels in there. So, you know, we're seeing some companies that people really thought were quite hostile to Colonel development that managed to at least shovel stuff in that some people actually call code. It does kind of work. It's a good step in the right direction. It's better than what they did before. And so on. And we have, on the order, the better part of 400 other companies that just didn't quite fit on my slide that are also contributing to the Colonel over a period of time like this. We have a lot of companies putting code into the Colonel. It just keeps on going that way and it seems to be increasing over time. Although it's hard to tell because the quality of our information is also increasing over time. Academics is really anybody working within the context of a research institution. So professors can be grad students doing funded research, that sort of thing. It's typically been a very small portion of things. It's actually grown a little bit over time, which is good. I've been a little discouraged in the sense that we have such a tiny bit of stuff actually coming out of universities. I've been trying to figure out ways to improve that and that's hard. Not a whole lot. I think there's a few, but not a whole lot. At this point, you know, that crowd has long been pushed out of IT services at the universities. You know, instead we have things locked up in boxes and you have to call in the guy with the keys to do it. You don't really know the universities doing that sort of stuff. That's professional. So all this I've been talking about so far is mainline. I just talked together a couple of slides, writing how the stable Colonel process works because most of us, well, maybe not in this room, but most of us don't actually run mainline kernels. We run something that is usually derived from stable release and then further. But there is a whole set of stable releases come out managed by Greg Crow Hartman pulling together important fixes to Colonel. In a sense, it's stable as we can make it, but nobody would say that there are no bugs remaining in it. There are always things that will be found that will be fixed. And as we'll see, they can be fixed over the course of many years. This is the history of relatively recent stable Colonel releases and stable updates with it. So we see back in 2625, they had 20 updates supplied to it with about 500 patches down to 2636, which is the current Colonel as far as stable updates go. There have been no 37 updates yet. We had three of them with 500 changes. The same number of changes here as we had in 20 updates back in those days. The number of changes going into the stable updates has increased tremendously over the course of the years. I don't think that is a function of our Colonel releases being that much bugger than they used to. I think it's a more matter of better discipline of identifying fixes that should be applied to previously risk release kernels and putting them into the stable update stream. So that's what we're seeing there. The things that are currently maintained by Greg are 32 and 36. 32 has been picked by a number of distributors generally sort of in the enterprise level as being the Colonel they're going to maintain for a long time. So Greg is working with them and helping to maintain these kernels, getting patches from a lot of distributors, and all pulling together. That's why 32 has had such a large number of fixes applied to it and a large number of updates. 27 was maintained that way for a fair while as well. Well, it's actually getting a little bit more complicated than that anymore. In terms of for the enterprise distributors, I don't think they're ready to engage with that question yet. They're still dealing with the one they're really just getting serious about shipping now in a lot of cases. But we're starting to see a lot of people picking up other kernels as long-term kernels and maintaining them. This is not something that we had before. Willie Taverot, who maintained 2.4, I think he still technically maintains 2.4. For many years, he has picked up 2627. He plans to keep that going for a long, long time. So if you want a kernel that you can run forever, that would be one that you could pick. 2635 instead was picked by a whole group of embedded Linux companies back towards, I was going to say towards the fall, that's the Northern Hemisphere fall. A number of these companies came together and they said, okay, we're all going to pick one kernel that we're going to ship in our products. We're going to try to actually get our changes upstream and so on and try to improve the whole process of how things work between the embedded community and the kernel development community. So I think it's a good sign. Andy Clean has picked up that kernel and he will be maintaining it for a fair while as the embedded flight kernel. Meanwhile, Wind River was shipping 3.4, so Paul Gormaker has picked that up and he'll be maintaining that for a while as well. So we actually have a number of long-term kernels that are being maintained into the future. I don't think we'll add to that number for a little while, but we're seeing an evolution of the process as more of these kernels get longer lifetimes. So looking back at this chart one more time, this is the last time I'll click this on you, just looking at the number of changes that 31 and 32 have scrolled off the top there. They had about 12,000 changes each applied to them. So there was for a while what really looked like a downward trend in the number of changes going into the kernel. 37 kind of bumped that up a bit. Partly due to a lot of stuff that went into the staging tree and other one stuff. 38 is going to be right along that trend, like I said before. So some people might look at this and say, okay, hey, we're slowing down. Are we there? Back in 2005, Andrew Morton said that the patch volume had to drop because we were actually going to have to finish this kernel someday. That was when we were merging about 4,000 changes per kernel that he said that. I've not been able to resist giving you a brief ever since. But one could say that it looks like things are dropping. Is it getting boring now? We actually solved all the real problems that we had to solve. Times sort of kicked back and all that. And I think the answer to that really is no. Certain things have stopped. Anything that we had to do to catch up with any other system out there is most part done with always exceptions. And so on. A long process of feeding in code to the staging tree, trying to get a lot out of tree code into the mainline kernel has more or less completed. So there was a period where we were just shoveling a whole lot of code into the tree and we're not really doing that anymore. So what we may not see the same sort of patch volumes that we saw around 2631, 32, there was a lot going on and I don't really expect to see a downward trend. I don't expect to see the kernel development process slow down. So I'll spend the rest of the time talking about where I do think things are going. Again, if you have questions you can ask. A fairly easy one to predict at this point is 2638, which is due probably sometime around the end of March if we stick to the usual cadence there. We're currently at 2638 RC2. What's coming is a whole lot of sort of very intense internal performance oriented work that got put in this time. So the VFS scalability work, changing the way the dentury cache works, the way the caching of the lookup of file names, essentially the file system works, went in finally for 2638. This is a very tricky set of changes to the core virtual file system layer that seems to be fairly solid but had some people worried for a while. Another interesting change is transparent huge pages. Your typical processor deals with memory in units of pages which are 4,000 or 4 kilobytes on most architectures that we support. But the processors actually support many page sizes, larger ones starting at 2 megabytes and going up even into the gigabyte range from page sizes if you want to. There can be advantages to actually using larger page sizes like that, mostly in the performance area. If you're using larger pages then page faults will happen less often. They're cheaper to satisfy when they do happen and you have a lot less pressure on your translation look-aside buffer which can also cause you to have to do lookups more often. So putting in, if you can use these huge pages more often, you can actually run faster is what it comes down to. So we've been able to work with huge pages for years because it's just a very administrator heavy application developer heavy fiddly sort of process to do it. Transparent huge pages make it just happen. If you've got an application running and there's a huge page available and it looks like it will work there, the kernel will actually coalesce a whole bunch of small pages and do a huge page and substitute that in. And then it will split it apart again if it has to and it makes it all just sort of happen automatically. You don't get quite the performance benefit that you get from using huge TLBFS and actually really messing with it at a low level. But seeing as it just works for everything, it helps a lot and I think will really increase the use of huge pages in production systems. This is a feature that Red Hat actually shipped with Enterprise Linux 6 and we'll get it in the mainline as of 38. Percession group scheduling. This is the famous 200 line kernel patch that people talked about for a while. It just changes the way the kernel divides tasks up and splits up the available process of time trying to keep processes from interfering with each other in terms of how they contend for the scheduler from CPU. So in a sense it's not a significant feature because group scheduling is something we've supported since about 26 or so. But this again makes it just work, makes it just happen and so it went in. It's more like a 700 line patch in the version I actually saw. Not counting the fixes that followed it. So it has grown this there. And then some other sort of stuff. Transmit packet steering is just a high-end networking performance sort of thing, trying to direct packets out through the right processor and so on. And Trusted Encrypted Keys is a feature using the TPM and integrity verification and all that so that people can lock down your system and you can't actually get to it. Which is something we've all been clamoring for. But there are actually useful uses for that sort of technology as well as the usual 2-inch sort. So that's what's coming with 2638 along with 1,000 drivers and a lot of fixes and all the usual sorts of things. I listed a whole lot more in LW1 if you want to look there. But that's what we got. So from here let's look at the specific subject areas and kind of where we come from and where we're going. Starting with file systems. Now, we can go back as far as 31 and we see ButterFS file system stabilizing. Race changes going in. We merged a couple of new file systems in 34. LogFS, which is for solid state storage, small embedded systems sorts of things. Seth instead being a large scale distributed, high availability file system sort of thing. Very different sort of stuff. Continuing to improve ButterFS. A bunch of scalability work for EXT4 and went in for 37 and a bunch of that had to be turned off at the last minute because it didn't work quite right. So we'll get the rest of that for 38 going in there. So where we come from file systems. So where we stand now is well, we have a lot of file systems. There's really two that most people are paying the most attention to out there. Certainly in the desktop areas and so on. There's EXT4, which is the continuing evolution of the EXT file system series that we've had in Linux pretty much forever. EXT4 raises a lot of limits, performs better, does all sorts of things that we want and it's really pretty much ready for production use. People are using it at this point and it's working pretty well. But EXT4 is also really seen as being the end of the line in a way because it's a very old file system architecture and we moved on, we wanted to do other things. And that's why the ButterFS project was started. ButterFS is a brand new written from scratch file system that adds a lot of very interesting features. Snapshotting, built-in raid management, a whole lot of stuff is going into ButterFS that will make I think a very nice file system to have once it's set to go. It's getting closer. The ButterFS developers will not tell you to put your production data on it right now because they really don't like what happens when they lose people's data. So they're not saying it's ready yet. Being that as it may, the Mego people have actually decided to settle on it and to use it as their default file system. So someday this year, maybe if you can actually get a Mego device you should have ButterFS on it, at least for some of the user experience and I don't think they're all going with ButterFS at this point. But it has a lot of the features that they want and they didn't want to go through another transition a couple of years from now. So they're just going with it. And they've actually said they've had less trouble with ButterFS than they've had from the XD4 for whatever that's for. So that's where that stands. So what's coming with ButterFS we'll see the stabilization and completion of that. Maybe by the end of this year they'll start making noises about it getting closer to being ready for production or kind of no. Like I said, file system developers are conservative that way. Completing things like the rate support, when are we going to get the full rate of stuff in there? I'll ask Chris. Anyway, getting close to that, data migration, there's some interesting patches for managing ButterFS on top of a hybrid storage device where you've got some very fast solid state storage, slower rotating storage. You try to figure out which data is in most active use and you keep that in the fast storage. That kind of data migration, moving things between different types of storage devices. And actually using the features that ButterFS provides. For example, Fedora is working on a feature to allow the administrator to snapshot the state of the system before applying an update. Then apply the update and if the update didn't go right, you just go back to the snapshot and let goes on as usual. Fedora, as you may know, actually issues a few updates here and there. So it might be a nice thing to have to sort of keep up with that. The X-T4 is sort of going into production use. It may gain a few features. There's a snapshotting patch out there for X-T4, but I don't know if that's going to go in or not. I haven't really seen it. And continued performance work, scalability work and so on. And the VFS scalability work that I mentioned before is going on. Yeah, previous as such don't actually exist in the storage later anymore. That got changed over the course of the last few months. So things have improved there. In terms of features, I don't know. There's a certain tension there. But LVM is getting better. Some processes like ButterFS are picking up features that LVM would otherwise provide. If you're using ButterFS, I'm not sure what you would use LVM for to go with it at this point. Because LVM can give you snapshotting, give you multiple device support, and all that sort of stuff that ButterFS will also do for you. And there are some real advantages to having the file system understand the topology of the storage structure at that level. So, you know, we'll continue to get better in all that. But where we'll end up, I don't know. It depends on what you're doing. Alright, moving down a layer to the storage layer. In 31, we saw the addition of the storage topology infrastructure, which lets us just have a better understanding of how our storage devices are put together. So the kernel and user space both can make better use of what's there. Various scalability improvements have gone in over time. We got the IO Bandwidth controller put in so that you can actually control the IO Bandwidth used by groups of processes, if you want to. And there's a way to do that at a couple of levels, so I'll come back to. Hard Barriers came out in 37, like I mentioned before. We sort of changed the way we deal with synchronization in storage layer to simplify a lot of things. And in 38, we saw the merging of a new target mode implementation for the SCSI layer. And the end of, well, I won't say it's the end of a long-flame war, but shall we say the next step? And the long-flame war having to do with what's the best way to support target mode. SCSI target mode being where your computer actually behaves like a SCSI device, like a target. And can export the device usually over some sort of network-based protocol out there, so you can build big storage arrays, that sort of thing. So that went in for 38 and should improve things quite a bit in that area. If you're building large, infinite storage devices, which I'm sure most of us are. So, storage, various things that are happening in the storage area. At the top of my list, I think really is the handling of solid-state storage devices. Because we've gone from a situation where with a rotating disk, you can get 100 or so IO operations per second, maybe 200 if you've got a really nice disk and a tailwind. Otherwise, that's about all you get. Solid-state storage devices are pushing up in the area of 100,000, a million IO operations per second. So we've jumped a few orders of magnitude just like that, and that's going to expose all kinds of scalability bottlenecks that you never knew you had until you actually tried to do it. So there's a whole long job of streamlining the storage layer to squeeze all of this extra overhead out. And there are patches out there that do pieces of it, but it's going to be a long problem because, as Jen's expo put it, there isn't one thing you fix and you've gotten rid of the scalability problems in the block layer. You go through and you do this fix and you've gained a half a percent. You do that fix and that gives you one percent and so on. And after a while, you're actually talking significant improvements, but it takes a while. You don't want to work to really squeeze all that stuff out. There's other things, radiounification. How is that going, Neil? It's getting there, very good. And hierarchical storage, like I mentioned before, hybrid devices, that sort of thing, dealing with that sort of stuff. All happening at the storage layer, a lot of work going on because the storage layer is a key part of the performance of our system. So we'll see a lot of work being done there as well. Yes. Sorry, if I catch that. Which support? Trim support. How is trim support looking? Trim support or discard support is a mechanism for telling a storage device that a particular range of blocks does not actually contain useful data on the device. This is very useful for solid-stage storage devices that are always having to move data around and do garbage collection and all that stuff to do ware leveling. It's useful for other kinds of devices as well. The kernel supports the feature very well at this point. We have it, we can do it in what's called online, where it happens as the system's going or offline. The problem is that the hardware tends not to support it very well at all. And so we found that the feature that's really supposed to improve performance tends to make it worse in a lot of cases. So figuring out how to actually really use this feature on real-world devices is proving hard. But the kernel supports it pretty nicely at this point. Memory management? For a while memory management got kind of slow and I wasn't talking about it, but there's a lot happening with memory management as well. So if we look over the course of the last year or so, we got a memory leak checker put into the kernel. Very interesting stuff if you look at it. The leak checker actually does a sort of mark-and-sweep sort of process. It looks like something you can find in a Lisp interpreter. But as a debugging technology, it's not something you turn on in a production system. HW Poison is a higher reliability mechanism to try to keep the kernel system running at the highest possible level in the face of hard memory errors. Trying to respond to that better using hardware support, that sort of thing. And 32 as well, we saw the addition of KSM. It's an interesting little module. It will scan through your memory. Try to find pages that have duplicated contents. They have the same thing. When it finds such a thing, it will throw one of the copies away and have the two processes share the same copy of it in a copy-on-write mode so that they can split apart again if they have to. This is very useful when dealing with virtualized workloads where you tend to have a lot of this sort of thing. So this came out of the KVM project and it's of interest to that. It's especially interesting if you're running virtualized windows clients which I'm told keep a whole lot of pages containing zeros around because you can never have too many of those, evidently. But they tend to compress very nicely so you can get a lot of your memory back that way. Memory compaction is a sort of online defragmentation technique for main memory. Memory over time tends to get fragmented and it can be hard to find groups of physically contiguous pages in memory. The memory compaction code just uses the page migration mechanism built into the kernel to move pages around when it can, to try to free up large areas so that you can get larger chunks. It's been a nice feature to have for a while with the addition of the transparent huge page support to become very important because transparent huge pages won't work if you can't get huge pages. So you've got to be able to shove things aside and clear these large pages. Only only with 38 that you're seeing memory compaction come into serious use in mainline kernels and they're still working some things out of it. Right back is an issue I'll come back to in a moment and like I said transparent huge pages in 38. What's coming? Right back is just the process of taking dirty data that you have in main memory and writing it back to persistent storage so you have it there. It's become clear over the last year or two that we have some real performance problems in that area. A lot of things that people see as interactivity problems and so on are really right back problems. So there's a lot of work being done trying to figure out where it is that we went wrong with right back because depending on who you ask somewhere around 2.6.18 or so we had some sort of a golden area where right back worked. But different people have different golden areas, mind you. But anyway, we're going to try to fix the right back problem. That's one of our bigger performance problems now. And we're seeing a lot of work around technologies like KSM that I mentioned on the last slide of trying to make better use of our main memory either by filtering out duplicates and having them share or actually compressing memory into memory so you store compressed copies of pages and then you uncompress them when they're actually referenced again. This can be useful for certain sorts of things or trying to stash copies of things in something like a solid state device if you have a large solid state device where you can use that as sort of a faster, closer swap device there's a lot of things happening in this area with names like transcended memory and so on trying to improve our use of memory in that way. How much of that stuff will get in the main line? I'm not sure. There's a certain amount of resistance to merging some of this stuff. But there's a lot of interest in it and it keeps coming so I think we'll see more of it. So moving on to a different area. Real time, real time being our attempt to support deterministic response times in the Linux kernel, something that has been said to really to not be possible for a long time, but it is getting possible, especially if you have the right hardware that doesn't kind of mess with you behind your back. Anyway, a lot of stuff that's in the main line kernel has actually come by way of the real time tree, a surprising amount of it actually. So some of this stuff we've seen more recently was included perf events and we'll come back to later. Changing some of the low level spin lock stuff in preparation for a bigger merge which may happen sometime soon. And really as of 37 or 38 you can really run a kernel without the big kernel lock at all which is a very nice thing. That's been a 15 year process to get rid of that lock but we've finally gotten there. But meanwhile there's still this large out of tree patch set that's shipped by all these distributors at the embedded level, at the enterprise level and so on. It's been out there for a long time. In an era where we've worked really hard on getting out of tree code into the main line so it's all there and we can all work on it. But there's still this really big patch set that is maintained by some of the people who are most adamant about getting code into the main line. So it's been kind of funny but it's there. They say they're going to merge it. So we'll see some of this stuff. There's some memory management stuff, pampability, the memory management layer code that didn't make it this time around, maybe for 39 it'll get into the kernel. Thomas Gleitster told me he was going to merge the sleeping spin lock code for 38. I just don't think he's going to do that. But sometime this year maybe he'll actually get around to that. So if you see Thomas give him a brief, then stand back. Thomas is big. That sort of thing. But the sleeping spin lock code is the low level magic that makes the whole real-time patch work because it's what allows the kernel to be preempted at any time. So anytime you've got something more important coming along you can set aside whatever you're doing and respond to it. So that's really the piece that has to be merged before you can say the real-time patch that has gone in. And it's been many years because it's some pretty scary code for some of them. Very scalability issues. The way the real-time tree works tends to lead to contention much more quickly. So you find scalability problems there first before even the really big iron systems find them. So a lot of stuff like the VFS work that I mentioned before was actually tested through the real-time tree first because they could actually reproduce the problems that it was aiming to solve. I see a lot of that happening there. It's kind of become the scalability workshop which is really sort of interesting. That's not something they set out to do at all. And there's various sorts of open problems a lot of which having to do with the fact that if you're working in a throughput-oriented environment one of the things you really want to do is to decrease communications between CPUs as much as you can. So there's a big push towards per-CPU data of isolating data and having the CPU stay away from each other. Per-CPU data doesn't work well with the real-time preemption stuff because as soon as you're dealing with per-CPU data you can't be preempted, you can't be migrated because otherwise the invariance that you're counting on to make the per-CPU access is safe don't apply anymore. So some of the things they've done for things like per-CPU variables are pretty scary if you look in the real-time tree now. And they don't really understand I don't think how they're going to solve that still because that's a fundamental conflict between as Paul McKinney put it, real-time on one side and real-fast on the other and sometimes it's hard to have both. So one other thing to mention with regard to real-time is deadline scheduling. It's an interesting development that's out there. If you look at the way real-time is specified, you know, a positive real-time and all that it's all based around priorities. You put somebody in a real-time scheduling class to give them a priority and whichever process has the highest priority is the one that actually runs at any given time. The thing is this doesn't really map all that wealth to real-world tasks and it's something that the research community has moved beyond a long time ago. They've instead moved into an area that they call deadline scheduling which changes this by doing away with the priorities altogether. Instead of a priority, you give a process what's called a worst-case execution time. The amount of CPU time it's going to need to get a job done in a deadline. So essentially it says I need this much CPU time by that time there. And then perhaps a periodicity that I need this deadline reinstated every so often every however many milliseconds that sort of thing. So if you do this, if you define your scheduling in this way, you can actually write a scheduler that will, one, guarantee that every process in the system will meet its deadlines and to refuse the admission to any process that would cause those guarantees to be violated. So it allows you to really provide good isolation between real-time tasks on the same system and make sure they can meet their deadlines. There's a deadline scheduling patch out there. This is actually going to go into the academic column when it goes into the kernel because it comes out of the School of Santana Superiori in Pisa, Italy. It works, but there's a lot of sort of interesting problems that you run into as soon as you take deadline scheduling and try to apply it to the real world, especially if you've got large numbers of cores in your system. There's some problems to solve yet. So I'm not sure when that will go in, but it's getting closer. It's being worked on. So we'll see this at some point. And then we should be really the first general-purpose operating system to have this feature that's actually exposed to such. MacOS actually has deadline scheduling in a more limited sort of way, but we'll have it as a general-purpose feature. And you can think. Drivers, I'm not really going to say a whole lot about. Drivers kind of mentioned a few interesting ones that have gone in. Improvements, kind of modernizing the radion driver. We finally got the new Vodriver in a little while back. Dave had a good time with that, as I recall. There's things like that. And with 2637 Broadcom finally actually came around and gave us a wireless driver, that sort of thing. So in the areas that have been kind of the worst problems for Linux should have been graphics and wireless networking. One could say that things are almost solved, but that almost unfortunately is kind of a big almost, but we're getting there. Just as a sort of an exercise and fun, I just went through and counted all the configuration options you find under the driver tree. Configuration options are a pretty poor analog to drivers. They don't map exactly, but there would be some proportionality there. So you can look and you can see that we're adding about 170 things you can choose in the driver tree in every kernel release. So we're adding a lot of stuff. There's a lot of drivers going in. We really have the best hardware support of just about anybody out there. And an awful lot of hardware have support by the time it's available to users. It's really working pretty well for the most part. So why would I say there's a problem? The problem, of course, is that we still have certain vendors that don't want to work with us. We're finding this problem, especially of course in the embedded area, which is kind of behind other areas in a lot of ways. So embedded graphics chips that's really a problem. Things like that. Some companies that just haven't quite gotten the message yet about how it is that you work with the kernel development community. So it's a matter of talking to these vendors and trying to beat some sense into them. It's something we've been doing for years. And over time you see the lights come on and companies figured out that really working with us makes their lives easier than working against us. And so we'll get there, but I hate to say when. This is really sort of a facet of what I call at this point the larger embedded problem where we have this is now a problem we see across the kernel. We've got people working under incredible deadlines, incredible secrecy, for short product cycles. It makes it very hard for embedded kernel developers to work with the community with what they're doing because they're under the gun and they can't talk. And by the time they can talk about what they're doing for the next project. So what we end up with is a lot of code that never makes it into the mainline tree. There's a lot of code out there that's had no community input and if you look at this code you really wish you weren't running it on any device you actually cared about. And things just don't get fixed. It's an ongoing problem and it's again one of these things that we just have to talk to people and try to fix. The decision in the embedded community to name a flag kernel and try to get their stuff into is a really encouraging development in that regard. We haven't seen a whole lot of code actually coming from that direction yet but one can hope that over time because I think that they see a problem there too. They don't really see their way through to the solution yet. Related problem, the drivers and all that is power management. I don't have a whole lot to say about that. We've added various features over time trying to improve our power behavior over time. I don't think I really want to talk about it too much but let's just say that we're getting better over time. We've got a ways to go yet both at the embedded level but also at the large scale data center level where power usage is really just as important and just as relevant and all that. So what's coming in this area includes lots of fixes because everybody at hardware of course needs to be fixed and do this sort of stuff right. We've seen some relatively new things. A lot of this is trying to find a way to solve the Android problem a way that's actually that's actually mergeable which would be easy for them to to make use of. Should they get around to it. The Android developer who works with this stuff recently said that he hasn't actually even looked at this code yet. So we don't know if it's going to be something that they can move to or not because they're kind of working in the embedded environment and they have a lot of those same problems. They haven't had time to deal with it but maybe someday we'll settle that stuff out and various other things like idle cycle injection wave running a system right at the edge of where it melts and as soon as the silicon gets a little soft you can force some idle cycles in there so it keeps going. That sort of thing. So that's where we are with PowerMatch we'll need to hear more about that from Matthew later I think. Something I wanted to talk about for just a moment is tracing and in general visibility into the kernel because this has been an area where we've made big strides over the course of the last couple of years. 2631 was one perf events went in. Before that we didn't have an internal performance monitoring system and a whole lot has happened there since then. In 2632 we saw the addition of the first set of stable trace points or semi-stable static trace points into the kernel and there we didn't have any of those before then. In 33 we got some improvements to our dynamic tracing facilities. I don't know how much that's being used yet there. In 35 we got the ability to do performance monitoring of a virtualized guest and the host altogether as a single unit which is very important to be able to see what's going on on both sides of that boundary and look at the system as a whole. 37 added that's wrong that's 38 is when we get to conditional trace points which is just an improvement to the way trace points work so we can better filter out this sort of stuff that's actually interesting to trace. So looking at some of this sort of stuff Perf events started as essentially a driver for access to the low level performance monitoring hardware built into most processors that we have now so you can just ask the processor to count how many instruction cycles did this take how many cache misses did you have things like that and you can use it for micro-optimization sorts of efforts that sort of thing. Over time Perf has kind of grown in fairly impressive ways at some point I think that we're all going to be installing the Linux kernel so that we can run Perf on it and everything else kind of goes through there seems to be sort of the ingo view of the world. So anyway, we can now monitor software events as well as hardware events things like function calls, trace points things like that, anything that is an event you can deal with through the Perf subsystem. There's a whole set of features there for getting that sort of stuff out quickly it's user space and various analysis tools the actual Perf user space tool was merged into the kernel tree and shipped with the mainline kernel as a way to evolve all this stuff together and seems to have worked in terms of inspiring contributions to Perf so what can you do you can do things like application profiling like you can do with the current user space tools but it profiles on both sides of the kernel user space boundary so you get the full picture there not just user space it's very nice in that way figure out what's causing system events you know who's causing things to be forced out to swap who's allocating memory things like that try to figure out what's going on in your system as a whole various kinds of statistical analysis and so on a lot you can do with Perf so you can look at the wiki there and various other things it's an impressive tool sort of related to it and kind of in conflict with it at times is ftrace which is the built in kernel training system that we have initially called ftrace because it's just tracing function calls within the kernel but now it does a whole lot more than that as well so you can set up ftrace to find out what's causing large latencies in the kernel what is it that's causing the kernel to be slow to respond to things that sort of thing you can trace power straight states you can trace memory mapped IO operations it's a very nice reverse engineering tool to see what some sort of software is actually doing with IO memory that sort of stuff trace stack usage trace points again all that sort of stuff if it's traceable you can do it with ftrace it's also a powerful and very useful tool kind of fun to play with recommend that you take a look at that if you haven't already I ran a bunch of articles on LW1 written by Steven Rostad who did most of the ftrace work on how this stuff works so if you look for it there you'll find that related to all this is this concept of trace points one of the very nice things that dtrace came with on solar systems is a whole set of well documented trace points placed in the kernel so if you're interested in say when the scheduler is changing processes or something like that you just look up the trace point and you can hook into it you don't actually have to know how the kernel code works Linux is traditionally not had these for all kinds of reasons we lack the technology to do it right and it took a long time to convince people this was something that we actually wanted but we're starting to add trace points into the kernel now but there's about 200 but I think this slide is getting slightly out of date probably closer to about 300 by now they keep going in with each kernel release we add more trace points to various things the well documented part of this is perhaps lagging behind that but we can hope that we'll get there and we'll get to a point where we'll have a nicely instrumented kernel where you can just sort of turn on things when you want to see what's going on in parts of the kernel and get the information out very nicely and very easily that way we have an interesting discussion over whether these trace points are part of the kernel's application binary interface or not and whether kernel developers can actually change them and we thought we'd resolve that at the kernel summit but then this seemed like maybe we didn't so I'm not sure we may see another fight yet over whether this stuff is actually ABI or not finally in regard to tracing the other thing it's just worth mentioning a system tap because it is still out there they just released 1.4 I think this is a system tap system tap being a powerful dynamic tracing environment really aimed at competing with de-trace all that system tap is never quite met its potential it may really never make it into the main line just because of the way that they work or don't work with the kernel community but it's out there and people use it and I suspect they'll continue to work on it for a while so final topic very quickly is security security of the kernel as a whole so a while back I put together this slide I'm just looking at CVE numbers for the kernel this is not all CVE numbers for the kernel for 2010 I simply ran out of space on the slide we had well over 100 of them lots of vulnerabilities in the kernel so I actually raised this in the kernel summit asked if people thought it was a problem and the answer I got for the most part was no bugs or bugs our bugs are more security relevant than a lot of other bugs just because of the way the kernel works and we will continue to fix bugs whenever we find them on the other hand I am starting to see more people actively look for sources of kernel bugs and maybe try to harden the kernel against them and improve things and this is good because we do have a real problem out there we are going to start to this once upon a time we were defending our systems against scripted KDs and that wasn't really that big a deal to do you could deal with it for the most part what we are starting to see what we have been seeing for a little while really is actors who are much more motivated and much more capable whether they are organized crime type people or governments I suppose there is a difference between the two for the moment we have got people who have fairly serious resources and very strong motivation doing stuff and they are writing things like Stuxnet or whatever and doing some pretty impressive things with that so we are going to think about if we can possibly defend against that kind of thing and what we can be doing to better handle it and the problem is hard this is one of my favorite exploits that came out over the course of last year was posted it is good because it actually takes advantage of three different holes and has to use them all to actually exploit the kernel you got one that can allow you to make the kernel write to an old pointer or actually what you can do is you can write an old pointer to an arbitrary address after an oops there is another one that can force an oops a couple of them and you can actually force the oops and force that right to be in the right place so that you can essentially write that zero to a location that will be your user ID and so you know instant root process right and this exploit worked and it worked by exploiting all three of these so things that look like they are not all that severe with somebody who is sufficiently motivated I am amazed what some of these people can do with what seems like just a tiny little vulnerability so we have a real problem there and the solution to that I don't know we are seeing various things going into the kernel over time most of which are not really aimed at this problem trying to add various things at the access control level which is not really where our problem is I don't think at this point and again like I mentioned there are other people who are trying to harden the kernel now so we are starting to see some stuff going in but we will see I put down FA Notify for 307 this is malware scanning, virus scanning sort of thing and actually the user space connection to that actually got disconnected before 307 went out because they found some ABI problems with it and I don't think that's been fixed ever since I think it's been kind of sitting idle in the kernel and they haven't actually done the part to actually fix up the ABI and like I said various sorts of things trying to get some stuff out of the GR security tree the same parts as they say of the GR security tree things like that trying to move it into the kernel and harden our kernel over time and hopefully we will get there and that is where I stop almost exactly on time believe it or not but I do have a couple moments for questions if you can turn how does the deadline scheduler figure out the worst case execution time and the answer is it doesn't the process has to tell me I want to maximum this much time and in fact the way deadline schedulers work if the process exceeds that worst case time it just gets thrown out it loses it's done it's not going to make its deadline anymore but no the scheduler can't possibly know now there are people in the academic community doing research trying to actually prove worst case execution times for certain kinds of things but that's a gnarly task and I don't think we'll see that for a while it's very similar to the whole thing probably so for now developers have to guess and actually watch what happens on a real home running system any other questions if not I will get out of the way and thank you all very much