 You could create a lot of description, and sometimes the machine doesn't come back. So you need to be ready in your fleet to lose the machine at any time and be able to remade your machine at any time if, like, for some reason we didn't test the kernel on this hardware and the machine don't come back. So that's something that we need to make sure you have in your fleet. Like, don't configure a machine by hand. Make sure you have something like puppet chef fault that will do the configuration and the provisioning is automated. So that's something that you need to put in place to be ready. So the first, yes, we are done using caseplies. We are looking into the Kexec to actually accelerate the reboot, especially on the development part. But we still see a lot of vision Kexec, and that's something we will work on to accelerate that. There's a lot of hardware that don't initialize properly after a Kexec. So some of our fleet works well, but some of the machine is like, oh, no, it doesn't work. So there's still a lot of work to make that happen. And caseplies, you're still limited on what you can do. It's getting there, but you can do specific change. We probably want to keep that to a really, really specific security feature that you want to get there. And you still need to reboot to have a real kernel at some point to have a good base. So the first part of being of convincing people, you need to have a really good communication in place. And that's what we were liking. You need to go be able to give out the plan. And that's kind of the conversation we had for a while, like, hey, people were like, yeah, finish upgrading. And it was like, yeah, here's a three-time kernel. And then as soon as they finished, like, yeah, actually, you need to upgrade to 4.0. And that was all in like six-month period for some service owner. So you need to go there. And you need to listen also to what the console is. Lots of people are a lot of fear about kernel upgrade, a lot of unknowns about kernel upgrade. So you need to listen to why they are afraid of and make sure you integrate that into your testing or fixing whatever is broken into the kernel. I talked about the RIS process earlier. Just make sure you have a good process, a regular process, and just go over that and making sure it's like, people know when they're going to get a new kernel. So our current really slow is we get a new kernel out every six to eight weeks. We actually follow really closely the upstream way of doing things, just like we rebase every year and we release an FBK internal kernel every six to eight weeks so people know when they're going to get a bug fixed and when it's ready to get deployed. Building just make sure, like software development 101, you need to have an automated drilling process. We could use Jenkins. We're going to use something more internal at some point, but just have one square that build all kernel, output some RPMs, put them in some repo, and having them ready to be deployed. The testing part is, I would say, the thing that is currently still lacking on the Ether internally, but also in the community level. The kernel testing, there's a lot of effort trying to improve that, but there's no like a good way. We really, we currently run some tests, like we run LTP on each build. I'm looking to integrate the case health test, but it's hard to know where to put your tests to, there's many places. And that's something we would like to see improvement, but maybe we'll continue with something at some point when we get to a good level. And if you have some tests, so make sure you publish them at some point so people can use them. Another thing you can do, do some comparison testing, that's what we do. We compare like the same workload on new kernel and old kernel, and we have many machines, so we need to test that from different hardware. That's a lot of testing you need to do to make sure that the kernel is good and people will trust you when you give them good kernel. The last part of that, the next part of like it's kind of testing or deployment, doing some kind of real help a lot of people, gain confidence into your development process. You just set up some shadow traffic, that's what we have, sending the same traffic to two set of machine, like the real one and the next one to get people or just deploy to small subset and just let it bake for awhile. People will gain confidence in that. The other thing that we do, we build daily kernels, daily of our run build, but also the leaves of the upstream build. So we can catch upstream problem and deploy that into some of the machine. And so we can catch the problem faster and fix them before they get into a release kernel. So we don't have to go back, do the upstream back port and all the cycles. So if you can catch them. So we want to do more of that, do more testing on the actual upstream kernel. And maybe that's a print release performance number if we can. That's going to be a good thing so we can see some regression there. The question is, is the kernel, is this automated? I would say a bit of both. We have some basic automation that is just like, that will deploy. You can run and analyze numbers and there's some service that can do run two set up and compare the numbers and give us good, bad result. But currently a lot of that is just like, let it run and see if we break something. And that gives you also good signal, especially if you can spare some machine. If you don't mind, if you have a few web server that you can spare. We run a new kernel there and see if it happens. And yeah, and then the last part, having some tool to deploy. Your kernel will up you a lot, depending. There's no, I don't think there's a good open source tool to deploy a kernel to our fleet at this moment. You can maybe use some kind of configuration management to deploy it or develop some tool, but that's something that you need to have some way to deploy the kernel, deploy it slowly. Make sure you don't put the whole service. Like if you have a service on 20 machine, you don't want to put all the 20 machine down. But you need to have that. We internally rely on a tool called Fbar, Facebook, auto remediation that basically do all the operation management of that, like draining the machine, putting them back in prod, doing the actual operation of that. So it's a good thing to have if you want to do many machine at a time. So we did this, we did this couple of time now. We did the switch to 3.10, we did the switch to 4.0. Just a couple of things to watch out if you want to work without steam and you want to deploy the fleet. There's a couple of guts that burn us, so you should probably be aware of them. Make sure you don't rely on numbers too much because they kind of change fast. We were hoping to keep the 320 because a lot of our monitoring had like hard code 3.something kernel, but when they switched to 4.0, it was like, oh, we broke many things. So if you develop an open source tool that's rely on kind of version, make sure you can support future of major number. That was a little bit annoying. And if you deploy a kernel, you're going to need to upgrade some packages like EtherEatTool, IPRode, SystemTap, Perf, Crash. You must have all packages that depend on the kernel that you need to upgrade when you get, if you want to exploit a new feature of the kernel. If you don't use the standard GCC on your system and you deploy external modules, you need to make sure the GCC are matching when you build a module and when you build your kernel. If you don't, there's like the ABI might not match and you will have a lot of problems installing the module in your fleet. Sadly, the kernel, my configuration management is a really difficult thing. Even if you reuse the version control, the .config file, there's no way to add comment or to track why you put changes. So that's something we had problem that like, oh, we enable something in one version and then we copy the .config file to a new branch and then we forget that we enable this specific hardware or configuration and then we deploy a kernel and somebody's like, oh, why is that not working and you find out that you disable something that you enable or if it's Versa. So I wish we could do something better if somebody had some idea to improve the .config subsystem to be able to be more expressive about what you want and dependency and why you put changes there. That would be a good thing to have. A lot of performance regression we got was because somebody, and that's related to somebody added a lock somewhere to protect some data structure. One example is like the Apple CTL got really slow from one use case. We are using Apple CTL a little bit but this is called got really slow. We found a problem, it was just a new lock. We were able to rework the whole area to do a lot less and reduce the contention there but that's something to pinpoint if you see something getting really slow, that's probably what happened there. If you still have some patches around, make sure you don't forget them. We had patches in the past that we just know nobody knew that we had and just like, for example, on Perf and people, oh, it stopped working. Oh, there was a patch there. I didn't know. It was not sent upstream, it was not commented anywhere. So that's one of the major reasons of getting upstream first is better because you don't have to deal with this problem and don't forget that. If you're a great driver, sometimes you come with a firmware and you will probably forget about it. So you probably need to upgrade these binary firmware. That's also a difficult of deployment. So make sure you upgrade the latest Linux firmware package. Usually that's enough to be deployed. But many problems that we got reported to us are actually not kernel problems. A lot of people will see, oh, there's a new kernel there, so I see a problem. So problem versus kernel problem. So make sure people know about the workload, know that it's not a memory. Some people are just using too much memory and o-ming and just blame the kernel, but sometimes they're application that just used too much memory. So make sure you have a good memory monitoring in place to monitor kernel. What we do, we use the NetConsole to collect all the d-message output. If you want to use, you can use syslogin. Does the same thing. We use KDump to collect all the crashes in the fleet. We're having less and less of them as we go to new major kernel that you really see there. It's going down, but we still have some. But it's really good to be able to have the full KDump, the full core, so it's really easy to debug your problem. And we use also some performance monitoring. We use some using Ftrace and Perf automation to actually collect. And so we're gonna use some BPF-based data collection to continuously monitor like syscal performance to make sure we don't regress over time. So a couple closing advice. After going upstream first is good. It saves you time in the long term. It's more work initially, but it will save time. It will make everything better for you and for the community. Don't be afraid to push back. We got a lot of people when we started doing that. We're like, ah, it's a really good idea. It will slow us down. And at Facebook, we all about moving fast. But in the long run, it enables to move fast in the future by rebasing easily. But you need to be really convincing and make sure people are doing the right thing. But we now are starting to see that it's getting other team like HBase is doing that now. And some other team are looking at doing like really good contribution to principle. Stop forking the internal project and just making sure the upstream project is a good way to go. So as I said, we have less than 20 patches. And that's, and I guess the guess I like is the age of our kernel and definitely the average age. When we started like a few years ago, we had we just deployed new kernel and there was like a lot of two things that were still aging. But as we go in with the upstream first approach and going to better kernel, the age. And that's based on the age of when the major upstream release were released and what's the current like age of our kernel is going down. So hopefully we'll get younger and younger kernel in the fleet and probably will be better for us. So make sure you send your patch upstream. Don't keep patch internally, that's bad. Please, everybody wins. If you have any question, if that's the kind of challenge that interests you, I invite you to come see us at the boot. We love to part that. We have many other challenges like that to deal with at Facebook. So it's really fun thing to do. So if you have any more questions, I think there's a lot of time left. Yes, so the question is how do you, if you get a crash, how do you verify it's not really working upstream when there's a board report? That's a good question. I don't have a solution for that right now. You need to even know. We have an internal system that with the bug tracker with like compare the call stack and see if it's the same that then have a task in the internal bug tracker associated to it. We don't have anything. I don't think there's a good, like we could look, there's a kernel bugzilla, but it's hard to see if it's the same problem. So you basically need to either look at the kernel story if there's a patch that looks like it. But that's one thing that we adapt into us many times. You see a crash, a panic and just like, oh, I'm gonna fix the bug. And then you turn three 10 and just like, send it upstream or look at the upstream code that somewhere is like, oh, somebody just did the same patch. So that's one of the other parts. And that's one thing that is good about running newer and newer kernel. If you just run the latest upstream, there's a small, a really small change that somebody already fixed your bug. So that's my advice. Just run upstream and make sure it's there. But there's no good way to, I don't think there's a good way in that, that's a problem that's been a long time. How do you track the kernel bug in the community? There's been many effort, but there's no like one, like this rule that they're on tracking. There's a kernel bugzilla, but not everybody use it. Some people just post to the mailing list. So it's hard to track all these things. It's obviously, how many people are there in the team? The kernel team by itself, it's about close to 20 people. But it's like an effort that everybody, like every server owner actually look at their bug and like they see their system crashing and they will help the bug when we, we'll be there as like as helper help them debug or we'll debug them for them. But there's many people looking at that internally. Yes? So the question is how many people actually work on the kernel and who work on the kernel? Is it a kernel team or is it separated? Most, that we work, most of the people that contribute are part of the kernel team. And but that we work like some, most people will have a dedication to one specific area, file system, networking, performance, that kind of stuff. There's some people outside that will contribute once in a while patch. Some service owner will see out there, I have a problem with a specific driver or especially more like a, I see that more often like on Perf. I want to add this feature and some people will contribute it back. They will often come to us as a first review and we'll help them since we are more, the kernel team is more used to send the same thing upstream but we'll help them send the upstream but we just want them to send them as the RDR owner and it's easier for them to actually get the recognition and just us taking the patch. Some people will just like all right, is that just can you deal with the upstream? I don't want to deal with them so we will take their patch and make it go through the old upstream process but it's mostly the kernel team now for this few other like system like especially the other where people might do some contribution there but that's the way it's organized. No, no, it's about 20 people there so it's, there's a workflow and a lot of that is just like purely upstream work. And I would say like a lot of that like Chris Mason will contribute maybe hundreds of them just on the BOTA FS. So that's a lot of effort. Now a lot of the contribution don't even see the Facebook data center. Like we do testing, we use the Facebook infrastructure to do testing but we will just send the patch upstream and they will get back at some point next year when we do the rebase. So yeah, so the question is that what do you need to submit when you submit performance patches? What do you need to, what proof? So the way, especially for performance patches we people will want to see numbers, want to see the workload and especially they want to see the code that generated the workload. Sometime you cannot write all of that but if you, most of the time you, if you can have a small like C code, C example that goes with your patch or pulling to it or benchmark program and see the numbers. People like, you'll see the patch improvement if you look at the history I've always had like before and after numbers. People won't trust you just on the fact that oh it's better, no, there's like nobody accept that as a good value. You want to see why it's there. It's one thing that's really hard to compare though is like it's really hard to, oh what, which hardware is that? One of the good thing about the OCP project there is like we have standard hardware with all the publicated spec so we can use that as a benchmark on like Leopard type one. Like all the spec is there it's easy to compare version there. So at some point we might be able to just say that on this hardware and people will know but yeah like that there's a multiple layer that it can be so you need to describe as much as you can the infrastructure. Any other questions? So the question is like what's the timeline between when you get a bug and send upstream and that's really depends but like a typical bug might take a week or two to pinpoint to find where it is it and depending, it definitely depends on what is it but if it's an easy fix you will send it upstream and it will get applied right away and it will just go into the next rebase and what we do as soon as we know that the subsystem isn't accepted the patch will backport it directly we won't wait for the next release necessarily. As soon as we know that the patch is good we will take it. So some bug will take a couple of weeks to fix some might take six months because it's hard, it's complex. There's no typical way to do it. We are out of time. If you, I'm gonna be around for question and come see me in the booth.