 Good morning. Good afternoon. Welcome to the presentation about getting your first contribution into the Linux Mainnet. My name is Marta Lipchinska and I have been involved in various open-source projects for nearly 20 years. It means also that I have contributed my first patch to numerous projects and every single time it has been similar. I have heard quite often that contributing to the Linux kernel is something special, something that's more complicated. So in this presentation, I'm going to show you the way forward. The way forward that gives you a way to go and demystify the way you can contribute to the kernel. But first, let's look into some statistics. You have the kernel versions from the latest one and you have a number of developers contributing to each of those versions. And then in the third column, you have the number of the first time contributors for each of those versions. What you can see is that in the recent years in every single version of the kernel, you have more than 10% of the developers who are actually doing their first time contribution in this version. Remember also that if you are, let's say, in the 5.10 line as a first time contributor, it means that if you contribute again in 5.11, you are not going to be in the first time contributors. If there are so many people, you can count that thousands of people sending their first time contribution and being accepted every single year. So is it really that difficult or there's just a way to follow to actually make it? So let's get started. My approach to contributing to the kernel includes a few steps. I'm going to show you that procedure with an example of working on a bag and we'll cover a little bit later why it's a good way to start. We'll have a few steps. First, all the things that you need to start your journey. Then we will be analyzing the issue and understanding what kind of a problem we are seeing. We'll cover a little bit of tools that can help you and then we'll move to actually preparing the change. First of on how to prepare it so you can send it and then after you send it on how to get it accepted. And at the end, we'll cover the case if your first contribution is not a bag, if it's something else. And I will give you a few details about what will be different in this case. So let's start from the beginning, right? What do you need to start contributing to the kernel? First, you need your subject. I think that the best way to start contributing is to have a bag and solve it. You may see a suspicious warning somewhere, something is happening differently than you expected to happen or you just get a big dump from the bag, from the bag out. You may be also tempted to do a first contribution as a new feature. So something in new code or a new subsystem or a drive. This is possible but I think it will require more work from you because you will have to prepare a bigger part code. So normally when you are developing a new feature, you will run into a bag or two and you can test the way of submitting into the kernel by using this bag or this small improvement that you have and then go forward with your new driver or new subsystem as your second or third contribution. Apart from bags and new features, you can also perform an improvement. So for example, better performance in some specific case, do some refactoring or add a new test. This is also a good way to contribute so you can choose what is your case. Now when you have your subject, you will need a Linux source code. You will need a compiler and a bug of course. You will need a test system. It needs to be a machine with root access. It may be a virtual machine. It may be an embedded system depending on situation and your problem. You will also need a text editor. There are of course preferences and every developer oriented editor will work. You need to support raw text mode and you need to control how white spaces are added into the editor. Basically every developer oriented editor I know can work here. If you do have a Linux coding staff function in your editor, that is even better because it will save you some time figuring out all the small issues that just take time and do not add much value. And then you will need an email client that supports a raw code, raw text mode to send your patches. We'll cover that a little bit further. Now your developer system should be running Linux and make sure you have the pseudo or the root rights because you will need it to install the kernel and the modules. You will also need to install some packages, some dependencies and that will differ between different distributions. And here I give you an example for Debian that will be probably quite similar for Ubuntu and for all the distributions you have to figure it out. That's not that complicated. Watch out for small detail. Typically in default installations the boot directory in your system is quite small. And if you start installing new kernels you may just get out of space here so you may be watching out for the empty space here and remove unused kernels if you need to. Then getting the kernel source tree. You need to get the main tree because this is where you submit your patches to. All the patches you submit you send it against the master of the Linux tree and you have a link where you can get it. If you are working on a product and you found your back in that specific product you are likely on a stable kernel, probably one of the long term support ones. And in this case you will also need the corresponding stable tree so that you can test against the kernel you have and then apply on the master tree to test again if your fix still works. You may also need a tree of a given subsystem especially if you are working on changes that interact with recent changes happening in that subsystem. But that should be something you will know when you start working on your changes. Then I strongly recommend you to get an indexing of the kernel source tree. You can do it offline, your text editor can have it or you can do it online and I give you the link to one of those indexes. A kernel index saves a lot of time because you can just type in a function name and you will find out where it is defined. If there are different versions for example depending on the architecture and you can also check how it changes between versions especially important if you are moving your change from one version to another. Saves a lot of time, get a source indexing. Then how to do it step by step. First you have to get the source tree and that can take time because you need to download it. Then you get to the cloned Linux directory and we check which kernel you have right now because you want to copy your current configuration to the working directory. The directory you will be compiling your kernel to use your current configuration as a base to your new kernel configuration of the master. If you are an embedded system that may be a little bit different. Then we do make old config to add the options that create the new config based on the older config and add all the new options that have been added between the versions. At the end you build it, that takes time too depending on the configuration you have. And at the end we install the kernel and the modules. At the end you may need to update your bootloader or just make install will do it for you. That is to be checked in your system. When you have time after the presentation or when you are watching this offline try to compile and boot a kernel. Download the master, change the version name extra versions in the makefile, compile it, install it boot and then check if the uname command gives you the version you have put into the makefile. If you can do it on an embedded platform that's even more complicated quite often so you get bonus points if you do it. Now you can compile your own kernel and we can now get to the issue just to your problem and to a solution. I'm going to walk you through using a bag and we are going to start from what we get in the kernel log when it happens. But first, when you are analyzing the issue where to look for information there's quite much to look for. You have the kernel documentation in the documentation directory in the kernel tree. You do have the code itself and you have the comments around the code and here you can use the source code indexer. You can also look into the previous discussions on the mailing lists and you can look on the news sites. You can find out that a SEP system you are working on has been described when it was introduced some time ago, when there were some big interface changes or for different occasions so you may find some background information here. You will, however, that the kernel interface the kernel internal interface changes. So a documentation from five years ago may be no longer accurate. So look at the dates of the blog posts at the articles to find out if it still applies to you and if you are unsure you can use the source indexer to verify how a given function has changed maybe it has disappeared or maybe it has appeared a few kernel versions again. Now let's get to our bug. This is an extract of what you could see in a specific situation in kernel 5.0, I thought time ago. You can see that there is a bug line and there is some more information and we are going to decode it step by step. The complete cut trace you can find in the kernel source code it was committed with the fix. So if you are interested, you can look it up. What do we have here? And we have the important point in red. We have the bug, we have a null pointer, the reference. We have the function where the problem happened and we have the cut trace. So how to understand it? The crash happened in NVMe parse in a log and then it was called from this other function. You may be tempted to run a debugger and try to figure out what happened. I can show you here that you can analyze a problem like that quite often by just looking into the kernel code and then verifying if you are right with your analysis. So we get to the function that crashed in this case and then we can figure out that the Azure tool line tool will point out a specific line. We have a null pointer, the reference. So we may be thinking about where a null pointer can happen here. Of course, in this line, there are some obvious cases. It can be the control. It may be the log booth. So probably one of the two. Null pointer, the reference, quite often means that there is something that wasn't allocated. You have like three ways of figuring it out. First, checking all allocation paths or tracing the call stack or looking into the kernel messages around. What I'm going to do here is to mix all those three approaches a little bit. So first, the message. You can see in the log that we had that message a little bit up in the screen. So we do have a condition here and we return with zero from the initialization. Then we can look a few lines below when we can see that the allocation in a log buff is allocated after we return from this function. So if you see the message, the buffer has never been allocated. So that looks like a possible case of our problem. If we manage to enter a function that is using this buffer, it's differencing it, but we didn't allocate it. It means the fancy that we do have a problem. And that was the case in this situation. Again, when you have some time, take a look at some existing kernel issue that has a fix. Then the commit fixing a bug in the subsystem you want to work on. Ideally, if it's a bug and you have a complete trace in the commit messages, look at the description, not at the solution, and then try to figure out what the issue is just based on the description. You can list a couple of ideas like more than one. And then after that, verify if you're right to the patch count. And if you can actually analyze a bug that hasn't been fixed, for example, in the kernel bugzilla, then you get bonus points and you can actually try to fix it. Now, during development and debugging, there are some tools that you can use that can make the task easier. So let's take a look at a few of them. Quite often, we do not want to tell it, but most of us are debugging using printer statements. And that's also true in the kernel. But there is no printf in the kernel. Fortunately, there's something else. We have the printk. In fact, quite rarely used on its own, as a printk. What you can't find more frequently are the peer functions like peerer, peer info, or debug. They are an equivalent with an exception of peer debug that can be compiled out. In devices, device drivers, you will find similar functions like device error or device info. The advantage is they are giving the device name also in the error message. So it's easier to figure out which messages from which driver. You can also dynamically debug all the peer debugs in a specific file or files, and you do have an example of how to do it. There's more documentation about using dynamic debugging if you need it. Then all the oops, bug or worn on are giving you a kernel trace. You can also do a debugger on the kernel level if that's what you prefer. You can use a function tracer allowing to find out what is happening between two events. You can do performance measurement or counting different types of events to help you understand what is happening. And the kernel is giving you a number of tools documented in the tracing index. Quite many things depending on the task you have. On the testing site, the kernel has a self-test framework where you can test the kernel from the user space, the test to run after boot, and you can write test modules to help your test. And you also have the key unit framework. And in this case, you have unit testing inside the kernel. So you can use it for a driver. And it's quite similar to any unit testing framework you may know from any other development. And yet again, you can find the documentation of the tools for developers in the kernel documentation or online. Now, you do have the change, you do have a fix for it, and now how are we going to prepare it? When we are preparing a change, we should use the Linux coding style. And here I simplify it to one slide. The tabulations are eight collectors. You put one statement by line. The preferred maximum line length is 80 characters, but if you do have a good reason you can get something longer. You should use rather short names and lower case and the braces. We can try to explain it in a more complicated way, but I prefer to just give you an example of how they should look like. Please take a look at the spacing and the way you put the statement. Of course, there is a complete definition and there is a tool that you can use to verify the coding style and for files, for patch files you run scripts check patch of your patch or for the whole source file you do script check patch minus F, your file, and you get the results on the fix you, the things you should look into if they are right or not. Now, you do have a patch, it has a good coding style, so now how are we going to create it? You can create easily a patch from your last comment with git format patch minus one. If you want to create patches with two last comments, it's minus two. And your comment message should look this way. In the first line, the title line, you have the subsystem and the title, then you put an empty line, then you write what is the purpose of the patch. The lines should be a maximum of 75 characters, you put an empty line, and then you put a sign of why whip your email address. Example, you have some driver and fix a timer overflow after 30 minutes. The title should explain what it does. If it's a fix, you write what you fix. Then we have an empty line, and then we describe why it's important to fix it and when the situation happens. In this case, of course, for totally artificial, we have a back happening when a cat sleeps on a keyboard for more than 30 minutes. I haven't seen one like that, but really everything is possible. And then we sign off the comment. About the sign off. This is pretty serious matter, and you should be using only the real names, and it certifies you have a right to submit an app and source license. It means that if you are working for a company, if this work is done, while we are working for a company, you should get an agreement on how you do it and if you're allowed to do it. If you are working on your own, that's fine, that's easier. But this is a really serious matter when you add a sign of by. Apart from sign of by, you can get a few other frequent tags. You can get act by. This is added by a person who reviewed your patch. This is often a maintainer, but it may be any other developer who knows the subsystem. You can get a review by, and this is the person who formally reviewed the patch and they think it is ready. They have communicated all the comments to the auto and they think that the patch is correct. You have also reported by when you want to give credits to someone who reported the issue, who have tested by someone who tested the patch, and you can, you see fixes tag that is showing which comment this one is fixing. And of course, you can see more about the tags and the procedure in the kernel documentation. Now, yet another exercise when you have a moment, perform a change in the kernel. Then test it that it actually works as you expected to and use check patch to verify that it is correct. Format the patch file and then we can discuss it if you want to. You have the patch ready. So now, how you are going to get it accepted? First, you need to know where to send it. We get that information using the script getMaintener and here is how we use it. On a simple file, librandom32.c we get two maintainers, we get the mailing list and working mailing list and we get the general Linux kernel mailing list. You will get the general Linux kernel mailing list for everyone, every single file with the specific mailing list and the maintenance will change. Those are the people and the list to whom you should be sending your change. Do not forget to include the maintenance directly and do not forget to include the mailing list directly. The kernel mailing list are quite busy and your patch may simply get lost if you do not send it to the right people. So, we are submitting your patch now. First, you should make sure that the coding style is fine but you already know how to do it. You should send plain text email and inline the patch. The subject should be patch and then the exact title, your first line of the patch. You can use most email clients with specific configuration and you do have examples in the kernel documentation. A few things not to do. No attachments, no encrypted emails, compression, legal statements in your signature, long signatures. If your email client is adding those, talk to the people handling your MyServer and your MySystem to find a solution. And again, no GitHub will replace for now. What I'm doing when preparing for a new open source project like the kernel or preparing a new machine, I'm configuring everything and then I'm sending the first patch to myself to test it, to make sure everything is right and I strongly recommend to do exactly the same. Now, an example with sending using GitSend email. This is a frequently used tool to send patches. I give you a configuration example that should work for most of the email clients. Of course, then you will have to type in your password but this should work in most cases. And then when you have it configured, you can send your last comment like gitformatpatch-1. Now, you can do git send email minus one. Then you can use multiple minus minus two and then put all of the destination emails to all of the maintenance and mailing list. If you want to send a patch file, this is quite similar. Git send email has a lot of other options that you can explore everything it can offer. Now, the review process. First, patches are rarely accepted in their first version. So be prepared to send the second one or maybe a third one or maybe a tenth one. In your patch description, you should answer the question why you are doing this change and it should be very clear in the description. You should count one week to receive comments and if you get an answer, you will get quoted parts of your patch with comments or questions. They are often brief, maintenance are quite busy. You should answer in a polite way. You can disagree with the review like the person didn't understand what you meant but in this case, you should explain clearly why you think your solution is the right one. You should use facts and address the problem directly. In general, answer in a way that you would like to get the answer if you were in the maintenance place. You can ask for clarifications if you do not understand. For example, if the maintainer asks you to write the change using a different function, you can put a code snippet, not a complete version but just a draft asking do you mean something like that to clarify if you understand the comment correctly. When you have addressed the comments, you can submit patch V2 and the process will continue with yet another round of comments or you may get it accepted in V2. What kind of comments or feedback do we often get? According to a change, if you forgot some small issue, the maintainer may ask you to refactor some existing code and reuse parts of it. They may also require you to use some existing API that you probably do not know. You may get suggestions on how to improve your patch, for example, for better performance or to make it shorter. You may get an alternative solution, a request for clarifications. For example, why do you do it this way and not the other one? And you may get an explanation of a situation where your solution is not going to work. So what happens if you do not get any feedback? You can resubmit after a week but before that verify that your patch title is clear, that the description is clear and especially verify the list of people to whom you have sent your patch. Again, after that, is the change small enough? If you send a few hundred lines of change code that would take more time to review. What happens if your change is not a bug? It's nearly the same with a few suggestions. If you are adding a new feature, I recommend you to communicate early so that you may get the discussion about the way to approach your new feature early and not after everything has been written and it has to be revered just after. From your first change, show that you know the rules, especially the coding style. If possible, add tests and ask some people to test the change for you. If you are doing a bigger change, you may consider sending it as an RFC, request for comment. This means a low-maternity patch does not need to be complete, but it's showing the way you want to approach the problem. It can allow for discussion early, so then if there are big changes to do, you are not spending days or weeks working on the change. That has to be a different way. For RFCs, quite often you do not send it to the whole Linux counter mailing list. Rather, you send it only to the subsystem mailing list so it can't be discussed in a smaller group. If you have a number of lines changed, consider splitting it into logical steps in different patch files. It is way, way easier to review. How to split the big change? A split patch is called a patch set. It's a set of patches that get submitted together. You put one logical change in a patch and you do have separate title and description for each patch. The corner should compile and work after each single patch file from the set. An example of splitting a change into multiple patches. First, you fix a bug that you have found while working on your issue. Then you fix another bug in the same file but the two fixes are independent. You fix a comment in another file that you have already spotted during the same development and then you add a test in a fourth patch to verify that your fixes do work. Or another example. In the first patch, you add a new generic function with its documentation. Then you add another function in another subsystem in a second patch. You refactor the driver in the third and then you use your new functions in the driver in the fourth one. This way, the developers can look into the logical way to understand your change better. Now, let's start wrapping up. There are a lot of corner resources out there. Starting from the corner in Nubis site, a challenge no longer active but you can find the exercises. You can find corner articles from LWNnet for a lot of information about the recent changes and how the process works. And of course, the Linux corner mailing all the messages. We have learned in this presentation that new developers see their patches in each single corner. It means that it's not that complicated and if you do follow the right approach, you can do it too. Start simple, test your setup, learn the rules of the subsystem you want to change. Your first patch doesn't have to be perfect but if you show that you did your homework, even if there's a problem, people will help you. Be respectful to the other developers and to their time and we all learn by doing. This will be now time for questions. If you would like to know more about patching the Linux corner or contributing to any other open source project, you can contact me using a number of online means. Thank you for your time and you can try to start with the exercises I proposed.