 Ah, that worked. So I hope you're all in the right place. This is the kernel mini-con for LCA 2015. If you're in the wrong room, please leave quietly. Today, we've kind of got a half-program, half-unconference style mini-conference. Some may say that's due to lack of organization, and I'd have to own up to that. So we're going to have a couple of presenters up until just after lunchtime and then on conference. So I'll introduce Jeremy Kerr. He works at IBM Oslobs. He's going to talk to you all about patchwork and how that's used for kernel development. Thank you, Tony. As Tony said, I'm going to kick off the kernel mini-con with something that's not the kernel, and it's not even C. So my name is Jeremy Kerr. I, as Tony said, I work for the IBM Linux Technology Center. And this is something that we've developed, oh, sorry. I've developed kind of enormous spare time to help us do a bit of open-source work. So some of you may be familiar with patchwork. And the reason I'm talking today is because it's getting a bit more usage on kernel-based projects. So a lot of the sub-maintenors of various subsystems in Linux have started using patchwork. And this talk is intended to give you a guide to some of the lesser known features of patchwork and how you can use that to incorporate into your workflow when using the kernel and other projects as well, but focusing the kernel today. So you may have seen patchwork before from patchwork.oslabs.org. We host quite a few projects now. And patchwork.cernel.org hosts another set, but that's not maintained by us Oslovers. And there's a few others set around for, I think, I think the OpenWRT guys. Some of the smaller projects have set up their own instance of patchwork. It's open-source software. So they've started their own servers and are running patchwork on that. But the talk today is based on the code levels we're running on patchwork.oslabs.org. Because that's what I control. So that's all I can talk about. So if you haven't seen patchwork before, a bit of a brief introduction. So patchwork is subscribed to various open-source mailing lists with the approval of the project maintainer. It then receives all the mail from that list. And anything that finds as a patch in the mail stream, it pauses and keeps a record of that patch in a web UI. Then we keep a little bit of other information about a patch set by the maintainer of the project itself. And they will basically record the state of the patch at the time. So it comes in. It'll be in new states. It will then either be set by the maintainer to accepted or rejected or under review or superseded or changes requested. So basically, it's a state management system for patches. As a bug management system, we'll keep track of the state of bugs. Patchwork will keep the state of incoming patches on the main list. And then that way, we'd sort of track the flow of a patch in its lifetime, either in incorporation into a project or rejection or a request for change. So this is what it looks like. This is the patchwork instance for one of our projects. And here we have a list of patches on the page and there are states down the right-hand side. So this tells us we have a few new patches that have come in and the rest mostly have been accepted by the project maintainer. So the idea here is we have a basic patch flow of new patch coming in. It goes into an under review state. And then it's either accepted or rejected. There are other states as well, but this is kind of the main thing we're talking about. The maintainer's job here, we want them rather than worrying about which patches they haven't looked at or have looked at, they can concentrate on the actual patch review part of their work and not too much on maintaining mailing lists and maintaining files of patches that need to be reviewed or that sort of thing. So patchwork gives us a list of what needs to be done and forms a to-do list for our project maintainers. But also we want it to be useful for the project contributors as well. So the idea is that the state itself is visible to everyone in the community. They can see if their patch has been accepted or rejected or has had some sort of action from the maintainer. If not, they can follow up or potentially see what's happening. Now today, I'm sure most of you have seen patchwork in some form. In fact, I found a link from Google to the patchwork page for patch. What I want to do is cover some of the lesser known sort of things about patchwork. One thing that's not that obvious from the web UI is we have a command line client for patchwork called pwclient. Now if you go to the patchwork website, there'll be a link to download the pwclient script. It's just a brief fit of Python, fairly self-contained, and allows you to interact with the patch database from the command line. So we have a sort of a git style sub-command type thing. So pwclient list will give us a list of some of the patches in our current project. So in this case, we're seeing four patches, one rejected, one new, two accepted. So that gives us a way of seeing what patches are outstanding for a particular project by looking at just the new and accepted ones, for example. I'm sorry, the new ones. And then we can use that to possibly feed into scripts for reviewing or what not on our local machine. In terms of reviewing, we can download patches. You give it an ID. So all the patches are referenced by ID and patchwork. And that will allow you just to download a particular patch. Often, you want to actually incorporate that into your version control system. So recently, we've added a pwclient gitAM that will download the patch and gitAM at teatory straight up. And then you can review it as part of your normal git workflow as if you're merging from a git tree. And if it's acceptable, you don't have to do anything. We can also update the patch database from the command line. So in this case, we're setting the status of patch44603 to accepted. So that will then make the patch listed in the web UI as accepted and inform the community that this patch has been accepted and is listed as such in the database. Now, the IDs are a unique way of specifying a patch. But often, we don't know the ID for patch in advance. So we have this concept of a hash for a patch. And this is a hash of basically a subset of the lines in a patch. So we try and get the context. We also ignore things like new lines and I think the line offsets in the patch itself so that if a patch has been applied with a small amount of fuzz, we'll still match the same hash. And that allows us to reference patches in patchwork by the patch itself. So if you have a patch file locally, you can generate a hash from it by fitting on standard into this PWPASA program. And that'll generate the hash. Then in PWClient, rather than referencing the patch by ID, you can reference it by a hash with a minus h hash. So that's useful for, in my case, I use it in a catch-up script. So this script here will, between two revisions in Git, it will hash the patch that forms that commit and then use the hash to reference the patch in PWClient and update the state in patchwork to acceptance. Yes, Paul. So Paul's question is, what if there's more than one patch in patchwork of the same hash? We will abort in that case. And in fact, I think, yeah, I'd sort of be conservative there and don't update the patch so that we don't update the wrong thing. Question up the back here. Is there a way into that? So the question was, can patchwork do this using the using-mails on the mailing list? No, it can't. I'll cover some bits that we can use the mailing list for control, but this isn't one of them. The problem is that we don't really have any authentication. So anyone could send a message to the list that will update patchwork, faking a maintainers reply. I mean, it's not such a serious concern to open anyway, but my thinking there is we're going to do that. And if there is a good way we can incorporate that, I'd be happy to implement it. It's just the authentication there, which is my concern. So yeah, so this script here basically generate the hashes and use those hashes to call PWClient. And this will basically, since you've committed all these patches into your Git tree, they're pretty much accepted, right? So it'll set all those states correspondingly in the patchwork database to accept it. Then you will basically remove them from the maintainers to-do list. And you know that you don't have to action those anymore. One other recent development is patchwork notifications by email. So this is an example that was sent to me, I think, in March. And it's told me that three of my patches for the FWTS project have been changed state from new into accepted. So this isn't used by many projects at the moment. It's enabled or disabled on a per project basis. And the reason we don't have it enabled a lot is that generally a maintainer will do this anyway. So they'll often say accepted thanks, rather than relying on patchwork to do that. But if you are running a project that's managed through patchwork, just let me know. And I can enable this for your usage. And you don't have to send those accepted thanks. Mail's out, otherwise this is what they're doing. And the idea here is that we batch the updates to about 10 minutes. So if you're updating patchwork and doing changes, if there are no new notifications to go to one user for 10 minutes, then the email goes out. So the idea is you don't get to spam people with every single change that happens. We group it all together. Some of the other, as we sort of touched on earlier, mail control facilities patchwork has is the ability to set various properties of a patch or control how patchwork parses one emailing patch. So as the original patch is coming, we can set various headers. And the first one is tell patchwork to completely ignore this patch. And when I was maintaining one of the subsystems of the kernel, I would often send out a summary of this is my changes going into the next merge window. And because all those patches have already come through the mailing list, there's no reason for patchwork to track those again because they'd just be duplicates. So I just set this header on the mail's going out to say, don't track it. It's also good for if you're including an example patch in a conversation. So you're saying, let's do it like this instead. You don't want actually to be merged as a patch itself. You can just say, add this header to your mail and patchwork will just take you where it happened. Another one is you can actually set patchwork states the initial state of a patch. So again, these headers are all for the initial patch that goes out. You can set the initial state for a patch to certain values. So in this case, we're setting it as an RFC, meaning that the main handler doesn't necessarily have to action apply or eject or anything like that and can use that original state in the patch database and market as such. One interesting thing is you can do this, but you pretty probably don't want to do that. Patchwork also has the concept of delegates. So when a larger project such as the kernel, even kernel subsystems have delegates responsible for certain bits. So example in PowerBC, we have a PowerBC maintainer plus subsystem maintainers that do various features or board support or platform support. And so we have delegates for patches. So a patch that comes in that touches the 32 bit PowerBC code may go to one maintainer. Patch that comes in that touches the KVM code may go to someone else. We actually reviewed by that person. So we have this concept of delegates and you can set a delegate on an incoming patch by email on your header. If you already know who is going to be reviewing a patch, you can set this header and it will automatically be delegated first thing when the patch comes in. And it means that the maintainer doesn't have to go through that list and find out, okay, this is a 32 bit PowerBC patch. I need to assign that. It's already assigned before the start. Now I'd probably suggest working with your subsystem in working out who is responsible for what. If you've been working in an area for a while, you probably know who's going to be reviewing your patches and this can be a helpful little, sort of skipping that first stage of patch review, of patch review, but the first stage of triage and getting it to the right person to review. Another feature we've added fairly recently is we count now the act review and tested by tags on a patch. It's still under development. I'll cover sort of some of the more controversial bits of this in a second. But so if a patch comes in, if there's any acts reviewed or tested by tags in the original patch, we count them. If there's any follow up emails coming in that have one of those tags, it'll also be counted. And then we display that on the web UI. So this is an example list of some patches that hit the PowerPC list recently. And we have this ART on A-R-T, Act Review Tested, and the counts of those tags that have hit each patch. Now the idea here is some project use criteria before a patch will be reviewed or be considered as upstreamable and often people go, we need two acts before we review it by a maintainer to sort of spread the load a bit. So the idea here is that if you have something like that policy on your subsystem or your project, you can wait to let A columns hit two and then review it or what implement whatever policy you need. But it gives a summary of what has happened so far without actually digging into the patch itself and looking at, looking through all the comments and find those things. So as I said, it's a bit of an underdevelopment feature at the moment. We've settled on these three tags by default. I haven't been asked for any other tags, but I don't wanna sort of paint myself into a corner and only ever support those three. So there is a question whether we support custom tags on that if we have someone influence like a CI process that adds a tag, do we add that or do we not? And which projects will use which tags? So there's a bit of, I guess, discussion going on about that. And also how you sort, can you sort by the ART column? Does that allow you to, does it summit? So there's still some design features there. If you have any brilliant ideas, please let me know. One of the other features or one of the other sets of changes that's been happening last year is a lot of documentation particularly about the installation process. So we've tried to slicken up the installation process for Patchwork itself. That's if you're looking to run a Patchwork system on your server. I'm guessing most people don't, but if that's the case then have a look. A lot of the documentation around the PW client binary is being fixed as well. So we've added some of the help there, made it a bit more intuitive perhaps. One of the things that happens during a Patchwork talk is feature requests. So we're always happy to find out how people are using Patchwork, what things are missing. I mean the idea about it is to enable communities to work better. So if there are feature requests, either we'll chat about it in the Q&A session or just come and find me after the talk. Also if you're interested in developing Patchwork, we've got a few slides on how that all works. As I said, it's all open source. It's all in Git. So again, it's written in Python. It's using a web framework called Django. If you're familiar with that, you probably know more than I do about Patchwork. And it's relatively easy to set up a development system without having to install a full web server and configure Apache. The way you do that is you just create that up. So once you've downloaded the Git tree, there's a manage.py script. So you use your, in this case, Postgres createDB command to actually create an empty database. The syncDB command will initialize all the tables that Patchwork needs. And then the third line run server will actually run a little web server completely in Python that you can use to make changes to the source and run it without having to, like I said, install Apache install to make for your Apache do all the other sorts of things. The actual, the web framework we're using is Django, as I said, and there are some great docs on the docs.jangoproject.com website. And that, basically, Patchwork is a very, well, quite a thin layer above the facilities that Django provides. So that will kind of give you all the bits you need to start hacking on Patchwork. If you're looking at hacking on the PWB client binary, that's pure Python. It just uses XML RPC to talk to Patchwork's server, so you don't just do any Django stuff for that. You can just start missing with Python there. One other important thing on the server side is we have a fairly comprehensive test suite. So this command will test any changes you've made to Patchwork. And if you're fixing bugs or implementing new features, having a test suite will always help. Also on the development side, we've had quite a few, quite a bit of activity recently on our list, where some folks, I think, from Intel have implemented a new UI for Patchwork. So my plan for the next, I guess, in the next month is to set up a beta.patchwork.ozlabs.org site, where we can try new UI features, things that don't affect the database, and run that against the existing Patchwork database so we have sort of two things running once, and we can do a bit of comparison and a bit of experimentation on what's going there. So I guess that encouragement, so if you have a UI change you want to try out, set me a patch and we can try it on the beta website, and then no one's going to yell at you straight away, at least. So kind of a couple of important resources here. The website for Patchwork and the Patchwork man list. Of course it's Tractive Patchwork. That doesn't mean I'm great at following the Patchwork lists, but yeah, we have a list at our website. And that's pretty much it. I usually end my slides with a thank you note, but I wanted to back it up with some data, so I ran this. So basically finding all of the comments, the initial comments on all the patches, the line, the metadata you submit with Patch. Found all the ones that were accepted and which ones had thank in the comments, and we got this. Which means, so if you say thanks in your original Patch, you get a 39.5% hit rate. If you don't say thanks in your Patch, you get about a 43% hit rate or accepted rate. So now I guess I was planning for a lot of discussion. Feature requests, that sort of thing. Anyone want to open things up? Excellent, Paul. Paul's asking what's the statistical significance of the... Yeah. So probably not very high. I can give you the data if you want. This was a late night... We have about... In this case, we were using just the kernel-based lists that we track on Oslob, so the next PPC and NetEv. It may be different from other projects. Maybe they're more polite elsewhere, but that's what we get. So the question was, we've done some integration work with... I shall put the polite version back up. We've done some integration work with Git. Is there any plans to integration work with other version control systems? The integration work done with Git is very minimal. The only... Just as a quick think... The only thing that we actually do is the PW Client Git AM. And there's no reason we couldn't implement the same functionality with a different version control system. And there's nothing... There's no kind of... I've intentionally made it VCS independent. So there's nothing that assumes Git. There might be some sugar we can add to make it work better with other VCSs. But the only reason I've used Git is that's what I use. So that's all. If there's other things, then I'd be happy to look at that. Do you want to make for the mic? Is that coming down and working? Do we have the mic? What are your plans in terms of scope? Like, how far do you want to implement features that say are available on sites like Bitbucket, GitHub, which have quite a variety of things around the source control? So perhaps integrate... Like having an API so that projects that have continuous integration can apply patches and kick off builds based on that kind of thing? Yeah, I think... I want the core of Patchwork to be simple. I don't want it to grow into doing our CI stuff on the Patchwork side. So I'd rather go in the approach where, like I said, if we could provide an API, then we can do that. Like I said, the PW client interface uses an XML RPC API, so it can be used by anything else. It may not be suitable for other ones, because we haven't considered that in the design. One of the things I would like to do is to rework that API, and if there's basically a second consumer of that, then that will give us some more data points about what the new version should look like. And again, the simplified way to go is, I think, would be preferable for me. So there's some sort of craft we've built up over hooking the API directly into the PW client, or making the API very much customized to the PW client consumer, but if there are other things as well, then we can definitely help us to define what the new API looks like. Cheers. There's another question over here earlier. Nope, I think we're good. Tony. So I understand the issues around authentication and stuff like that, but you know when you release a version 2 of a patch series, right? Is there any way to automatically supersede the version 1 or the previous versions of that patch series? That's what I'd love to do. So having that automatic superseding kind of actually saves a lot of work for the maintainer, so it's something that would give us a lot of work. The only problem is I haven't figured out a reliable way to do that yet. Now, I could argue that something semi-reliable would be better than nothing. I'm not sure yet, but... because there's multiple different ways that someone follows up to a patch. Someone can reply with a... something that goes on top of that patch, or something that goes alongside that patch, or something that completely replaces that patch, or just a little fragment of patch that says, no, you should do this particular part this way. So doing it automatically, I guess, is hard. If we can assist patchwork with some headers or some bits of things there, that'd be cool. But I'm not sure. So maybe there's something we can do with looking at the actual contents of the diff itself. If the diff's that similar, maybe it's an update. But I don't want to sort of do it and get it only half price. Jeremy, what would be really good for that is to have Git send an email at those expatchwork headers. Yes, yeah. And then you can have superseded in there, or RFC and things like that, because that's how we tell the email subject line what the patch is going to be, whether there's a V2 in it or there's an RFC and so on. So actually making that sort of status a first-class citizen of Git send email in some way, that can then, because there's other tools that are built on top of Git send email that could use that, like Gilt and whatnot. So actually having that metadata come through the actual formatting process as the diffs or the patches are pulled out of Git would just flow through. Then you've got an official protocol way of doing it. Yeah, definitely. What I would say is you can add a header that says this patch supersedes this other patch. But having that, I'm not sure how we can, when you do the original rebase to make V2 of the series, do we keep any information about the ones that it's... That's a pretty good one for Git send email. Yeah, definitely. I mean, yeah, so the states definitely will help there, but we can't... Once a patchwork has hit... Sorry, once a patch has hit patchwork, we can't update the state any longer from email. So we will need to have some sort of way of saying that this one... You can set the state of the patch that's actually coming in, but not some other patch that's already in the database. So we need to add that functionality to say this patch is new and it also supersedes this other one over here. Yeah, yeah, someone like that would be great. And then having that, you need to know the referencing before the email is generated, which is the... It's just an engineering problem. We can... You're just having to integrate it into the tools that you use to catch teams. Definitely, definitely. So the comment there is having the tools understand that will give us the method of the assistant that Patrick needs to do that. The Garrett change ID stuff is quite a pain to set up and being forced on, I've tried using Garrett for a bit, but it is at least one way of ensuring that you get that consistent flow through of what a patch was as it moves along through development, as long as something's adding it in. I mean, I'd prefer that patchwork just nicely added it in if you didn't have it because the Garrett way of you're rejected if you haven't got it there is a pain. But at least it could mangle the patch to put the line in and when you download it from patchwork, at least you would have that line and it would know just the next one. It doesn't help with the, you know, someone just grabbing it straight off the manual list though. People would have to... You probably would have to encourage folks to add one themselves before they sent it. But it's the best... It feels miserable, but it's one of the better ones I've seen. And a lightweight version of that might be your best option. Definitely, definitely. And it's a couple of design points. It would also be really good if it was... You use the same compatible things that we could try and try out different systems. Definitely. A couple of design points of patchwork. Firstly that I don't want to enforce the use of patchwork for a project. So if someone's happy using the manual list, they shouldn't have to be interacting with patchwork on a daily basis if they don't want to. So they should be the maintainer's choice to use that to manage their workflow. But the contributors, we don't want to add an extra step to contribution. We don't want to make it, you know, you have to learn this thing to contribute to our other projects. So we have the optionality of using patchwork. And the second one is I don't want to start adding... Unnecess... Well, adding patchwork-specific metadata to a project's change history. You know, you want your change history to be pristine only about the project itself rather than adding bits that are tool-specific or something else. So I need to figure out a way of having some sort of tracking across different sets of patches to say this one's related to that one without affecting the version control, which may be more challenging than I'd like. But if there is some way that'll be ideal. Paul, I just want to go back. Do you think there's any advantage to scanning mailing list archives for historical data? Not really. I think for the projects that have started using patchwork, no one's requested to pre-populate it with stuff. Basically because it gives you a huge workload to start with. You've got all these patches in new state and we saw about 50% of them would be accepted. So there's a whole lot of kind of workload that creates just by doing that initial import. But it definitely can be done. When I set up a test instance of patchwork, I often load up the Papui C mailing list into it just as a test dataset. But we can just run the parser on an existing box and it works fine. Paul, yeah. How's your throwing? Just sort of following on with that thing of authenticating users. I'm just wondering what other authentication and authorisation things you think are going to be needed in the future for if you're talking about a large project which might be distributed and might be dealing with people of less than pure intent. So are you asking how we would implement a secure patch update system? I'm thinking more what types of authentication and authorisation do you think are useful in that context? More than just a sort of email address kind of thing. I mean the email's fine. It's a way of transporting a bunch of text. We need to make sure that whoever's updating a patch is a maintainer of the project. So that means that you can't do things like let's say a security up change is coming in. If I can impersonate the maintainer, I can say look it's rejected or it's accepted and that never shows up in the actual maintainer's workflow. So that's kind of the situation I want to avoid there. And we can do that just by, let's say, you can jib-a-joo sign the mail, that updates patchwork. And perhaps have some method of setting the acceptable keys that are available for a specific project. So if you get a signed mail that is an update command, then we need to authenticate it against the correct users. So that will probably be the simplest thing to do. It's just a more engineering problem. I don't think it's incredibly hard. It's just someone needs to do it. And I guess the fallback there is that you can just use PWA client to do that anyway. So if you're running a script to sign a mail and do things, you can just run PWA client and update patchwork that way rather than sending email. So, I mean, it does solve the problem of sending one email to both inform the list that the patch has been accepted and to set it to update the state and patchwork. So that's nice. And if we can do that, I figure. Does patchwork have the notion of a set of patches like a series of patches, one of five, two of five, up to five of five? And does it treat them as a unit in any sense? No, it doesn't. So that has to be the main... Is there any plans to do that? Someone has to do that. Yep. And the idea is you can... You may want to just accept one of three rather than all of three. And we'd have to sort of keep the same granularity as we have now, but allow you to do actions in one. Now, we sort of have a manual way of doing that. As in you can create a bundle of those one of n or the n of n patches and then update the state of every patch in the bundle. But again, that bundle has to be created manually. So if I can do that, there's been some suggestion that we should just look at the m slash n in a patch subject and try and correlate it with other patches. And that would be ideal. But again, it's just no one's actually implemented yet. But I'd like to do that, yes. Yeah, I was thinking it would actually make the process of superseding patches a lot easier if you could match up the subject on the zero of n with the subject on... Probably the same subject on the zero of n the next time, even if all the patches in the series have got different names. We've got an x-series supersedes sort of thing. So again, we can use the... Yeah, definitely. Sounds good. Patches are. I don't speak Python. Great. Any other questions? Any of the feature requests? Excellent. I'm out of here. Thank you very much.