 Thank you very much for coming. I see a lot of familiar faces, which is nice. While I give you my own views, I see a few other maintainers. If you'd like to add something, you're welcome. I also want to have this as part of a discussion because this is really a lot about the development process, where the i2c subsystem will be a case study. But I think a lot of things I will show is true for the whole Linux kernel these days, or at least for subsystems dealing with drivers. So my name is Wolfram Zang. I'm hacking on the Linux kernel since 2008. Since late 2012, I think I'm the i2c maintainer. Currently, I'm a self-employed consultant, mainly working for Renaissance upstream team, bringing new SoCs into the Linux mainline. But a lot of things I'm talking about will be like my experience as an i2c maintainer. So I'm mainly wearing that hat right now. And the loose structure of this talk is I will give a short overview, which is about the current situation, which is like a short summary of the talk I've given last week at the Linuxcon about scaling problems in the Linux kernel i2c. Some results of that, what you as a developer can do to get your patches upstream better. And also give you an idea what all this means for the i2c subsystem currently, because a few things will change like from now on. So, I really like to mention this to quote Trent McAlbast with discussion about a role of a maintainer and what a maintainer has to do is really old. So this quote is like from 2012 or 11, I think. And so maintainers are given a lot of duties. It has grown this way, you see them. And when the patch count was not as huge as it is today, that was somehow scaling. People were, maintainers always had support from their list, but in general it came down that they were all this in one person and they could keep up with that. But now we have current trends and now you will see a lot of graphs, which I was data mining the last days and weeks. This is a pretty easy one, number of commits, which a lot of zig-zags so allow me to make a small linearized function out of that. And this is not a surprise probably to you patch count is going up if we just to give you some idea, the left side I started accounting at version 3.0, which is roughly around 2011. And this is real, the right side is version 4.8. So pretty recently where we are about today. And this is the amount of patches going from a little over 2000. So we're approaching 14,000 patches per cycle. This is the amount of patches which needs to be handled and I think it's pretty safe to say this trend will continue if not getting worse or better how you see it. I don't want to say this is a problem really. I mean, it's really great that so many people are participating in the Linux development and giving their code and wanting to have upstream the scalability issue is about how to handle all these patches. Ben, one comment? Comment? No, it's per cycle. And what I want to mention, if you see these graphs of patches going into a cycle, please consider that they usually not counting the merges, which I think is okay because if you do other statistics, then it doesn't make sense to have merges. But for maintainers, merges mean work. Sometimes quite, sometimes they're easy, but sometimes they can mean quite a lot of work. So this should have that in your back as well. Also that the statistics are mainly about accepted patches and you have, until you accept a patch, there's rejected patches, there's a lot of noise on the mailing list. This is so far not counted. I have an idea how to measure that, at least for subsystems using patchwork, but patchwork needs to be a little bit extended. So I would need to talk to developers about that. So maybe we will have statistics about that in the future, but for now it's mainly about accepted patches, and which doesn't show what work went into not accepted patches or whatever. And I also want to mention that when I, like my graphs start from version 3.0, and to my personal feeling, the situation was far from perfect at that time. So yeah, this scalability thing got really worse from a situation which was not good. So patches can have tags, which I like a lot because they shift away work from a maintainer. So people can say act by, reviewed by or tested by. So here are those numbers. Again, the zigzag lines per cycle, and let's simplify this a little and have linear functions for that. And yeah, I was a bit frustrated when I saw them because one tag which is really important for me as a maintainer is a tested by. So I can do, I get patches for a lot of hardware I've never seen. So I really have to do visual review and I have to rely mostly on the patch author that it works as intended. So if another user comes in and say it's tested by, it works for me as well, this is really helpful, but it's basically a flat line, although the number of patches really has increased again. We know that graph state. So I think there's room for improvement. I was also surprised that the number of act bys did not raise because we have an increasing number of patches and act by is also used by maintainers to allow to enter their subsystem from another subsystem. So I would have expected that this number goes up as well. It doesn't, I think this has a little bit to do as one graph, which is a little bit giving hope that reviewed by went up. So I think because we don't have a clear case between act by and reviewed by, this not going up is a little bit hidden by the reviewed by going up. So reviewed by going up is nice to see and really helpful, but you see that we still have a huge gap and the slope is not even not the same slope as here. So while it helps, it's not really improving a lot. It helps to not get worse, so to say. And also there's the fact that one patch can have a lot of tags, so multiple tested by multiple reviewed by. So ideally this would be a good standard case. So I just had a graph, how many patches do have a tag? At least one tag, some kind of tag, yeah, man? Cannot see, are you mean? If I know what was tested, yeah, could be an idea. So, well, the good thing is the patch, you see that at three dot zero times, like a little more than 20% of all the patches had at least one tag, while today we're approaching 40% of all patches have at least one tag. So that's the good news is that it increased and if we wait for another five years, maybe 50% of all patches will have a tag. Yeah, I don't call this awesome. Which also means that all the patches not having a tag are mainly left to the maintainers doing the reviewed and making sure that things do not break. And with a patch, number of patches that increasing, this is really, really a lot of work and it's increasing and I do see a problem there. Which can also be expressed like this. So this is the number of authors. These statistics are based on git data mining and each patch has an author and a committer. And for me, the committer here is basically the maintainer. The person who decides, yes, this patch goes to some upstream tree and then as the very end ends up at Linus tree. I didn't take into account the maintainer's file because it is helpful in cases, but it's also outdated in other cases and I didn't find this information not trustworthy enough to base my statistics, so I just wanted to see who's actually putting the patches into some tree. And you can also see here that the number of authors is increasing, which also means educational work, you have to teach them a few things and all this. While the number of people committing to the tree is a little increasing, but given the numbers we've seen above, I would like to see another slope here. At the very direct outcome out of this also, no, sorry, I forgot to slide. Small hope, the number of people doing reviewed buys and are not actually a committer themselves. So no people doing reviews and not being a maintainer. That number luckily increases, but still it's not the same slope as this slope above so it helps, again, I think to make things not go a lot worse, but we're no close to closing the gap between those two lines. A direct outcome you can see here from my personal, so I take all, this is all about my subsystem, but I don't want to bash other ones. It's just what, it's a direct outcome of all of that. You can see in three, this is the backlog of unprocessed patches I have in my queue. So you can see I started with 3.8 and there was a new maintainer and yeah, totally motivated and I will keep my backlog close to zero. Every patch will be processed and yeah, you can pretty much see when I gave up. And this was probably the first time when I thought about giving up completely also as a maintainer because you get a lot of pressure from people, I understand that developers have deadlines and have been saying, yeah, we need this and that cycle and you feel that pressure. And so that was one of the times where I thought, okay, I reconsidered and did not give up but I had to let go of this, I keep this patch count close to zero thing because I didn't scale that much. I want to make this graph a little bit fairer because here we're at the 4.8 cycle which just ended. So I think it's fair to say, let's say the last two cycles are just usual development. Of course I have not processed yet the patches which should go into 4.9. So the next graph will just cut off the right side a little so the last two development cycles are caught up. Here we are now at 4.6. Ideally, I would think I should have processed all patches which came in until the 4.6 cycle by now. But as you can see, I clearly haven't. And it's not even a linear function you can see here. I think it's more like exponential. And that's the state it is how. I did even get two new sub-co-maintenors, one for the ACPI stuff and one for the i-square-g-max maxing technologies. I'm very happy about them, good people, not enough to scale. Ben, yep. I will come back to that in two slides. So this is a rough schedule. So the data I just collected, I show you for my subsystems, I can only collect for subsystems which use patchwork because they have a database when patches go in. So I did this for a few subsystems which do use patchwork as well. And X4 is quite interesting because this is what I think is what I expected. They have a more or less linear function of that. They even have periods where they keep the backlog constant so nothing adds up anymore. So I think this is what I'd like to see somehow normal. By the way, these are normalized. Like this is your 100%. It doesn't say anything about the patches itself. Like X4 has a backlog of 800 patches by now, I think. RTC Linux is a smaller subsystem. We can maybe call this somehow linear, but we're talking about 20 patches here which is not super much. For all the other, I think you can see also there is not a linear but more an exponential growth. Part of it is I think usual development, as I said, because you have to process the most recent patches, but still, I think this goes so back enough that I see a worrying trend here about. Most people, the big mindset give up. I don't know. Ben? No, that's a problem. I try to only select subsystems which use patchwork well consistently so they mark superseded patches as such. So they should be out of these statistics. But yeah, as I said, I'd like to have patchwork extended to have a timestamp for when you change to the state of a patch. So then you really know how the latency between sending a patch on the mailing list and some state change was, but we need to add that. So in general, I really like LWM, but I really don't like that they're always writing this. And luckily, John Corbett was last week at my talk. So I'm interested if he will write this again, that it's a well-tuned machine and there are no process scalability issues inside. Sorry, no. I disagree. I, we were in a discussion about this. I was interested what will come out of that. And which I just figured yesterday, I don't want to be called a part of a machine, even a machine or part of a machine. As I said, I was about to give up more than once and this has to deal with stuff like frustration or stuff like this. So a lot of annoyance or so. So there are human factors involved. And I'm all for that. We present the Linux workflow as a stable one you can rely on. I think we're, it justifies to be described like this, but I don't really like the machine thing. Yeah, because lots of things happening on a personal level. Okay, what does this huge overload mean? I cannot, although I'd like to, I cannot process patches in a chronological order. I have to do assigned priorities. And for a random developer, these might look random. So this is why I want to mention a few of them here. So you maybe get an idea of what really makes a patch for me more attractive to review it. And these days are very crucial thing of it's a very human thing. You help me, I help you. So if I see people cleaning up the ice-crushy subsystem, doing things which affect are good for a lot of users or review other patches, of course, if they need help, I'll be there as much as I can. This is probably a natural thing. I don't know. Maybe you won't recall it completely fair if you don't have time to do that also, but in order to scale somehow, this is, I think, like I said, a natural thing. Then there are patch series touching a lot of subsystems, cleaning up something across the whole kernel and ice-crushy is just one small part of it. Of course, then I don't want to be the blocker to have a group effort failing. So this is another priority boost. The fact that users, I recently spent a little time on interrupt support in ice-crushy, which was used for a better touchpad driver, so it got more responsive. Like this affects all users having a laptop. Okay, this is a priority boost. Or this is something we'll say, okay, yeah, okay. I should have this earlier than something other, which is just, I don't know. Well, the worst example would be a white space fix up. And of course, yeah, if it's a regression, it automatically, this applies, I think most applies to the whole Linux kernel, but if it's a regression, of course, it needs to be fixed sooner than if I have a new driver pending, which by definition cannot cause a regression. Although I just might take this before, new drivers have another, there is this thing called the RC1 rule because they cannot cause a regression. You may apply them upstream, I can send this to Linux even after the merge window closed. So this is why I tend to work on drivers a little later than other patches if I have the time to work on drivers at all. So this is also to keep in mind. And then complexity is for sure an issue. So I am actually, sometimes I have a bit of free time in the evening and say, hey, I could do some patches and like other people, I don't know, solve Sudoku or something. And if it's then a huge complex patch and maybe a bit sloppy, no. And then there's this easy patch which looks all right, which is just a small incremental change and properly documented. So, okay, I'll take this one. I have to mention here maybe that currently I maintain the I2C subsystem in my free time. So I do, I'm contracted to do Linux work, but sadly the currently it does not allow me to do maintenance work in this time. I have been told by Greg Kruhe-Hartman that I'm a minority by now. But I think most of the issues are also valid for subsystems where the maintainers are paid. They may have a bit more time to work on patches. But the amount coming in, I think, is still huge enough to cause the one other effect I'm describing here. And then there's stuff only maintainers care about. I really, we have one super outdated mechanism where I need to fiddle with Power Mac drivers, which it's in the way for some refactoring and it uses, well, it occupies memory for everything and every user who uses Linux with I2C and nobody uses it, so I like to have it gone. And this is not the kind of work usually developers use or the people I get patches from use because they have like new SoCs, want a new driver or fix that. But I really think it should be done. And stuff which I would like to do to scale better, like having the I2C core, which is now one huge C file, I would like to split it up and I can say, okay, now we have ACPI, now we have device tree part and I can have separate maintainers for that. I can't even do that work, although it would help me to scale better. At least I didn't find the time. And all the stuff like option documentation, you can read that for yourself. And there literally, this is if developers are complaining that their patches get delayed for month, which is a usual latency I have in my subsystem currently, this is really delayed for years in part. So I have this nice I2C transfer tool written for to do I2C debugging. It has been used by a lot of people and they were happy about it. But they have to pick it from the lists. It's not in the official I2C tools repository because I cannot find the time to write a man page for it. That's all it just needs. But I think this part will improve soon. So last week I told a bit about what we as a community should do, what maybe organizations could do. I'm really, really calling out for organizations to step more into reviewing. So like the first part we told not only companies, but mainly companies like join the community, share the code and they're doing that now, which is awesome. But I think we need to do the next part, which is take part in handling all this code and make sure it works. I think we are there yet. But this is more like the bigger scale, on the more personal scale, what helps maintain is. There's one, there's a small group of users which I say here as users of patches, which are not necessarily developers, but need a feature collected from the list, apply it and see if it works and solves a problem for them. And I'd really encourage to those to give tax, like tested by, I thought it was, I said it was pretty important for me as a maintainer or at least to say, hey, I'm interested in that patch. It really solves an issue for me, because I said before the number of users affected by patches has an influence on the priority. And the more people saying I need this patch or I even tested it, that is a priority boost. So if that happens, please do. For developers, yes, please always give me only your best shot. I totally understand if you are missing experience, this is okay and we can deal with that and hopefully the whole community can deal with that. But sometimes people are sloppy, I know they know better and they still don't do it. And this is the opposite of a priority boost. This is really annoying and, yeah, those patches, might not be also completely fair, but might be human to just if they really go at the end of the queue. And I do understand that developers have deadlines to meet and stuff like this, but I hope I point out that I'm not very rich of having time also. So let's deal it out, deal it somewhere. And really for people working at companies, if you have in-house knowledge, try to use that first. Instead of waiting for me to answer your patch. And this is what might come out of it anyhow. If I see somebody from a bigger company, I know that has other engineers working on I-square-C. My first reaction might be, please go to that person and have worked together to give me something which is more reviewed already. Ben, yeah. And yes, you have stuff like you tell them in one place that this is not okay and people fix this one place, but don't recognize themselves. That's a few, five times more in the patch. So self-education is really one key thing. So what I also recommend is if you're new to a subsystem, just, I mean, we have public development. So go to the archive of that list and read the reviews of other drivers just before, shortly before you. See what was mentioned there, see what was criticized there and see, check if it applies to your patch. So you can reuse that review and when you supply that maybe your patch again, it's in a better state and which is a, if it looks nice and if I see, okay, all these issues which usually come up are not present there. Yeah, it's good, it's attractive. Yeah, then sometimes this is true for maintenance as well. I wonder how many tedious tasks are still done manually, act by completely written out. I have keyboard macros for that and stuff like this. So there are a lot of easy things and really, really pay attention to tedious tasks or where you can have keyboard shortcuts or small scripts which can make your lives easier and reduce the sloppiness which makes my life easier, stuff like this. And I run all patches, I apply through my code checkers. So you may save a round of review if you do as well. I have to admit you like check patch is obvious one sparse match coxinell. You have to a little bit to learn to read. There are sometimes false positives and there are sometimes reasons where you say, okay, it's saying that but for reasons I do it like this way but in general, their warnings should be paid attention to. Then? Yeah, yeah, git hooks, that was I mentioned them for maintainers should really use them for developers. Yeah, well, why not? And one thing I really, really like to mention is a review your own patches is such a good thing because if you wait like, and this is the case for I square C currently if you wait for four weeks and nothing happens and just resend your patch without looking at it at that time, you might be a review as well because at least from my experience I forgot so many things or have a different viewpoint that I'm actually able to send a version two just by looking at my own patch. And this again is a priority boost. If I see that you yourself have looked again at the code and found issues and I know how I have don't have to deal with them anymore and you're actively working on that. Yeah, creates attractiveness and I think it's a good method to scale better on this level. And the obvious one, if you have free time and interest, take part in reviewing other patches. This will help me, I think this will help you because it takes more skill to review a patch than to write one. So you will work on your skills and this can improve your patch again. So I understand that it needs time but if you're attracted, interested in that so try to do that, I think it helps on many levels. And the one thing I might be a bit of a paradigm shift I don't know, but I think, you know this classical ping people is a bit outdated. Maintainers, submitting patches says, you can ping every two weeks for me. This is so not realistic to have a patch usually done in two weeks. I would like to say ping me in two months. But then again, I use patchwork so all patches are tracked. So I don't forget patches. I just don't have the time. And especially with pings on a private level, I know I understood that it's not my fault that there are so many patches left unprocessed because I on my own don't have enough manpower and the I-squashy list is not active enough in reviewing because as Thomas Gleixen said, it's a pass by subsystem. You go there, drop your driver and move on to SPI or whatever. Unlike complex networking devices where the high-agent city stay and keep on working. So I don't have time to reply. I'd rather review patches at that time. There's, as I use patchwork, I don't forget patches. So there's nothing gained, but even although I know I'm not a failure, it still adds a little bit to the frustration, especially if it's written privately and giving various reasons. I know them, I understand them, but I can't help. I can't help that. So this is, this might be true. I consider, well, asking the criminal community, maybe we can just discuss here a little more, but I think pings should be reevaluated. Get. Okay. I tend to forget. I see. Yeah, that might be a problem. I see, I see, I see. Okay, yeah, this is the heterogeneous way of handling things in the kernels, often a problem. I see that, yes. And with ping here, I really mean just ping and nothing more. If you say ping because this is a really issue, I have sympathy for that. Or ping, I really need that. Can I help in getting this upstream very fast or stuff like this? But giving a reason, giving a good reason is way better than a standard ping, which is, but most of the pings I get, it's just ping. Sometimes it's a gentle ping, but a reason would be a lot more helpful. So what does it mean for the I2C subsystem? This is just repeating the slide from what a maintainer is. We had that before. I just want to bring that back to your attention. And this is how I will change the role in the future. I will really, really point out that I'm just one of the software architects. I'm just one of the patch reviewers and I'm one of the software developers. And I really push back to the community. So if people ask me, can you review this patch? I will point out, ask the list. I'm just one of them. And so I can have, and like other people on the list, I will be mainly reviewing patches which I'm personally interested in. This is another human factor. Like I said, it's my free time. And to reduce the frustration level, I will just focus on things where I see a personal benefit to it. On another side note, I might try to get some more funding to do the work which will of course then allow me to work also on other patches with a different, from other areas. But also to do that, I need time for that. So I have to cut on these issues I just strike through. In general, I like this work. I'm totally happy to be the patch committer, maintainer and work out how to get things upstream and how to deal with other maintainers and even patch reviewing. I do like this stuff, but I have my limits. And yeah, I will, to scale somehow, I will focus more on higher levels. Like how do we solve this problem as a community? And not, I will working on higher levels, not so much on patch level. As you probably have seen, this won't work out at least for me. So what it means for most of the patches, sorry, but this will even more, bigger latency. I'm pushing back. I'm really calling out the community to find solution for that. The four advantages, I like the first one very much. I want to stay sane and I don't want to run around deeply frustrated all the day. There were times it was like this. It was really, I felt bad all day because all these patches and they were important to people and I just couldn't cope with it. Yeah, and as I said before, I will really focus on trying to solve this on a higher levels because I think that's the next huge step we as a community need to take. So this is what I have for you today. I think we have a little bit time left for other questions. I'll be here all day, so ask me questions if you find me. And I would, a little bit of advertisement. There's a GPL Boff lunch today. I will be there, we will meet at the lobby and people like Bradley Kuhn and Harald Welter will be there to discuss various things about GPL issues. You know, there's been a huge threat about it somewhere. And one thing we learn from that is we need to get people around one table and talk about it and get more understanding of each other and this is one part of it. I'm looking forward to it. So another place to meet me and maybe you're interested as well. So this is it for now. Thank you very much. And let's start the question round. Juna. Minus. They were about to screw up things generally. I mean, in a really bad way. So it's sort of good to involve the subsystem in terms of this kind of hard administration thing. I don't know really if it should have a good way to handle that. It's just like another piece of big piece of work that all of us had a push to ask. When we had boardfights, there was nothing like such. Yeah. And in that sense, boardfights were better for subsystem maintainers. But they were really crappy for the community at large. But the way we have solved the boardfile issues with boardfights are for arm and power and we're not really having some problem. But the way we solved it was actually push more work on the subsystem maintenance. Yes. And that was very unfortunate, I think. If people have free time, it would be very nice if we go to the Device Tree Discuss List. And if there is any bindings or something that you're familiar with, spend a little time on reviewing that and making nasty comments on the nice comments from people who submit new bindings. I very appreciate it. Thank you. Yeah, I agree. So Device Tree bindings are easy to create, but not easy to maintain. So there are some questionable ones entering the tree. So this is definitely a problem pushing that to subsystem maintainers. And I totally see the ICPI thing, where I come more from the Device Tree part. So I feel more at home at this one. But I'm so happy about the Intel guys helping me with ACPI, where I don't really know a lot of things, especially when it gets too complex. It's a real specification going on. And they're trying to standardize a bunch of things. But the subsystem maintainers still have to deal with all the specifics, and this hardware, and so on. And on top of this problem, that was very unfortunate. So I was going to have a couple of comments on. So I actually gave you this problem in the first place. So I actually thought a lot of that was going on. There's also a couple of issues from the developer side that I don't think is being addressed yet. And so most of my work, there's some of these short-term projects where you go in, do some work, fix some of the people, and you get some patches out of them, like OK controls, you get a few patches away, you send them, they don't go through the first type. But I now don't have time to come back to them. Is there any, could we have a system that you could take all the patches on? So far, I think the process is to post the patches as ROC or something and hope that somebody will find them by googling around or using patchwork or stuff. This is subsystems using patchwork. My advertise that they, you just enter the substring of your driver name and it will list you all unprocessed patches for that driver. That might work for such subsystems, but not everyone is using patchwork. I do. One other idea I'd like, I think it came up at some kernel summit, is like you have an email address you can send patches to and it will run it through code checkers and do some cross-builds for various types where often things can run and get the response so you know before sending to a mailing list if there are issues with that. I think that would be useful. I don't know if the zero bot guys are interested in that, but it would be, for sure, a useful thing to have for developers. Yeah, I do. Okay. Yeah, okay, that happens. Honestly, in an ideal solution, I don't prefer to have the version, the first version of the patch on the mailing list be already be covered. So first certain developers send it to this address, fixes the things and then... Yeah, yeah, yeah. But I agree, having the mailing list monitored is a good start, yes. Yeah, well... I guess I or someone just needs to talk to the people if it's a mailing list or addresses then a technical detail, yes. Other questions? That would be a disaster. Yeah, that will cause a lot of issues, I agree. So let's go into the coffee break, I think. Thank you very much.