 Developing Enterprise and Community Distributions at the same time. Impossible? Please welcome Fr. So just to make sure the recording is working, my wood password is star, star, star, star, star. Now everybody knows about it. Okay, so my name is Frédéric Rosa. I'm working at SUSE as one of the SUSE Linux Enterprise Release Manager. And today I will talk to you about how we try to develop our Enterprise distribution alongside with the community distribution in open SUSE. And some people think it can't be done, and we are trying our best to make it possible. And I'll use a tumbleweed, leap and sleigh as an example. So first, just a quick guess. How many people know the difference between tumbleweed, leap and sleigh? Okay. Quarter or third of the room. So that's good because I did a slide just to make sure everybody knows what I will be talking about if you are not used to open SUSE. So I will use very often the term sleigh, which is SUSE Linux Enterprise. This is our SUSE product, so a set of products, like sleigh, the server product, sleigh, the desktop product, and among others. So this is an Enterprise distro, long-term support, et cetera, et cetera, et cetera, which is developed by SUSE. Then we have what we call open SUSE factory, which is our development repository. So for people used to federalize, this is equivalent of Rohide, mostly. And for Debian, I don't remember the name of the development repo, but you get the idea. We have also now, on open SUSE, we have two distribution, not just one. We were a bit crazy. We thought, yeah, why do just one distribution? Let's do two. We have what we call open SUSE tumbleweed, which was initially created by Greco Hartman. And then the focus was changed a little bit. And that's open SUSE rolling release, a bit like Gen2, Arch, et cetera. Except that it's based on open SUSE factory or development repo, but it has to be fully tested and it has to pass our automated tests powered by OpenQA to be released. So on a good week, you get one release every day. On a very, on a slow week, you might get two release per week. It depends. And usually you get one kernel every week. Or on heavy weeks, you get a lot of kernel changes, et cetera. So this is a very fast rolling release. And on the other side, we have now what we call open SUSE leap, which is our stable distribution, which is based on the SUSE Linux Enterprise code base plus some part which are coming from factory or some specific repose, but most of the things are coming from factory. So we have a mix of something very stable as a base and something more fancy or more recent on the top. So I hope you can see the slides because there is a lot of light. But mostly you can see how we create the various product. So we have open SUSE tumbleweed on the top, which is St. Richard. We have open SUSE tumbleweed, which is always rolling. And we have our SLEE code base, SLEE 12 service pack 2, SP3, and then SLEE 15. And then Leap is based on top of the shared code. And when we do a new service pack on the enterprise level, we usually either we backboard things from upstream or from tumbleweed or we just grab packages from tumbleweed. So this is the theory. Now let's talk about the reality. First, what we learned from our past. So I usually use the same example. So sorry for people who attended the open SUSE conference previously. So I'm going to use the desktop that we ship on SLEE, which is GNOME. And for our last major code base of SLEE 12, which was done three to four years ago, something like that. We decided, oh, we'll take GNOME 3.10, which was at that time the release which was available upstream, more or less. And then we patched it. We fixed bugs. We added new feature for customers, et cetera, et cetera. The only thing is that we just kept our patches in our SLEE repo and we didn't push them on open or we tried to push them upstream, of course. But we don't always push them to open SUSE factory. So to our development repo. And we always said, oh yeah, we are on a hurry. We'll do that later. And then we released SLEE 12. And then one year later we released SLEE 12 SP1. But we had so many bugs to fix that, okay, yeah, we'll upstream later. And so basically nothing happened for SLEE 12 SP1. So we had two code base, not completely, but really diverging. So the stable GNOME for the enterprise customer and for people on open SUSE, something more recent, directly from upstream, at losing some time fixes which were done for customers. So it was really not great. So we realized that, okay, we have to take action there. We really need to fix this mess because otherwise it will be unmaintainable on the long term. Just as a reminder, the code base for SLEE product, we have to maintain it for 13 years. So you really want to make sure that you don't, you have something which is sustainable. So what we say that, okay, for the next service back, what we'll do is we want to think our GNOME version in SLEE and the one we have in open SUSE. So we took the chance to update the version of GNOME from 3.10 to 3.20. And at the same time we said, okay, can we make this source package exactly the same for both projects? And we did it. It was 300 packages. We had people discussing between the SUSE desktop teams and the open SUSE desktop teams, which were completely different people. The open SUSE desktop teams or the open SUSE GNOME team is known to be very picky at documenting changes. Richard is part of it. Now these days I'm also part of it, but at that point in time, and it's still ongoing, you have to make sure your change log are really verbose, well-written. All the patches has to be properly documented with a bug number, et cetera. So it's really something that sometimes in the SLEE products on the desktop side, people were a bit sloppy. So they have to cope with the rule of the open SUSE guy, okay? And we made sure to help the developers that they could see how much they were diverging between the SLEE product and the open SUSE packages. And so far when there were changes which were really, this seems to be just useful for enterprise customers. If we do that for the open SUSE people, there will be a set, et cetera. So let's add the patch, but we only build it for SLEE. So we made sure that we were not breaking the open SUSE side of the world. And we also discovered that sometimes the open SUSE people were suffering from bugs that we had fixed in SLEE, but we forgot to enable the patch. So they had the patch, it was not just enabled. But we did it. The point was that we were able to work internally and then push to open SUSE. We made sure that we had checked in place to make sure that packages, this is a requirement for the SLEE people, we should always never lose a bug number in change logs or feature number, as Richard talked about in the change log or CV number, because we want to be sure that we never have a regression in the code base when we upgrade a package. And this is something that sometimes the open SUSE project was not that concerned about. So this is something we kind of enforced. But the open SUSE people were very helpful in accepting that we injected bug numbers, which were kind of fixed, but the code never showed up in the open SUSE code base, because it was just fixed by upstream, but we just had a way to tag that, okay, this bug, it has been fixed in the code base. And so far, these days, the bugs which are found on the SLEE package, but where the SLEE package is coming from the SLEE, it's pushed to the SLEE people. So in short, it means that packages and users using packages which are inherited from SLEE get basically the SLEE quality or at least what we sell to customers, but they get it in open SUSE. So best of both worlds. So this was the past. And now, let's see what we have currently, what we are working on, which is SLEE 15, our next major release, and SLEE 15, which is the equivalent on the open SUSE side of things. So first, let me show you how we get things into our project, how we get changes in our project. So how many people are familiar with the open build service or OBS? Oh, that's good. And not only open SUSE and SUSE people, that's even better. So this is roughly what's happening when people want to do a change on the code base, either on SLEE, so we have our own copy of the build service, we have an internal build service, and on factory, so on table with the open SUSE side of things. So I use the term developer. You can replace that with a contributor if you prefer, but volunteer. I'm fine with that. So developer is going to create what we call a submit request, basically a change, a patch with a change log, a new version, a new target, whatever. He's going to send that, so he's going to build that in his own branch, and he's going to send that to a specific distribution, so it can be SLEE 15, it can be factory, et cetera. Then this change is going to be staged, and a small subset of the distribution is going to be regenerated. So basically, we recreate the entire distro based on this change, and we are going to test it with OpenQA. We are not going to run a full test suit, but we just want to make sure that basically, we can still install the distribution, the desktop still starts, and the icons are still in the proper place, et cetera, et cetera, et cetera. So this takes about 20 minutes to one hour these days. And the change is blocked at that point. Then at the same time, we have a review team, which is going to check the change. So the review team can be on the SLEE side, it's usually a release manager, and other people who are used to the code. On the OpenSuser side, it's going to be also a release manager from OpenSuser, and very often, before that, there will be the people responsible for the set of packages, let's say the GNOME OpenSuser team, they are going to first review the change and say, yeah, it looks fine. So we have the four eyes review principle in effect on both side. And if the change is fine on the OpenQA side, and if the reviewer say, yeah, it looks okay, fine, then we accept the change. It's going to be integrated into this role, which is going to be rebuilt. And then we are going to rerun the test suite, but this time we are going to run the test suite with all the tests. So you have an idea of how we integrate changes in the distro. So we decided to put at that level here a lot of automated reviews. And a policy to make sure that we don't diverge, or at least we don't diverge too much. I forgot, if you have questions, you don't have to wait until the end. So we have what we call internally at Suser. We have what we call the factory first policy now in place, since it was already kind of in place since our previous service pack, but now it's really fully in place for our new cut base. This states that the development of the distribution has to be done on factory. People should not do the change on our internal build service. They should do their development on the open build service, and they have to push everything on the open source factory, so basically on a tumble with the rolling release, which can be released almost every day. And then once it's merged in the open source factory, we then take it back and push it in our SLEE 15 code base. So the idea is really to do the development in the open. There are always some caveats. Sometimes we have some features which are developed with some partners which are in the NDA until specific date, so we cannot release a code base until the NDA has ended, et cetera. But for most of the things, that's what we do. And to make things easier for developers, we said, okay, since we are asking you to do the development on tumble with, or factories, it's the same, you don't have to even take care of SLEE 15, at least for now. We have a bot which is going to look at all the changes which are accepted in the development distro, and it's going to retrofit such those changes in the SLEE code base. So it's really something like you are working on an enterprise product, the next enterprise product, but in fact you are working on the open source tumbleweed, and we just generate the enterprise product based on that. So really, this is to prevent forks or unwanted forks. And, yeah, because I'm not following exactly my slides. Did I say everything I had? Yeah. So those submissions from this bot, when something is accepted on factory and pushed back to SLEE 15, they are still reviewed by the same four eyes principles. They are still reviewed by the release manager on the SUSE site. And sometimes we find mistakes, or we find something, yeah, but this might fly on open SUSE, but it won't fly on SLEE for that, that, that reason. And then you have two solutions. First, either you say, I'm going to fix that just for SLEE. This is an easy way. No. Usually what we do is we reject the change for SLEE, and then we notify whoever did the change saying, okay, what you did, it's not great. Could you please fix it and make sure that you fix it on factory. And then when it's accepted, we'll get the proper change. Funny thing is, often sometimes we detect things, even if it was reviewed by four different eyes in factory, we detect issues with our own four different eyes on SLEE, and then we just fix it. And so far, it has been in place since we have started to work on SLEE 15, and it's still in place until this week. So we are still in beta phase, but we had this update crawler, what we call the update crawler bot, which is just crawling the factory system and making sure that all the changes are pushed back to the SLEE product. Yes. You mentioned that there are some, what are the main reasons for such occurrences? Is that something working for OpenSUSE apart but won't work for SLEE? I'm repeating the question, so why something would work on OpenSUSE and would not work on SLEE? What are the main reasons? The main reasons. I'll go there a bit later, but an enterprise product is always much smaller than an OpenSUSE version, usually compare Fedora and RAIL or SLEE and OpenSUSE. So we focus on less things, and sometimes a package might enable things like, oh, we want to build for 200 language bindings or that we don't ship on the SLEE side, so we want to disable that. We don't want to have build requirements on that. Or it can be that they want to support something which we don't ship on the enterprise product. Or sometimes there are a few cases where we are on purpose diverging from the OpenSUSE side of things and then we have to say, yeah, but it won't work, so we have to handle this slightly different. But this is, I would say, a conscious decision. It's just not happening like that. So basically the factory first policy is your submission won't pass unless it has good reason to pass. So we have, as I said, we have put a lot of bots in place to check all the submissions which are done. We have a legal bot. What it's going to do is check each submission and compare that with our legal database which contains all the source codes, did an analysis on the source code and checked the license of each files, did a report and gave that to our legal team who have said, yeah, looks okay, or there is proprietary in the source code. I don't think that's good and they're going to reject it. But there is a lot of automation to make sure that we don't end up filling our lawyer desk with that of, no, it's not on paper, but you get the image of reviews. But it's still catching a few things here and there. So that's always good. Then we have what we call the maintenance bot. It's trivial, but it just makes sure that the submission that has been sent already builds in the branch from the developer because sometimes people send something and it doesn't even build for them. It happens. It can be, it's not always there for, it's just that they did something, they sent the submit request and then basically things change under their feet and it doesn't build anymore and no. We don't bother even reviewing the change because we know it's going to break if we accept it. So it's automatically rejected. We have, I talked about that earlier, we have what we call the change of checker. So this is a requirement for SLEE. Any change has to have a bug number or a feature number or a CV number. So during the development phase, it's not really enforced, but once we enter the RC phase or once we have our product shipped and we work on service pack, we want to make sure that each change is properly documented and we know why we are doing some changes, not just doing changes at random. And then the last bot, which is the one that people complain a lot internally, is what we call the leaper bots. This bot is going to check, is going to enforce that the factory first policy is properly done or followed. So if the package has been directly submitted from the found factory, everything will be fine because it's already upstream in our upstream, which is the open source. But sometimes people are in a hurry or they want to get a change in the distro, in the SLEE distribution very early before it's even accepted on the open source side of things. So there the bot is going to check, is this submission accepted on factory? If not, is it accepted in the development project on open source? And if not, rejection. And that's when you start seeing people complaining. Because, yeah, but I did exactly the same change on both sides and it's, yeah, but you didn't really do the same change because you did the work twice at 10 minutes of interval and your change log entry doesn't have the same timestamp. So basically, you put the bot on yourself because you did the work twice. You could have done it just once and it would have been accepted. So it's really, we are really trying to change the mindset of people to contribute directly to our upstream, which is open source. Reduce their own work. And if they don't do that, the process is going to reject. And there are cases where we decide, okay, this package, we accept to not follow any more factory, so our development. We want to branch because this is still an enterprise distribution and we at some point know that we are going to freeze the version of things or we are not going to take every last changes or every new version of everything. So for instance, we have branch or kernel. We are based on kernel 4.12. It has been decided a long time ago. So the kernel is not going to be submitted from Tumbleweed to Slee but the bot is still going to check if there is something which is a similar change which has been pushed to factory. So this is still good because the kernel is here and not a good example but let's take a package, I don't know, Firefox or whatever. You want still when there is just a change to make sure that we have the same change as it been pushed to factory or not. Sometimes it's not relevant because it's a backboard from a new release which is already in factory but sometimes it's just a patch with a fix just for one thing and you just want to be sure that this patch has been pushed in both code base because if you don't do that then the next time we rebase this distribution on Tumbleweed so for Slee 16 in three, four years, I don't know exactly people will have to do the work again they will have to push their change again to factory. As an example when we did this entire process for Slee 15 we checked, we compared everything in the latest Slee 12 code base and factory and we found some time patches which has been pushed since Slee 8 or Slee 9 so that's something like 10 years so the guy 14 for 14 years a guy was doing a change every major code stream release so let's say every four years every four years he was pushing the same change the same patch it still applied on the enterprise code base it never went to the open source side of things so the guys were redoing the work every four years for 14 years so some people might say yeah that's a way to make sure you still have a job but if he would have done that in the first place he would have saved a lot of effort and because we now enforcing this rule we don't want these kind of ugly things to happen so it was like submission, factory first we won't take it in Slee you have to fix it but factory first and now it's in factory so in four years or three, four, five years the job won't have to be done again and again and again this is a leaper bot as I said and Richard already said it because we had a lot of discussion on that if you want to do this kind of policy use automation really because people will be frustrated by the policy but they will be less frustrated if the rejection is coming from a bot rather than a human we saw it in addition make sure you give your bot a name the mentioned bot you make the statement naming a bot after your people yes so repeating the the comment from Richard make sure your bot name contains the name bot in it because we tend to have you know, internal service every count has to be binded at first to somebody and basically the mail was sent by somebody and people didn't realize it was not somebody rejecting it was a bot which was just impersonating somebody but really, really it's helping a lot because people tend to be less frustrated when they get a rejection by a bot they would say ok, I'll fix it and then they comply with the rule especially it was really visible on the changelog checker rejecting when there was no changelog entry not a bug number entry was really something we were enforcing before manually and people didn't they would argue when it's a bot, they submit one minute later the mail that it has been rejected and then one minute later we get another submission fixing the damn thing yes, exactly, mission accomplished of course you will always get people arguing sometimes for good reason but that's fine when you reject something don't just say reject make sure to say why you reject or leap a bot is making sure to give URLs to why it was rejected like the diff between the package which is submitted and the package which is on factory so people can see that yeah, you have trailing spaces your bot is sometimes a bit dumb it's not going to do semantic difference between patch set it's going to do a plain diff and then if something is slightly different a human would say yeah, that's fine the bot is going to be very dumb now but if you can give enough information in the rejection people will fix it themselves they won't have to ask you why was it rejected very often we see that it's automatically rejected and then we get a submission 5 minutes later again time saved for all the people doing the reviews all the infrastructure because we don't have to accept a submission which then is going to be fixed again and again and even for the developers because basically they are being taught that yeah, you did something wrong here is how you can fix it and they are going to fix it themselves yeah, that's basically that and make sure that you can still override the bot so yeah I will not say that we are the release manager of our JDAIS but sometimes we are, we use our powers to make sure that things go through because this is still a bot and sometimes the bot is really wrong so we talked about the policies that we are enforcing and how we are enforcing but you might wonder, yeah, but does it work, is it give me numbers so first on SLEED 12 so the old codebase not all in the sense that it's released, it's shipped but it's not a brand new that we are working on so we have the concept of service pack, so we layer things on top of other things so we have a codebase, a SLEED 12 codebase which is mostly 3,000 packages 290, etc then on top of that one year later we did service pack one where we only changed 550 source packages the rest was not even rebuilt so the source is the same the binary is the same unless there is something which is changed in between which needs to be rebuilt and on top of that we did service pack 2 we used some kind of tiktok semantic for our service pack so one service pack is the first service pack is stability fixes not too much feature service pack is let's update the kernel and update the desktop and all the things and then again a stability service pack so one year we fix things together you break things thank you Dodgy for saying what I was about to say but it's best that I don't say that so one year we stabilize things and one year we bring new feature let's put it in enterprise nice way and on sly12 sp3 which was the first distribution where we really enforced the factory first policy we had rosly 500 source packages 300 were forks like we were diverging from factory but we made sure that those changes were still pushed on open source and still we had 20% of packages which were directly imported from the development distro from factory which saved a lot of time and effort for everybody so this was sly12 this is a process we put in the middle of the development now switching to sly15 we are currently at beta6 which was released on Friday we have 300 3000 and 300 source packages which is compared to sly15 about 10% more about the same number of packages so on those 3300 packages we have only currently when I did the extract on Friday we have 7.4% of packages which are forks so we have more than 92% of the distro which is already tumbleweed which is factory so our enterprise distribution is basically 90% open to the tumbleweed as a date of last Friday of course so when I say it's identical it's identical at the source package level so SRPM are the same what might we decide to patch things in we apply a patch which is only irrelevant for sly or for open to the in some case but that's more the exception than the rule for reasons that we discussed earlier we don't ship KDE for instance on sly so sometimes there are packages which are going to build things or requires KDE so we are going to disable that on sly etc but that's not bad so this is about sly let's switch yes question so how our salespeople are going to sell the fact that our enterprise distro is basically the open to tumbleweed with distro so first they don't yet know it that's why you came here right? yeah yeah to make sure they won't know about it more seriously we we still are going to tumbleweed is a rolling distro so in one month it will be farther away from sly 15 so it's going to drift and but we will make sure that our sly code base has to be rock stable etc insert all the thing about that and yeah we at this moment in time yes we are at 90% we will go lower on that I have two questions in the back yeah I was wondering when you start freezing packages have a situation where things will differ shouldn't you have then a leap second way to say ok if we do the back port fixes we back port it first to leap and then it goes into sly so the question was yeah you are going to freeze the versions so what about the back ports all the fixes are you going to do in sly or what do you do you push them to open so yes this part of the policies or factory first policy mentioned that people should still submit their change to the open source side of things if it's a back port often you don't have to do it because the version will be just updated in autumbleweed and that's it if it's just a patch it has to be pushed so we are still making sure that all changes our stabilization effort benefit to open source just for the sake of in 3-4 years when we base our enterprise product we just want to have those fixes already in we just don't want to redo the work again and again and again you had a question that's good ok yep so I talked about sly let's talk about leap so leap as a reminder is the distribution which is using the core of the sly source package and then on top of that a lot of more packages which are not cheap on sly or that open source community things we want something more recent than which is on sly so it's very small but I have some graphs after which are a bit more readable but raw sly for the latest leap release so 42.3 we had 10,000 source packages compared to 3,000 source packages in sly from those 10,000 2000 raw sly were coming from sly so 20% open source leap is 20% sly all the rest is either coming from factory 22% and the rest is 42 because we use the same concept I should have put my slides in the order 42.1 42.2 42.3 so 42.3 is done on top of 42.2 which is done on top of 42.1 so the changes which are only in 42.3 they are coming from 20% sly 22% factory a few things which are specific to this version so KDE5 LTS release that's a KDE people did nicely for they made sure that their release schedule was a bit following the open source leap schedule which was very nice from them and then we inherit half of the packages from the previous release and then again and again you have to think remember that open source leap is thought as a stable open source distro so long term support or whatever you want to call it for people who want something which is going very fast we have another answer which is called open source tumbleweed and we have leap 15 which is in beta which is in beta for about one or two weeks it's following, it's based on Sly 15 it has mostly the same number of packages than leap 42.3 so 10,000 packages source packages on those you have 27% coming from Sly 15 so Sly 15 remember 3,300 source packages and from that we are able to reuse in leap 2,800 packages but you should have done the math of how many percent of the Sly 15 entire code base is in leap but that's not bad and the rest is coming directly from factory and we have bits, a few bits from the devil project so if you put that in a nice graph you see that when we did the first leap release 42.1 there was a bit chunk which was coming from Sly and all the rest were coming from factory and then over time we reuse the packages from, so the percentage of packages used from Sly increased which is good because it means that what I forgot to say that opens through the leap being based on Sly packages packages which are coming from Sly are maintained by the through the people so when there are bug fixes they have to be done by the through the people so the community at large benefit from the work that the through the employee are doing and fixes which are done for paying customer on Sly they end up once maintenance update is to the Sly people while after a few weeks or things like that it's going to end up automatically in leap so leap benefit from the fixes that people are paying through the to fix on Sly so the more packages which are coming from Sly means the less pressure work the community has to take care directly they benefit from paying customer so that's nice and yeah, leap 15 as you see 30 about 30% from leap and the rest is from factory that's nice and with that just to make sure that we are always improving as Richard mentioned we are agile we learn over time things which works things which doesn't work and we are trying to improve smooth our process we have a lot of discussion internally but I'm really hoping that you have also questions suggestions so I'm open to that yes, yes so you said you're using open Q8 to test the data I'm referring to the program so you have a small open Q8 that runs a few tests and then you've got other open Q8 that runs all the tests for the first one does it mean that the free I mean for open Q8 works with screenshot so I'm repeating the question for the recording so we have open Q8 which is testing first when there is a new change a small subset for the entire test suite once the change is accepted the entire test suite open Q8 is based on a screenshot but it's NCRL and the thing is that once you have a distro which is installed by open Q8 you can run anything in it and you can run a different test suite inside and just check the results in open Q8 that either is the result of the test suite internally is going to be pushed to the serial console or just display OK and then the screenshot tool is OK appearing or not but you didn't finish your question so the question was since we are open QA based our package from community contributing to open QA so yes and even more than that we are making sure that our test suite between the sleep product and the open souser product are shared as much as possible I don't think we are at 100% but so it's all on github other questions remark no? OK thanks and if this is a kind of challenge or I'm interested in souser has a lot of job openings