 Thanks to everyone who came. It's so great, so it's just amazing. I was just kidding, there's nobody here. I expect expert questions from all of you. I honestly, you know, realize that this topic is extremely wonky for even those of us that care about licensing. So if you at any point, none of this makes sense, just stop me and I will be happy to have a nice intimate discussion with you about this. All right, so let's talk about SPDX, baby. So the background is that SPDX stands for Software Package Data Exchange, and this is a set of standards for communicating the components, licenses, and copyrights of software, technically of any copyrighted work, although it's really only been applied to software. This is an initiative that was created and driven by the Linux Foundation. The Linux Foundation, if you aren't familiar with them, is a trade association that lots of big companies that have vested interests in Linux and now more broadly open source are contributing to, and trying to solve some of the bigger problems that is faced across the industry. So this is the problem that SPDX seeks to solve. Companies have pain in ensuring that they are compliant with free and open source licenses. Especially those companies who have a relatively small percentage of open source inside much larger proprietary stacks. Think things like smart TVs and flight entertainment systems and car GPS, that sort of thing. And often these companies are sourcing code from third parties, like hey, I'll pay you to provide this component, and then another company is paying someone else to provide that component, and so you have like little Russian nesting doll of code where you keep opening it and there's more code inside from someone else and you're not entirely sure what was in that originally. So they would really want to answer the question of who is your daddy and what does he do and the goal of the SPDX standard is to enable any party in the supply chain to accurately communicate the licensing data for any of the copyrighted material inside of it and for others to easily be able to consume that information so that you can break that chain of I don't know, he gave it to me that you get when things like Volkswagen happen. So they have this idea that by implementing this standard they will be able to effectively solve that problem and it sounds really good on paper but it's actually really difficult to do that. So what they do is far beyond what is traditionally done in our communities, which is to sort of just label software as GPL or BSD. For example, we all know it's not uncommon to come across a piece of software or code where in the header of the file it says this is BSD and that is all the licensing information that is provided for this. Now in our sphere we're okay with that generally because we understand what that means. We have a pretty good understanding of what BSD might be but the SPDX standard goes well beyond that and says that the right way to do this is to track the copyrights and the licenses for every single source file inside of the distribution in a standardized way and they do this via an XML formatted file and so that in theory that you've got this file that says these are all the copyrights, these are all the source files, these are all the licenses that are in play in this mix and then you can take that file and run it through any number of tools to be able to determine what's inside of your supply chain and say okay, well we looked at all the SPDX format files and we know that there's all these GPL files, all these BSD files, all these MIT, all these Apache, all these Eclipse, all those, whatever and we just know what's in there from that. Now the current spec, the original spec from SPDX required that you do this on a per file basis. The current spec doesn't require that you go per file but every time you talk to somebody about SPDX they probably want you to be doing this but you don't have to and that's important to keep in mind as we go forward because there's lots of things inside the spec that have been loosened in recent releases to say well you don't have to do it like that but then as soon as you start walking down that road a whole bunch of people show up and they're like no actually we really would like it if you'd go all the way down. Now SPDX also adopted a set of naming standards for the licenses so it treats every license with any difference in wording as a unique license including typos, including differences, slight minor differences in phrasing. The only thing that wouldn't make a license different would be the copyright holder changing. So Regents of the University of California, if that changes to Tom Callaway then that's the same license but if I say A versus Ann or in the unlikely event of versus in the event of then these are individual licenses as far as SPDX is concerned. Now Fedora doesn't really work that way. We treat functionally identical licenses as if they were the same license so we don't create new license identifiers every time we see a slightly different version of the BSD license or the MIT license. But in the SPDX universe every unique license gets an SPDX identifier so they use a pretty standard syntax of the name and the version and the variant. And this results in a very, very, very long list of licenses on their part especially since they've gone through our license list and scraped them all into their own names. So some of that is my fault because I found a lot of those licenses and they hadn't heard of them before so sorry but it does mean that even with our very long list their list is even longer because every variant of BSD they gave a new name and we just said that's BSD move on with your life. So let's talk about you and me and Fedora and what we might be able to do in this universe of SPDX. So full SPDX compliance would be really, really hard. And even if we automate that process there is some tooling out there where you insert source code and out the other end comes an XML file that says this is the SPDX report that got generated for this. A lot of the results will be wrong. I know this from experience because I've done these hand audits for years and it's really, really easy for these tools to not understand what they're looking at and not understand because human wrote down the license intent. It wasn't a machine and so machine's trying to parse the output of humans is very complicated, very difficult, certainly not impossible but it's probably not worth the cost in the investment to try and write that engine to scrape through every single possible source file in Fedora to figure it out. And because we know that we're gonna get some stuff wrong in the process either humans make a mistake or the tooling makes a mistake these SPDX XML files are intended to be inheritable so that you can hand them to somebody else and they hand them to somebody else and they hand them to somebody else and they follow the code around. And so wrong ones are gonna propagate out from us into other people's stuff and the standard says we're supposed to put our name on these XML files to lend credibility to them so we generate them, we put the Fedora project's name on them and that gives them credibility even though they're wrong. They go out, other people trust it because Fedora said it must be this way so it is this way and it can be as simple as a version of being wrong on the GPL detecting the wrong version of GPL can have a significant impact on compatibility as it moves through the chain. So that's sort of why that problem is really complicated. It's not as simple as we'll just add a little script that runs through builds and scrapes through every build every time somebody does want to determine this file. So who should go all in on this SPDX model? Well, who knows all the corporate licensing on all the source files? Upstream, we hope. And they really should be the ones that are generating that SPDX XML file, including in their source code and saying, you know, as the person who wrote this code or who understands the copyright and the licensing for this code, I am vouching for the statement of what its licensing copyright holders are and then you can take that and trust it because if you can't trust me, there's no way to bother me, basically. And we can inherit it and pass it on down the chain except nobody does this. Like, I'm sure somebody's doing it, like there are some people that are following this pretty closely, but effectively no upstream is including a SPDX XML file at this point. And the reasoning is that they just don't care about supply chain management at this level. They aren't concerned about license compliant management. No external tooling requires it. No external tooling automatically generates it and they really care a lot more about that bug that crashes the library when you pass to pass data to it than they do anything regarding license compliance. And a lot of upstreams feel for better or worse that it's sufficient to just include the license text and say, you know, copying is right over there. Work great for 20 years, I don't see why we need to change anything on this now. So if we ask the question, should Fedora do it? My vote is no. But if you disagree, let me know. If you're like, no, you know, supply chain management and compliance are really important to our community and we need to get this right, we can have that conversation. It's also worth noting that REL doesn't do any of this either. So now there are customers of Red Hat that would really like us to be doing this because they take Red Hat and they bundle it in and they ship it off to somebody else who ships it off to somebody else who ships it off to somebody else and it ends up in an airplane seat. But we're not doing SPDX to solve those problems. We have other ways of making them feel more comfortable usually by knocking zeros off the price. So why are we still talking about SPDX if I just basically made the case that we don't care about it? And the answer is GNOME software. That is a screenshot from GNOME software from Wikipedia. And GNOME software uses app data files. And these app data files use the SPDX naming identifiers to identify the overall license for a software component. Now it's important to note that they're not actually doing anything with the SPDX XML files, they're just using the naming identifiers that SPDX assigns to licenses to map to applications. So they decided early on that they weren't going to look at the distribution package licensing label and metadata like the RPM data to determine what the licenses of a package but instead ask the app data file to provide that data back. The idea of being that either A, it's really complicated to have to figure out routines that you write into GNOME software to strip all that data out of the packaging. It's far easier to just hand it over to whatever the dependency resolver is and says go install that thing and not have to actually look at the package data. And also B, there's plenty of distributions that do a really lousy job of this and write the word distributable in that field and can't trust it to be consistent across all these things and they want the GNOME software experience to be identical no matter what distribution you're running on. These app data files, a lot of them are got generated by Fedora, specifically by one individual in Fedora who thought it was really important who happens to be the GNOME software maintainer. And he sent a lot of these upstream and some of the upstreams took them in and some of them just never paid attention to the fact that he sent them anything because that upstream's been dead for 10 years. But we really do want the upstreams to own these files. We don't want to be generating them because you have the same sort of problem that you have with SPDX files. If you generate the licensed metadata wrong in the app data then we're passing that on the chain. Other people who are looking for an app data file are likely to inherit it from us, carry it in their package so that it shows up pretty in GNOME software. So this is where we hit a disconnect because it's confusing for Fedorans to have to deal with two licensed naming schemes. They're being told by all the GNOME software folks that they want to have app data in here and your app data needs to have a licensed string but it needs to use the SPDX identifier but your package needs to have a licensed string and needs to use the Fedora identifier. So we've already had a couple of cases where people have started putting the SPDX names into the RPM spec file, mistaking that that's the process, thinking that that's the licensed name they need to use because they heard about app data before they got to making the Fedora package. So should we use the SPDX names? The pros, obviously that minimizes the confusion. If we standardize on that set of names then we don't have to worry about it. Their names are machine parsable. Technically Fedora's names are two but there's a lot of inconsistencies in the naming scheme that we use. We call Apache License ASL2O and GPL2 is GPLv2 plus because the FSF made a really loud stink about the naming that we chose for it originally and said that it had to be named the way they liked and honestly the less I have to deal with the FSF getting angry at me the better. So we went ahead and adopted their naming schema because it doesn't care as long as we're consistent every time we call GPL something the same thing it doesn't really matter. But SPDX having the ability to sort of reinvent everything so I know every license is going to meet that name-version-variant schema and it all is going to look exactly the same. They also get to maintain the big list for us. We don't have to worry about maintaining our own license list anymore. We can just say the license list is over there. It's the SPDX list if it's not on there, you know. Tough kittens. But Sousa's already doing this for their packages. Sousa has switched over to using the SPDX short names inside of all of their package schema so we know that there's no real major scary things in the raw implementation of this. Sousa's actually using a hybrid model of our model and the SPDX model where they're using the SPDX names but they're using our syntax for license parsing. So if something is GPL or artistic they're using our parsing schema inside the string to determine how to actually tell what that means. They don't have one. They just list all the licenses in a work. So you just go an XML and dump it all out. There might be, there's a whole long list of exceptions that are out of the main chart. There may be something where you can say an or later flag that applies to the XML criteria. I don't know, it's a good question. It's something worth considering. That is another con perhaps to using a specification that is a moving target and that they reserve the right to change the way they do their naming at will and we could lock into a previous version of the specification if we wanted to but this isn't gonna help the folks that are trying to move forward. So let's get to the cons. Every single package in Fedora will need to be fixed. I used scare quotes there to make the emphasis around how much work that's going to be. We're also gonna need to re-audit because the Fedora license to SPDX is not a one-to-one relationship. It's not as simple as if you had that string, you now use that string. We'll talk about that in the next slide. There might also be some delay on new packages with licenses that SPDX doesn't know about while we wait for them to update the list as opposed to Fedora where I find one and just update the list immediately and say it's good or it's not good. They don't have any real interest in doing clearance like we've done where a new license comes along and we determine whether it's free and open. They will just put it in the list and not care about its status because they're not interested in tracking open source software, they're interested in tracking software. Now if the OSI approves it, they do have a little checkbox that they'll check it as being OSI approved in the list and you can look at that but they're not gonna submit it to the OSI, they're not gonna do the internal review. So we'd still have to do that process and then submit it to them and wait for them to merge it. But to be fair, I'm not too concerned about that. There's not a huge number of licenses that we're coming across these days that are new. For a while we were seeing about five a month and now we're seeing about one every quarter. So it's not a ton of new licenses that are coming in that affect us. It may also lead to the expectation that we plan on doing full SPDX at some point in Fedora. People, because the naming is so symbolic across the board when we say we're doing SPDX, people don't know whether we're doing just the naming or whether we're doing the SPDX XML files and whether we're doing it for just the package or whether we're doing it for every source file inside the package because there's multiple levels of depth that you can sort of traverse inside the standard as to how you implement. And it may confuse people if we do this. But let's get back to the one to one problem which is the BSD and MIT problem. In Fedora we treat all these functionally identical variants as the same license. So BSD variants get marked as BSD. And it's not uncommon, like I said before, for the upstreams to simply say BSD because we know what they mean. If we use the SPDX list as is, we can't do that. There is an entry for BSD in their list but it means very specifically one variant of BSD, the OSI approved variant. And we know there's 30 plus other variants of this license that are out there. And SPDX has given each one of them a new name. And so it is BSD variant, something, something, something. BSD variant, something, something, something. And there's all in there. Now this means that Fedora maintainers will have to figure out which variants they have and mark their packages correctly if we use this naming schema. Which means they'll have a lot more work to do. And it's not fun work either. It's not exciting to figure out which of the 30-something BSD variants you have in your code when you might have multiple of them in the same code base. And so what went from a package that just was able to say license BSD will now go to saying license BSD variant 74 and license BSD variant 22 and BSD variant. Just because there's multiple sources from BSD code. Chromium's license string is going to triple if we move to this model just because of the sheer amount of different BSD variants that are inside of it. But because Chromium is evil is not necessarily a valid justification for not moving the rest of Fedora off onto something else. So what should we do here? Well, I am hoping you have opinions. I would like to put them out there. Leaving up to me is a valid option. We can just say, you know, spot you seem to know what you're doing with the whole Fedora legal thing. We have faith in you. But if you do, then you waive any right to complain when I ask you to fix all your packages before they can go into a Fedora release. And we probably won't do anything about this until rel8 is long disconnected from the Fedora process or post rel8. Because I don't think that the rel team really wants that level of excitement in their life to add to all the other pain that they're dealing with. To have me go, hey guys, we're gonna rename all the licenses and all the tools that you wrote to generate your quality control and your auditing. They're gonna have a whole set of strings that doesn't work. And Fedora about the only tooling that we would have to change would be, would have been package DB, but we killed that. And RPM Lint, which just needs to understand the new license strings to have validation. But nothing else is actually looking at the license string in any meaningful way. Koji doesn't care about it. None of the other tooling cares about it as far as I can tell. So I don't think we would need to make any other infrastructure changes to adopt the new licensing model. Dennis, I'm looking at you if you think there's anything I'm missing, but. There's nothing. Yeah, license checking on the look aside cash is really hard because that's as good as the tools are. They're not as good as a human. And so even if we had the look aside cash doing some sort of parse where after you checked it in it unpacked the whole source tar ball and ran it through a tool and said, this lint detects some questionable files inside of it. Somebody's still gonna have to look at those files. And yeah, and the number of positives of cases where people have uploaded stuff into the look aside that ended up in a package that shouldn't have been there is relatively small. It's not worth, in my opinion, adding a significant delay for everybody in the process to have to wait for somebody to send it to check the source code when most of the tar balls are gonna be clean and not gonna be problematic. We do have a good community that picks up on these changes relatively quickly and files bugs on them when they see things that are inappropriate. We just need to make sure that we continue to educate newcomers to the community about the importance of catching these things early and not getting them widely distributed. So that's kind of where we stand today on SPDX and the file format. And so the real question that the core of all of this is is the gains that we might get from standardizing the potential confusion between GNOME software and the RPM packages worth the effort to undertake a full audit of all the licensing strings in Fedora and then remapping all of the things and then teaching all of the contributors that we now use this naming model that's longer and more complicated than the previous one. And so that's where I'm interested in feedback from the community. Since this is a do a conference, I was hoping there would be people that would be like yes or no, but yeah, that's a good point. And I think that some of what mitigates that to a certain extent is the fact that most of the identifiers in their list came inherited from us where they had a very rather short list of licenses they started off with and then they imported our entire list and when they didn't have an identifier, they just used our identifier. So a lot of our identifiers have been inherited over into their schema. But there's a couple more that are slightly less common but those are the big ones. And so I thought I had a slide in here about laziness. Did I have a lazy slide? So we could be lazy and sort of do a hybrid approach to this. And what we could do is say that we're going to adopt SPDX 95% of the way through except for BSD and MIT which we're just gonna let people label BSD and MIT in there and not worry about the variance. The biggest downside to that is that people who are expecting us to be fully compliant when they see us using the name of scheme and then come along and discover that we just sort of hand wave over that entire problem are gonna be pretty angry, I would imagine, or confused or unhappy. And I think that's probably worse than doing it. I think we could go halfway and I honestly think that that's how SUSE treats it. I'm not sure that SUSE's actually audited their entire tree for all the variance. I think they just leverage the fact that there's a BSD identifier and an MIT identifier and anytime somebody sees something that they recognize as one of those cases they just use that string and then the license tag. I haven't done a full audit of their tree to be sure but I really would suspect that they don't get down to that level. Well, it's anecdata but I imagine that if I had to guess, I would say probably a third of the Fedora packages have one of the BSD variants inside of them at some stage or level. It's also worth noting that in the way that we do license tagging in Fedora right now we allow people to do interpreted licensing. Which basically means that if your source code has 70% GPL and 30% BSD, you can call it GPL. Because we know that by honoring the terms of the GPL license you're also honoring the terms of the BSD license. And they're compatible so it's not a big deal. Some people are really sticklers about this and will list GPL and BSD but we don't require that they do so as long as it's obvious that the interpreted license makes sense. And we have a whole how do I parse my interpreted license page on the wiki for people who try to figure this out and want to understand how to do it. And we make it clear to people that they don't necessarily have to try and interpret the license string down, that they again list all the licenses that are inside of it. But that's gonna get a lot more complicated if we bring in all the variants and have people want to do that. If they're gonna go to the process of ensuring they have the right variant listed in the string to then just go and interpret it down to just GPL again, we're probably gonna end up in a situation where if we do go to the SPDX naming model we'll probably drop the interpreted option for license tagging at the same time. So, does anyone have any particular thoughts on whether we should go to the naming scheme or not? Cool, well then I'll just figure it out and I'll just tell you what we're doing. You think I should just figure it out and tell us what we're doing? You think we should do it? It's true, but the Linux Foundation is exactly running out of money right now, so I feel like. I think that the biggest concern with making any sort of change in the schema whether we use their schema or update our own to be different or more consistent is just the sheer amount of work it's gonna be to get all the packages updated. We know that there's a significant number of packages that have no effective maintainer at this point and if we're going to make this change across the board it is also the right time to do a re-audit of every package inside Fedora and in the past that was a task that was manageable by me. It is no longer manageable by me, it has not been manageable by me since course six, so it's going to have to be something where if we take this on we're gonna need a lot of volunteers to look through packages, re-tag them, commit them and have everyone understand that we will not let things through into stable branch until they have been fixed on licensing so we will need a flag day process for doing another license audit and very few of us have gone through that process. It's not fun but it could have benefits for us in general if we catch other things because every time I look at packages I catch things that have sort of slid into the cracks so. But I also can hear everybody who doesn't care about licensing just generally groaning because I just added a month and a half to our release cycle so. Well we can script the best guess for things but again we're talking about the fact that if it was a one-to-one thing then we just scripted all across the board. It's not one-to-one. BSD doesn't mean BSD in the SBDX universe. There is tooling to do that but I know from experience that a lot of these things are not gonna be simple enough that we'll be able to just compare the copying file. It's gonna be, there's a number of cases where stuff has just lost its actual copying and because its code has been copied from one place to the other and the only thing that survived was the header inside that says copyright food, license BSD. Which BSD variant is that? We're going to have to do some digging to figure that out. In the old Fedora model we didn't care because we could trust that BSD was reasonably sane but in the new model we need to know which variant it is. So is there a why that we could get that? Probably. Maybe, it depends on what level of audit we care about. Again, sort of this is the problem of have we let people do interpreted license mapping where before if it was GPL and there was two or three BSD files we would let them call the package just GPL because you could honor the BSD terms with the GPL. If we move to this model we kind of need to drop the interpreted license model and then each one of those things can't be trusted. We can't trust that GPL is just going to transfer over to the GPL equivalent on the SPDX side. We need to re-audit that tree and determine whether the BSD variants are in play and then list them all out. So we can give a guess. Like we can say okay your old license was this, the new SPDX name for it is this but then we also have to say and now you have to go and audit the tree and determine if there's any other licenses that are in play inside of there and find the SPDX names for those. Now what we could do is we could unpack every single source file and every single RPM and run it through a tool that did sort of a license lint. Like there's a couple of license linders that are out there, SPDX has one and then return the results at the same time to people but that's a time intensive process and we don't have any of that infrastructure currently running so we would have to put that in place in order to do that and so I'm always reluctant to commit. We need to run license lintians across everything. Yeah, my experience with previous initiatives related to this has been that the vast majority of Fedora packages have no idea whatsoever the answer to that question. Either or, yeah and so I would almost err on the side of presenting them everybody with the information just to be on the safe side and then if somebody says well I don't understand I can be like well this is right here you can look at this what don't you understand about this versus if we don't provide them that information they're just not gonna click it. Also we do know that there's a fair number of Fedora packages that are effectively unmaintained that just continue to turn on forward where the maintainers don't touch them at all because they're either dead upstream they're fossils as far as stability is concerned they don't need to be touched or they just got automatically rebuilt. I can find these all the time when the change log says automated rebuild, automated rebuild, automated rebuild, automated rebuild, automated rebuild and the last time a maintainer of a human had touched it was 2013. So there's a fair number of packages that are in our ecosystem that meet that criteria and we cannot assume that the person who's assigned the maintainer on that package is gonna show up and do the license audit for us. And that's true for any distribution that's not a Fedora problem. Debian has the same issue, Sousa has the same issue as well so it's not unique to us by any means. It's just a lot of work and it's gonna take a lot of people to figure out the correct battle plan to do that and so half of the point of this presentation is trying to see if there's other people that are really interested in helping to put that plan together. So yeah, I think that's totally a viable option to stretch it out and then do it over time. I think we probably would wanna break it down and look at things like the cores that are packages that are put into things like the atomic containers and make sure that they are all correct first and then move our way outward into the ecosystem and try and fix it while telling people at the same time, hey, if you can get to one of these packages before we do, that's great, but we would like to have some way of being able to have people mark their packages as not just corrected the license string but also audited the source code. And I think that in order to do that we need to help people understand what doing a manual audit actually means. In the PDFs way, that's an audit and helping them understand that it's more intensive than that. So some of it's gonna have to be, but yeah, it can't really be broken up. I just don't want it to be something where we're still doing this in five Fedora releases trying to finish the last thing. I was involved in several other Fedora initiatives that drug out like that with people so we can do it over multiple releases. And then finally we said things, well, we're gonna close all these review requests for old packages to clean them up because nobody cared. So just try and avoid that problem. All right, well, thank you for listening.