 Hi everybody, welcome back to our chats at Upstream. And today I had, this afternoon I should say rather, have Carmen Bianca Baker of the Reuse Project from the Free Software Foundation Europe. And, you know, we've been talking today about what it is that we owe each other. And traditionally in open source, one of the very key critical things that we have owed each other is, what is our license? What is it that we give each other software under because a core plank, if you will, of how open source and free software, of course before that has functioned is an irrevocable promise under a license to allow people to use, modify and redistribute. As our software has gotten more complex, corporate users in particular have increasingly demanded not just the right licenses, but the right metadata about licenses. When we had five packages in our application, it was easy to say, oh yeah, we'll just read the license files. But now that we have 5,000 packages in an application, it's a little more complicated than that. And so Carmen works on the Reuse Project, which is an initiative to standardize information about licensing across all free and open source software projects in a way that is, I think, I'd love to talk some more about this, both human friendly, but also very much machine readable so that when we are building these towering piles of software, we can understand the whole thing. So Carmen, first of all, congrats. Just a couple of weeks ago, reached 1.0, did a 1.0 release. That's an awesome milestone. The project has been going on for several years now, right? Yes, correct. I joined the project in its infancy in 2017 when I was an intern for the EFCV and I kind of stuck around. In the way of all great free software projects. Yeah. The best people just sort of get sucked in, yeah. As a matter of fact, when I joined the EFC, I was supposed to work on something completely different, but that kind of fell through because of scheduling. And I saw this project lying around the reuse project and I was like, I know a little bit about software licensing. I can do this one. I'm waiting on the other project to pick up. The other project never picked up and this kind of became my thing. Well, very cool. So what drew you, I mean, well, so you were just saying it was there, but what's made you stick with it for five years? Because I do like programming, but I like this project in particular in a very egotistical way because it's very core Python in a sense. It just allows me to do really, really Pythonic stuff and improve my Python skills, but also because I've come to kind of care about software and licensing. And I'm also a little bit of a stickler for getting things exactly right. And this is the kind of project that helps people get things exactly right. Well, let's talk a little bit more before we go into the why's and the where for's. I mean, for those who aren't familiar with reuse, what is it like? So the what is getting done exactly right is licensing information, licensing metadata. How does reuse help with that? Reuse helps with that, it helps with that in the sense that, so reuse is made of two parts. One, it's made of a specification and general guidelines to the developer. And two, it's made of a program and they kind of in sync with each other. The way reuse helps most of all, well, that would be very egotistical of me to say that it helps most of all because of the tool, but it helps because it has very clear instructions and those instructions can very easily be verified using the tool. So it's the two reinforce each other. Right, so for those of you who are following along at home, I believe it's reuse.software, is that right? Is the... I do believe it's, yeah, it's reuse.software. Yeah, I had no idea until I stumbled on reuse some years ago that .software was a top level domain, but now you two also know that information. Yeah, so reuse.software and yeah, like you say, it's very clear specific instructions for developers on how to do this thing. And then software, I don't think you should sell the software short, right? I don't think it's egotistical to say mere documentation without validation. I think that's actually one of the core learnings of modern software as opposed to 10 or 15 years ago, right? There's much more of a sense of if it's not tested, if a computer can't help you do it, then it's often not gonna get done, right? And I think if anything, that's one of the key learnings of unfortunately in the day job at Tidelift when I'm not hosting chats, one of the things that we have to do sometimes is go into very old software and try to understand what is the license. And like you, I'm a stickler for getting things right. And yeah, that means sometimes I pull my hair out because I have to go, what were they thinking 18, 19, 20 years ago when they put this software together that like they thought that this sort of jigsaw puzzle of license information was good enough, right? And so to some extent, you all are saying, hey, you know what? It's time for us to just gonna say professionalize, professionalize is a loaded word but certainly standardize where our license information is, right? Yeah. And to just be as clear as possible is the trick. And so one thing that I've forgotten the exact name for it but it's become really popular in the last five years or so is to, yes, a single source of truth. And the idea of reuse is by adding the software, the copyright licensing information into the header of every single file. The information is as close as possible to the single source of truth. Right, yeah, that's a great, I mean, that's one of the many, I think things that reuse does really nicely, right? Is this idea that the single source of truth is the software and not some other database, some other. So I mean, you all besides, actually before we go on, you know, one of the things that I think is a neat trick about reuse is that you have based what you've done, you didn't invent it out of whole cloth, right? It's based on other older practices and standards, right? Do you want to talk about that a little bit? Right, so it's based on a couple of things. A lot of the credit goes to the SVDX which was starting to really pick up steam by the time that reuse was getting started. And it kind of just, we noticed that the SVDX gave this very clear way of conveying information about licensing, which well, why not use that? And then the other conventions that we kind of copied is most projects or a lot of projects already have like a blur at the top, which has the copyright holder and the license, but this was never standardized. There was just a massive text at the top of every file. And we're like, what if we could make that a bit smaller? That's really nice. And if we could standardize that, so it works for every single iteration of license. Right. What we also, one old thing that we partially kept and partially changed is providing license file. So in most repositories, you'll have the license file in all uppercase at the very root of the project. We changed that into a licenses directory because most projects actually do have ultimately multiple licenses. And it's very unfortunate because this kind of breaks the nice GitHub UI where it tells you what's the license of the project is. We've bugged for you, sorry, we've bugged GitHub a couple of times about that. They keep saying that they're gonna work on it, but we'll see. Yeah. I mean, I think this is actually one of the interesting tensions for, you know, not just for GitHub, right? We like to have a sort of polite lie that we tell each other is, oh yeah, this package is under one license. And, you know, often, I don't know about in our, most is a whole, we could do a whole another talk about data on licenses. So I'll put that one aside for right now, but like, suffice to say, yeah, you're right. Many, many packages, certainly hundreds of thousands of packages do have well more than one license. And a thing about reuse that I would imagine is somewhat challenging sometimes is that you make it hard to maintain that fiction, right? If you're doing reuse properly, you really have to sort of come clean about all the many licenses that are in there, right? Do you get pushback from developers because of that? I've not personally experienced any pushback about that other than, of course, not having the single license file. But I think people are, all developers are realizing like, yes, actually there are multiple licenses. So I might as well declare all of them. It's a bit more work, but while reuse is ultimately a little bit more work. Right. Well, that's actually, that gets one of the things, like how, you know, theme today again is what do we owe each other? And certainly one of the premises of reuse, I think it's fair to say is that as developers, we owe each other accurate license information. And, you know, how do you go about, but well, but also suffice to say that isn't necessarily something that we've actually, we might pay lip service to it, but we haven't done it very well in the past. So how do you all like, you know, reuse as a group? How do you persuade, obviously step one is provide clear directions and step two is provide software. You know, but then you've got to go out and tell, like explain it to people and what are the kinds of arguments that you use and how do you persuade people? I think the arguments make themselves a little bit in the sense that you have to do this anyway. And when you turn it around and you are like, would you, when you're trying to find a code that you would like to use, would you like it to be reuse compliant or would you like the licensing to be an utter mess? Then most people will be like, yeah, I'd like it to be reuse. So the golden rule kind of applies. Do onto others as you, you know the rule. So if everyone, I mean, this is a big hope, I guess, but if everyone kind of did it, then it would make the life of every developer a lot easier. As pertains persuading people to actually do it, this is a little bit harder, but we are doing a couple of things. Some communities just adopted. So I know the Linux kernel and the KDE community have been adopting reuse or a variation of reuse internally. I also know that the European next generation internet project mandates reuse compliance as part of all the projects that are delivered. So that's cool. And certain companies are also internally enforcing this. I know that the German COVID tracing app is completely reuse compliant. That's cool. Yeah. Yeah, I was gonna ask, what are some, I mean, I knew about the Linux kernel and I, as an old, you may or may not know, I'm an old GNOME guy. So I, maybe I'll go nag some people over there, though it's now many generations away from when I was involved, but the COVID tracing app, that's very cool, very timely. You know, have any of the, I noticed in the 1.0 release, you mentioned that this is a very Pythonic application and, you know, I mean, I guess I hadn't thought of it that way, but it makes a lot of sense, right? Given that so much of Python's strength is ultimately in text processing. And this is at some level, right? A text analysis and processing app. So I noticed that you distribute the app through PyPI, so you can pip install. Is it, for our Python listeners, is it pip install reuse, pip install? So if you, I would recommend to mostly use it to do a pip x install reuse. It's the new hotness. It is literally my job to keep up with these things and yet it is still hard. And well, but so have you ever talked with any of, like PyPI or it sounds like you've talked to GitHub, which is obviously a key part of the bigger ecosystem. Have you ever talked to like PyPI, for example, about, and I know there's an ongoing discussion amongst Debian developers about whether or not, or how to integrate reuse into Debian. Have you talked with like PyPI or NPM or any of those kinds of? I'm not aware of, so I don't do a lot of the communications. I'm also not aware of any talks with PyPI. I think it could be cool, but it's also, I think Python is a little bit behind with regard to licensing anyway. I believe it was, wasn't it NPM that really, yeah, NPM was the first to enforce a license. Like you have to choose a license if you want to distribute on NPM. I'm not sure if Python ever adopted that. I'll just say it's complicated. And to your earlier point about SPDX, right? I mean, this is one of these things where, as you know, when you're processing, I should step back a second. SPDX, something that Carmen mentioned earlier is the software package data exchange format or software program data exchange. I forget what the P is. I think it's package, yeah. Yeah, spdx.org, it is a couple of things, one of them that's most relevant is that it is sort of a universal list of open source software licenses, free software licenses and a set of standardized short, this is what all the license lawyers really love about it is more than anything else. This is a set of standardized short acronyms for licenses, right? So like MPL-2.0 is the standardized name for Mozilla Public License 2.0. Why I grimaced when you were mentioning PyPI, and this is not specific to PyPI, but PyPI for variety historical reasons, one of the more problematic ones is that it did not enforce SPDX as a, it slightly predates SPDX, I believe. So a lot of people put in license information that is looks like SPDX, but it's not actually SPDX and good luck getting from one to the other, right? And I think this gets to this question of, and there are certainly many developers who when you pointed out to them, hey, what did you actually mean by this? They're happy to take patches, they're happy to fix things up, but there are many others who are like, eh, good enough, don't care. Yeah, I know, it's a bit dreadful. I maintain the Fedora package as well. Fedora also doesn't use SPDX. So for the, so the reuse tool itself is like four licenses because you borrow stuff. And one of the licenses is the CC buy SA license, but it's just, I think in Fedora, it's just CC buy without a version number and it's dreadful. And there's no, we also have some CC zero stuff and there's no CC zero in Fedora. Yeah. I know they, I mentioned Debbie and I know Fedora is also having that discussion of how to modernize, right? I mean, this is one of these sort of unfortunate, you know, the XKCD comic about, you know, we have 16 standards, this is terrible. Okay, well now we've created a 17th, we'll create one standard to solve them all and now we have 17 standards, right? SPDX is a little bit of that, right? Because it did actually grow out of the Fedora stuff, but getting it back into Fedora and the has been, I follow along on that mailing list if I say it's, there's a lot of work to do, right? I mean, I think this is one of the things, you know, I think this is one of the things that I think is both fascinating and so challenging about something like Fedora is that the good news is your, I mean, about reuse is that you're building on 25 years of knowledge about licensing, but that also means there's 25 years of software to be fixed. And some days, I'm sure on your optimistic days, you're like, yes, we're gonna do this. And I would, if I were you on my more pessimistic days, I would just sort of curl back up under the covers, right? Yeah, so in response to let's do this, we have an internal project that we're trying to kind of kickstart called reuse booster. And the idea is to just approach projects and either encourage them to apply reuse or just do it for them. So it's a good way of, well, I'm not sure if it's a good way, but it's a way of really, you know, getting started. But I think if any listeners are keen to participate, yeah, change your favorite projects to be reuse compatible. I mean, I know we have had, I've certainly had conversations with the GitHub folks about their licensing tools. And so hopefully, licensee folks, licensee is the GitHub tool that if any of you are out there, well, Carmen and I are saying hi again. Again, you know, do you, I mean, you know, for some projects this is a very easy lift, right? If you've only got one or two licenses, the tool can help you very quick to standardize more complex projects, it's gonna take more time. I mean, booster sounds like a great initiative to help, especially with bigger projects. And I suspect, I mean, have you, I suspect you all must have gotten good feedback from early adopters about how to improve it, right? Make it easier. Continue to say, like, we have a lot of issues that are being opened by people who want to improve it just very slightly. And a lot of the issues get incorporated, but also sadly, a lot of feedback is just very slightly out of scope. And it's like, I would love to solve all of your issues, but we have to stay a little bit focused. Right. Well, especially, I'm sure a lot of these issues are things like, I mean, I'm just guessing here, but I'm sure some of them essentially fall into the category of we cannot answer your legal questions for you, right? Right. We do our best. We have a massive FAQ on the website, not legal advice, but it does answer a lot of very common questions. And a lot of them, like, it's a massive list at this stage. Oh, interesting. I should check that out, because that is something, certainly, that comes up at tidalift, right? When some of our tools flag, for example, you mentioned that GitHub does license scanning. A common problem that we see is that what GitHub says your license is and what your package manager says your license is don't necessarily agree. And sometimes that's a scanning bug, right? The reason why I've talked to the GitHub folks is that I've discovered, on occasion, issues in their license scanner that we've been able to help fix, right? But sometimes it's simply, you know, perhaps the source of truth for the package manager is different from the source of truth for GitHub. And so they disagree and somebody has to do the code archaeology. And boy, it sounds like maybe that FAQ is a great, because we certainly have, you know, we tell developers who are working with tidalift, like, hey, yeah, you should do this code archaeology. And they say, but I'm not a lawyer. I don't know how to do that code archaeology. Sounds like that FAQ could maybe be a good resource for them. Yeah. And also recently begun providing some helper scripts, like building on top of the tool. And one of them, it's like a couple of lines of code that you can copy and paste into Bash, which basically takes, I wouldn't necessarily recommend it, but it's one way of doing things, which takes all of the commit history and all of the authors from the commit history and just adds them to the file for every single commit. Which is really cool. Maybe not the best way of doing things, because an author is not necessarily a copyright holder, but it's maybe a way to get started and then clean up afterwards. Right, right. Well, and that's one of those things. I mean, one of the challenges of licensing and copyright more generally is that it's not something you can reduce to a script a lot of the time, right? Scripts can certainly help, but they can't make calls about things like, I was involved in a discussion with some lawyers just a couple of weeks ago, actually, that basically boiled down to, if something is not copyrightable, how do I indicate that in the source code tree? And the correct, I thought the correct answer was, none of us can agree on what is or isn't copyrightable. So when in doubt, slap creative common zero on it or whatever, or one of the other zero licenses on it. And that didn't go over very well, actually, sort of surprisingly. I know that's been our recommendation as well, that reuse to slap CC zero on it, because we have to choose something, like because of the restrictions of how reuse works, we have to choose something. And also SBX, like SBX doesn't really have a, sorry, there's no license here. Well, there is, but it's not, it's complicated. On top of that, even more complicated is public domain in the sense that there's no public domain identifier for within the SBX. So if you take code that was entered into the American public domain, there's no way of actually conveying that information in reuse. Well, and to be fair, I mean, I think this is one of the interesting, the interesting sort of myths about public domain is that when you say American public domain, that's explicitly American, right? Europeans, in fact, do not have a right. The US government reserves the right to start charging all of you for all this NASA source code. So practical matter never gonna happen, right? But like, I mean, this is one of these things that I would imagine you all run into on a fairly regular basis as an objection or a concern, which is that there's this tension between the work we should do for each other in the name of getting it right, right? Like I don't know quite what the name for it is, but as you were saying, I just want it to be accurate, right? Like there is a, but there's a tension between that and okay, but as a practical matter, X is never gonna happen, right? Like we would love to have perfectly accurate public domain United States information, but like the US government's not gonna sue you over that kind of stuff, right? Like the relevant case law for that is like a hundred years old. It's still accurate, but there's no new case law because the US government doesn't bother, right? Yeah, and how do you negotiate, or maybe you don't, that tension of a developer saying, okay, but the risk is low, the risk is implausible. Well, we can't solve everyone's problems, even though I'd really like to help everyone. So at some point, like if you really disagree with something we've done, you can reuse this free software, you can fork it and change it very slightly and do it exactly as you want to do it. No, that's always the solution, right? We're taking patches. So I wanna end on, I think this has been a great conversation. Thank you so much for your time. I wanna end this on a fun note. If you could do one thing to like help adoption of reuse, right? Like snap your fingers and like, you know, what is it that you would love to see happen? I think the thing I'd love to see happen is, so a lot of people, I'm always very inspired by projects like Prettier and Black, which kind of auto formats your code for you. If reuse could be incorporated into Black or Prettier, like that would force so many people to actually bother with adding the information. I think that is the dream to, in the same way that programmers have gotten used to just letting a tool format that code, also letting a tool verify whether they're copyright licensing information is correct. Right, right. I would love that. Speaking of somebody who has to pick up all the pieces professionally, that sounds amazing. I'm with you, let's make that happen. Thank you so much for your time. Thank you so much for, we were talking right before we recorded. This is very much a volunteer project of love or pedantism or whatever at this point for you. So, thank you for the time that you're pouring into this community. Really appreciate that. And I hope that the next five years of reuse, the next from one point out or whatever are successful and happy. And maybe we'll get to that point of universal adoption. That would be terrific to see. Thank you so much for having me. My pleasure.