 So this is our panel on making sense of license compliance tools and I'm going to ask I'm moderating my name is Bradley Kuhn from the software freedom conservancy So I'm gonna start off and ask each panelist to introduce yourself no more than 30 seconds each We are where you're affiliated with and what your relationship is with licensed compliance Okay, so my name is Thomas team agar. I work for here generalities We're part of the team behind open source of your toolkits I'm also involved in SPDX and I'm also involved in clearly defined Hello, I'm Valeria Cosentino software developer at bitter here. So bitter here basically creates development analytics for open source and so I started playing with extracting licenses from code. So this is why I'm here Hey, I'm max sales. I'm the head open source attorney at Google I manage our in and outbound compliance processes and Just recently joined the board of the ACT Alliance Foundation My name is Philippe Ombrédan and I'm both the maintainer of the tool called scan code which is a licensed detection tool and several other open source license compliance related tools and the city of a small software company called next be Hey, my name is Michael Michael Jager. I'm along with other people I'm maintainer for for Sology, which is a Linux foundation project scanning for licenses and SW 360, which is a eclipse foundation project Providing a component inventory. I am employed at a German engineering company called Siemens and in my free time I give also trainings about tools Okay, so let's just leave the mic at each end and that way it'll minimize the mic passing so first question for the panel is What what GPL or other license compliance problems? Can't do you believe that compliance tools can solve for users? Did you say what GPL and other license compliance problems? I didn't mean you know, I'm obsessed with the GPL So I'm gonna focus yeah, but any any any license compliance problem Okay, what what what what's your laundry list of license compliance problems? You believe Tools can solve and how do they solve them? Actually, I see it today that GPL since you have named it is not a problem for most organizations And if you see it we we are really asking like what's your problem actually with the GPL We have more problems like on our on on some for Sology server on some instance, maybe We have 50 packages which contain these clauses like this is proprietary information This is complete dental information. This is a file where you need to have the written permission of the copyright owner Like files which slipped into open-source projects Accidentally, maybe maybe some of these famous cases have been solved But still we have like in the past three years We have experienced 50 cases of files which are just not for commercial use according to their licenses And how does the tool help people? Well, you use GAN it and then you find it out That's actually part of the open-source distribution and you're probably compiling it in your product if you if you didn't really Look for it. So I think that's that's probably one of the benefits of for Sology that it tries these Licensing relevant statements which? Which prohibit you from commercial use, right? So it's it's really about discovery It's it's how people discover what license are there so they can read them and comply, right? Okay, please To me the biggest problem is that most developers don't know What sort of particle they use and and as weird as it sounds It's a bit like if you were building a car but you forgot who you bought the engine from and where the brakes are coming from and That's still the case So the value of the tools is helping you figure out where the code comes from if it's a party code That's the first thing Then what's the license and they're pretty essential things to know before being able to use any bits of code whether it's Proprietary or free Libre and open source software Whichever point of view you you're coming from if you don't know that it's like a bit crazy But we're still very much at the basic these days, right? So it sounds like your answer is very similar to Michael's that it's about discovery the tools help you discover What's there? Yeah, so not only but that's the Essential okay max Yeah, I'm gonna channel you I'm supposed to be the impartial moderate I don't think tools really do a good job of compliance And I think there's this this fantasy or I guess this industry push Towards tooling like let's just get the best tooling. Let's use the best system and not worry about open source compliance As kind of a substitute for knowledge So tooling is really great at a start for example like when you want to see what the license of some source code is But it doesn't work without some knowledge of how to actually interpret the terms of the license or a Thoroughly documented process for building software. How do you build the software? Where do you store the code? Do you basic stuff like what's the linking style of something? Yeah, that's all I was gonna say I think that there's that there's that fantasy out there Well in my effort to be an impartial moderator, I'll push back on that a little bit I think the your other two panelists who've already answered would say but discovery is the first step You've got to discover what's there first to be able to do any of the stuff you talked about. Would you agree with that or you disagree with that? Yes, discovery is definitely the first part, but I guess there's two there's two camps of tooling There's like people so scan code is a wonderful tool. We use it extensively There's the camp of tooling that wants to use Scanning and tooling as an input to a larger process like you've talked to people human beings have agreed on Social behaviors around code and then there's the camp like the black duck people and the like the compliance industrial complex Where it's let's use tooling as a substitute for thought. Let's not even investigate the software that we're doing So I think the first camp it can be very helpful, but I would agree that it is So what would you think to what license compliance issues do you think tooling can help with and how does it help? Okay So I'm pretty new with licensing things. So the tool We have we're working on basically he tries to use Existing tools like like scan code or nomos and the idea is to tell a story about licenses So how this the evolution of licenses are like related to other software development things like Maybe you move from our private repository to get up So then you have a license change there So they start the idea is to start the evolution of licenses and see What we can tell about this so it's like we are like not really focusing on extracting the current start of the project But more like analyzing understanding why License can change and the impact on other things that are related to software development So dig a little deeper on what you mean by evolution So are you talking about when people when other people have contributed under different licenses on top of other ones that sort of thing like when You I mean when you have like an update on a license for instance you pass from a gpl2 gpl3 or Or maybe you are integrating a component from another open source project that has different license So maybe you don't discover this at the very beginning But then over the time you see that maybe that component changed the license and this can cause problems in your So it's more like a tracking of like I mean tell a story about the license you are or the license is that in your project It's the same question What what can tooling do to help understand and comply with licenses and how does it do it? so Where I see the tooling so so as software developers, we're now developing all in CI CD we're going faster and faster and faster and What I basically see is that compliance tooling and Is usually way behind with development tools by default So what our focus has been and one of the challenges basically we use package manager all of us use package manager in my company We use close to 40 of them So we have more and more police today because hey, we do CI CD so everything goes faster We have more package managers that are not supported by any tools. So you can't you discovery is limited But then also when you have all of that information that comes in all these tools and all all the things you need to be able to process it and That's where we what we focused on is basically okay. We discover but also then how can you process it but to add the maxes comment We don't say we automate everything You I can fully believe you cannot automate compliance fully we call our system highly automated So my idea my best do is to basically as maxes a lawyer and our lawyers Is figure out how I can take maxes fault for the simple cases and write that into computable rules But then from the other side what you do as a developer in your source code also like Give you a way to indicate. This is documentation. This is a source. This is examples And then also get this also in computer form and then make a handshake so I can reduce the work of the leaks so the basically both side gets a handshake and We generally call us having cats and dogs talks to each other being the dogs usually to be developers and the cats And it doesn't speak the same language. So my job what I read into is making the language that they can speak So that it's easier and we can move faster and so that the lawyers can focus on the really complicated cases Which really require dedicated tension. So we take the the knit and gritty work Which hand work out of the equation and just basically say lawyers. Hey here. I have something speaker. I don't know And if we can then automate it which is usually a month long discussion because things are complicated in law We try to automate as best as possible to give hints. That's basically where tooling can help you of taking the General discussion that you have and take it out and focus on really specific edge cases That's what that's what we're focusing on that leads me well into my next question Which we'll start and go the other way with so there's this old phrase in computing garbage in garbage out Usually say today something like faulty data set or something like that I would argue be a little bit un impartial here That almost every free software project out there is a poor data set for understanding its licensing Developers as you pointed out do not do a great job at expressing their licensing intent In a way that is consumable and automatable there are efforts out there to try to get developers to do this personally I believe that's as Alexis's talk years ago was Sisyphus is happy because there's always going to be this pushing this rock up a hill and developers are not going to want to follow any specific documentation system of licensing so Given that how can how can tools actually work in a real way to even do discovery let alone anything else if all the data is so bad that Files are not properly annotated projects are not properly annotated with licensing How can we actually solve that problem in an automated tooling way when the data is that poor? so the Solution we came up with is Well, it's actually we have multiple parties from multiple backgrounds working together So just to explain this whole panel for the people there. So there's actually a group of People that are working together. So how how it works is that? Open source of future actually uses the scanner Thank God, so we use actually that to parse it We can then basically what the generate we generate data then we Displayed this into spdx, which is the internal standard and then basically Michael can basically ingest it in the geology So the solution that we working on is having all these various open source tools work together Now there's lots of companies and this is maybe the difference for developers companies have liabilities So companies care about licensing and open source you as a developer might not but your company does So the way how we how we work is basically companies have these huge problems except we're inside CIC Everything goes faster and we have this data problem. So the solution how we came together is by All the organizations that have this problem working together and this is why we found it last year We found it clearly defined which is basically a Central repository where we can take all of this data in and have a creation platform on top and then fix it So just you know all of the companies pretty much to do anything serious about compliance have compliance team They do nothing else to figure out his licenses They used to keep all of this data inside and then we were like hang on We spoke and like so Jeff McAvara and I spoke a year ago and it's like Hang on why why hang on we're all doing the same work So what you now see is that basically all these companies realize like we have no Proprietary information this we are all doing the same work. Why not collaborate and Then basically take these Fix this data again together and that's what that's how we now do this because basically in case you have noticed open Source is exploding It's coming more and more more we have more and more package managers everything has to go faster and faster if we don't solve this Basically companies will might say like we will not use your open source project simply because your licensing is so bad That it takes me too much effort to push it through our development tool chain I will not even if it's a great piece of open source. We will not use it Because your licensing is so muddy So that's why now we're trying to work together all the people on this panelist Yeah max as well you're also now in play defined to to basically what can we do to provide tooling and work together That we can lift the whole community to fix this again We do not want to impose you how to do licensing and enforce rules to your throat What you will see is basically us working together and we'll see we'll file a poor cast from saying hey, could you not please? Fix this year you have to pull across please so let's pass along So do so it sounds like your pitches is that you've solved it all there's no problem with compliance anymore all the all the upstream Well, you have a plan right and all it's going to be all upstreamed and what's the date? What's the date? It's going to be done by we already like okay, so so his argument is it's done today You go to clearly to find every open source project that matters you can get all the information does the whole panel agree with this Yeah Yeah, so we started working on a solution, so this is basically OzI eclipse foundation Dozens of companies that are those best micro do you think this solution is going to work or are Give me the date when do you think all upstream projects are going to have perfect licensing information such that all this tooling perfectly Tells you everything give me the date when you think it's going to happen The solution Feel free to disagree with this solution let's pass along do you do do agree with the solution and if so what's the date? It's going to be done by it's a good starting point, but data is Yeah, but our integrating the different tools all together and try to get the information also from other sources can be our solution But I would not bet on the date So max, what do you think do we have this is going to get there perfect this is impossible So let's let's talk about why that is the input data will always be garbage for the rest of time because the law copyright law is garbage And I'll just say this who knows what a derivative of a derivative work is Who knows unambiguously in every case whether a piece of code a is a derivative of a piece of code B No one knows no one knows and no one can know everyone has to do their own risk analysis and this is the human factor which is the only thing you I mean tooling will help but I think we're in this kind of Obsessive compulsive phase like you were saying a second ago where we have these ideas to fix copyright law once and for all If only we could have annotations in a specific metadata format at on on the head of every file Then derivative works wouldn't be a problem anymore Then we'd know the copyright provenance of everything forget derivative works who can analyze whether something is even Protected by copyright and so we're always going to have these or whether it's purely functional like that is actually still a novel area of Law still actively developing So never it will never happen, but we can take to your point earlier if we're integrating our Our license scanning tools in as part of the development process That's really the important thing is that we keep it open and we keep it tightly Integrated into how people are storing source and actually developing programs because again once you're A lot of people don't want to get their hands dirty a lot of attorneys I want to get their hands dirty with with software development But if you're looking at it after the things already compiled already been distributed to someone It's really impossible to to figure out what's going on. What do you believe? Do you think we can do you think we can fix the upstream data problem? And if so when actually it's going to be fixed on December 31st 3,000 Okay, great exactly right before midnight. No, no, but it's impossible to fix yet there are things which are practically possible and Contrarily to what you said You can have developers participate in the process and help so I have two practical examples first is Linux I've been involved with Some of the top level maintainers of the the kernel for the last two plus years with others To help clarify the licensing of the kernel and rarely enough I'm not rarely enough. It's a it's an old code base. It has a lot of history Probably the largest number of contributors. We've ever seen in any free liberator process project when we started there were about 80 different licenses and just for the GPL About 700 different ways to state these files under the GPL and you know There's a limited number of words to express that but nevertheless You could think about every single permutation and it land at some point of time in the kernel code base so what we're doing is Scanning and reviewing in details. I've just finished another review of the the latest tip of industry yesterday Every find in the kernel to decide what's the correct license. Is it clear-cut or not? And how can we replace any boilerplate by a? SPDX license identifier one by one and that's a huge amount of work We are hoping maybe by 2019 and the current push to have maybe 60 percent of the files covered there And there's still a lot of ambiguities and really weird stuff where as time goes by, you know companies have disappeared people have died and If you have ambiguities especially in the case of Linux where some part of these three ends up in old non-git trees and wasn't a big keeper It's it's it's a mess nevertheless. You see nowadays if you watch the LKML the linear kernel mailing list developers diligently Providing simpler and clearer statements of the license of the code they contribute and other maintainers nagging them to do that So I think we're all for there is progress there. I agree with you I think we're I think we're all for expressing licenses more clearly yeah, but I have a follow-up question on the Linux point so what happens when you can't represent The license of a particular set of copyrights with a simple SPDX expression Yeah, so so the problem is it's not really about SPDX is what if the license is ambiguous and and there's still a good number of files which have ambiguous licenses and Eventually there's there's two ways either you can get back to the original contributors and trace it back unambiguously and Clarify the thing or you have to get rid of the code Like I agree with that too, but what when it is Unambiguous you know what the license is, but you just can't write an SPDX expression for it. What do you do? I? Don't think so Why would not you be able to do that because the the there's an exception involved that doesn't have an xps identifier things like that Yeah, well, so you can still write an xpdx Expression for that and eventually if there's no official identifier at SPDX for this between code new exception Then you can ask for it to be added and if SPDX says no, what do you do? Well, you can continue to use it as a private identifier I mean just to give you an idea. There's about 300 each licenses Which are referenced at SPDX scan code that takes about 1300 so about a thousand more And doesn't mean you can just trash them and ignore these licenses That's the right SPDX recommends that people if there's a missing identifier, they just make up a private space Yeah, and eventually there's discussion to have decentralized namespacing To to address that now another example, which is I've taken the top thousand packages of several popular application package manager and it nearly Javascript with npm Ruby gems for Ruby Pipe I for Python Maven for Java and NuGet for C sharp and I am computing a bunch of Statistics on the clarity of licensing. It's still in progress not fully finished There's one interesting tidbit of data that came up, which is rarely enough the licensing of node Package that mean JavaScript which are more often smaller and more recent than others is usually clear and One of the reason I think it's clear. It's not so much has to do because it's More recent code or smaller package in general But because there's been a significant effort of the JavaScript and node community To ensure that there's feedback provided to developers if you submit a package to be uploaded to the npm registry And you don't have the proper spdx license expression attached to your package. You'll get a warning it's not rejected, but you'll get a warning and I my only explanation for this Difference between node and other package managers is Possibly based on that. So I think if you provide feedback and Provide some information to software developers that Licenses missing or license is not clear there were act So if you go even further eventually for the kernel we'd get check patch Which is the tool used to verify each pack is correct before you submit it Act as a quasi license compiler And if you treat licensing as something which is as important as the code being able to run a compile I do a follow-up question. I want to give Michael one one chance to answer my previous question, which is When and how do you think the upstream data problem can be fixed and and okay? Okay, so you don't forget about the date actually I'm asked about dates all weekday, so I was hoping on weekend I won't be on dates as a project manager, but the answer is something like a Philippe has answered I think the question is similar to when will we all have electric cars and the point is at no point of time because there are some Who adopt electric cars very quickly and there are some who just don't care And I think there is some area an open source who very people are just not so very interested in Publishing license clean clearly defined Packages and that will stay around it will also stay around because today open source project Themself have a lot of dependencies and if they don't update the dependencies they hang around for five or ten years You will find very super famous Java components with ten-year old dependencies And then you can ask them or maybe you contribute something To update their dependencies, but unless all dependencies are all there being used by open source software Not really being clearly defined in terms of licensing you will have the situation and it will be like electric cars in 2040s the majority of cars will be electric But you will have combustion engines hanging around and I think the electric cars analogy is very interesting because the same thing Happens now in license compliance. They are different players trying to come out with their own solution, right? We have the linux initiative here We have a reuse dot software from the free software foundation Europe or we have for Soler G We're for example my employer one of the reasons why we invest into for Soler G is because we think if Tool is freely available and at the at the time when we have started to contribute to for Soler G There were not so many license compliance tools out there But we thought if a tool is actually available as free software it will help to clean up Licensing an open source of so that actually links to the question follow question I want to have for fleet so let's start with you and move. No, let me start there and we'll move it along But it picks up on what you were saying last fleet. So so I once called License upstream license and a license annotation and projects and unfunded mandate upstream because from my point of view The companies are all asking for us They want perfect upstream annotation of all this licensing to make it easy So all your tools work well and give all this data, but upstream developers they have other work to be doing They're trying to make this software work making perfect license annotation in their in their project is a big job and Often I it sounds like the tool folks are saying well, let's collaborate with you to get it right How much do you think the obligation is really on? The folks who want this annotation to get into these projects do the annotation for them and offer it as Patches to them and say does this look right to you use it versus this collaboration idea you're talking about which sounds It sounds interesting, but on the other hand, it's really unfair to ask these developers to do yet another job When it's not what they want to do It's really the job of the people who are all obsessed with this license compliance stuff to actually get it done Yeah, I agree. I think Contributing unambiguous Annotations is a good job for those who are actually trying to have that or want to have that the point is that in some cases You cannot actually contribute it because you're not the copyright owner if it's Proposed right you can I think yeah That's probably to accept it and incorporates it then they've they've ascended yeah I also think it would probably accelerate the entire thing if those people who are asking for it Are actually contributing this cleanup work, and I think they're maybe clearly defined also goes into this direction actually Because for example if clearly defined is able to Take over analysis work from phosology or other tools then actually someone else can contribute that to the clear What do you think about that Philippe do you think that do you think it should be an unfunded mandate upstream? Do you think somebody has the job to come along and do this and if so who so I Don't think it's as or or Unless you live in a parallel universe Using software which for which you don't know what license terms You need to abide by is just crazy. I mean the same way I mean I wouldn't want to use any software for which I don't know the license That's a gateway But most developers are gonna throw the gpl on the top level directory start making files and all of us We consider that a fully gpl project. It's annotated enough for any developer to care about probably for any lawyer to care about But the compliance Want better annotation right? I think it's perfectly okay for anyone I don't care about annotation per se I care about clarity and If the convention and it's widely accepted the convention is if you slap a gpl at the top Level of your project your project is gpl then that's perfectly good enough. It may not be perfect It would be better if you were a bit more expressive maybe state what the license of each of the files But nevertheless, that's that's better than anything and better in many case than nothing at all that we see in several projects so it's not so much about slapping annotation as much as Being able to discover whatever convention may be used by a project or community The thing that's terrible is when you get nothing So what do you think about this issue this this this and this unfunded mandate question Well, should upstream have to maintain this and if not who should maintain obviously up stream shouldn't be forced to maintain We're all benefiting from their software. We shouldn't be imposing Sometimes extreme commercial hardship on them for something that they gave away free But I think you made an important point, which is you don't you want clear licenses, so I think I think that Go ahead. Well the thing I was going to say was Convention really matters here And I think that's something that the that the tooling people are losing which is that It is a convention that if you put a gpl license at a root level directory That all the other files are going to be under that license and we've lived with that convention and it's been really low friction to create and use gpl software for example under that convention and I think what we don't realize we're doing is every time we do another iteration of The new obsessive compulsive behavior of documenting of annotating We're creating social precedent and we're creating commercial conventions Which if there are ever ambiguities and licenses eventually could be consulted on so can you imagine like a project? There's two projects one is a gpl license at its root level directory has 10,000 files. I think now I can use that unambiguously What about when we move to the world where every one of those 10,000 files needs to have a perfect annotation? So is the convention going to be that if one of those files is missing the annotation that all of a sudden the suffers That's actually good because since you're a lawyer I'm going to ask you a legal question I can't give us legal advice because you're not our lawyer, but tell me I'll give you legal advice Tell me tell me so the the compliance world has been feeding me back for years that the file That the the file on the disc has special significance under copyright that annotating the file with its license is Incredibly meaningful. So can you tell me exactly what where in the copyright statute? It says that the file on the disc is the special like each source file. What's that you're saying each Where so I've been looking for years in the copyright statutes where it says file on the disc is special And that's the thing you should annotate with permissions. So can you help me find it? No, it's not there obviously and actually I Mean if you really want to free people on copyright licenses at least don't even need to be written Right like we can really get freaky with the extent to which convention can Can start talking about so let's take that a little bit So I mean I'm being a little glib there about the file because because the file is not where the where the Copyright controls attach. How do we annotate? How do we annotate copyright in? A software project. How do we figure out whose copyrights or who's and what their license? Where does the copyright attach where does the license it can attach people have argued that see tokens like if you tokenize the C-program the copyrights attached with each token. I don't think there's any legal backing for that I think you would probably grease it so but I understand the problem. What how do we find where to annotate? I think the appropriate thing to do is to be respectful of project maintainers So if we if we look at it from the viewpoint of respect where people have taken an extreme amount of effort and put Something out there for our benefit then we should take projects as they come instead of dictating I think how they should annotate we should say okay if it's clear enough that using some kind of tool We can scan it we bear the burden we bear the cost of of assessing the provenance then that's that's probably good enough We should probably circle around conventions that don't impose so many burdens on so Michael's already accused me of I want to give everybody a chance so let's keep it Michael's already accused me of changing the question in the middle So the so just give us your general thoughts on how you feel about the issue of upstream annotation Who should do it why when and how when I agree more as what they said, so it's our convention So agreeing on some rules, but for instance the The work that MPM for instance is doing or GitHub so forcing forcing or anyway Putting a warning to have a license in your project can help So I think it's like a mix from upstream and then also from I mean knowing what you are doing when you write code So I would say that is what about what about you? How do you feel about the upstream annotation question? So Luckily in my company I could write a policy on this Lizzie I had that power at right first and our set we say like yeah, don't fix the problem on our side just File a ball request for this because for us. It's basically if we don't If we patch it basically say so we patch it internally where we So just you know We do have in our tool and ability where we can say that the convention is if it's a license found a route It applies to all the files that is possible To basically translate convention into machine learning as a we try to say like hey, please upstream it Because for us it's basically if we fix it once it's basically fixing going forward and sometimes it's really really trivial things And it's like guys come on. It's like takes you five minutes to basically to fix this we're sometimes talking about Most of the time the license already there, but just because they didn't perfectly follow how friends may even specify it's out How the license does because it's yes, it's in the maven ref, but it's deeply buried in there It's a five-minute fix like just fix it and it will be fixed for the whole community Cool, so I have one last question that I want to ask you then we're gonna turn it to the audience so my last question is the biggest compliance problem I see in the world is Under copy left licenses the requirements for complete corresponding source code the source code that corresponds to the binaries or otherwise Minified JavaScript, you know binary like things Tell me what tooling helps with that if any and how So you want to know the corresponding source code for right? So you have a binary, right? I mean this this is the ultimate compliance problem I have a binary that I know was built from some sources that were under a copy left license How do I produce the source release that goes along with it? What's what what what and where is the tooling that helps with that? So are you the creator of the binary or your consumer because there's a either way so if you're the creator basically what what we're trying to do is basically give you the toolchain for free and What we're also working on is giving you the instructions on hey if you do Case-axel we basically will be publishing for for all the various packet managers how you can comply with that and Like literally exact details of like if you do this in this in if you're doing may even do this if you do an MPM Is that so those details were not Available beforehand. Yeah, most companies have written those and we were like when I asked companies that work Oh, you have those can we just open source those like no no So now I basically decided with a couple of other people to we're just gonna write them We anyways have them publish them as basically this is how you can do it and all the tools will support that Yes, it will require some time because some tools are a little bit more complicated to do this but yeah, and Then basically what for me to think once we have open tooling and we give you the documentation and like this is how we do it It's basically us as who needs it or the companies that needs it is going to all the tools that are part of that stack And basically filing pull request and say hey webpack We would like to do this and this are you okay with this and basically we provide the the tooling for that And then yeah, it's kind of take a while before we get to all the tools, but Yeah, my solution we have to as we are the ones that would like to have it. We have to invest to fix it so So what do you think it? What are the what are the tools out there now that help with complete corresponding source code provisioning On either side consumer or producer. I have no idea I think I agree with you actually so Because I haven't seen the tools yet I this is what I'm trying to find out where they are and how to get them so I agree with you That's why that would be my answer to if I were asked so so we're gonna agree we're gonna give it max Miracle is up there. Hey Miracle, so I just want to give a shout out to the quarter master project. It's a great project It's in development I Think that yeah, it's gonna be really hard But the way to get closer to making sure that when you convey a binary You convey the complete and corresponding source is to make sure that whatever tooling you have is really deeply integrated into your Build system because that way you can create a manifest You know exactly every source file that went to the binary It's gonna be very easy to convey both the tool chain Now as a as a consumer of binaries like let's say you're in a relationship with a Company and they give you a binary and you're required to redistribute it there It's it's gonna be impossible to comply because you're gonna have to do contractual negotiations They're gonna have to give you the source or an offer. You're gonna have to pass that along It's really it's really difficult, but as the producer of a binary. It's not that difficult as long as the scanning is deeply integrated with the build system And to the point. Yeah, if you're not the producer of the binary, it's really hard Even if you take a package in a popular Linux distribution being able to Ensure that you get the exact corresponding source code is it's not a given thing now some tools like quarter master Can help I I also have a tool called trace code Which is Using s trace to trace the build and figure out which files may be used But it's it's really low-level help and it's one hundreds of the work. That's eventually needed to me the simple Seeing to ensure you always have the corresponding source code available is to always work from source and It's something that surprising everybody's using open source, but very often we consume package and projects as pre-compiled binaries coming from left and right and the software teams be they open source developers themselves or in commercial context don't have the corresponding source code It's a real problem Especially after the fact getting back to the source going to be harder and harder Website disappear. There's one person and one team that helps to preserve that. That's the Software heritage project which is trying to index and preserve all the source code That's really important and we don't realize how important it is I mean there's whole ecosystem like in Java that's been used to consume only binaries There's a huge amount of Java code which is not available and no longer available in source code And when it's available in source code There's no license information. No, I'm gonna agree that everything in the world should be available in source code if it's software So I'm with you on that even if you don't publish it as a consumer Not taking advantage of the fact the code is available is crazy. Yeah, I agree It's just you're you're giving up on the benefits now getting back to the other question just before I wanted to add Something which is if you're publishing source code Supposedly under an open source license You want it to be consumed by somebody else. Otherwise There's some problem in white. Why do you publish source code in the first place? So having clear licensing is should be part of the standard practice I would argue that we should not optimize for the most pedantic corporate user And to this to this for instance take Two examples so practice in the Linux kernel has always been to always annotate each and every file So that's the common way for Linux if you take another Ecosystem for lack of a better word Ruby Ruby developers hate writing any comment in their files So there's very few comments in general and even fewer License related commands or annotations so it would be crazy to force The practice of C and Linux cannot developers on Ruby developers I do want to give Michael a chance to answer the the source code the source code provisioning question And then we're gonna get some audience questions. Yeah, because also Tom wants to hold up a sign here. Oh So answering this question as the last person is probably Redundant because I agree that as a producer you have quartermaster there and there's a tech from software heritage Also sitting there doing an interesting project in this area when you have a binary This binary analysis toolkit might be interesting. I think there is a new generation by Version out of it knows published to be published soon binary analysis toolkit next generation bang So to say so I I think that's interesting And I think that should be more open source because the old binary analysis toolkit was open source But the database of fingerprints what you find in the binary and associations with source code being published Was not public so I think that's going to be changed now, and I think that's that's probably also an interesting tool So we're gonna take a few audience questions with our last ten minutes Which means we have to share this mic because we don't have two mics in the room So Tom is gonna run the mic, and I will pass this one around to others So so if you say who on the panel you want to answer first that would help There's basically one burning question with everything I've heard now which is like in a Hypotential scenario in a big enterprise doing like Java software for for ten somewhat years Starting to care about license compliance now is like okay Just like break down and pray to whatever deity you have because basically you're in a very bad place And you're not gonna leave it soon, or like what what do you do? Because yeah, that's not everything you just told me sounds pretty bad actually So so you asked what you do so you're basically a new company and you're it was a so what I would recommend For Java Open source for you to look at my own tool. We were exactly the same place And we basically we looked so so the difference is basically It's a form of there are all the tools basically you need a tool to understand package managers And the trick with all the previous open source tools was like they understand basically just on file level Corporate holders and licenses, but it didn't take the package information into account And so we basically we really looked at all the actually we spent two years looking at all the proprietary vendors And we know everything's out there And they don't really work if you really look at it if you understand so my first question for tools was that was really like How do you get to your source code? So how do you how do you get what the packages are in there the second question? You also have when you show me concluded licenses How did you get to that conclusion? Those are the two questions that so no matter what tool you pick? Those are the two questions you have to ask yourself. So I Because I was expecting that he is answering with a tool But I would answer even though maintaining tools Situation you need to be to become a very of your situation like what's open-source software you're using what is actually your compliance risk What how do you distribute software? What's your distribution model and from then on you probably end up this art likely and I think there's also Java support and quarter master If I'm not mistaken But I think the first thing is situation also when I talk to other companies Want to use phosology it sometimes it just turns out it's not the right tool for them because they're in a different situation I have different compliance needs So I I don't know who's the best panelist to answer I I'm asking about MIT and BSD compliance So SPDX identifies the licenses with leaving the copyright here and corporate holder as variables So how do you with tooling? When the licenses require you to reproduce a specific copyright here and and holder How do your tools deal with that and as tool makers? How would you like upstream projects to? Make it easier for you to deal with for example Facebook has like a fixed year Google uses like tautological Corporate holders, so it says like chromium authors and and then maybe a patchy plus LVM addresses the GPL to Compatibility in a different way so just to rephrase your question is how important is it to have all copyright statements? Like how do you deal with it? Yes the license yeah, and how important are the years in copyrights like how do you like upstream projects to? What would you like them to do with years and corporate holders so that it's easy to comply with for me writing tools that the Copyrights is I like them to be passable, but I think the bigger question is how Importance is it to have the exact statement? I remember a discussion with a developer from actually Google Working on the next version of operating system called fuchsia And he was telling me that it was absolutely essential to have the all-right reserved trailing word from copyright segment That's a big discussion so we can show that so I told him no it's it's been it's been it's it's it's over since 1950 but so the question is also a legal question No, I actually have a follow-up question to the question there go ahead I would say as little as possible like The git log is a great Documenter of both the Contributor and the year right if there was ever any infringement or litigation people would go to that immediately So if you can just get rid of all copyright statements and all code I'd be happy with that I'd also add the easiest way to comply with copyright notice requirements of Non-copy-Lefted licenses treat them like copy-left licenses always give the source code to everyone in the world always and all the copyright Notices will be right just always give everybody source. Don't write any proprietary software. It'll solve all these problems So My question is where do I find or where can I get information on what tool might work for me? So let's say I'm a company, you know just starting out on compliance And I'm wondering, you know, what tool can I use now apart from me telling them these and these and these and tools These tools are available and this is what they can do They might not believe me because well your lawyer you don't know anything about tooling. So where can I send them? So I have decided because max is the biggest consumer of tools on the panel And there's too many people who make tools on the panel. I'm gonna let max answer as a consumer of tools I think the responses you're asking the wrong question And we deal with this a lot Sometimes more than we'd wish to you, but if you're looking at a situation where Things seem really messy. There's been a history of bad practice The first thing you need to do is talk to people you need to talk to the lowest level engineers You need to talk to their management and before asking for tooling you need to make sure that there's some kind of coherent process for checking code in Is there some kind of basic IP training to the employees so they know how they can check code in is there code segregation? So I think your your problem is purely human and then after you solve the human problem if you can solve it Then then tooling is really it doesn't matter what you choose every every One of these tools is going to help you do what you need to do, but you need to be solving the human problem first And 100% right. I mean process first and you can choose my tools afterwards Okay, so let's assume that we're We work at a company that is still on the path to putting everything in open source So you have a mix of open source licenses and binary or proprietary things Does spdx or a process using something like that actually grok the fact that some things in there are binary or non-open licenses? So as a company that has a lot still a lot of proprietary stuff in there What we do is actually we make our own license identifiers for all our prices So we treat basically open source licenses and proprietary licenses as licenses so the tool is designed to handle both and SPDX supports basically also writing your own License identifiers. So what we do for instance, so just you know in aspects They start with license ref so we do license ref proprietary and then here This is how we have our own identifier and all of our packages when they go to our customers They will have an SPDX license identifier exactly for that so that our customers when they insert our packages They see exactly the license Okay, I have one last question and that is on this question of completing corresponding source I'd like to ask the tools people on the panel has Are you first of all? Are you aware of the reproducible builds project one and two has that? Helped you at all in getting to complete and corresponding source Yes, we're aware It has helped somewhat The problem yet is the whole chain is not yet supported to do all of that So what you want I said what we want is to have basically all of these tooling running ads Suffer creation, but also be able to parse when the artifact is created afterwards So it has to do the whole tool chain and for that that still requires a lot of work It's like where we now currently are at what you open But the open tooling is working at source code creation a lot and figuring out the discovering and and posting at that We're not yet there with really the end-to-end. We'll get there eventually Anyone else want to comment? Yeah, I think there are a couple of steps before that Which would already solve the problem of providing the Complete corresponding source code because I think reproducible builds always producing the same binary with the same signature or Like hash value is a step beyond that that and I wanted also to add that I understood Miriam's question differently supposed that you understood the process And you understood your roles. I think there is a problem that we don't have a Central marketing department for open source tools so far, right? so people know that there is open source there are open source tools and license compliance, but It's it's really difficult to understand the capabilities of the existing solutions And there is actually an effort on github. It's called sharing creates value and there We would like to list all the open source tools and and explain their Capabilities and how they fit together and how they could be arranged in the tool chain and in the company or so so I'll take my Progative as moderator to say on the reproducible builds question and my very biased view because reproducible builds a conservancy member project But I thought this before they were a conservancy member project It is the best thing to come along in the last 20 years with regard to the complete corresponding source code problem in my view So with that I went I want to thank all our panelists and many of them before you clap I want to know many of them submitted talks of their own and we cajole them into being on a panel together instead of having their Own talks that were very gracious about it, and I'd like you to give them a big round of applause