 So the the final talk is Kate Stewart will talk about Essentially a solution that will make compliance much easier. I think that's describes it pretty well Okay, we just heard the last speaker talk about trying to go through and work through the compliance. I Spent probably the last 10 years in that building there trying to figure out how to comply we've got sources coming in from multiple places and and In order for us to sell silicon We had to basically ship out a Linux distribution with it And when you start looking at the list of distributions and looking at putting a user space together putting some licenses So forth and saying, okay, how do I comply to this? It became a really interesting problem so like say right now I Changed jobs last year. I'm currently working with canonical on a bunto But all this work was pretty much in the seeds for this where the pain that I went through I was working at a free scale doing license compliance. So most of these questions there I was asking and so you we just heard the discussion of to okay, if you just pass on the source, that's fine the trouble becomes is you've got Some new code you're adding in and you've got multiple other licenses and you're combining together and You're trying to figure out all the pieces that are there and the information is not a consistent form It's coming in from different directions. You have to hunt from in different deletions So all of a sudden you've got multiple licenses. You're trying to track and prepare perpetuate on and Then if you basically are taking multiple packages and putting it together and you're getting things from outsource vendors You know, you might contract to someone to basically give you a driver or you've you know someone for instance does a codec and They have provided under certain terms you have different terms that you have to work with and Make resolve with the open-source licensing as well All of these things tend to make it a little bit more interesting as well as lining things up with the third-party software So at the end of the day for one company trying to ship things out You having getting an accurate bill of materials is what they generally called a software bill of materials is a slang term For this of saying what's there and then what you actually have to do to comply it becomes a challenge And it takes a lot of effort. So while it may be very quick for the developers to pull the pieces together to figure out actually what you have to do to comply and involves significant effort sometimes if you don't have your processes set up properly up front and The problem then becomes compounded as you work your way through a supply chain. So like that device we just saw You know, you'd say miserable OEM vendors are taking it so someone takes it puts a little bit more on someone else takes it put a bit more on it's a bit more on and The supply chain plays a role here in terms of things modifying slightly as you go along at the end of the day it becomes very complicated to get all the pieces together and so we I was basically looking at this and started talking to some of my Colleagues in other embedded vendor companies and we started talking to people at like the links foundation and so forth and You know Motorola various places of players like that and the need to gain the standard format for conveying this information was became pretty clear to us and so Came up with a proposal last year. So I don't know. Were any of you in Wellington last year? Ah, good. Were you at the talk? Package facts proposal. Oh well. So Last year at Wellington was a starter this package facts proposal and so I came up with, you know, okay, this is what we roughly need to keep track of and basically presented it at Conference and That sort of started things off So we suffer suffer package data exchange It's basically looking at what facts on the licensing and copyrights you can provide and Coming up with a standard way of communicating them The idea being if you're sending a package of software along you can send the licensing obligations along with it What's been sort of talked about, you know on Android and various type people have their own individual ones The idea is to standardize it across the industry so that It becomes fairly neutral as format and we can actually exchange the information without having to dive into it 20 different fashions It's become one of the key pillars of the open compliance program that Matthew was just referencing and it's Initially got its start as a working group under the Fospas are and they are now The support of their of that organization has been very valuable to us for making this possible for us to collaborate across To get this to happen, you're basically getting released managers. You're getting technologists You're getting lawyers. You're getting business people Because these are all the elements that really have to make sure their needs are satisfied for the whole supply chain to work with the Linux Ecosystem the open source ecosystem and this does extend actually beyond open source too So this was at the end of the last LCA talks are okay We're going to go to a phase one and face to like any good little plan and the first phase is okay Let's standardize on a way of encoding it Think it sounds reasonably straightforward when you start digging into it. There's a lot of interesting issues there There's a lot of interesting issues. I'll go into a little bit as I go through some of the details of the file formats But we wanted to you know, it has to be uniquely identified You have to have some degree of confidence that the data hasn't been corrupted on you or changed out from underneath you and You have to everything fairly identified and then once you've got a format You need to be able a way of sharing it and being able to pass it on So those are the sort of the two elements And we've made actually a fair amount of progress on phase one and we're sort of heading into the phase two quite nicely at this point But I'll illustrate that and hopefully you'll be convinced as well So we're working with a false bizarre the charters to create this data exchange standard for enabling these companies to share licenses and components for these software packages and The facility in the compliance obvious grassroots and We're not doing is a standard organization Which was one of the options we initially discussed. We're actually running as an open-source project So those who actually care about the issue and want to participate are doing it and certain companies have vested interests in You know, it's part of their business. They're participating directly consulting revenue eventually for them and As well as because it'll make their job easier like any open-source project. It scratches an itch for them So here's a bit of the background of the participants that have been actively engaged with us And come together to you know, and there's more that have been working. I've got a lot of them But you know, we've got a fairly broad spectrum coming People working in with different perspectives and so hopefully we'll be able to figure out most of what's needed in the first go Realistically anything we do like this. We're all pre realistic one point. Oh, isn't gonna be perfect We'll go on but we're trying to serve Future-proof a little bit and try to get as close as we can initially so the working group operation is There's about three main working groups that are formed Initially it started as one s pdx group and then we started finding out that there was people wanting to talk about the legal issues And they were boring the technical folk and then there's the technical folk who are really getting into okay How this file formats are going and there's a guy what's happening to my inbox here? I don't understand what's going to my inbox and then the business guys were going. Yeah, okay So how do you make it real and they said like say we had you know three different constituencies really coming in And so over the last I guess I guess but last quarter We started actually forming into sub working groups where you can have the more detailed discussions in your own area of expertise So if you had looked at it before and you want to come back if you there's an area in particular It's interesting to you feel free to join into them Because we have people who care about the Intellectual properties and things like that. We decide pretty much up front We're going to be creative anything that we come up as this creative commons and the copyrights effective Going to be a Linux foundation to remove problems and because we weren't working with the standard We wanted to make sure we kept whatever we did. We were keeping it in a neutral format and then At the last LCA talk The person before me was Joss Burke is talking all about how not to create a community and all the things that could go Wrong and so I was paying very very close attention and I think a lot of those lessons We've managed to the fact that we've grown and we've got so many diverse people participating I actually highly recommend listening to Josh's talk if you're actually looking trying to get things going It was very good and very timely So the specification goals are you know Realistically we had one guiding principle that we got right from the start from talking to the legal people who basically tried this and failed And we had to stick to the facts. We couldn't do interpretations interpretations and putting things together It's a realm of lawyers and making risk assessments and legal judgments so we had to We're still playing a tension point between how much you can figure out in the code through tooling and Justify Versus how much is mmm. I think it's this and how much background like okay? I've talked to this person on this site on this person on this site and gotten the information You know some of those things so it's what's in the code is what we're trying to mostly stick to But there are certain cases where certain tools will you know determine that? Gee this code here is under this this code fragment matches this gpl code from ten years ago And a lot of the commercial tool vendors have these type of capabilities right now so the file format We've been trying to figure out. Okay. What's a license and I'll go into that a little bit The other thing is we wanted to come up with a standard set of short names There are Why when you refer to gpl which variant of gpl you're referring to specifically what are the standard exceptions? So we were looking at to try to come up with a quick way of referring to a license as well And so this also is an interesting little discussion with all sorts of interesting viewpoints on it and multiple different attempts that had to be harmonized a bit But we figured we'd get these two things which was the specification and some standard short names we were going to be advancing things a lot farther than they were and Last August we came out with the first draft beta draft in public put it out Linux con and Then since then we've been getting some feedback and working on things and we're hoping to get the release candidate nailed down for q2 So what is the next PDX file the specification is for creating a file and It will contain identification information Package which Sarah says okay. How who created this file? What was there? the package information is the information about What you're talking about the SPTX file the SPTX a file basically talks about a package and we're not defining what a package is It's anything that's a tar ball or anything. That's a you want to put together and call a package you can call a package the key being it's something that has It's a discrete quantity that you just you determine and Then anything in there you're basically listing all the information for We also had the fact that not all these licenses are going to be partly short-form licenses So we wanted to have a way of handling these non-standard licenses Oh, we found a couple fragments that sort of look like it might be a license sitting in the code We don't know if it really is or not I want to be able to catch it so that the lawyers down the road can take a look at and apply legal judgment appropriately and then what's actually happening in some cases is At the various package level You are saying that this license is one thing and you start digging down into the files And you start going right to the file level and you say you see all sorts of other little licenses sitting there That's part of the pulling you all from different places, right? And most of the cases this is fairly benign But in some cases it'll bite you if you're not aware of it and so, you know, there's been Cases there's you know, there's been a case in the past in my prior life Or you know, you just have to make a different choice about what you're going to ship because you can't comply and Finding those things out late in life or before you made it gets costly The closer you are to release and when it's time to money on an embedded front the more you're going to lose so That's it and then the other notion that we wanted to have is this whole supply chain of passing information on We wanted to avoid capturing if someone's saying yeah, I've audited to this already So if you basically got a file and someone's gone and audited and cross-check this makes sense And they want to sign off on it being able to pass that information on builds trust So, you know We've been looking at putting a reviewer sign off by so concept being you upstream a patch The main tater will basically take and look at an appliance to your kernel Same thing. They do a sign off on it that they've already reviewed it looks good So the identification information what that really is is okay. What version of spdx is going to be in use 1.0 is What we're going to call the first version But I could say we don't think we'll get it right. So we want to make sure we get some way of versioning So in the tooling can future-proof How was it generated was it a person or a tool and when was it done and then? You know how the other thing that was sort of an interesting aspect that came up from discussions is you cannot License your copyright license facts, but you can The attribution of the person doing the work apparently I learned though is and so the question is okay The state of what it was since it's a person providing it to you under to use Which was sort of an interesting meta issue that didn't come up initially in the thinking but Because we've got a lot of good lawyers This group they were basically saying let's make sure that we can make it explicit and We're going to put a default license in there that Perpetuation let people push things forward But if people want to basically only provide this information under a specific NDA terms or something like that They have that ability which is going to be Not the preferred way But it's for adoption and there's certain business needs sometimes I have to be worked on round And then the other thing is it's in great It's a notion of an author comment Where the person who's been doing authoring this file has to wants to make? Information available to consumers of the file and that's an optional field So anything that's been great out right now is considered optional fields So that's the first part. That's just the identification information of the spx file itself At the package level you have your standards, you know your name The file name and for politely where it's been downloaded from if it's known And then if you want to record the Shaw of that tar ball you can do that and then also Check something of the file base check some all the information at the file level if we can do a check some of the actual Or is it the Shaw's we have a way of fairly uniquely identifying that nothing's been tampered with and that's one of the key facilities It's because we've seen in the past where you say that's the licenses for this for the area and the file changes underneath it And that Destroys the analysis so you want to know that the analysis has been destroyed and if you're validating it And that the sources you're looking at are actually still matching and the reason the patch level on so doing it the patch level on zone doesn't work is We wanted to be able to have a way for people to embed this into the file And you can't do a show on yourself where it gets very very tricky to do a show on yourself with yourself in it So you can have it outside or inside By using this XOR on the files itself and The source analysis information and anomalies again commentary was requested by a lot of the commercial vendors and The declared license for the package. This is what's been in the copying file the readme's and so forth This is what you're expecting it to be But then also we're courting at this top level Especially and all these scene licenses as well as and it's a pretty quick flag to someone to say okay All the scene licenses aren't quite matching declared licenses Do I care more to go down and deep and understand the file level it's going on if all the scene licenses match the declared license? Probably things are fine So you don't have to go down to the file level, but at least you have that capability there and Then the declared copyright holder and then you know some optional fields for more description if you want And that's pretty much one per spdx file for these for the non-sanded licenses Unique identifier and then the extracted text of what we found is what's going to happen in that section And at the file specific level again the name the type The show of its content the show of it itself and Then the asserted license the seat and then this is one of the areas where I was saying the licenses are a little bit interesting In that sometimes the license that's actually written in the file is not really the license that governs a file and These cases seem to have caused a certain degree of consternation and so The compromise that came up with as we probably record what the asserted licenses was what with the tools and analysis believe their license to be As well as what is actually in the file and you sort of need both to have trust and then any comments You have an optional comments explain why they might be different if you want and Then the copyright information is seen and then another optional field to say this file was actually part of a different project If someone's discovered that again, this is part of okay Can you get the traceability and the information you need and then for the office sign-offs on the reviewer information? You know Basically your name and your timestamp when you signed off on it And that's the only thing that pretty much can get out to the files once the initial versions created There's a thinking here that you've signed off So the short form licenses are going in an appendix one in the specification and As a version 1.5 of it we have 151 documented right now is in a spreadsheet and I'll talk a bit more about what we're doing in terms of moving them over to what so Specification status since October, sorry since August we've had the 1.0. What we're doing right now is RDF the specifications being a RDF as encoded as RDF XML and there's also a tag syntax variable Tag syntax more like textual tag strings And so harmonizing those two to make it still readable, which is one of the original goals has been a good source of discussion and debate and then we're also Taking and moving from just a text document from the spec into an RDF ontology They'll generate an HTML file with a spec in it And so we're moving that over to a git infrastructure so that we have our revision control and keep it going forward And so if you're any questions or comments in that area the spdx tech is the group that's focusing on these issues In terms of the licenses We're trying to go for the most common ones and the standard license names as well some of the common exceptions so we're not good so You know and the short forms themselves are Trying to get they've come from some precedence or there's basically the red hat precedence is in the Debian precedence for short forms those were the two most common set of examples we found and Debian's also been working on a depth five, which is a machine readable format of encoding as part of their packaging too so trying to get everything to harmonize together with names is one of the focus areas for us and Once we actually have these short form names will probably be making sure that we have the information available on a website Such that there's information is canonically there if we're using a short form as part of the spec there will be a web page that Here the web page that will basically show its name The full name of the license where it is what the standard text is what the standard headers are and then a neutral version of it as well as Notes about it if there are any it's an OS I approve and so forth So we're gonna be using the website as a way of basically giving us a reference and so to say okay If you're using a SB in SPD expect if you're using a license this license or form here You can go to find all the information you want about it in one spot and to keep that up and going and Then the conclusion Trying to figure out what licenses to include and which one's not has had some fun You hit because you're recording all that information Trying to decide okay being Exhaustive and you know using listing all the 19 every license you discover and put it in Is a lot of work and for little gain so we're sort of aiming for about 90% coverage and In realistic, you know about 20 actual licenses are responsible for most of the open source world right now If you are interested in licenses and so forth I'd actually highly encourage you to go look at black duck site there with their top 20 It's really quite interesting which ones actually make it up there And then there's about 80 they've been ratified as open source And so that was also some of the guidelines that we went in and so we use those two as our starting points And then it was quite frankly people as people who are in the industry been working at it They've been countering common ones. They were basically bringing them forward and saying hey, you forgot this one You forgot this one. Yeah, we have forgotten some and and So that's pretty much where we're going and then we also basically what Debian's been doing and Fedora's been doing and trying to again Figure out a good subset from there Now it's a template and then the status of the repository itself is the 151 licenses and we've actually been working with Various the legal some of the key the lawyers are caring about these things We're working to actually get some the ambiguous ones resolved a little bit better The Python stack in particular was causing a bit of brief on the analysis side And so I think we've been working with the Python foundation to actually say this is what they mean in these cases So I think that'll be a little bit of a service it, you know Love one place to actually sort of say okay, these are the key licenses and this is why or some of the analysis is there But for the most part they're all fairly straightforward and we've just been you know documenting and learning What the processes will be using going forward to Licenses to the list The legal and the business teams are developing those right now You know someone wants to know me because license license to get written tomorrow You know every day of someone's right maybe writing another license or deciding no They have to put this out under their terms and then we'll want this to be able to be included in the future So coming up with the process that's reasonable for adding and choosing the license will get added So it doesn't become a kitchen sink is part of what that team is working on as Well as coming up with the ways of encoding the neutral version of the license This is so that the scanning tools can use it By that what I'm meaning is Generally white spaces and significant punctuation maybe You know British versus us spellings of things Okay, it has to match this except this whatever is in the copyright field You know that what's in that copyright field doesn't necessarily matter the license matches And so coming up with a templated version of that license that can be used to help scanning tools Is one of the areas and so there's been some discussions with some of the phosology and Daniel German who's been doing some tools on that side So coming up with a reasonable way of representing that is a dual as well And then coming the technical teams been looking at okay coming up with a template to support some parsing hooks So that people can automatically index into things that they care about to the reference and So as I said one year later, I think we've got you know a good number of those elements now starting to be covered and and What's next well what's next is coming and continuing to develop the infrastructure continue to refine the specification and gain more tools online Because when you're dealing with you know packages that have 30,000 files in them This some of the stuff is a little bit prohibitive by hand and We're also looking at to test drive this specification and tools with some friendly beta sites and Establish these frameworks for change the specification license in the future. So that's what's keeping us amused In terms of the infrastructure we do have a website up now it came up last June July yes, and we've the there's a wiki up there for collaboration and We've also basically broken it down into teams and so you can go and navigate there There are open mail lists, which is one of the things we did not have initially And archives for them of our existing are available. So if you want to follow the threads of the discussions and Pick up. I don't mean to subscribe to any of them There's also bug tracking now available as of about a month ago We've got bugzilla going so thanks you to the Linux Foundation for providing a bugzilla capacity for us as well as a Git capacity for the tooling so The bugs in the spec are starting to get documented. Let's put it that way and and the pretty print tool and translation tools up as well as a The areas that we can track bugs against right now are the website itself Specification the tools exist the license list and then documentation. So we have a way of at least Starting to track and keep ourselves honest Going forward and then revision control finding the source code is good and Then in terms of the tools needed for this well you've got create a file read a file and validate it The three main things There's some open source tools that we're starting to work on and then there's also the commercial tools Key vendors here are Black duck open logic Palamita and they're all participating in this to make sure whatever the tooling they're coming up with We'll be talking to this effectively You'll see their names fairly prominently participating here. So there's some support going in for tool creation We're trying to get grammar in the grammar syntax the flex buys and files for the spdx file itself So that'll help with the tools that want to parse them. There's the RDF ontology As well as some online audio follow-up data already exist and we've been passing through our ontologies through them to make sure Examples we've been coming up with are looking good and are actually coherent and then there's also right now There's open source license and copyright recognition tools that exist today from phasology and from NINCA So those are the ones In terms of the tools that we're looking at again, these are people that basically stood up and volunteers as I want to do this tool And so the pretty printer is Gary and the translator is Gary O'Neill has been working on taking the lead on that and Would welcome any other one else to go to collaborate with him and contribute so This will hopefully help we'll be using these for the betas and working from there and the spreadsheet translator this came about because A lot of companies We're basically tracking this information and spreadsheets and this is how people understand it Okay, I'm seeing naughty kids here. So moving back and forth between RDF and spreadsheets should be quite possible and so That was one of the things that was definitely request that came in and then if you want to contribute Again the tech group is where to go and feel free to look at the sources So all this is in prep for getting business users to use it because if we don't get it adopted and widespread System, it's not going to effectively be flying So we're trying to minimize getting 1.0 wrong by actually working with some real customers up front and There are several organizations participating in this that are willing to sign up and do it so we've got We've got I think five or six right now and we're trying to get about 10 to 10 and then at least with some pairs And the other thing is we'd also like to try to do is get some of the open source Upstream projects and make sure that this is suitable for them as well so that's where we're targeting to get some beta work activity and Putting together the pieces to help people and so if anyone's interested in that the mail list is the biz group they're the ones active right now and The wiki information that we're doing for the beta program is sort of sitting up on sitting on the coincide So why we're doing it obviously from one company to another Translating it making sure that one can read and produce the other writes it and it's really going to be useful for what they're doing and These are the elements that we're looking at for the program itself Obviously the license the second license day are one part But when you start to dig into what's actually going to make it real in the field There's a lot of things we've got to get going We've got to get a variety of training materials and education materials available to educate people about this How do you use and so forth because this will be pretty much be used by supply chain And so incoming and outgoing people are basically bring software in putting software out putting out the compliance putting out the province Okay, how do you use this information? How do you basically push it? evangelism and outreach making people aware of it exists and You know, I suspect that some of our beta programs are going to be asking for their downstream to give it to them That's part of their contracts in future and things like that So there'll be adoption from multiple different directions We have to deal with the translation and localization issues There really isn't anyone basically taking lead, but eventually that will have to get resolved as well as obviously the tool it gained the tooling available which you've seen in the starting of it and The processes ongoing processes defined for interacting and involving this and then there's all the planning and coordination of the blade Programs as well as okay. Do we have someone to answer the you know, so we have someone to answer questions if someone asks a question What's the support channel, you know? Is someone asking questions is someone hand-holding things like that and then there is resources for that so For a beta program as well as a wider program. These are some of the elements where the business group is basically looking to try to put in place So this can be successful as part of a Because the feeling is in certain areas. We really need to try to get as close to right as possible At the first time again when it's finally done So in terms of a timeline Beta testing is probably going to be happening between April and June this year Then we will probably try to get that one point of spec finalized for Linux con So we can serve one year later Declare it done. So that's what we're sort of marching to you right now and The next steps here at the starting the beta program Finalizing that 1.0 version. This is what's going on for the next couple months as well as doing some of the brainstorming on the Things that questions that come up they get tracked that we can't answer right now Well, those go into the bug repository, so we don't lose them for 2.0 And then we'll see where we go so I just wanted to say thank you to all the people who've been following this and contributing to mail lists and phone calls and the office and Probably discussions they really From where the disc the proposal was a year ago till now. It's quite changing. You know, it really has improved a lot and The various working group people So if you want to learn more there's a couple sites places to see and If you're interested in specific areas, I'd encourage you to join any of the mail lists Or else contact one of the leads directly and so That I guess Is there any questions? Hi, I'm from Qualcomm Qualcomm open source portal. I see you've worked with a couple of my colleagues there for my group One thing that I know from my experience at Qualcomm is so so this sort of information is carted around in spreadsheets at the moment but if you if you go to where the decisions are finally made regarding the Whether or not to to use this particular software package or or you know when this business decision is made about about How we're going to ship this or whether we're going to ship this or whether this is acceptable At that point in time, no one looks at this detailed information. The only people that look at that are lawyers really and you know the so so my question is is whether You intend to is to look at sort of a diagrammatic representations in the future because all I see in these sort of meetings are you know simplified diagrammatic representations of The convey both the licensing and how the software is constructed I actually interested in getting with you offline and see what's your diagrams or look like in terms of just a sketch for it Yeah Realistically, this is to make the information for those who are people who are producing the diagrams easier to find It's and it is not meant to replace The risk assessments that have to happen at a business level and the advice the lawyers are given It's just basically to make the information there and easier for the lawyer So you have less work to pull the information together because I spent many many many nights and weekends going through files trying to find out things and The idea that I was basically looking the same package that my colleague at Wind River was looking at that another colleague at Montevista was looking at and we were doing the same due diligence to basically try to give the information Can we ship this or not? was waste and so that Trying to get the stuff summarized Then let's say the right level of analysis be focused down and the diagrams as you say being decision being made and decisions made so you Do some questions do you see the need for first standardization of how that the information is summarized and presented at a higher level or That's definitely like 2.0. Maybe I Just want to get you might get the foundations there and solid Can I just want to clarify the intent I guess in one way With the spdx file format essentially a is that is there a suggestion that in an ideal world that You know if you got a source forage and look at You know open source software there You would have the software there and an associated spdx file that we sort of ideally we'd like to be able to we'd like to have that option available Okay Having that option there. I think it's going to be necessary for the long term success is such that when you basically produce the file You produce an X like your package you produce the spdx Debian's going there right now with depth 5 and That standard is there for producing and then you were hearing what they were doing with the Android Which is again sort of similar the trouble is that all these things don't interoperate and you have different levels of trust And so that's why we're but that's why a lot of the discussion and debate has been going on How do we make sure that we have trust that what's there is not changed out from under you and that's one of the new dimensions? so Towards the end that you had something about essentially advertising evangelize, you know where you're heading So is there also talk going on with? the source forage If I'm looking for like Sam started to talk to the Debian folk as there's upstream projects that are interested in participating or talking to me I'm happy. I'd love to talk On the source forage side as well as others It's just a question of bandwidth and reaching out because I'm doing this as an open source project for myself as well just because it seems like My life will become easier in the future as well as I Mean to be other people's will I've been engaged in this now Okay, well, thank you