 Matt Tergert will be here presenting a talk about Phosology and Debian. Matt Tergert is a developer, an upstream developer of Phosology and also the package maintainer in Debian. And he works as assistant administrator in RiseUp, doesn't it? Please welcome Matt Tergert. Can you hear me yet? There we go, great. First I would like a volunteer to someone please give me the name of a small package that you maintain. We're going to use it for a demonstration in a minute. What's the name? Okay, it doesn't have too many directories or files. It's not very big. Okay. I think this will be okay. So bear with me a minute here. I'm just going to get the source and upload it. Okay, sorry for the delay. Now that's processing and we'll take a look at it in a few minutes. As was said, my name is Matt Tergert. I work for RiseUp Networks and I also work on the upstream Phosology project and then the Phosology maintainer. So about 10 years ago, HP had a problem. They had teams of engineers that were working on building devices and they wanted to use Phos software to be able to derive those devices. They also had people working on software products that wanted to use Phos libraries. And also lots of people including a lot of us in this room were asking them to ship Linux distributions because we wanted to be able to run Linux on their hardware. However, they had a problem. Sorry, technical difficulties here. I have to get my notes. At the time, HP was a $40 billion company and when you're that large, you have special problems that other people don't have. When you're that large, you're quite a very large target for lawsuits. The product teams at HP wanted to ship Phos software, but HP's lawyers were really afraid of shipping anything that might cause them liability. So finally, there was lots of debate among the engineers and the product teams wanting to ship things. The lawyers agreed if the software that was being shipped could be scanned in some way and they could assess the risks of what risks this was going to present for the company and then they could sign off on those risks. They would allow it. So there was work started on a scanner to begin scanning for risks. Okay, so how do you determine the risks? HP was actually okay with shipping copy left, you know, GPL type code. They just wanted to know exactly when they were doing it and make sure that they were ready to fulfill the source obligations and other aspects of licenses. What they didn't want was to accidentally ship some GPL code and have that come out and, you know, it would be a PR disaster, but then also they may have to begin shipping source or something they didn't want to. So in order to help determine these things, work was started on a tool that would automate the process of scanning software to look for these problems. The tool would look through the software, report things that the lawyers didn't like and flag them for the lawyers to look at. Now depending on the product that they were talking about within HP, they might be scanning source code that was used to generate a firmware image that was going to go on some hardware device or they might be scanning a package piece of VOS software that was a library they were going to use in some application. Or the biggest thing is they might be scanning an ISO image of a Linux distribution that they intended to ship on their hardware. This is the ISO images of distributions that really scared the lawyers the most because there's so many packages they were really worried that somewhere hidden down in the inside of this Linux distribution was going to be some license that told them they had to give away HP proprietary technology or something like that. So they were terrified of that idea. Okay, so the first thing the tool had to be able to do was recursively drill down through whatever it was given and pick it apart at the lowest level. In the case of a Linux distribution you normally start with an ISO image that you would be giving to someone and then you start unpacking that and inside that you have a whole directory hierarchy and you have files that are compressed and zipped and tarred up and B zipped and 7 zipped and every possible kind of format you can think of. This tool needs to be able to find these things. We refer to them as containers, you know, things like tar balls and when it encounters them open them up, look inside them and keep going recursively all the way down the stack until they expanded as much as they could. So the scanning evolved over time based on the requirements from the lawyers. They would have things that they knew about that they wanted to make sure that the company wouldn't do. And so gradually it grew from the ground up as a bunch of different regular expressions for things that they wanted to look for, particular licenses, copyright, declarations, other things like that. So it would scan through everything and at the end it would produce a PDF document that could be given to the lawyers and then they could review any problems that it found. So this fulfilled what the lawyers wanted which was to be able to review everything and so HP using this process began to ship for software. But now there was another problem. It was kind of a mess. When adding new things to look for and adding support for new file types, that sort of thing, first of all you had to rerun the entire analysis to get a new report because it was pretty stupid about how it just scanned through things and did it report as it went. And the code was really just this huge monolithic mess that was primarily written by one person and only that person knew how to understand it. So it had worked well and it let HP ship file software for quite a while but it was time to redesign. So work was started on a new design. The various different functions that existed in the previous tool were split into different components. Before the unpacker used to unpack things on every run. Instead the decision was made. It would unpack things and put them into a file repository. The things that did the scanning would be split out into a pluggable architecture and we refer to those as agents and then you can plug in whatever kind of scans you want. It's very modular. The agents after they scan the files produce results that they put into a database. There's a scheduler that kind of sits in the middle of everything and controls how everything runs and then there's a couple different user interfaces. Primarily a web interface is the main way you interact with it but there's also some command line tools to help you automate doing uploads. And some examples of agents. I mentioned the primary reason the tool was written originally was to fulfill the legal obligations. And so a couple of the agents there's a license scanning agent. There's actually several different types of license scanning agents now. The scan and look for license text and boiler plates and source files and that sort of thing. There's also a separate agent that looks explicitly for copyright declarations. Thank you. Another interesting one that we have is a metadata agent that knows about metadata contained in files and so when it encounters files it knows about like for example JPEGs have metadata tags and then it can scan those and make those available in the database which is kind of interesting because you know sometimes somebody might put some text in there like this JPEG is copyright so and so and the license under these terms. That's something that the software agents would like to be able to determine. So I'd like to talk in particular about this unpacking thing we talked about a minute ago and in particular talking about unpacking Debian software packages. We're all pretty familiar with the basic upstream software package and of course there's new source package formats but so originally when the unpacker tool was set up if it encountered a Debian source package within an ISO or within a directory that you had uploaded it would run the scan on the files in the source package and that's good. It would encounter the RIDG-TAR-GZ and it would uncompress and it would untar it and it would go through all that and it would run its scans on there. It would also encounter the DiffGZ and unzip it and look at that. But one thing to think about is suppose you have somewhere in the RIDG-TAR-Z you have a license declaration that says this is GPL v2 and say in the diff you have a patch that applies to that that says oh yeah we're going to adjust that and say no it's not GPL v2 it's GPL v2 or later. Or we're going to put the word not somewhere in the middle of the license that's going to totally negate what it means or we're going to change copyright dates or in terms of other agents that you might do scanning with you might patch C files and say you have an agent that's scanning for a particular software vulnerability and now in the Debian Diff you just added interval vulnerability before it wouldn't find that. So just recently the Unpacker was extended to know about Debian source packages and can treat them as a container and so when it encounters a DSC file it looks to make sure the other things are there does the unpack of the source and then it scans the unpacked Debian source package which is pretty nice. So I mentioned I'm the Phosology Maintainer in Debian and it's in unstable right now I think one most recent uploads is about to move into testing. If you want to play with Phosology it's just a matter of app get install Phosology once it's installed it sets up most things on its own. The only thing it doesn't do is you need to set up the Apache configuration there's examples in the package but really that's kind of a site specific thing so we leave that to the user to set up. Okay so here's this picture again of the new structure. One of the neat things about the way Phosology is structured when this rewrite was done was we knew that potentially in scanning large amounts of software and having things that are very computation intensive that you might actually want to run some of these things on separate machines and so everything was designed from the ground up to allow these things to be on separate machines and likewise when I packaged the Debian package I built it such that individual components could go on separate machines and that would work just fine so if you app get install Phosology that's actually a meta package that pulls on everything you need to make it run on one system but if you want and you have the need to scan lots of software or run lots of computational intensive agents you can do things like have a whole cluster of machines running as agents one separate database machine, have a separate machine running say the scheduler and the user interface. The other nice thing is that the way everything communicates is over SSH and just uses the normal file system for the repository so for example we use NFS for the file repository. So if you do want to set something up in a cluster situation in the readme.debian file there's instructions in there about how to go about doing that. So I mentioned some of the existing agents, the license agent, the copyright agent that sort of thing but now that this is written as a generic framework pretty much anything that you could use to scan software you can write an agent for that would plug into Phosology and so we've had a lot of ideas of other things that we could do. Some of the biggest users of Phosology so far are people that are producing firmware for embedded devices and they're using Linux and they need to make sure that they know about the licenses so they were, you know, most people are attracted because of the license scanning right now but really this could be used for doing any sort of software analysis and some of the things that we thought of in particular would also be interesting to the same people that are doing embedded systems one that seems to pop up in the news from time to time as you often hear about people shipping some sort of device that makes it out the door with a virus on it so you have this new embedded device and people plug it into their home computer or, you know, whatever it is and it ends up infecting things. There's no reason you couldn't take, you know, say the clam antivirus tool and write an agent that went around it and then before people ship software as part of their quality assurance process they could run Phosology on it with all about licenses, we could do scans like search for viruses search for, I guess you could run spam assassin and search for spams if there's any spam in their source code or something like that but likewise you could run all sorts of software analysis tools, you could run, you know, C tools like Lint or compiler preprocessors that look for compiler warnings or all sorts of things and there's no reason you couldn't scan binaries as well I mean we can upload things to Phosology and generally we're wanting to upload source code but it doesn't have to be source code you could upload binaries and if you had something that was able to analyze binaries and look for particular sets of instructions I guess that you wouldn't want I could flag those as well so one of the things I want to talk about in this talk is how we can use Phosology and Debian in particular and I've thought of, you know, we talked a little bit about how Phosology is able to unpack Debian's source packages now which is nice but we could probably write all sorts of agents that would do Debian specific kind of scanning the first couple that I thought of is we could write a Lintian agent that when people upload things if you encountered a Debian package you say great send it off to Lintian, get the results back, put them in the database make those results available through the user interface we could also do PU parts, there's a bunch of other things right now I have been scanning squeeze source DVDs, currently the CD image team produces weekly updates of squeeze DVDs and I've been taking those and uploading them to Phosology and doing the scanning and making the results available and I'll show you that in a few minutes here but potentially there's lots of other things we could do, one thing that might be kind of interesting the FTP master team is eventually if we get this automated well enough it would be kind of nice if when things get stuck in the new queue they could automatically be uploaded to Phosology and then the FTP master team would be able to go and scan the results and look and see if there's any license problems or weird copyright issues or any other kind of quality issues that whatever agents we have installed determine the other thing that would be kind of nice to be able to do in addition to just doing it for new packages is we should be able to audit the entire archive for particular licenses and copyrights that we're interested in currently we have the copyright file on Debian packages that is kind of a pseudo standard format where you indicate copyrights and licenses I think there's been some people talking about making that a more standard format that would be machine readable, eventually we could do something like have Phosology do a scan, compare with what's already listed in the source package and report any differences and so the maintainer may have put stuff in a copyright file that says here's what licenses this package is under but if Phosology finds something hidden in one of the files way down it would be nice to know about that I also wanted to comment that there's a couple other projects within Debian that are doing very Phosology like things like what's it called UDD, ultimate Debian database or something like that and trying to integrate those things and Phosology isn't necessarily a replacement for that it could probably eventually be made to do things but it's designed at kind of a higher level it's not Debian specific so I think some of those other projects are nice in that because they're Debian specific they can present information in a more Debian specific way so just about two weeks ago I set up Phosology.Debian.Net and this is a Phosology install that we can use to analyze things and previous to this I had been running some of this analysis on one of the upstream installs but this is one that specifically that we can use for Debian and I'd like to eventually automate and have things get scanned there on a regular basis in a minute here I'll take a look at that I can find my web browser hopefully our analysis from earlier has completed okay so this is the Phosology user interface and this is what it looks like when you're logged in this is kind of the main landing page if you want to upload files I mentioned there's a command line user interface but you can also go to this upload directory and you can upload it via your web browser you can give it a URL that it can pull it from or you can give it a local path on the server that it can grab things from which is pretty nice when you're doing DVD sized ISOs as you don't want to try and upload those via your web browser after you've done the upload there is a job queue let's go see let's go see if our job that we uploaded earlier is there time out so here's a job that I ran earlier we'll get the whole I bet you our job earlier already finished this is a Debian DVD that's still running there it is okay so a lot of second I'll go back and look it looks like everything finished on that job so now we can click on it it'll take us to the browse section and we see we have this directory and I uploaded the source package and here's the the different files so now within the UI you can drill down within these files and start looking for things by default you're just looking at the directory view up here you can turn on different features depending on what you want to be looking for so what we'll go ahead and do is we'll click on there and we're in there and we'll turn on the license scan there's two currently two license scanning agents one's called NOMOS the other is called Bsam was the original one and it's very thorough but it's kind of slow it's based on some algorithms that were used for matching DNA sequence sequences and I think it's a some algorithm from the 80s or something and people have figured out some faster things since then but this is the one that we had up and was the primary one up until the most recent release okay so we clicked on that and now what it's doing is it's showing us okay in this directory I found these are all the different licenses that I found and the count of what I found and then we can as we drill down through the through the directories it will start you know showing us fewer amounts here we can also do things like for example let's do GPIL of exception we'll click on show there that will give us a list of all the licenses of that category and what files they're in we can also say see if I remember how to do this right if you click on the license name it will highlight it in the category over here in which files it's in we can also click on view to view the file and now we have the file down below us and you'll see that it's highlighted here with the bits that it found so there's query and FSF license there there's a GPL down there so that's the BSAM scanning NOMOS is very similar we'll do copyright as well here's a copyright scan the copyright scan there's a disclaimer here right now the person that wrote this agent put this in here but basically said there's a lot of false positives right now but it was determined still to be useful enough to be included you know so it looks for these sorts of strings and some in particular that it knows might show up that have a certain set of dates that you might be interested in and it's very similar to the other one you can click on it it will take you to the file and show you where that was one other thing I hadn't had a chance to talk about yet is this concept of a bucket browser that's a new feature that was added in the most recent release so we have a couple license scanners and they know about lots of different licenses so what the bucket browser allows you to do is it allows you to sort those licenses into categories that you care about and so for example we could make a OK for Debian category and a not OK for Debian category and we could put the licenses into that and then that makes it really easy if we upload a Debian source DVD and run the scan we can just say show me everything that's not OK for Debian and it will highlight and pop everything out and show us if there's anything in there I would like to do this as far as I know there doesn't exist a canonical list of which licenses are OK and which licenses aren't there's something in the wiki but it was written by a developer not by FTP master I think FTP master probably has their own list of requirements somewhere but I don't know where it is the Fedora project has done a really good job on their wiki documenting the licenses and the ones that are acceptable and the ones that they don't and so I think we have a set of bucket lists for Fedora and so it's real easy not to look at Fedora and be able to click on something there's a developer in Fedora, Tom Calloway I think that's done a really good job on that because they were doing some auditing a while back where they were auditing for GPL v3 also I think they made the decision not to include any proprietary codecs they have done a good job of figuring this out and I'd like to start pushing this in Debian trying to get the Debian legal team, FTP master everybody else who cares together and come up with some canonical lists of these are things we think OK and these are things that we don't and they don't have to be definitive we can just say these are the things we're sure are OK and these are the things we're sure or not they don't have to list everything but it would at least be a good start I want to go back to the license scan here and go up to a slightly higher level so a minute ago I talked about unpacking Debian source packages you can see we have three different files that are in the source package but it was also smart enough to say oh yeah this is a source package unpack it and then you can drill down and look at that view as well so just see if we can embarrass Eric here I think all these licenses look OK I don't see anything that pops out as being particularly bad for Debian it's hoping we'd find a well actually I wasn't hoping but I was worried we were going to find some nasty license in there so anyway that's a short demo of how it looks I'm welcome to I'd be happy to go through and show you other stuff in here if you have other questions go ahead Bido do we have a mic? the first thing is I wanted to comment that while new processing is sort of an obvious place to use this the place I've always been more concerned about are just sort of random uploads of packages that have already been accepted into the archive where upstream makes some license revision one of the ones the lawyers really want to understand they're not afraid of it they just want to understand it is when a GPL version changes whether that's two to two or any later to three or any later and that sort of thing so it seems to me that if we get an instance of this that's running well enough plugging it in so that it actually watches every source package upload would be in some way reports anything that looks like it's changed on that upload would be really interesting the second thing I wanted to comment on is that I think the easiest way to get to a bucketized list would be to start by taking a scan of everything that's in Maine that's in non-free and using the licenses found to seed the two and my suspicion is that in the process anything that comes up in the Maine scan that somebody freaks out about will be easy to notice and handle and that actually led me to the last thing I wanted to ask you about I know that when we first agreed to open source this code base out of HP and seed the community there was intense concern about what happens if we start just publishing the results of scanning major released commercial Linux distributions and stuff like that from who's going to be embarrassed at what they weren't paying attention to standpoint do you think enough times gone by enough things have gotten addressed already and so forth that major embarrassments are not likely to happen here or do you have any sense from the scans you've been doing of squeeze DVDs of it how we stand with Devian just a few words on that would be interesting. I mentioned that because Fedora already had good lists it was pretty easy to scan and look and see if they were compliant with their own restrictions and I think when we did that we didn't find anything major popping out but one thing that's kind of interesting is that for the public repositories that we have set up to do the sort of thing we've only been doing things that we can freely distribute and so that means you know Devian and Fedora sort of thing but it doesn't mean Red Hat Enterprise Linux or doesn't mean some of these other things because one thing that's kind of interesting is that you know within the UI here you can click on things and download them and so in a lot of ways I mean it would take you forever but you could get all the software you know via this web interface and so the decision was made that yeah we can't really scan Red Hat Enterprise Linux although I guess presumably you could do CentOS and that would be kind of the same thing and that might be kind of interesting we haven't done a scan on CentOS but that would be nice to do yeah obviously I'm not interested in embarrassing anybody I didn't mean to point to any particular company or anything like that it's just that I remember there was a lot of discussion about you want a tool like this to be perceived as adding value to the community's processes and not sort of destroying the community in the process yeah not giving lawyers just ammunition to go after them yeah you know I obviously you know I had some small role in helping to get people convinced that you know this is something A that we should do and B that we should open source so I'm really tickled to see you talking about it here and I hope folks in Debian will take advantage of this you know our FTP masters already do all of this work they just do it by hand where they've written some other little tools to help and I think the same thing is true with things like the Lintian Lab you know a lot of people have done a lot of things to write code to iterate over source packages and all if this ends up being a good infrastructure for making those things easier to maintain and use going forward that's cool so now you are doing these analysis on the complete Debian distribution with I don't know 15,000 source packages so how do you actually proceed from all these very detailed listings to find out what are the problematic cases that's the first question excuse me and the second question is do you actually plan to include some logic in this which for instance would find out which license is compatible which other license because for instance on this approach I see many different license so maybe you would like to find out one when they are not compatible yeah definitely that would be a good idea and I think that's been talked about I think it's on the list of desirable things to be able to do because obviously you have a lot of licenses that aren't GPL compatible and it would, what you need I think is a concept of at what granularity is a conflict bad you know when they are in the same directory I guess you could do it at the directory level and see like okay below this directory we know we have two things and one of them is GPL and one of them is non-GPL compatible and have it flag that that would be cool what was your first question again how do you actually do data mining on all this data that you get say on 15,000 source packages to find the problematic cases? I think the bucket browser is one attempt at helping you to find those sorts of things right now a lot of it's very verbose and I think would require post-processing of some of the results if you're scanning something very large and you're trying to find some things and so some of the great here you can with that link there you can download the results dump them to a file and do some post-processing and that sort of thing the UI could be extended to do that it just hasn't been yet it's pretty crude and just letting you browse the results that are in there but the UI is written to be pluggable as well just like the agents are and so you can drop in an agent that does a particular kind of scanning and then you can also drop in some PHP files that then present the UI for your particular agent you know so eventually I would like to do you know if we did a lintian agent whenever the scheduler encountered devian packages it would fire off the lintian agent that would scan it put the results in the database and then somewhere in the UI here we would have something that was smart enough about lintian to let you browse the results one of the tricks about automating this and I think that BDale's idea of automatically analyzing the uploaded source packages is a great one I'll probably write in with continuous testing is that we're going to be making this transition from manual inspection to automatic inspection has a risk about the quality of this I mean obviously we could talk about the quality of manual inspection and the consistency of that but how can we measure the degree of trust that we can have in these tools yeah I don't know the answer to that I mean I think first off we need to look at it as this is a not a crutch this is a tool to help us do the stuff that we're doing by hand easier and help make our time more efficient but you know we're still going to be on the hook to make sure that that everything gets done properly and so yeah so I'm not sure how you avoid falling into that trap maybe a year or two ago that they were trying to get a machine readable copyright project going yeah that was the thing I mentioned earlier and I'm not sure what the status of that is and I've heard there's something going on even upstream as well to do I don't know if it's a machine readable copyright but it's more just like maybe a standard boilerplate for what you put on the top here C files that makes it a little more machine parsable I was just going to comment in response to Tom's query that I think one of the things that I find most interesting about this and it's actually one of the ways that I sit on HP's internal open source review and one of the ways this gets used a lot is we look at the deltas between previous and current runs so there's still this notion that at some point you have to decide that you've got a clean archive and you can use tools like this to help but the reality is I think if you asked our FTP masters they would feel pretty good about their current W&M main archive so the question then becomes there's always the possibility you discover something that got missed in the process of scanning a lot of it but the more interesting thing is what changed on this upload? Did the list of licenses found in the tree change and if so is that worth a quick look by a human right now today we do a lot of manual inspection of what we do and we expect package maintainers to not be idiots about things they upload and occasionally people go and do some try to rebuild, hold archive, they do various processing things and often these are done by people who are not core deviant developers but they're people outside the project studying the project and various people at different times look at it it's not like we have any systematic analysis of license changes or copyright reassertions or bug fixes in the text of a copyright from an upstream maintainer we know these things happen, we see them happening, those of us who maintain packages have had to deal with this stuff over and over and over again over the years and I would think anything that helps us observe deltas in the same way that Lintian for example helps us understand if some policy has changed and we haven't been paying attention and our package is no longer compliant this could help a lot with that sort of routine analysis and this actually reminds me of something I wanted to mention and that is when the change to the new architecture was made when the unpacker is going through things every time it encounters a file it puts it in the file repository and the way it does that is it uses a hashed directory scheme where it's the first bits of the hash are split into directories and then the file name is a combination it's a SHA-1 and MD-5 and a size or something to make sure that there's absolutely no possibility of collisions. Anytime it encounters a file it puts it in the repository with that name but then all the agent analysis is also driven off of that ID and so if an agent has already run on that ID if it encounters it multiple times in the scan that's just fine so what that means is that we can do something like upload this week, squeeze DVD, source 1 DVD it does all the analysis. When we upload next weeks as it's picking everything apart it only has to rerun the analysis on the files that have actually changed which is pretty cool and your comment about wanting to know about the deltas reminded me about that you know, Phosology kind of inherently already knows about when files change or when there are new files and so it should be possible in the UI to be able to say like oh hey look you know these files changed and here's the new results as opposed to all the ones you've already spent a lot of time going over and looking at. Right there's been some work funded by the EU looking at automatic assessment of quality in software and I was wondering if people have been thinking about plugging filters into quality and stuff. The other thing that occurs to me is that we had a talk about automatic textual analysis of mailing lists a couple of days ago from Hannah and that was really interesting and I was we've got a lot of data in the bug tracking system about how good or bad the code is. Yeah, someone just I was in a sort of Bayesian filtering between the uploads that introduced what was later found to be an RC bug and then work out sort of things that tend to introduce RC bugs and some of the time that's going to be just pointing at difficult code but some of the time it's going to be saying this particular structure in C is really dodgy. Yeah, I was in another session at the time but somebody else also already mentioned to me that about Hannah's talk about this could potentially be applied in Phosology as well. What was the last bit you just said because it reminded me of something? Matching RC bugs back to where they were introduced and therefore using the Bayesian analysis. Yeah, anything you could think of, any sort of scan you could think of that you could run to be able to do such things should be easy to wrap an agent around it and in some cases that's stuff at looking at changes over time as well so some of that would have to be at a deeper level. Any other questions? I should actually defer most of these questions to Daniel because he's the expert on this. Yeah, so something I would say is that the people who run the license identification might not be the people who are capable of understanding the licensing constraints. For example, when you show UCARP, there's actually some GPL with exceptions. Those are actually install SH if I think it's part of the autocom, auto make system. And get text. And essentially this system I don't even think that is GPL, it's just basically that it has some embedded components that they are copied by the configuration management system that do not impose conditions on this code. So I think it's actually very tricky and I think it might sound wrong but I think that this is actually to be done by somebody who really understand licensing. Because otherwise you can actually have this explosion of people worried about things that they should not worry or do the way around which is people that think that things are okay and they are actually wrong. Yeah, and the original tool that I mentioned that HP wrote as they encountered things that were giving those sorts of false positives they just coded in weird exceptions into the tool to tell it, oh yeah, I just ignore config sub and config guest because we know that's probably okay. But that's what I mean. They're not errors, they're not false positives, they're there. It's just that the conditions allow you to use them in such a way. So they shouldn't be ignored by the tool, they should be reported. But the problem is that the interconnections of the different files make them difficult from the licensing point of view. And I think that's actually why we shouldn't use the term false positive. It's just that it's a true positive but in the way that the system is built it is okay. Yeah, but at the same time usability wise if you want to make it easy for people to do things that we need some sort of system that allows you to have categories of things that, yeah, yeah, I know about that, that's fine. But I don't want to see it, I want to see the more important thing. That's actually way better than to embed into the tool saying I don't care to see this one. What you should be able to say is that some people might care to see this one. And I totally agree with the idea that the lawyers are the ones who actually want to say I want to be notified when something strange happens. And so my time is actually dedicated to that. And in many cases like config.h.n, those are files that you will always find with GPL version 2. But those will really not affect your code. As a full up to that question was that tool meant for lawyers to have a look at? I mean I'm just trying to understand the process that they would go through to approve something. Were they actually looking at physiology or were they relying on a scale developer going through using physiology and producing a report saying that it was okay? I can give a quick summary or be able to give more notes. But the way it worked was HP had an internal process they called the open source review board and that had, was a team of lawyers but also technical people that anytime anybody in the company wanted to do something. Yeah so very early the idea was that we needed to have a review board that represented all interest. So in addition to legal folks and technical people understood the software and HP's patent portfolio and all of that, there were also people representing open source community interest and people representing the business interest of the different HP businesses. So the idea was let's take a holistic view of each proposed interaction between open source software and some HP proprietary software content and ensure that the decisions we're making about what we contribute into the community, what we perhaps entangle ourselves in with licenses like the GPL and where we are willing to take things in and use them in products are all conscious decisions. And over time a set of heuristics sort of evolved about what makes a good decision and what should we use. And one of the things we realized is we were spending a lot of time staring at source code in these weekly review meetings and that was fine when we had one or two proposals a week but when we had 20 or 30 proposals a week that fell apart. So we started creating tools like this today. Lawyers don't tend on a daily basis to go look at these results themselves. There are people who are part of a team that prepares proposals for review in our weekly review meetings who use these tools to generate what amounts to a, it's kind of like I don't know what to call them, the folks who help lawyers do stuff. But anyway, it's like a paralegal kind of activity in that they are, there are people who are really experts in this kind of stuff who use this tool to help prepare an analysis. And still there are times when we look at something and go this is a little confusing. Let's go look at the actual source code. Let's go talk to the people who are actually doing the work and make sure. But in a typical week when we're reviewing somewhere between one and two dozen proposals all but one or two of them end up, we look at it and go yep that's easy, yep that's easy, okay that's cool, we can do that. Gee this one sounds like a great idea but our IP licensing folks need to review this and make sure they're comfortable with our giving away that piece of quote, intellectual property, unquote. And that's how the process works and no, average lawyers don't read this. We do have some non-average lawyers though. They're the ones I really like working with. I think Moritz had a question and that's probably about all we'll have time for. Well at first I left the initial question whether it's Phosology, Debbie and Ned Wyatt up to Debbie and Ella, can any DD lock into it? It's not yet, we just set it up a couple weeks ago and so far only I have an account but if people are interested, I think the first thing we need to do is sit down guidelines and structure about where people can do things but probably we'll create a user's directory and then each user can have an area where they upload and other things. I'm hoping to have a public part of the archive where we have all the snapshots of the distro and I guess we'll have, yeah we can talk about it but I think it's definitely possible. There's no way to tie it in right now to Debbie and her authentication system where we can just give everybody one. So right now it would be a special case kind of thing where it's like, yeah I'll create you an account. So if you want to do that, just let me know. And I think we're out of time. Thank you very much.