 So we are right about to start with the last talk of the session. Please take your seats and The talk is about for Sology project Given by Michal Jager and Maximilian Huber. Let's welcome the speakers Thank you Actually the talk at 540 already Say thank you to the persons who still left So I'm wondering what to say for the persons who stay until 640. Thank you very much. Anyway So we are Maximilian and me Michael We both work on the open source project for Sology and we would like just to introduce it today very briefly in 15 minutes We are very happy to have the opportunity and Let me start so for Sology is about this notice file thing I guess you have seen maybe an open source projects already examples about notice files one more popular notice file appearance are the Legal parts and smart phones if you are interested in this like about section and then you see some legal section And then you see a lot of licenses and these are actually the licenses of the involved open source software and The basic problem is if you have larger software today, obviously you use a lot of open source software and if you would like all Would like to print all the different licenses then you need to collect collect them and Probably you would like to have some software for that and this is what for Sology is about So it's basically about Licenses if you are into the open source software licensing There are two areas where you would like to deal with scanners one scanner finds open source software that has been copied maybe to private proprietary products and The other thing is just finding the license statements in open source of here for Sology is about finding license Statements in open source of there and why is that so difficult because open source of there today is made of other open source software So if you have an open source package It's natural that you find pieces of other packages in that open source package and as such because different packages have different Licenses you will find also different licensing conditions for different parts of your open source software That's why you would like to have software scanning it. So that's one problem. You have I have here a screenshot Probably you can't even read it It's about Apache thrift. You would think that this is an Apache foundation project Usually they are well curated about licensing But this one has 20 different licensing statements for different parts inside the software because Apache thrift is about remote procedure Invocation on different programming languages and that involves different technology stacks. So you have different licenses. So that's one problem and Maybe you would say okay So what so you collect all the different licenses write some regular expression scanning thing and then you get all the licenses of your open source software involved The next problem is that licensing statements are usually not so easy to understand So I have a couple of examples that should Kind of explain what the real problem with finding out proper licensing is here We have a licensing from out of the wild from true crude software. It's actually very good software I've used it myself no offense against true crude But the licensing here is very confusing because they're saying like oh the software is actually based on some other software But we license our software on true-crypt licensing and and then you find out okay But what was the license of the other software by the way and you start researching on this? another another example is that some Persons just like software for a right software for fun and then they do licensing statements which are also fun Right, so this guy and that's also a real example from out in the wild It's it's from the ZLIP library, which is used at many many places for just doing ZLIP compression and this guy says like if you're a fan of Shigewara, don't use my software right and You can do that But the point is this statement is really hard to find by scanners, right? so you need to look at the statement maybe if you have real commercial software you want to sell in 100 thousands or So do you ask you lawyer if you are an open-source project you say like oh, that's fun Let's make another fun out of it and you start another funny license like chocolate license or be a very license or whatever But if you want to cope with it somehow you come across this one. There are two other examples which make it more complicated Here you have a common license text It's called BSD license if you are into licensing and then they have added to the license text another paragraph saying Oh, if you don't all licensing you can also go for GP a licensing This is because you want to be compatible maybe with GP a licensing and so on But the fact is this is a very custom statement about licensing and it's really tough to write scanners to Automatically identify these and interpret on these and then there are even nasty more nasty things. This is a common license text here It's also an example from open source software, which is on github I didn't put the source here because I don't want to blame the vendor here But this is about an MIT licensing. It says you can do basically with the software, whatever you want just tell who those who wrote the software and And then they have added to a standardized license statement Special sentence saying like yeah, but you can only use the software on our hardware, right? Or you can only use that software if you communicate with our hardware and that is really a huge limitation And you don't get it if you just scan for licensing statements because as a stupid scanner would Probably find this licensing and that's where Phosology kicks in Phosology is not only scanning because frankly today. There are a lot of open source scanners today There are fast scanners. There are scanners on Titan. There are scanners and Java You will find a lot of open source projects doing scans and open source software, but phosology is about scanning and review So that's a major difference to other projects It's not only about scanning, but it's also if you have these Ambitious cases you have in UI and can review what licensing you will you find in the software? So now I'm heading over to Maximilian so I'm This is working here. I will tell some core facts about Phosology It's open source. That's great. It's in fact even a Linux foundation collaboration project. So it's The Linux foundation supports the project and Yeah, it's gpl v2 licensed so it will stay open source and It's a Linux application which is mostly written in cmphp. It has around 100,000 lines of C and 100,000 lines of PHP the Graphical web front-end runs on Apache and There's also command line interface which can be used if one wants to have automated scans or Want to get the information into Some other systems It can be accessed via via the command line The back end of the of Phosology is also very interesting. It contains a scheduler which Which runs agents and agents in parallel and allows to so if you have a huge chunk of code you can upload it and say I want These three scanners to be run on this upload and then you can come back later once they are done And we have Postgres SQL as database and we have scripting for doc and vagrant iPhone wants to to just try it out or to deploy that makes it easier So That are just some facts Now I will try to actually show the application Yeah, it's working and is it readable or is it too small? so Actually one would need a larger Resolution for that, but I will try to work with that So I have just uploaded two random packages time and set up and Now I can For example open the time library and as shown previously one can at one look see which statements might be contained Which or which licenses might be contained in there in the source code and With this information one can now Try to find interesting corporate statements for example here. There is Some MIT license and that is found correctly and Let me just switch to the more interesting set lip Example so here we see we have many set lip possibilities That's one of the features of physiology that for such he already knows set lip or the the special parts of set lip and That the special Thing in set lip is that it says if you want to know what the license situation is look in the set lip dot h file and And Since Just to demonstrate what for surgery can I can now take this sentence and say hey The sentence is used often here, and I know what I'm doing. So every time I see that sentence in a file I Will say the set lip possibility was right, but I know it even better. I can say it's real set lip and Then I can say do that and look at each file and and What why is there a patch it to do I think I have clicked something wrong You can correct it with the same method, right? Yeah. Yeah That's interesting just I think I can remove Licenses here for the whole To demonstrate another feature I can say I know that in the whole project there is no apache 2.0 and Then it's gone. So and With this workflow one would work through the whole package until all things are green on the right side and after that one is sure that all possible Lightman's license statements were controlled and were cleared and There are also further There's also a UI to handle copyright statements and to clean them up since they often contain some rubbish which can be removed and Also other features. So that's for solitude and Then another feature of for solitude Now one has the clearing one has all the information in the system and Then there are how to tell others about these these the gained information and For that for so it she supports spdx, which is a format I think it's created by a group from by the Linux Foundation is the right and It's also supported by many other tools So you can export the license information you can even process it with your own tools which might understand the XML RDF standard and the Just to be fast the next big step is Company a has generated such a next spdx file and it hands it to you and you want to know whether it's right or wrong and for that you can also use for solitude so you would Just take the provided spdx file and upload it into the system then you would Then you would land go to this exactly the same UI as here, but now you would see only the provided license information and you can use the same UI to verify the the spdx file provided to you and Yeah, that was just another feature of for surgery. It's it's really huge and we have many many features and it makes fun to explore them all and Thanks, I think Yeah, and yeah, thanks You can try it If you want by this simple line of talker you get pre-built image of the latest master or you can just visit one of the URLs to Get further information and Now I think we might have if there are Yeah Yeah So the question was if I have an analysis and looked at one file or one upload and then I'm Uploading a newer version of the same package So for solitude can do a couple of things to reuse information from the existing upload first of all It stores every file only a single time on the file system by hashes So if the same file is part of a different upload it takes the hash and takes over information Which I have originally assigned to a file with a certain hash So if that fire has been already on a server being analyzed It's it appears in the same form at the newer version that will be taken over but also the the custom formulations and that that was probably also the One important thing here with this presentation license statements are normally written in like just prosaic language, right and As such if you have an open source projects You find these prosaic statements and you would like to tell the system if that prosaic statement is actually found in a file then it's gpl and that can you that is what you can do with this interface of for solitude and You can reapply these rules like identified text phrases hinting to a license to a new upload So you do not need to make the work again So that's why is for solitude is also a server based application with multi user even multi groups and so on That one big server Just collects your analysis and if you upload newer versions of that you really need to look into new licensing statements That are found in your versions of open source packages I'm okay to have two more questions. I'm But maybe just one quick interesting True Yeah, frankly, you know MPM is not really the domain or the problem domain because many packages have clear licensing in the beginning It's really like the classic UNIX pack a Linux or UNIX packages that you find in this embedded systems programming also, for example, if you are a fan of spring framework and They are very well licensed very cleanly licensed so for the clean licensing thing for solitude isn't fun, right? You're just uploading and you see always the same license and then you say like, okay What what do I need to do here? But it's really for the the classic C library that has been grown over many years And you find ten different licensing statements and you would like to get a clean impression about the licensing situation So NPM. I think it's also obvious half page of JavaScript code one liner of licensing in the beginning So Scott is not my opinion No, we don't but For solitude actually if you look for licensing statements, you can do many things you can search for keywords So you will find a lot of licensing statements but you don't know which licensing or which license it is and In or if you review these you need to determine which license it is So for solitude can do both you have keywords that hint at license relevant text statements You have full license texts like 400 full license text and you can do text matching and it shows to you 100% matched so you know that the license is actually in the open source package code base and it does also some Sentence characteristic sentence matching in the middle which is not really like giving you the security or certainty that the certain Text is actually there, but it hints very precisely at that this licensing is actually found so it has keywords regular expressions covering a couple of words and Full license text matching And I think we have to close we will stay here in the middle in the front since this is the last talk and we'll answer To close the room We won't stay in the room, but we will answer questions Is there anywhere people can find you either today or tomorrow if you use Phosology and some of them seem to have used it as I see you can have stickers we have stickers now and Please talk to me if you want to have a sticker