 Good morning, everyone. My name is Anupam Ghosh and I work with Siemens Technology in India and work as a team lead. I'm a maintainer and developer for Fosology. So today I'm going to talk about Fosology as an open source software for open source compliance. So the overview is I will give a brief introduction of Fosology. Why we need Fosology such a tool and what's the Fosology building blocks are, what all they are in Fosology and at the end we'll conclude it. So I think all of we are familiar with this screen which gives you a legal notice. What is this actually? So before you whenever you install a software you need to proceed to accept the legal notices. So all the software that are distributed it comes with certain set of licenses, copyright statement involved and disclaimer. All those things are an immediate part of a software in the licensing text. So we need to find the license text. We need to find the reference to the licenses. Written text it may be not exact text it may be a written text explaining the licensing and the license relevant text. Why you need to find this? That's important. So now it is we use lot of open source software and the problem is the license proliferation. What is this? So what we do exactly we use license software, we use licenses, we use software to build another software and the problem here is multiple software when you reuse those software are actually built with some other licenses. So we need to take care of the license compliance here. We need to find out the license has been used inside the software that we are using or reusing. This is an example of Aperture Thrift. Aperture Thrift is licensed under Aperture 2.0 license. So though Aperture is a very homogeneous license but if you scan this Aperture Thrift to the phosology then it will give you 25 odd licenses. So that means that you have another 25 licenses inside Aperture Thrift license. So what is phosology? Phosology is basically a web server for license and copyright compliance software component. So phosology was started in the year of 2008 it's licensed under GPL. It's a Linux foundation collaboration project. Web server based command line interface it provides you the scanning agent for license and copyright relevant statement. It's a multi user, multi tenant web UI. On the right hand side you will be able to see that the present different layer that phosology uses. It uses supposedly SQL as a database and we use cc++ php as the business to write our logic and on the UI part is built on php3 and jQuery. So this is workflow, how phosology works? Phosology workflow starts with when you upload a package into a phosology. What do you mean by package? Package are the software libraries. The open source libraries that you have when you upload it with a phosology then you can start the agents and when you start the agent you can choose it license, copyright, etc scanner all those things you can choose. Once the scan is completed it's ready for clearing. That means the clearing person can now review the content of the package and once the clearing process is done they can generate a report. There are various format of reports supported like Word, SPDX, I'll go through them one by one. Here is one more slow here. SW360 is a component management portal. From there also you can directly upload software packages to phosology. This is the very same flow that we saw in the last slide. You can upload our OSS package in phosology then it goes for a review process which is called clearing process and then you can generate a report based on what has been done to review what has been done on the clearing side. I have added couple of slides yesterday so it may not be in the actual slides but I'll upload this after the presentation. Phosology has two main agents. It's called Monk and Nomos. Nomos is mainly your keyword and it uses regix and Monk is basically more text matching kind of thing. When you go from Monk to Nomos it gives you the flexibility but the precision increases when it comes from Nomos to Monk. This is also a snapshot taken from the phosology UI page. So when you upload a package here Zlib has been uploaded in phosology. It scans the package. It has scanned all the relevant licensing information. If I have some time end of the session then I'll give you a demo so that you can see how the tool works. There is one more scanner in phosology that's called Bulk Scan. What it does? Basically I've talked about two licenses, Nomos and Monk. We have so many open source licenses available so it's not possible that a scanner may always hit the right license at the right time. There may be a possibility that there is a new license. You may have found a new license. So how to handle those situations? In phosology it gives you the bulk option where you scan, get the relevant text from the source file and mark as a bulk scanning agent, mark as a bulk scanning agent, choose a relevant license from the list and scan it for the whole package. So in bulk you can also add or remove licenses. So what are the new features in phosology? The SPDX import. So what happens? It may happen sometime that some packages have been reviewed or some packages have already been seen but you want the report for that library. So if someone comes and gives you a SPDX file, so the phosology gives you the option to upload the package into phosology and review that file. So once you upload that package and review you can... In this slide you can see the first option where you need to choose the package, where the package will be uploaded in the folder then the package. For example, you got a SPDX file for Zlib. Now you can select the Zlib package and then the SPDX file that you have got may probably in RDA format. You can upload it to the upload report section and then you can choose the bill options for the... The SPDX file may contain a new license. So you can choose them as a candidate license to be imported and then you can import the SPDX report, the new report against that package. Now once you have imported the report you would like to analyze the file, what has been done, what decisions have been made on that package. Now again after analysis you can actually generate a new report for that same package. So that can be compared and that means you can reuse using this SPDX report. So this is another feature. This is after the clearing process is complete, you can to review the process, you can also generate a Word document. Now it's very important as you manage your licenses is very important to manage its obligations and policy of the obligations and policies. So phosology also provides you a very good interface to manage your obligations and organization policies against the licenses. So you just need to define those obligations and using CSP import you can import those in the phosology interface and against the license when you generate the report obligations are automatically included in the report. Phosology REST API, the automation is a buzz for a long time and that's the reason we come up with the phosology REST API. Now it supports the REST API interface so that it's become easier for user to integrate with the tools they are looking for or for uploading a package or integrating with any other tool they want. So as of now with the from 3.4.0 version we are supporting the REST API. We have the endpoints like uploads, folders, search, user, job report and tokens. So you can generate a token from phosology web UI and then you can upload a package, post it and trigger a scan over there and generate a report and you can also get the report over the REST API. So here I have given the link for the REST API calls, how it works with phosology. I will also show you if time permits at the end. Now we see couple of agents like Nomos scan and Nomos monk and bulk scanning. Now as a part of last year GSOC we have developed one more agent which we call, GSOC is Google Summer of Code. So our student Aman has helped us to develop this Atherashi agent which is not a rule-based command line tool but it uses some statistical and algebraic method to find out the license information in your file. So here we use couple of distance algorithm and similarity finding algorithm to find the license information from a file. How it works exactly? So first we process the input files, that's the files from the library and then extract commands from the file and then normalize it and then apply different distance algorithm and based on the similarity and the rank we define that this file has this kind of what kind of license one file may contain. So this is our interim flow for same Atherashi. So when you have the normalized text you perform a match and then similarity array, you get the similarity array and apply similarity array in the file and rank the findings. So that's all about more or less about what I wanted to present about the physiology as a license scanner. So we have lot of time and I can go through a brief demo of physiology. So this is the physiology web UI interface. So if you install physiology, it looks like this. It's very easy to install on your system. So we are in GitHub, so you can just clone us from the GitHub page. We are very well documented. If you go to physiology wiki, we have the difference installation options. So you can install from source, you can use a Docker container image itself. So there are various options available. You can try it out. Currently in my system I have installed it from source. So default user ID password is FOSSI. You can login using FOSSI to physiology. So first thing you need to do as I defined in that process view, so you need to upload the package with physiology. So how you do so? You go to upload, then upload the file. Now you need to choose the folder where you need to upload. You can, according to your need, you can create multiple folders and folder structure here. So just for now I am choosing a ZD package and uploading it to FOSSI. And then you can just describe something, test package or something. And then you can choose the scanning agent. This is copyright, email URL and then ECC analysis, QR analysis, then Monk, Nomos and you can upload. So, okay. So physiology will unpack this package first, then it will run the different scanning agent that we saw. So you can see this. If you go to my website and I send job, it will show you. It has unpacked and now your package is scanned and it's ready for clearing process. So if you come to license browser tab, it will give you what the license has been found inside this package. So it has found some ZD, BSL, all these licenses and then it has found the copyrights involved in this package. You can see the list of copyrights that it has found. It has around 102 entry. ECC also export restrictions it has found. So, sorry. One basic thing about this, you can define your copyrights and as well as ECC statement using from the configuration file. We have the configuration file. It's a text file stored in the library of physiology and you can define your own thing over there if it requires. Okay. I'll just show you how the bulk scan. So I talked about one more license. For example, bulk. How this work? I'll just show you. Here is the bulk interface for physiology which gives you the bulk. You can choose a statement which you think probably this statement looks like it's could be a ZD license. So I can copy this text. I can copy into the text reference here and I can remove the ZD license and can add ZD license from the list and schedule a bulk scan on this whole upload. Okay. So once this complete, it will mark the file as identified. Okay. So it's not only this package inside it's marked actually, it's actually inside the whole package it works. So you can now see before we start. Okay. I think I took a bad example. So it only actually marked on one file but it actually found find that elements in two files. So it has marked two file as cleared. Okay. So now about the generating reports. So once the clearing job is done you can go to browse. You can go to the package that you have uploaded and as I said you in the presentation that you can generate different type of reports. You have read U.S.S. You have SPDX RDA. If you have tag value you have unified report generation options. So let me generate SPDX RDA file. So you can see the report is generated and it's downloading. So the report is downloaded. You can open when I think it didn't open. So this is the SPDX report and you will have all the licenses that you have cleared. It will be here and follows SPDX information. Now quickly I will go through. You can also generate report in other format according to your need. So one more thing I told you about obligations interface. So SPDX it gives you very good obligation management system where you can add new obligations against a license or you can if you have an obligations imported from some other place you can do a CSB import or CSB export of obligations. Okay. And for rest API you can generate a token from here. So you have the option to generate a token from here. Give a name and a date and read write access. The way you want to generate the option. The permission you want to give for your token and you can then easily use the rest interface. So the rest link I have given it's also there in phosology org. This is the phosology official website where you will be able to find the rest API basic explanations. So here we have given some curl commands that you can use over rest and we have also given the swagger editor ID and the rock content of the EML file. So you can if you put it into EML it will give you all the rest API interfaces. All are well defined. So you can just try it out of your own. It works pretty well. So it gives you a complete interface for automating your package upload with phosology. Or if you want to integrate with any of your CICD you are it enables you to do so. So now this is the authorship project that we are currently integrating with phosology. Hopefully it will be done in by this year end and you will be getting another scan agent that actually enables you to do more automation with phosology. So you can conclude your license more easily. The goal of phosology is provide more and more automation so that the manual work decreases. The clearing process decreases so that you can only review the process review the license finding. So that was all about from my presentation. So if you have any questions you can we don't have a Jenkins plugin yet. We don't have a Jenkins plugin yet. So you have to use the rest API as of now. And the question since I didn't have my microphone was whether there was a Jenkins plugin for phosology. As of now it's not available. We use phosology pretty heavily. We installed it with a lot of our customer base. But I really want like a full solution. I came from open logic and this used to be the business that we were in was nothing but auditing. And now we're replacing that with phosology. So thank you. I just wanted to thank you again. And highlight the fact. You sort of glossed over it but in the current climate it's very useful to be able to do the ECC scans. And in fact the phosology will find all the keywords that are relevant for that. Easily for you is a useful skill. It's probably one of the only tools I know out there that open source does that. So do you want to talk a little bit more about how the export control scanning information is found? Because I think that is very relevant right now. Export of scanning information? Yeah. The fact that how the export controls scanning. The keywords it's looking for. Okay. The context about that. Because a lot of open source projects we need to register so that we can work across international jurisdictions. Okay. So as part of phosology this is also open source, the configuration phosology used by default. If you go to phosology, if you go to phosology Github page and then you have to come under SRC folder structure and sorry. Then come down, look for ECC, no there is no ECC folder. You have to go inside copyright because for ECC we actually truck the copyright agent into multiple agent. So it's the same engine we use. So if you go inside agent, you will find there is ECC configuration file. So this is the regex that's mainly used by ECC. So and we have the regex used by copyright. It's the same in the same folder you have the regex used by copyright, ECC and keywords. So if you want to edit them, it's pretty easy. It's just regex. You can add your own stuff or it's very easy to do. So as part of automations, just I didn't introduce it here to bring more automations. We are currently in a process integrating phosology with software heritage, clearly defined and a couple of more things. We are planning in future to build a compatibility agent also that will reduce work a lot. So that's all. Thank you for joining.