 Okay, hey, thanks for joining this presentation, actually, that recorded presentation. My name is Michael Jäger and I would like to talk to you a little bit more about the Phosology project. First of all, I may give you some general introduction about the Phosology project and to tell you a little bit more about what's special about it and what happened recently in the project. So Phosology is obviously a tool in the area of open source license compliance and it has been there for quite a while. It has been published initially in 2008 by Hewlett Packard and in 2015 it turned into a Linux Foundation collaboration project. It's a Linux application, so you can't directly install it, natively install it on Windows or MacOS 10. You need to have a container maybe or maybe have a virtual machine running Linux. And it implements a couple of tasks which are important for open source license compliance. Obviously, that's about scanning for licensing, for license relevant statements. That is about scanning for copyright statements or OSHA statements. It can also scan for email addresses. An important topic is also trade compliance when it comes to compliance with open source components. So it can look for export control and customs. That's what the acronym stands for, custom statements. And then there is of course the ability of what has been scanned and analyzed and looked at in the Phosology application to generate that as forms of documentation. That can be text documents, that can be word processor documents, which can be read, for example, with LibreOffice or you can export and import SPDX files. SPDX is an acronym which stands for Software Package Data Exchange and it defines on how metadata about files and software packages can be exchanged and that's one larger area of that is license compliance, licensing information, copyright information. Another area, for example, would be vulnerabilities. So Phosology can do all this and there are also other tools scanning for licenses and exporting SPDX, so what's so special about Phosology? After all the years working on licensing and understanding licensing and open source software, the users and the community of Phosology has seen that overview is actually key for efficient working on open source packages. An overview basically means to be able to navigate to the package or open source software's hierarchy, file hierarchy, look into folders and at the same time aggregate phone licensing information, so you quickly can identify individual files. Aggregation is super important for finding the critical licensing statements quickly. And if you are at the single file view, Phosology helps you even more by highlighting licensed relevant statements. So on the screenshot, you basically see two things here. You see that in gray, regular expressions hinting at the particular license have been highlighted and you see in brown that licensed relevant keywords have been found. Sometimes you only find keywords and Phosology is not able to determine the licensing and sometimes the gray highlighting pointing at a particular license is actually very helpful. So the idea is you can look at licensing statement and conclude it on a fine-grained level on every file and you can navigate up and down in the file for the hierarchy. So you might wonder like looking at individual files is that even necessary? Well, it depends on your licensing conditions, right? Like the conditions where how licensing is expressed and Phosology is not really alone with it. Also in the SPDX standard, you see things like license concluded, which is obviously an expert verdict here on the file. And on the other side, you have license info and file, which like what the license has been found. And that comes because in some points you have maybe dual licensing and some points in some cases you have licensing, which was not written very well. So it's not really easy to understand what licensing has been actually intended here. You have just unclear statements in terms of like it's written well, but you need to decide on some particular things because some points are left open. It depends on the domain how closely you need to look into the particular licensing in order to conclude the license. In some domains, software packages are licensed very clearly and very easily to find. And so you basically don't need to conclude anything, maybe. And in some domain, it's just a very confusing and an expert is required to make a conclusion. Let me at this point have a look on the Phosology application to briefly show you how it looks like. If you if you have deployed Phosology and serve to it, and for example, if you deploy using Docker or Vagrant, probably that's the URL you have to serve to, it will start with this login screen. If you didn't do anything on Phosology, the username and password will be the same both Fosse in order to log in. That's the main admin account. And obviously something went wrong here. And after you have successfully logged in, it will show you the already done uploads on the server. You can change your password, of course, if you like, or the password of the admin user by going into admin users and then edit user account. But this is not something we would like to do here. I would like you to show the initial probably most important method to work with Phosology and that's uploading a file, a package archive from your local hard drive. As you can see here in the upload menu, different methods of bringing uploads to the Phosology server are possible. You can upload from a given URL, you can upload from a Git server and even specifying the branch and SVN is also possible, I think, still. You can upload from a local server path like server, Phosology server is meant here. And you can define maybe an incoming folder and upload files from there. In this case, just a very easy case, upload from file. All of these different methods look very similar because they start with choosing the file that you would like to upload. You can tell Phosology to ignore version, concurrent version system files. You can have different settings for the visibility of the upload and then we come to the actual important part here to select different analysis method that should be applied on the upload. And Phosology, the application itself is not really bound to license compliance. Phosology itself is an application which unpacks an archive. It can do recursive unpacking, which is quite important because in open source packaging, you can find open source packages in open source packages that you would like also to unpack in order to look at its contents. And then for every file found in all the recursively unpacked archives, all the files are taken and individually sent to the analysis methods. It is in this case, analysis methods about license compliance. So you can analyze it for copyright statements. You can analyze it for export control statements. You can define your own keywords and look at the files if you find your own keywords. But it could be also different analysis methods, of course, for example, one module which recognizes programming languages. No one has contributed that so far, but it could be an interesting idea to analyze also on that. And Phosology has three different analysis methods on licensing. It analyzes or tries to find licensing by text-by-text comparsion by using regular expressions and an SPDX license identifier. So let's maybe talk a little bit more about that. Text-by-text comparsion is very useful in order to identify a particular license. And if you are a license expert or lawyer, you understand that if you need to make decisions on particular licensing, you need to have the exact text or need to understand what the exact licensing text ruling for this piece of software is. And in order to help you with that, Phosology is spotting differences between found text and reference text. And if there isn't any difference, you have 100% match of the license text from the database and the license text which has been found in the upload. And you are very safe that actually the license text found in the upload is the same that you have in mind and you have understood as a particular license. For lawyers, having that safety is very important. On the other side, it's obvious that this method only works on known licenses, right? If you don't have the license text in the database because it's new, Phosology won't find anything with this method. Maybe it will find parts of it because the license text is actually a newer version or reused parts of existing licenses in licensing text to form a new license. But on the other side, you would like to have regular expressions and keywords defined in order to find all sorts of license relevant statements. And that is what the second license analysis is doing, no more. And it's actually very important that both of these things are present in Phosology. One, full text matches offer you a higher precision here towards the right to tell you really that this is the license text that you also understand as a particular license text found in an open source upload and keyword and regular expressions find even unknown licensing statements, even own written licensing statements. So you make sure that you don't miss anything in a particular upload. And then there was a third license analysis method, which was called Ojo License Analysis, I'm sorry, this year, scanning licenses using the SPDX license identifier. And what's the SPDX license identifier? I have an example here. The SPDX license identifier is used as a convention, as a proposed convention, which should precisely define the licensing ruling for a particular file. And it was found with all that scanning technology and analysis work, applied to open source packages that the solution to reduce work in open source license analysis would be to precisely define the licensing at the source, at the published package in every file in a machine readable way. And that's actually the proposal from reuse.software. And I find now that I have forgotten it. So let me actually write the URL here so you can look it up. So that's actually the URL reuse.software. We can also put this in a browser here, which that was a bad idea. Forget about this, which defines the reuse.software is an initiative to define how you should express licensing in a file. And if that's found in a file, Phosology should be actually able to recognize this as an SPDX license identifier statement. We could add a regular expression to NOMOS to identify SPDX license identifier statements. That's fine. But in the case of NOMOS, you don't know if it was based on keywords or some regular expression or the SPDX license identifier that has led to the finding. And as such, it's implemented as a separate finding here as all whole license analysis, which allows you to tell Phosology precisely that if it's an SPDX license identifier, which has been found, don't let me review it. Actually, you can, but don't ask me for reviewing it. But make a conclusion automatically. Phosology has the section here, Automatic Concluded License Decider, where the user can define to apply license conclusions to files based on some rules, which are obvious. For example, if NOMOS and Monk both find the same set of licensing. If that's the case, then obviously text-by-text comparison and regular expression have led to the same conclusion. That should be enough confidence in order to conclude a licensing. And as such, you have the idea that if in a file Ojo finds an SPDX license identifier and NOMOS and Monk don't find anything else, like any further licensing statement, maybe further below because someone has copied or added some content and didn't adopt the SPDX license identifier, then an automatic conclusion should be done. And that is provided by Phosology to the users to actually save work. On one hand, you have the ability to very precisely look into files and understand the licensing situation with three different license analysis methods. But on the other hand, you have a number of options to automatically conclude licensing when the cases are clear. And that means that you save work by automation and at the same time maintain a high precision with the level of reviewing license statements on a file-by-file basis or on a license statement-by-license statement basis. Let me come to the next point here. Apart from how Phosology finds licensing, I have already mentioned that Phosology imports SPDX as it can also export SPDX, meaning that you have uploaded an open-source package, have analyzed the licensing, did some conclusions, and then you export the SPDX file where it is written and which licensing was found or which copyright statement was found. On the other side, importing SPDX means taking some analysis maybe from someone else, maybe even someone else has done with a different tool. And the import of the SPDX files in Phosology can have a number of use cases. For example, you'll receive an SPDX file by another party, by another organization or by another individual, and you would like to understand how well did you work on understanding the licensing situation of an open-source software or for a software package. Then, given that you have the same software already applied to your Phosology server, uploaded to your Phosology server, you can also load the externally provided SPDX information and compare it. And that not only provides maybe sharing of analysis work, so someone could provide a central archive for SPDX descriptions, and you could share that, but it means also that you can reuse existing analysis work and reuse is actually very likely, because in the all-day work with Phosology and with license compliance, it is very likely that you scan newer versions of a package that has been scanned before because your software develops, your dependencies develop, and it's very likely that you look at newer versions of already analyzed software. So if there has been already software analyzed by someone, and this party provides an SPDX description of the analysis result, and it happens that you are about to scan in your version of that component, you can reuse the existing work on your Phosology server and import the SPDX. I think the SPDX importability contributes very much to sharing license compliance data, to reduce work between all of us concerned with analyzing open source software. And that is one larger feature, and another larger feature is about license obligations. There is maybe as a background in the beginning, there is the term license obligation. It means something like instructions for non-license experts about what needs to be done when using software under a particular license. And using can actually mean also redistribution, right? So the license tags were found sometimes difficult to understand, and people started to write obligations, meaning like, okay, if you have that license and would like to redistribute software under it, then what does it actually mean for you as a software developer when redistributing it? And obligations can be also organization specific. So maybe your organization has special ideas about how to implement obligations, how to meet the license obligations when defined for a particular license. And there are also several organizations out there. For example, there's the Open Source Automation Development Lab, the USADEL, which provides a set of generally written obligations. There is the FINOS, an open source foundation in the financial area, which provides an open source handbook. There is Github, which has also provided the project choose a license where obligations can be found in a machine readable format. And Phosology lets you import and export licenses and then spreadsheet in a spreadsheet data layout using the comma separated value file format. So you can first export maybe the existing database to understand the file format and which columns are there. And then depending on your source, you can import your obligation set. And now there comes the trick, if you do an analysis of component and you determine licensing in this component where you have also written obligations, like obligations in your database, then you can generate reporting and pass that over to your other members of your organization, telling them in more natural form what they need to do with this particular open source software. So all these things happen on the UI and that's actually fine. But I think also nowadays we see more and more happening in the Phosology area, which is about integration. Basically, there are three main ways of interaction with the Phosology server and integrating it with different other applications. There's the REST API, there is a FOSS driver, Python library, and there are the command line tools that come when you install Phosology. So with the REST API, you have endpoints to interact with the Phosology server to manage folders, uploads, trigger scans and options, download reports and so on. If you would like to have maybe a brief overview, look at the phosology.org gets the REST API call section to see how simple it is to interact with the Phosology REST API. There is FOSS driver, a Python library which basically allows you to remote control Phosology and there are command line tools, also command line tools which lets you execute all the individual functionality by command line, like execute by command line and just the nomos scanner on an upload, right? But please be aware that nomos, for example, doesn't do unpacking for you. So you need to have an unpacked archive and then you can run nomos on it. And the REST API is you need to understand that on the REST API you need to also to follow a particular flow. So you can list folders and see what's there on the Phosology server. You can trigger scan jobs, you can observe what the jobs are doing and you can download different reporting. For example, SPDX, you can have a DocX report download and you can have readme, license listing and so on and so on. You can look at the basic REST API calls like the URL that was given on the slide before or of course have a look at the complete REST interface documentation of Phosology. So far in summary, so maybe let's step back a little and have a brief overview on the releases that were published in the past three or past 12 months like the past three releases. So we had 3.6 which was basically bringing Oho to the Phosology code base with the ability to read the SPDX license identifier tags. There was a huge change also under the hood, changing from the SQL file to import the initial catalog of licenses to JSON, allowing it also for other parties to import licensing information, a license catalog that comes with Phosology. 3.7 brought improvements to the automated concluded licensing. Under the hood a lot of changes were applied to the GCC8 compliant because the behavior warnings has been improved in quotes and now it has been improved. And another important thing with the 3.7 is the REST API support for SW360 actually support was there before but SW360 which is another open source software and Phosology can work better together with Phosology's version 3.7 because from that point on Phosology provides more or better information about the job progress and that's important for remote controlling it from the SW360 application. 3.8 the most recent release here brought to the code base the software heritage agent. Under the hood it was doing schema changes to make sure that everything is UTF-8 in your Phosology database that is for example in particular important if you would like to run Phosology on AWS RDS, Relational Database Service. The improvements to the packages on the Phosology project have been done so 3.8 is the most recent version. If you would like to upgrade to the latest make sure it's 3.8.1 actually. Let me talk to you a little bit about the most recent feature about the software heritage integration. So software heritage is an archive of published open source software and it's the software heritage organization itself as a foundation and other organizations can support it by donating to this foundation or become part of it or other organizations can even publish their own software using the software heritage archives. As far as I know or I understood this these works is it's basically two work streams. One work stream is about to provide the Phosology for archiving published software. So we talk about a super huge archive here if you imagine all the published software so far and providing it. So someone needs to develop the actually server technology to provide the archive and be able to extend it. And the second thing is to provide the archive as in meaning that someone need actually to host all that data. And if you talk about archive it becomes obvious that it should be a single site but there should be multiple nodes so if there will be some event some unfortunate event on one side the archive is not lost. For more information you can check of course the URL of the project but the important thing for us in the Phosology project is that it has a REST API right? And the use case for Phosology is that you can find out if a file is public already and for open source project that's likely very very boring but for mixed upload you can easily distinguish published files for non-published files. You can even upload your own proprietary software to understand or make sure that all the files of your proprietary software are really proprietary and have not been published so far right? If they shouldn't have been published so far because admitted proprietary doesn't mean not published. So in general it allows you to understand if a file has been published or not meaning if it's not in the software heritage archive it's likely not published. Phosology uses the REST API and uses the SHA-256 check summer files to exchange with the software heritage server. It requires an internet connection of course and when we look at the software heritage functionality then some more metadata is being pulled. Let me have a quick look here. So I have uploaded a very small archive it's the time tool you may not know from your command line contains only a few files let me zoom out a little and with the newer releases of Phosology you have the software heritage section here that you can click and then it gives you a list of files and that all of them have been found also on the software heritage archive. You can switch over to the software heritage archive and call the REST API directly so this is actually a GUI front-end for the REST API call. So we have here the content call with the SHA value there is also a SHA1 value here and there has not been a license found it has been scanned with an older version of Nomos and so it's part of the software heritage archive. We also pull the licensing information here from software heritage it's not really it's not really super useful to have licensing information from software heritage while you can scan on its own but we already implemented the ability to pull the metadata to pull the metadata from the software heritage server in order to be prepared when more metadata is being offered. Right now it offers you this list and if there would be a red dot you would know this file likely has not been published so far. The software heritage implementation is actually a result of the Google Summer of Kotron 2019 and the credit the person who has implemented it actually is Sandeep. So we are glad to have Sandeep contributed to the software heritage implementation. Sandeep now also volunteers as a mentor in the Google Summer of Kot 2020 run and that's also where I would like to switch to in some news about the open source community. So Phosology again has been granted as a mentoring organization for the Google Summer of Kot and runs in 2020 and 2020. So in this year three students have been awarded by Google with the Google Summer of Kot's depend and have applied to work on the Phosology project so we are very lucky and thank you to the students and to Google actually. And the areas of working are on one hand Atarashi to bring it to the Phosology and to let it better deal with comments, understanding comments better as separating outsource code. I know there have been libraries already implementing comment recognition but they were not working optimally for the cases so that's what being worked on and another area of work being implemented is the integration with the Grafana server. So Grafana is a dashboard server. It's also an open source project and the idea is to implement an integration with the Phosology server to allow monitoring of a Phosology server for all this non basic US metrics. The operating system metrics that I regard as basic would be like free memory, occupied memory, free disk space and all these things are there and you could do this with your Phosology server but maybe you would like to find out more like how many jobs are running, how many uploads have been uploaded, how many files are in the archive and all these things. So notably here also that big thanks to Nicola from Orange and also Bartosz from Orange who are also mentoring this Google Summer of Code student working on this part. If you look also where changes have been done except from the core project for Phosology, it's the REST clients. So there is a REST client for Phosology. Actually there are three of it. One is a shell script, one is a pint library and then there is a dot net correlate meaning that you can also compile it on Linux, not really necessarily run it on a Windows machine. So these are a client implementation to interact with the Phosology REST API. You will find them on GitHub in the Phosology group and I would like to also point out that there is Phosology Slides which is a very nice package with the collection of slides. It has enough material to run a one day course of Phosology and we encourage everyone to make your own presentation for Phosology, introduce the project if you like it and the slides are licensed by the way with GitHub CC Bias A4.0 just like this presentation so you can modify it and redistribute your modification just given that you maintain the particular license. So this is it. I hope you liked the video recording. I hope you liked the presentation. Please don't hesitate to write to my email address or contact me by any other channel if you would like to ask a question about Phosology or make suggestions. If you like Phosology consider to star or a project and I think then this is it. Thanks for tuning in and I hope I will be around in the chat at the time you're hearing this. Have a nice day. Bye. So I have this telephone here to support audio. If there would be any questions, I think I'm on this track open source project updates and slack like beginning with two and you can also ask questions here in the Q&A section. Thanks for the nice feedback actually Ashkat and I'm seeing that I have two audio sources now audio on that converts okay put away the phone and notice that there is a lag. I think the one from from the previous sessions it is a little bit sluggish to answer questions here on the Q&A. So one person asked is there is a best practice to use Phosology together with Yocto? There is probably but I cannot point out very well to it because I'm not aware of how currently colleagues are using it. I've heard that some are using Phosology or have integrated Phosology. My impression was that Li Xiaomeng from Fujitsu was working on this also in the Yocto project and also some contributions to integrate a tool for Yocto users but I don't have a good link right now. Generally Phosology needs the source code and somehow you need to be able to download the source code. If you are managing all the time the your dependencies with package management or you have all the references to some source repositories which you pull and build from it should be fine to integrate with Phosology because then you can just send the packages there. That's one thing that you need to consider and the other thing it will be probably super super much what you are pulling when building and packaging and I think one good practice is not really trying to understand the exact licensing of every file because many files are likely a GPL GPL2 license but to kind of whitelist some already expected results to be more efficient or a blacklist maybe things that you really don't want to see and work on that part. But if you would like to have like the full list of phone licenses and copyright holders then probably yeah it will be a hybrid approach where you can have packages which are obviously GPL2 license or with popular license where you can just checkmark it with providing the LGPL or GPL text and for other packages which could be everything you might feel the need to look separately in Phosology by manual uploads. I don't know if that's a sufficient answer to this. I would maybe I'm taking this question with me and I try to see what the colleague from Fujitsu was actually doing and if you send me an email with a reminder it probably would help me to get back to you. Otherwise I'm writing on. More questions? Okay interesting situation. Thanks can you give me your address please? Yeah sure of course. In case let me answer this here. It's in the slides also for the others regarding the address. More questions? There is also the Slack chat which I find by far more convenient to chat but probably that's better because it's aggregating answers to a question and this like different postings can interfere. How Phosology connects with SW360? So the basic thing is that Phosology doesn't do bill of materials. So Phosology just scans the package and there could be anything in the package and SW360 basically manages bill of materials and it has the REST API there is integration with antenna and some people integrated with ORT in order to create records on SW360 of product and then the open source components and upload the sources for the open source components and that's sort of like being maintained by SW360 and now comes the point. You can configure a Phosology instance in SW360 using its REST API so the REST API of Phosology and SW360 will be a REST client to Phosology and if you have source code attachments in the releases in SW360 you basically can click a button and do some automated scanning and getting SPDX back for license info generation on SW360. So that's the first thing how it's integrated and that you can do and now we pointed out a number of times Phosology is really good in correcting license findings and concluding license findings and the way it works now is that SW360 remembers the upload in Phosology and if you're unhappy with the result with the SPDX file you go over into Phosology you will be provided with a link in SW360 and then you can switch to Phosology and then you can do your corrections and you reload the SPDX report into SW360 again and by this you can again check if everything is all right with the product documentation generation and if it's not you can get back to Phosology even do more corrections reload the SPDX reporting and use that for your license information generation and so the answer is you need to have a Phosology instance you need to be able to access over the rest API and you can use SW360 to send source code to Phosology for scanning load the SPDX reporting reload the SPDX reporting switch over to Phosology using a URL that is provided to you by that SW360 provides you with and that should be okay I hope this question is answered I could have written it but I think it makes maybe more sense to talk about it. Any more questions? Also if you feel like you would like to understand more details please feel free to write me an email and I can maybe provide you with more information about it more questions I'm wondering how long the session is actually going I guess it's going 50 minutes you have more five five minutes still okay still some time for more questions is there yeah it's actually an interesting situation so normally you have like the the audience and you can have kind of like feedback on whether the persons are going to go out for the next talk and everything every interest was covered or if there would be any interest in some particular question please feel free to to ask again did I think one question I don't know if that was public I'm seeing it here in the list of questions is that there was the during the talk the question if phosology can be hooked up with any LDAP I remember that someone was talking about it and contributing to it I think there is no way of connecting phosology with Apache for authentication and from that on there might be a mechanism to use that with particular other authentication technology I have posted the coordinates for that to find more information and to also into the slack channel so the slack channel is about open source project updates and there is a small posting for me where I've also put links in for you that you can check also a link for REST API description file and that REST API description file could be used by REST clients or by tools like swagger so you can see the full extent of the rest API would be also interesting if there is any use of spdx with other tools like if there would be other persons using spdx for for generated by other tools because we would like to to also work more on the interaction based on on exchanging spdx files but I guess you kind of would need to misuse the questions the q&a feature for this maybe if you if you are interested you could also provide me with feedback what topic you would like to see which was not covered today to have it covered and some of the other forums or maybe a presentation there is the open chain tooling group for example and if there would be like interest in covering a topic that's pathology if that's that is of particular interest then we could prepare it probably too okay I think the session is also reaching the end towards like 50 minutes I hope it was very informative for you I tried to find the balance between a general information about like larger features which were added the last year and then new features which were added since the last talk so I hope it was helpful to get an overview about current for solgye activities again if you like the project please don't forget to start that helps actually to improve the project's reputation for new users and yeah I think can then conclude the session thanks and have a nice remaining conference event goodbye