 Good morning tower with some delay. We will now launch into this We'll now launch into sorry the collaborative repository of meta information both by Philip Owen Raphael. There you go then Hello everybody and I'm sorry for the early hour, but that's it So what what is this about? The problem is that in Debian we have so many meta information attached to a package and we just Don't cope with it like Upstream early and how to get new versions. So I'm talking about watch files and other things like localized files and Every kind of info of information that we don't provide with along with the package. That's meta information about the package so we need repository which Has to be also collaborative. So everyone can be part of it and Of meta information, which is about what I just said so This is much like a buff. So if as anyone any idea of how to do that I and wants to share with us, please welcome raise your hand on or make some sound and So The I think there are some points to to clarify about about this project So do you want to clarify some something? I can at least try to explain a bit better what you what we want to reach with that project Basically for the package tracking system, I would like to integrate more information about each Program about each source packages like where's the upstream sources located where if there is an upstream CVS or at the end repository and Many many information like that. It could also be extended one day to host maybe a screenshot of programs or In fact, the the kind of information that you want to store is Unlimited we would like to try to start with the watch file because We always want to check if we are If we have all the latest version, but not all maintainers are creating them, but there are people who would like to Create them for the maintainers, but then maintainers don't want to integrate them So the idea was to have them host somewhere external where everybody could just submit what file Get them even maybe reviewed and then used So basically it started with this idea Right now we may have a summer of code project to Get this project rolling But so the plan for today was to get some maybe basic design or good ideas And I hope to do it right and stuff like that What is interesting in my opinion is To build a kind of wiki like system that means anybody could just go on a website Check the latest mid meta information and update them if they're outdated Of course, it wouldn't you have to keep Nistory of the latest value so that if someone puts junk in it we can just roll back The chains should go should be sent to the packet tracking system so that people Interested in that package could read review what has been changed and stuff like that Oh Yeah Does the idea may thanks for everybody. Do you see what we want to achieve? Okay, that's going to be a pretty active booth Yeah, take take a mic More data that we can think of ready to add there. So you mentioned upstream version control system and Screenshot, okay desktop file dot desktop files extracted from that packages and And That takes information as well But although that that is already pretty much available in the packages file somehow, but well, that's metadata as well Yeah, all the kind of metadata is those that we already have like statistics about the number of books about the Popcorn, yeah Can you write? quick enough? Yeah, the the idea is to provide what is not already provided but There could be some links for example in the description we have A home page field usually and this would certainly need to be extracted to be integrated here. So We have some meta information which do not exist yet, which we will provide Which user or will provide? There are medium information that will be provided provided by script and not by humans because they will be automatically gathered and There will be some where both will be done at the same time in this case for the home page We will just get the initial values from the control file when possible and ask people to To fill in when One stuff is lacking Have you had other ideas of interesting stuff to store in that database? User ratings, yeah, why not? reviews of reviews of the package like Yeah, it doesn't suck Okay, I heard also the idea to extract information from fresh meat integrated back, maybe I Don't know what is stored in the fresh meat database, but they're pretty much similar information like Yeah, if a bug truck is available if Sit stuff like that. Yeah other ideas the upstream back tracker Upstream break Tucker. Yeah, of course That was part of the basic one, but It should be written. I wanted to show you The description of the summer of code project, but we have no net so I can't grab it back Abstract mailing lists. Yeah. Yeah, everything that is upstream related mailing list user support forum Yeah in the features and what would be maybe interesting is F's some kind of localization support for example, let's say we want to have a user support mailing list field The of course we would be interesting to have that field from French Germany. I mean, that's a kind Features that is interesting. Can you wait maybe write it down? Possible localization of fields or something in that idea so The at least my idea was to have a source and binary packages which Every package has some kind of attributes which we define, but they have to be extensible so we so we can add attributes later to packages and Okay, this is a matter of implementation implementation But I would like to to her some ideas about about that Yeah, the the ideas beyond project are Floating around for a few few months already We'd like to have some feedback on how you think we should implement it Because it's going to be rather large Everything that is that a base about all over package is always large And in particular if you want to store historical information it can grow up quickly and If you have ideas about how to store it gladly here's them my first idea was to have something a pool tree like we used to store for What is it a pool tree with one directory per source package and Then a control files like syntax for basic fields and maybe a Dedicated files for stuff which are more advanced like screenshot Yeah, but the speakers Give me forwards backwards. I move like you That's why I'm going here, but Maybe just Reduce the volume of the mic. I don't know. Do you hear me here? Yes, so if you have ideas about how to store that efficiently and to grab I have no luck Does it work better on this side? So I'm listening. Yeah One thing I notice is some of this information can be automatically collected and are basically read only for the public like Desktop files Some of this information are to be edited by a restricted number of people like I could think watch files some of this information are like Wiki like things like ratings and most screenshots So for what concerns the user screenshots, it sounds like we just need some convention in the wiki like If like just tell people how to name a wiki page about a debut package and Probably we don't need any other storage thing more interesting is how to make available like the rest of things and What are the users that we're going to do with it like desktop files need to be installable because they would be used by Updater Installer software to add the software to the menus Like we have in the open to installer and we're starting to have a similar functionality for Debian Screenshots could be Installable as well. I had in my mind the idea of making Installed games browser Which uses depth tags information to to find what kind of games you have installed and also shows screenshots So that could be done a fence quite fancy and provide a really nice frontend for running games in the local machine So those should be installable That takes is kind of an own problem in this case What files may or may not need to be installable as well as upstream URLs Although one could automatically I don't know put that in the packages. I don't know how It would be nice to have that information kind of standardizing user share doc But you can't like edit a depth file after it's been signed so Yeah, I mean That information's got different users. So I guess that kind of influences the kind of implementation you're going to do Yeah, maybe a Philip or we should just write a List of users of the information Because I mentioned the package tracking system as my first wish because I'm the main head of that but Yeah, we we certainly have to provide the information in several form like people may want to have a Single meta information but for all packages and some other might want to have a look at all the available Meta information for one package only so We should provide both quite Efficiently if possible and Yeah I'm not sure about Using the a real wiki directly. I mean I wanted to reuse the principle but not necessarily the existing tool I don't know exactly Because when I pick off wiki like stuff I Would like to have more control over the team tax of the stuff that are Provided and I'm not sure that we can do that with wiki a usual wiki I think that's for most information. That's from users. You will have to do some review and We know from depth x there is a public editor for depth x data and The biggest problem of all is not getting some data from users The biggest problem is reviewing it the bottleneck is the single person here who reviews it and There are like no tools for that and I think without the tool we just won't make it so Even if it's in wiki that should be in my opinion fine if we can extract the diff from current reviewed data current diff current wiki data and Someone can say these lines. I okay approve click one button and it's everything is done Which is probably Yeah, I don't know. It's not and maybe on the information Not only information are sensitive My idea was to review what's mainly to made a package tracking system so that the maintainer Or the other people interest could just check it Yes, but as soon as you distribute this as part of distribution like in an installer tool It has to be reviewed. I think yeah Okay, I Think it's okay. Yeah, okay Just speaking When it was Okay, let's Yeah, you told us that's it depends on the information for desktop fire for example We didn't have to review that because we don't want anybody to Provide wrong or non-existing application to our user so Phillipo, are you following it's okay? The possible use is I was Listed the pts e Explained often pts package tracking system. It's first user. I would say then then I don't know how you called your application Which uses desktop file does it ever? Adapted okay adept in star so I don't mind Fringe keyboard is difficult just mention in there in brackets that it's about using desktop files Yeah, yeah, okay, you'll start with desktop okay the installer should probably become a repository of applications that are user oriented so thing that has desktop file and it should give like a old data that user may want not only for things to install but stuff like what I have installed and Review that like Enrico said for games, but more general so Maybe that could be merged for everything and have different views on on the applications Could you add the possible uses the game launcher? Installed games launcher as soon as we can have Association between the package and its desktop we can probably have things like the Installer also telling you Where in menu the application landed in your current desktop and Especially like run the application so It may need to add meta info like Which binary is main for the or which desktop file is main for the package because like now we can have multiple desktop files in one package and Some of them may be not important for the user for sure because like I Think the info panels in GNOME and in KDE have each its own desktop file and it's all packaged and in one Package and well, what does that mean? I Think this should be all hidden so probably We need some way to Make the desktop files like this is for user and this is not for user It's kind of hard to to split what is for user from what is not for users though You miss a popcorn in a few places a Meta info is provided and adapting stroller Yeah QA Yeah, of course Well, develop my car. It's not directly related to the source package, but I don't know but yeah stuff likes bug statistics, I would like to integrate them in this Central repository so that the DDPO and the pts can use the same information without doing twice the same work and And provided yeah, and then it can also be used to to Yeah for other Program like the other program reviews or package reviews so another thing that could Could be could can be important in the way you use data is that you want to access it Sometimes for some data you want to access them per package like the screenshots some data You want to access them aggregated like popcorn? I Guess or like Yeah, mainly popcorn or that text So because I was thinking like a possible startup implementation is having like a directory per package And inside you have sub directories for every Kind of data you have and you put them you put it in like kind of role and then people can at least know where to find stuff And then eventually you can figure out to put some index on top of things But if you split the data per package if I want to download popcorn ratings for like all packages installed or not to do rankings then I need to To fetch by HTTP like 19,000 files, which is not efficient So some of the data can be aggregated as well. That's probably another Distinction like either provide them both aggregated and not I guess it's small data that you want to aggregate So it doesn't matter providing it both Yeah, that's another distinction that could be taken care of Please add change log to the provided metadata That's an important thing because right now there is no Reliable way we can get at the change lock without unpacking the package which is sort of Yeah, I think full change look because we can part it but You never know which version is at the user system. So you usually have to show him the part from his current version to the version he wants so and There is something in packages there be an org I think but I don't think it works and there is no way I can access that reliable reliably So Yeah, that's about it. Yeah, the the export format thing It's it's about interfaces. So I would like to To check out also the the import interface. I mean like users and no users interact with the with the repository It's traditional thing of Debian to have a mail interface. So that can be a thing but for It can be tricky for for some things. So Okay Mail interface would work for every text like attribute and I think Yeah, web interface is more suited for for things like Screenshots for example as for idea for implementing the information source I think you could have URL on say HTTP to some CGI script Which should probably have some parameters and the script should be able to say I want these fields for These packages may be with a regex or something to select which packages or Maybe just everything and single. I am not sure which is better because You usually want to cache the Files that are requested often with if you If you make a lot of regex Request that will probably trash this error quite a bit and maybe not even save anything so that's probably something that should be thought about but Yeah, that's that's looking fairly good Yeah, and I guess you also want to start simple like I would stay out of like Fantastic XML DTD's to represent everything. Yeah, he suggests to make like a bag of records sort of file or well The RFC 8 to 2 format we use for packages file. I think Yeah, my first idea was to use a format like that package file as a main leader and maybe store them in a Subversion repository so that we have history automatically or something like that I don't know if it If it can work efficiently, but I will give the mic to you and He had he wanted to speak a bit about ideas for storing this information, I think Yes, I'm not entirely sure. I missed the beginning of the talk. I'm sorry As I milled to the QA list a bit ago There's actually already some work going on on a very very similar idea and I think it's The ideas need to be merged because there are so many parallels in getting one central repository of information the main primary focus of the the repository Was for QA related data like lintian results Like rebuild testing but also for example extraction of copyright and change log But all the things that are named here the trends I saw in At the beginning now, it's a bit more elaborate was user supplied data That was not really provided yet. I haven't thought about that part of it yet There is already some implementation in the pool structure There's not yet an interface to actually really extract the data But there is already interface to add data to a sort of pool structured Repository metadata with all different kinds of information which very similarly like this and I think we should definitely work together on this of course because it's Enough similar to be at the total waste of time to have two efforts on this About ideas further one of the things that With respect to fetching the information had been thought of was having a pool Structure with for each package all information. So if you want a given field from a given package, it's a direct Lookup, it's all file system based and for example the use case of getting all popcorn status At one go without actually doing a lots of excesses for each of the package To also have a parallel Tree that is based primarily on the item the fields and Then within the field the packages depending on whether it's popcorn or in general Something very small. It can be one big text file with one line each package or something or Alternatively, it could be one directory with with a lot of Files for each package And Yeah, I mean we should bring some for further. I guess on what exactly We should What exactly to ask further I have at the moment not much more to add I guess Thank you Yeah The stuff you mentioned was It has been used right now to store upbeat logs for the GCC for one test So it's what was a big information. That's why I didn't make the parallel really because yeah It's more many small information in my idea for no, but of course we We could merge them no problem Like there's nothing here only ideas So not much Duplicated yet Yeah As I told in the beginning We may have a summer of code project working on that. So we were trying to Find a good design and gather ideas Yeah, that's it. So well You mentioned that you would have two trees one Sorted by package and one by meta information really Would they be updated at the same time or do you think we would should generate one from the other? The pool tree is intended to be canonical to be the real source of information and the other trees are intended for easy extraction of the information and for example for Change log extraction, which was one of the use cases There would be a heart linked mirror tree with just a change logs and Then for example a service that wants to have all the change logs For example packages.debian.org or whatever different Service can ask sing simply the tree with the current set of change logs of all of Unstable or all of the whole Mirror Okay, thank you Yeah, maybe we can just Erasing to any other ideas following this whole discussion I think we we have many interesting points already, but I Don't know how we yeah Sorry, I came in late, but what I'm wondering is Some of the things you mentioned up there and that were also mentioned in the description for the talk are things that are currently in In the package itself like the watch file or the Copyright would this be redundant with that or is it intended to replace the I mean, I guess you couldn't replace copyright But maybe watch file or how's it expected to work with that? Yeah, as I told at the beginning we The watch files is not always used and the maintainers and do not always agree to keep them so It's kind of redundant, but we're going to use the existing watch files to Start with and Hopefully it will be it will give become the reference. I think because it just makes more sense for all our qa tools to be able to Check if we have the latest version using a single database reference watch files. I think that definitely not for all fields This should be a diverge from the actual Alternative source in the packages, but the reason to still include it in a repository like this is because of the CPU time required to extract and to make easily accessible things like the watch files the desktop files that are in the package and Chains logs currently if you want to have all change logs, it takes a lot of I owe you basically need to read all of the dots depths and it's easy to retrieval and For some of the fields at least and not I don't think it should be first. I mean change logs does not have any use for this repository To diverge from it but for some fields indeed, of course it is good to that it's Really authoritative source of information for example as Raphael mentioned The watch files because then they can be maintained by qa or by a mixture of the package maintainers and qa Just a week not other metadata that could be other than can enclose some of that is a do AP description of a package Do AP do up The all up it's a XML format to describe package metadata you can put like description It's an intense to unify like a fresh meat and source for descriptions I know ask At the dump hill is the developer is the author of that and he's trying to push that as of kind of XML dialect to actually describe a free software project seems to be pretty good and you can put that text categories inside it You can put upstream VCS repositories on pages and so on so I had a couple other comments about Kind of the package related metadata Just last week. I forwarded a proposal that was written by Scott remnant about d-package 2.0 and one of the features of d-package 2.0 is that That that he's kind of pushing for is that it actually is only going to pay attention to a few metadata fields and Anything above that would be something that that apt or other Other things like apt would be able to deal with He's not advocating taking it out of control. It's just that the undisped package format itself We don't we know about a little bit of metadata So there's already kind of a split there also as part of the d-package 2.0 process and one of the things we've talked about is looking at the control metadata overall and Trying to see where it makes sense to do some adjustment because one of the things we've noticed is that the rpm Tools and file formats have their own set of metadata and there are some tags that are are similar for example You know we use maintainer and they use author and these things are Similar concepts with different names and so there might be some attempt to try and come up with a set of metadata that could be common across Package formats and and at the same time we would be adding things like you know you had The copyright and license up there right now. We currently have in a in a package file But there's no reason why we couldn't have a metadata field that Was what you know, I was a licensed string and also metadata field that was copyright string So the other problem you have is you don't want to grow the packages files You know we already have a problem with the packages files being too big and the new diff index stuff is helping that I think but To some extent we need a solution That's gonna allow d-package an app to not have to Download these huge packages files and so maybe There can be some sort of reasonable split where This takes over for a lot of it and in d-package doesn't have to deal with it as much anymore Yes, that's long-term work But yeah, you're right. There's certainly Some of them which makes sense to be integrated in the package itself, but most of them shouldn't really because maintainers Don't care about providing them and it's more Stuff that should be managed outside, but yeah, it's a good remark anyway You want it? Thanks, I'm Henry commension description of a project the OAP I would just want it to Say that it's not actually XML format. It's an RDF format. So are you people in general? Do you know what RDF is? It's the resource description framework on the standard of the worldwide web consortium for General metadata and I think we could think of using that for the implementation If we really want our general purpose metadata repository that will be extensible It's been thought out so the model of it that it will be extensible for for lots of stuff for example if we want to have If you're talking about something like the control file which has fields and annual use But RDF has also the subject so for instance if you want to have For each screenshot a comment Not only for each project Screenshot then you will want to have a Possibilities to say that this screenshot has the comment which is this one So RDF would be a standard that ensures that this kind of information can be included And you could also use this to they have there are already tools for querying these and Joining several repositories or sources of data That's kind of stuff Okay. Thank you Tolima just told me that we're out of time already. So I like to thank you for all your feedback And I hope we can manage to do something good this year still and Yeah, thank you all