 Okay, good morning. Welcome at this early hour. Today, in this coming next hour, I'll talk a bit about mall project I've been working on for, well, I actually don't really know how long, I guess it's roughly a year. So, despite the title of the talk, it is actually kind of a nice project. The current situation which inspired me to actually design and create a program or a project like mall is. There's a current, a lot of repetition going on within Debian. One example is the Lintian lab. It is extracting all kinds of things from Debian packages. It runs Lintian itself, of course, over all kinds of packages. The same thing is actually happening in the PTS. There's a sub part of the PTS which is extracting section information from the actual .dev packages and compares it to the packages files in order to report on the package tracking system whether there is currently override discrepancy between the override file and the actual packages. There are two ways, at least, probably more where bug counts are being maintained for each package. For example, the package tracking system itself which has some Python codes to query the bug system every day for showing how many important bugs, how many normal bugs there are for each source package. Actually, the same code, well, different codes with the same purpose is running in order to show you the bug counts on developer.psp, a QA page. And there are also many more things actually running across all the archive and doing things with packages. For example, change logs.dev.net shows change logs. Actually, they are currently on packages.dev.org. There is infrastructure to get watch files out of them in a project called DEHS and then tries to pass the watch files and query some of the upstream websites, whether the package is up to date. All these kinds of jobs have in common that they work over all the archive and that they actually have the same code. They repeat the code to actually do so. There are more things going on with packages besides like extraction and tools to run. For example, the archive is being rebuilt. Nowadays, quite a lot actually, thanks in a large part to Lukas Nussbaum who has access to a big archive doing so. But there are a couple of others like Marta and I will try to do so with a newer TCC version to see whether that would cause any problems in the archive. Last, we have Zanias designed a few parts in order to install packages, remove them and see whether the files are left over or detect all kinds of other issues with packages. Yesterday, we had some discussion here in a talk in, I actually, well, by injection, we had designed and specced a way to have self-test for Debian packages. There is a possibility to run a self-test. You can in source package define how a self-test is being to run. Well, this is also something that we want to run across all the archive continuously. And well, all those results are currently for rebuild testing. They will be published in some public HTML directory by some people. Some are being added to the QA Collaborative repository. Those self-tests end up somewhere else. And for maintainers, it's very tricky to actually get access to all these kinds of various, very useful quality checks and other things that are being run across all the archive. Well, because this is this current situation, I kept thinking of how can we actually improve this situation? Well, the result of my thoughts about this ended up being this new project called MOL. A name was coined in Mexico during last DevConf. Another acronym, it's not sending for anything else except it's just a short and easy name. And well, apparently, well, it's being run currently. It hosted on qaida.debian.org. Although it's not really restricted to the things QA does. You will see later on that there are a lot of other possibilities and opportunities. Doing it within QA was, however, the main starting point. So that's where it's currently hosted. And well, whether it will go separate at some moment, we'll see. And the core of MOL, what it is, is actually it is a collection of simple tables. That is the current, that is actually what MOL is. There's a lot more to it, which makes it interesting. But the basic thing to remember is it's a collection of tables. To make one example, MOL, a lot of the ideas, basic idea is actually coming from the Lintian lab. So that's why I'm using here a Lintian example. There is one table, or actually a collection of tables. But anyway, there is a table called packages.dost-debian-unstable-bin, which has a list for all the packages in unstable, all the binary packages which exist over there. So you can put in a package name and actually get the full version number and some unique identifier to the package. If you've actually created this, this table is also maintained with MOL, of course. You can use this key again to access a different table, the Lintian table. Those are separate across versions. So that could be, well, that will be useful. You can explain later. Anyway, using this unique key, you can get access to the actual Lintian results for this certain package. The tables are implemented in an efficient way so that this query can actually be done, for example, by a web interface very quickly. Another example of things that you could do with MOL is like the version tracking in the bug system. It turned out to be pretty complex when Don was giving a talk about that a couple of days ago. It turned out that actually nobody really understood what it does, how it actually entirely works. And I had a discussion with him and a couple of others about possible changes to the algorithm which would make the version tracking much simpler. However, the question was, does this actually result in a different state? I said some bugs in the new algorithm would turn out to be open instead of closed or vice versa. It is an interesting question you need to answer before you can actually consider changing the algorithms. Well, with MOL, it would be pretty easy to actually look into this algorithm and prototype it. I'll run through how you would do this with MOL. I didn't actually do it yet, but I'm definitely going to try this out because I think it would be useful to have a very easy to explain version tracking algorithm in the bug tracking system. The tables that are listed below and here in a little bit more detail. The first table you would add to MOL would be a table where we could store the version history of a package. Currently these are extracted from the change logs. Given a package in unstable, the change log lists all the versions in the history. This needs to be stored in a table for easy access because the algorithm depends on this change log. In a different table, we could import the bug status. Most importantly, of course, which version a bug was fixed in, which version a bug was found in, and the current status, whether the bug is open or closed according to the current algorithm as running on bugs.w.org. Now comes the interesting bit. How are we going to actually try out this algorithm? Well, you could enter a new table called as bug algorithm one or however you want to call it. This table will have the result of the algorithm. For every bug, it will use the package version history and the bug status where it's found, where it's fixed, and store whether, according to the new algorithm, this bug would be open or closed. So the codes you would need to write in order to fill this table would be just the code that's given those two inputs, give an output. The mall itself will take care of all the details that are involved with actually running this over the whole of the bug tracking system. So really the only thing you need to actually program will be this new algorithm and put it in mall, wait a day or however long it's going to take, probably much shorter, I don't know. And then you will have a new table which has your new algorithm run over the whole bug tracking system. In order to actually analyze the results, you could add yet another table which is having as input the new algorithm and the bug status and which can report on whether the status is actually different between the new and the old algorithm. And once you've done so, you can actually look into the bugs where there's difference and try to find out whether the new algorithm is actually more correct or less correct than the other one and so forth. So these are two examples of things you could do with mall. There's a lot of other examples I will go off to later. But first some more detail about how mall is actually working and how it assists you in doing interesting stuff. Well, the core of mall, as I said, was a couple of simple tables, a collection of tables. So it's basically a different database. For database to be useful, you actually need to access it. And accessibility is very important. For actually reading the tables, for querying one single item from it, for example, a given lintian result or a bug status or whatever. You can do so without locking. The tables are plain files. Well, not plain. They're actually currently Berkeley DB files. However, mall is not limited to Berkeley DB files. Any backend can be chosen. But that is the main one currently in use. Oh, reading is possible without locking so it can scale quite good. Whenever updates are being done, updates are usually done in batches. There will be written to a copy of the database. And once completely finished, this new database will be moved over the live database in order to keep it possible for read-only, for no locking access to the read-only. And updates are therefore also automatically atomic. It doesn't just help you if you actually can only run mall on the central web surface. Sometimes you would like to replicate information to, for example, a local host where you will do some very intensive checks. Or you'll run a service like, for example, packages.debian.org and you want to show off the change logs, which are actually stored in mall. In order to not need to create the database every time, you can also replicate it. Because of the nature of being just a couple of files, it's easy to rsync, for example, whatever database files you need. The access code can work just as easy on replicated database as on the live one. Actually, there's not a difference at all between being replicated or not. It's simply copying over of files. Microcrease, like creating this one item, is efficient by the nature of being actually a binary tree. But, of course, for some purposes, you would need to create the whole database. You want to loop over all the keys. And that's, of course, also provided with access code in Python. Yeah, as I said, mall is written all in Python. So all the infrastructure to actually work with it, you can easily write it using your Python. If you would like to use different languages, then you would need to use, for example, a web interface, a web create language, or duplicate the code into a different language. Or, well, maybe some language bindings could be made. The web interface, of course, a lot of times you would actually make the data available also to humans instead to automated processes. For that, the web interface is the most natural choice. With one web interface where you can, for example, package-wise, view all the data available for your package, not only lintian results but also the results of various rebuild testing, the self-test, and whatever else is being stored in it, makes it very attractive also for people who are designed and added new ways, new quality checks, or new easy good idea to make their results available and visible to all the maintainers. And as a bonus, actually when you want to do so, you don't need to write your own code to do so. For example, the build logs that are done by Matt and Mike, I simply put in a public HTML web interface directly because, well, it sucks to really need to write the interface for it, make it like maintainer-based, that you can say, hey, I am Jeroen Gewolfner, please show me all the build logs for the packages I maintain. This kind of thing can be done just once by mall, and then nobody else needs to really duplicate it. Command line access, there is actually an interactive mall program where you can actually directly create the mall database and also, of course, programmable because of being in Python. A database is of no use if it has no data, so then it will be a way to actually add data to it. In order to do so and make it very generic, the main input way for mall, or actually the only input way for mall is an incoming directory, world's writable directory in Merkel. Using this directory, you can actually add batches or single items of new information, and a cron job will process this like very regularly can be up to like every minute or whatever because we can do so. And that is a generic interface because it is plain text, update files, you are not restricted in the program or whatever that you are going to generate your data in. Not everyone has right access to Merkel, so on top of that, there are some other possibilities to add your data. For example, there will be a CGI that is accepting data via HTTP post, and after some authentication, for example, password based, it can, running on Merkel, it can put the files into this directory. Another way to input it would be using rsync with SSH key limited so that people can push updates. So secure updates are possible via a number of ways. Incoming directory is very flexible in this way. Yeah, authentication is being done by checking the user ID of the file. If the user ID happens to be www.dashdata, we know it's actually coming from the web server so we can do the authentication checks, etc. The database types, as mentioned, the currently implemented in Berkeley DB, which is not limited to that. There are several types of information you could do if you are thinking about Lintian. Given a certain package with a certain version, Lintian results will be static. It has no use to actually run Lintian again a week later over the same package because, well, if Lintian doesn't change and the package doesn't change, the results don't change. It is a pretty static property of Lintian and also reason why I'm diverting a bit why certain 3D requests that people submit to Lintian that they want to have a check based on the current archive of which libraries are installed are not going to be implemented. Anyway, also file extraction like a changelog doesn't change once it's in a package. There are several other types of things that you can do with a package which are actually dependent on the environment. For example, if you try to rebuild a package, it's the first whether you built it in unstable or testing even though you have exactly the same package. Also, of course, with time after two weeks, the environment with new libraries available or changed libraries can have changed, so the results will be different. For this, you would need a transient database type which actually can store the various results over time instead of just once for InSki. This is, of course, also provided. And then you can have the non-automated checks which are non-automated things like for example screenshots is one of the ideas that has been come into view from all. Not only programmatic data can be added but also reviews and screenshots. Whenever such things are added, you actually want to supplement, make them supplement the current information and not replace the previous one. It's not a version, it's simply an addition. More possibilities might be added once they actually need to rise. At the moment, I can't think of anything else but we'll see whatever people can come up with. It's very nice that you can actually store this data in mall and make it accessible. But it still involves quite a lot of work to actually make sure that your data is going to get into mall over the whole archive, for example. If you want to extract sections and priorities or extract change logs, whenever there's a miracle, you want to rerun it and have for the new packages after the change logs extracted, you want to HTMLify the change logs whenever a new, whenever that happens, et cetera. Mirror pulses are irregular. Well, they are regular but they can actually change a variety of it over time. If you consider security.wm.org, mirror pulses are really irregular. It only happens on security updates. So instead of running it very often, some push-based mechanism would work very nice. And for this to happen, there's a tool in mall that can be used to maintain a list of items that still are needing to be done, to still extract data. This to-do list will be automatically maintained. And if you have a worker, a demon that is capable of doing whatever job you're currently doing in mall, I'll stick to the Lintian example. If you have a demon that can run Lintian over a given package, it can regularly query mall and ask which packages do not have Lintian results stored yet. It will then get a bunch of items. It will store it in a local queue and can run over these items and submit them to mall. Of course, with this to-do item, it's interesting to have some authentication to prevent others from actually taking away to-do items without actually doing anything about it. And one important thing is here that it's an optional thing. If, for example, you have your own infrastructure or it doesn't make any sense to use mall's way of querying, of maintaining which items are still needing to be done, you don't actually need to use it. A mall client demon, mall CD, is a very, well, the name I'm not entirely happy about, but well, suggestions are welcome. But the mall client demon is a small program of a hundred lines or something, which has actually implemented this protocol of querying jobs to do and then actually executed and submitted to mall. It runs from Chrome every 15 minutes, for example, and it will create the database, put it in a local queue, and then launch the worker, which will actually really do the, for example, Lintian on the package and submit the results to mall again. Of course, something might be broken. You don't want the whole thing to break down when one certain package, for example, makes Lintian crash or whatever. It actually happens. Sometimes we still need to fix it. So the worker has implemented the timeout and will actually kill it and go on with the next one. Mall itself will notice that a certain item hasn't been submitted after a given timeout and will re-queue the item in order to prevent the same item from being re-queued too often and wasting a lot of time of it. Whenever to-do items are requested, priority is given to items which haven't been tried before or as actually always the items which has been least tried before. So whenever something is repeatedly failing, it will be prioritized the least so that only demon time is wasted on a package which is repeatedly failing if it actually has nothing better to do, nothing more hopeful to do. And there's also some incremental timeback of after like five failures. It will not re-queue it after six hours, but only re-queue it after a week or something. The demon will not submit whenever there's a failure because there's actually no need to. There could be any random other reason why something will not be submitted a failure and Mall is going to detect missing items anyway. So pushing back an error message is not really needed, although it could be implemented if it turns out to be really useful. One of the ways I'm hoping that Mall can get a very useful addition to Demian's infrastructure is because of the open way it's designed. You can do a lot of things with it, but it's no use if it would depend on a single person or a single team to actually add all the tables you are interested in. For that reason it's definitely going to happen that every Demian developer could actually add tables to Mall. The configuration file is reasonably simple and there will be a DD writable directory where people can actually add those tables with namespace. It will be insured that there will be no conflicts. It will still require a QA team, for example, to actually put your database in a very general namespace, but that shouldn't be a problem. Adding a table is a matter of writing 10 lines of configuration. In terms of who can be authorized to actually submit data for the tables, all that can be defined by a number of ways. For example, if submission is being done by the web, you only need to add an hd access file or something similar. Also non-DDs can actually use this infrastructure. They just need a DD to sponsor the creation of a table, basically how packages can be sponsored and then they can do whatever fancy thing they would like to do. They are still like a couple of hundred gigabytes free on Miracle, so that shouldn't be a limitation either. Mall, as I said, it's a collection of simple tables and that's really it as far as the data structures go. It's nothing more to it than simple key value tables. This might sound very limiting. One of the most frequently asked questions about Mall I get is why don't you actually use an SQL database for this? If time permits, I'll go into that question a bit later. But with just doing key value tables, you're actually having full possibilities anyway. It just requires a little bit different way of thinking about how to couple information with each other, how to connect information, because the complex things you could do with SQL, you can also write it in a couple of lines of code and it can be much more flexible in that way and have the query or whatever it is be run as a part of Mall. Also, if you have very expensive things you're doing, it will work very nicely that way because you can run the query automatically, continuously, et cetera. As an example of what actually stacking these databases actually means, as I said in the example, you have a database table which has for each package the package name, the actually like full URL of the current version in unstable and that full string is used as the key in the lintian table. The results of lintian in the lintian table, which is a set of tags at match, can be used as input again for a next table, which will pass this input and highlight whatever tags are deemed very dangerous. If they are hit, it's very sure it indicates of a real problem. A problem that is very useful if a user has a problem that a human would look into it. This is a way to actually add tables to each other, append them and make sure you can use the results. If someone else has an interesting idea, for example, if you have a set of build logs and someone would like to write or has written like Dan Fraser has, if I'm not mistaken, a tool to pass build logs in order to find common mistakes, for example, compiler warnings or warnings of the depth helper tools, et cetera, and would like to find out which packages in the archive are actually triggering this certain warning, they can write a tool which, mold job, which is basically just a grab statement, for example, and add it and have it automatically run over all the build logs. So anyone else can also use the results of different tables, in which I hope cooperation amongst different QA areas is going to be made much easier. That's interesting. Okay, thanks, open office. The data in mold is not limited to packages. Actually, there is currently no package specific code in mold itself. There will be package specific code in the web interface in order to make it, for example, pretty much package tracking system like or if you like launchpad like or however you want to view on it. You can also use mold to do it, to do burgwise things. The version tracking example I talked about is one of those, but you can also, for example, implement mirror checking tool. Input would be a list of mirrors, for example, the master list. The mold job would involve actually accessing the mirror, finding out which architectures are provided, which suites are provided. Checking whether the mirror is up, checking how much bandwidth it has, or checking how fast it actually updates after being triggered. And outputting these results. And well, you can also, of course, have maintainer as key, maintainer. There might be some interesting things to note about it like, is the maintainer actually listed on the low threshold NMU page on the wiki? As in, does the maintainer mind or not mind to be NMU'd? Which packages the maintainer has so that you can actually use it to implement a web interface which shows for your name or someone else's name? Which packages there are, which packages are being sponsored, etc. Well, the big question that would have been asked if I'm not asking myself, can mold be used right now? Well, yes. There are a couple of dozen databases currently running on mold enabled. For example, lintian results are there, change logs are there, watch files are there, md5 sums of all files in .debs are there, and some others. But actually, right now, no, sorry, I just broke it two days ago. I'm working on it quite a lot, and it's still in development stage. I am working on it, but it's still in development stage, and it's not really finished yet. This is work in progress, important, but the stacking that I explained to you needs a bit of work, especially in like pushing whenever one item is updated that the next item in the table is getting queued for work. Authorization doesn't work yet for the to-do items, but otherwise it kind of exists by means of listing the user ID of the incoming directory, which is some kind of authorization. The web interface has a proof of concept, one which simply dumps the lintian results for a given source package, and I discovered this morning that I broke the web interface also. Documentation is needed. Actually, I already started on the documentation, which is kind of like, it surprised me, that I actually got around to starting on it. I think it's very important if you want to have this widely usable, that it is documented how to use it, and the types of that you can actually send build logs and send a new build log a week later for the same package is not really working properly yet. But the core of it actually does exist. It is configurable, so it can be used if you would really want to it to do it, but for it to be really useful. You'll need to wait a month or two. I'm working on this currently as a part of a Google Sum of Code project as a student. As such, I have quite a lot of time to work on it, or actually, I'm forced to make some, quite some time available to work on it, because well, Google wants me to. The study progress has been made, especially during this Dabconf. As being a student, I'm still studying, and during June there's a lot of things I need to do besides Dabian, unfortunately. But I'm pretty sure that in about a month from now you can really add things as a Dabian developer yourself. You can follow the development. I put all my deliverables I submitted for the Google Sum of Code on the Wiki, on the development Wiki, where you can follow the developers, and if you have any random ideas yourself, you can please add them to them, so I can consider them. Well, this is basically an idea of what Moll can do, a bit technical. So I am going to show some random ideas I had myself of things you can add to Moll, or ideas that others have been mentioning, and I'd be interested in having ideas from others also. Maybe it's a good time to actually ask now the public whether there's any questions at this moment. You'd like to see answers. So Andrew has a question himself. Joey. The data gathering, for example, in the lintian example, lintian is run on Gluck while Moll is on Merkel. The data gathering is, in this case, running lintian over a given package, and the to-do item is being created via HTTP from Moll to Gluck, and on Gluck it loops over the keys to do and runs lintian on the local archive that's on Gluck. The output is put in the directory and being pushed via R-Sync to the incoming directory on Merkel. You mean the actually program that's just, no, the program can be anywhere. It's not sold short. However, there are some examples like, for example, the lintian thing is in the repository, but it is not necessary to do so. There's only the interface, the to-do interface, and the submission interface, which are very general, and you don't actually need any more codes for that. There isn't at the moment. There is some use for having more possibilities there. So the SQL data backend, as was mentioned, is something that is like entirely possible to add. It involves some trickiness in replication, but nothing is really stopping it. But there can also be key value things within, for example, a table with multiple columns, et cetera. It's being used at the moment, even, by some dirty hack. And not entirely fleshed out how that would work, but whenever there's some real need for it, I'm looking into how to best implement it. Implementing by examples and concrete needs is a bit more productive than implementing a method while not having, not actually really having a use case for it. So, okay, so package based. I mentioned, I think most of these already, the self-test, Raffer Herzog has been working on some new SHlips tools to reduce the number of SHlips bumps we needed. One thing you could run is make, is determine a list of files in packages and also extra files. Like, if you install the package, these files will also appear on your system without actually being in the .dev. There currently is no way to find this out. And it would be useful to have this list available so that people can query it and find out which package provided one. Tensable strings could be extractable, watch files, screenshots, et cetera. So, Russ? I don't, how do I know when to, sorry. Okay, yes. The actually package list is a mold job also. And the actual package list is simply looking into the modified times of the local archive on Merkle using the QA tools. So, some pushing is already implemented there by you actually for certain secondary archives. And mold can use this to update the packages files whenever it's needed. And it stores the last modified things in the table. And between tables you can define relationships as in, if this table is updated, please note that the next table is going to need a refresh also. So, then more will propagate whenever updates are happening. Russ? It seems like one of the hardest parts about this in terms of using it to replace various bits of infrastructure that are currently floating around is getting the output and making the output available. Have you thought about how you're going to handle different web output formats? Like, for example, Lintian has its per tag display? Per tag display is something that I've been thinking very briefly about but not that much. To run this like every period of time to really update it would be easy. You can make a table then that says per tag which packages are hitting. How to do this incrementally, like only update whatever tags that actually changed? I don't really know yet whether there would be some nice and good algorithm to do so. I'd suggest possibly just making the data available so that something else can easily plug into generate a web interface or send mails out on a regular basis with what's changed or whatever might be a pretty reasonable approach. I know that the thing I've been thinking of while watching this is the script I wrote years ago to keep an eye on how the slash user slash doc transition is going. That's, I think, generated some web pages and sent mails out to me with details and so on. It was all very customized and I think allowing for that customization is a useful thing to do but not having to write the code that iterated over the archive and ran D because minus C on everything would have been pretty nice. There's nothing actually stopping mold jobs from having site effects. You could add a new mold job which has input whatever you want to send mails about and as output I send the mail or simply an empty string and the site effect actually sends the mail. That would be easily plugging to be able to plug in and it's this way. So yeah, that's an interesting suggestion. Does mold have a capability for keeping an unpacked copy of packages around? That's one of the things that Lentian currently does so that you, Lentian itself doesn't use it all that much. It saves about an hour on the full archive run to not have to re-unpack everything with a new version of Lentian but since the full archive run with Lentian takes about 30 hours it's not really that great but there's a bunch of other stuff that requires unpacking the package so it seems like there's a big optimization potential if you can keep an unpacked copy of the archive around. For the interesting bits in .debs and source packages, they are being extracted and put into mold like specific files like the change log file, the copyright file, the control file, the zip files. For the full package, currently, no. You could make it, indeed, that you submit full contents of a package into mold and store it. It would be pretty big but doable. I'm not entirely sure whether this would give any big benefits. It would also really involve storing it in an unpacked fashion. It could be possible. I think a number of the use cases for that are kind of one time. You want to grep the entire archive for something but there's also more ongoing checks like if you've, I don't know, say you're doing a Python transition or something and you want to monitor the progress of that on an ongoing basis. It'll often involve grepping some otherwise undistinguished files in a package. You just want to check the whole thing, really. It is not very difficult if you want to do it to actually make the actual job really extract the package, do the thing you want to do and submit the results. It will definitely be slower that way but once you put it into production it will run continuously and then only update whenever it's updated. It's an optimization question. Is it really worth it keeping the whole package content? I don't know yet. Well, it depends a bit on the size. Well, it seems to me that you often want to pull out different files from a package. One thing you might want to say is a copyright file. Another check might want to pull out the change log or the file list or so on. I think the question is can you have one single gathering process that extracts files A, B and C and then add file D to that so that you don't have to have a whole other separate data collection running on another machine, grepping a whole archive again just to get this one file out. Currently the change log and copy file extraction are actually being run as side effects of the linchin test because linchin actually unpacks archives and I run it by having a hacked linchin version which doesn't actually delete it yet and makes more delete the whole source tree again. So it then puts in the copyrights of watch file and it would probably be useful to have some hook there that anyone can add a regular expression on files that would be useful to push. Of course then you'd have to re-extract everything. Yeah, exactly. Because extraction is done that's why I actually hacked linchin to really do all this extraction stuff to not waste too much resources. Yeah, to put it in, there's one thing for Enrico, he wanted to have all the desktop files available so they put in a new linchin extraction thingy to put desktop files somewhere and hacked it in there and well some, it would be useful to have a generic interface for that, yeah. There's a three-year-old linchin wishlist bug for extracting desktop and checking desktop files so commit that linchin repository, I'll wrap a check script around it. Okay. But yeah, the reason why I was asking originally is because just the other day, processing an old linchin bug, I wanted to see if anything in the archive was still calling installsgml, installsgml catalog and you know whenever I have a question like that I just log on to Gluck and go to the linchin lab and do a recursive grab and I, you know, this mole is clearly, it should be replacing the whole linchin system that we're currently doing with the its current unpack laboratory down the road because the scheduling, if it's nothing else the scheduling is way better but I don't want to give up my linchin laboratory where I can do those sorts of set checks. I'm not going to take it away from you but yeah, the point is I think best summarizes that you want to be able to simply use grab and get results now. So this raises another interesting question, is mole itself packaged so that if someone wanted to run another instance of it somewhere else in prototype a bunch of stuff they can? It will be, it's one of the things I promised in the Google sum of code so yes. Yeah, for that in particular I was in the middle of doing a linchin release and wanted to check something real quick and I mean it still took an hour or so for it to run because I didn't have a very particularly well optimized recursive grab but... Yeah. You said that you're already pulling the rules file out in a mole, right? Sorry? You said that you're already extracting the rules file and storing that in a mole, did you? Yeah, some files. Okay, because one of the things that I'm going to want to use mole for hopefully would be to keep track of which tip helper scripts are called by which packages and which options are passed to which scripts. Yes. Yeah, the rules files indeed, rules files is one of the files that's currently in mole. Well, there's... I think the whole extraction stuff is a client side matter, so mole is the database and at some point you are running clients to feed data in and you can just keep an extracted copy there. What's the question? That wasn't the question. Oh, right. I was confused. So I put up a slide with some more random ideas, maybe some people are inspired by this. They can go for it. Okay. Well, actually I would recommend to not make mole and tools that just covers about everything we could imagine in Debian, for example, I see the mirror-based stuff, but I really think it should be a great thing if mole could handle things better that we want to know about packages. Perhaps we have some things maintainer-based as well. I don't mind that, but I think that mirror-based is way too far fetched from what we need some improvement now, which is just on all the packages stuff base that we discussed before. I think nothing, it doesn't having this doesn't actually hurt anything, so I don't see a problem with it. It's an illustration of that mole can actually also do non-packaged based stuff and it is a very generic thing. That's why I also think mole as being packaged would be useful because you can really use it for a lot of queuing purposes. Another thing I'd like to see in mole is output of random cron jobs. For example, the website build cron job goes somewhere I don't even know where it goes right now. Yeah, but it would be nice to know when did that last run, did it succeed, what was the output if there isn't too much output. There's all kinds of things like that that are packaged based I think really useful to have in one place. Yeah, and history. I suppose one thing I'd like to keep is a list of which things are cross-buildable on which architectures or not. Which, strictly speaking, isn't something for our archive but for Mdebian's version but it's kind of nearly the same thing. Be handy. One of the things that mole will enable you it runs over not only the primary Mdebian archive but also security, backports, volatile and whatever else it's floating around there. So if you actually have some like rent-in or chainsaw extraction or whatever, whenever you implement it, it will also run on the secondary archives. If such a new secondary archive pops up all the things that are automatically run will suddenly also run on this extra new archive. There's only one place instead of two zillion places to update whenever we have a Mdebian release which is kind of useful because it usually takes half a year before we found all the places we need to update on QA. And because it can be packed, it can be run also on private archives and as far as results was the question whether we would mind results being on Merkle or well, okay. Yeah, well. So, there's the microphone. Yeah, I mean, put it in the central list is fine by me. I don't care where it lives. Just a handy list people can find whether they have cross-bill bugs to fix or not, basically. Okay, yeah, that's definitely cool. I think Fabio Denito either would like to add a few packages that were IPv6 capable or that should be IPv6 capable or whatever. And that seems like another thing that could be conveniently stored. I don't know if you want more to turn into a general database of things that are input rather than things that are gathered, if you see what I mean. Yeah, input is like if I understand very well, storeable over there. One of the things that sparked me from the Collab QA stuff, you can actually make a table whether did I deal with this certain build up or not. You can enter like a bug number or FB for false positive. Right, yeah. The same kind of thing because the IPv6 database I think consists in part of bug numbers and whether they're resolved or not. Yeah, definitely. And then you can also have another one that can check whether the bug is closed in the meanwhile, so Denito. Yeah, it's definitely one of the slides I had but I can't find it but it's not just machine generated information but also human generated information. It's definitely one of the things that is very easily possible. Some only one or two questions left. I'm not sure this is a good idea but since I was thinking about it they all throw it out. You have a system which watches for changes to the archive triggers various remote jobs based on that incoming package and then gathers their output and stores it into a central database which means you have something that you can do in the future. Yes. Because of rebuild rebuilding is definitely something like WannaBuild coordinating and rebuilding the whole archive. It is not much more complicated than like building the archive. So yes. I get a signal that I'm WNQA. Ask me personally and thank you very much for listening.