 Or the idea is that we can now get a smart suggestion into my thanks to it. It doesn't say anything, or? Not at all. The whole tag is missing. Follow the whole program. Something about the user interface. These are all auto-generated. So the more people tagged the more it will find the structure the more it will be more suggestive. Basically, the same algorithm that I'm using for the whole platform database which is only for the tags. So it's just a dissing. I looked up what Google uses for the technology Google uses for the data mining. And they use MapReduce, and they re-use SX. That's the infrastructure. It's described on Wikipedia. If you look for MapReduce there, it's a network of distributed machines. MapReduce. MapReduce. You actually have to fill in the scripts doing the analysis and stuff into the system. That's the thing somebody suggested to me yesterday. Somebody said to me there's a system Google uses and you. Because I was wondering, I couldn't just remember the name. It was Japan. There's actually a free implementation of that. There's several. It's called an action project. That's even a... What do you call it? No, no. That's a station algorithm? No, no, no. An eclipsing plug-in that helps you create the scripts that you need to run this. Well, that still doesn't run for really a couple of hours of the time. But the use is just a very generalised infrastructure to really use it to run jobs in parallel. It's basically a distributed computational algorithm. Yes, that's... Like the PM and the RS guide? In order to quickly do data mining. They apparently can render or remap or remind their whole dataset in one night or one day also. Of course, they have also some big data centers doing this. Yeah, they have really big data centers. I have smart computers. No use for distributed in one computer. Well, it has two cores. Oh, wow. Yeah. Okay, is there anything that somebody really wants to talk about first? Take it, do you want to say something? Sure. I don't know if I want to go first. Yeah, yeah. So, we talked about... So, there was a lot of discussion from them during your session yesterday. Yeah. There were things that people still wanted to talk about. I had a data mining project that I'm working on that I wanted to talk to people about, but I thought maybe since it's a block we should maybe talk about a little bit first. Well, I'm here. It's more than the hardware type stuff. If we could somehow get that going. It came out of the wacky idea session. Just as you know, Ubuntu are doing it. And... I think... Fedora has a project from Fedora. Yeah, Fedora is doing it. So, we should be doing it. Well, we could first talk about what we should align together, what data, or we could talk about the security implications. Because it's not security privacy implications, because I'm a bit afraid of collecting too much information. Yeah, Petra knows a little bit more about that. I don't know what Petra wants to talk about. Petra? You were telling me about the security implications. Yes, there is some information I've collected by DMID code that is a bit risky to spread around. Like... What is DMID code? What is DMID code? It's an old PC BIOS thing that queries the BIOS and gets hardware information strings back. Yeah. It's one of the things HWinfo can call itself. So, you can actually use HWinfo and call that. I can show you the output of it. It includes... It includes information that the company would not want to spread around to everyone. It's not even unique enough that it could be tracked back to a particular machine, maybe? Well, definitely. I mean, with UID, which is... Yeah. I don't think that's enough. Talk about the wacky ideas. We're just talking about collecting PCI IDs. What do you think? Well, let's look... Why don't we do less as more, just collect PCI IDs? I mean, we can publish a lot of that. We can use PCI IDs. We can use BID. BID is that some things might not always be on. So, it's a bit sort of... When we're collecting it, but... By on you mean, plugged in versus loaded and working? I mean, it's connected to the machine or not. Yeah. But just plugged in and knowing they're on the bus is interesting, even if there's not a driver. It's interesting to know that this many people are going down and have this device. You do want to have some information from the DMID code like the vendor and model of your hardware. So, HP, Proliant, for example, would be present in the DMID code information. It's a laptop or a standalone machine. If it supports the IPMI or not, if it's... Well, what kind of CPUs it got, how much memory it got, that kind of information is present in the DMID tables. So, that's why we collect DMID code at least at the University from all the machines because it's filled with useful information. But some of that useful information is not something you spread around because serial numbers and stuff like that you don't want to spread it too far. We could choose to only grab some identified DMID types and say we grab types about CPU but we retrieve... Since it's standardized, we can't say type CPU we just remove the serial number. Well, architecture is already... Not like it's Pentium or an iPhone or anything, but the architecture already got that. Yeah. What about architecture like Pogcom? Yeah. Well, if it's ARM, if it's IA64... I was wondering about that day that we could get from proxy to info as well. But I think what's more interesting about the DMID code is the fact that it actually lists who the vendor was that shipped it. So, like, most PCs will show up and say, oh yes, it's an Intel processor or whatever, but you won't know who made the motherboard or other information about that. But there's a way to collect that too, right? Not that I really would like to want that. I mean, I'm thinking memory... The main board supplies maybe a bit too much. Actually, or even that would be interesting because if someone approaches the main board manufacturer, it's much easier to talk to the main board manufacturer saying, hey, we are deviant, and we have about 50,000 machines running your... running deviant on your motherboard that sucks. And how could you improve your motherboard with, I don't know, for us? That's not really a problem. There's an implementation of the deviant packet available from Ubuntu with a server we can set up and just install and let the users ship us information. So, all we need is someone to do it. So, just like popcorn, we was often... Yeah. We were installing a one-time job... And sending the server there. To receive it. Oh, yeah, okay. So, an option to... not to popcorn, but to HWInfo or whatever. And saying... deep privatize... Yes, scrub. Scrub. Scrub. Scrub any unique information from DMI decoders. Yeah. Is that correct? That'd be the default. No, I'm just saying... But he's saying implement that. Yeah, we've been... That's pretty valuable. Of course, it's problematic to know that all kinds of machines have... what kind of privacy information they have there because it's not very standardized. So, it's pretty obvious. It's a serial number you can take away. Yeah, well, it's top of the list. Well, yeah, okay. Just be aware that it's complex problems so you might miss something. Yeah, but we at least tried. Yeah. I'm not sure if that's a good idea. This point is you started and you take a prosperity for it and you maintain it in the future and you start it. Yeah. Sure. But my point is that if you try and fail, it might look worse than if you just announce that we are not trying to provide your... to protect your privacy, do you still want to submit? Because then there is no expectation that we will actually try. If I say we try to protect your privacy? Yeah, and fail, then people will have reason to be really angry, but if we start by saying we do not try to protect your privacy, do you still want to participate? They do not have any expectation that we will try to protect your privacy and if they still participate? What we want to, why should we say not to participate? Because we might get more pissed of people if we fail that way than if we don't try and, well, then we can fail. If you come from the other way and just include information, do you know what to say rather than extra information? Sure, that's possible. That's probably easier because then we know what we actually are collecting. The problem is that you can never enlarge the data set. I mean, when you decide what you collect, it's really good to add one extra thing. We have that in popularized context. In popcorn, that's exactly the problem. If you wanted to mine something else, you needed to have, you know, like that tough question to all the users saying, oh, by the way, can I also take note of this and then people would say, how people would be upset? We could, like, define a minimized additional hardware set or information, set of information and default information and let the user decide, like, everything, just a minimized set or, like, nothing about that. Yeah, the question is how useful it is to do it because what you collect is the data you need and can you make use of split-down versions of data? Because I believe in statistics, every time you give the user a chance of influencing the data you've got that introduces bias on your data set. I suspect it's easier with the Ubuntu way where you have a menu entry, submit your information about your hardware to the developers and it just tells you that it will submit a lot of information saying, blah, blah, blah. Are you okay with this? Yes, and there it goes. The Ubuntu way is very good because Ubuntu has it. So we want to be more assholes than that. Not meaning that. We kind of have the way paved by other people doing it so you don't have to do the whole show that you are responsible. Is it good or evil? Which is the question every time you start a beta production? HWDB.ubuntu.com That's the... You say that slower again? HWDB hardwaredatabase.ubuntu.com But is that really... would that be helpful? Because I know that they have a totally different channel than we do. Would that be helpful? They have a very different channel than we do and knowing... they provide the other PCI output, the LSU-SB output, the DMI decode output. A lot of information that describes the hardware. The kernel is something else. They also collect information if the cell card is working, your hex is working. That's a minor thing. One question is, do you want to correlate it with the packages? Share the same hash? Yeah, a lot of call. I don't think so. That would be very helpful. Because then you could map specific hardware to specific packages. This guy's got a server. More important, he's a package set X. This guy's got a test. He's a package set Y. So you can give suggestions saying that stuff from your hardware, really. So I see... you know, on Intel video card, that makes... oh, on Intel video card, this, this and this packages. He's told me they're all wrong and you should get rid of it because the driver breaks if you use them. But that's a different question. I see that it's very useful to collect that kind of information and I'm also pretty sure I would not participate if we collected it. Right. If you don't connect it, it's not pretty useful. We can do statistics about what we have, but we can't do data mining. Well, trust me, it would be very useful because I would be able to detect high IDs that are not currently detected by the kernel modules and figure out what the kernel module is supposed to support it. I would also be able to check what kind of models are currently used by Debian and, for example, approach HP or IPM and say we have like 200,000 users of your ProLiant server. Can you make sure that your drivers work properly? That kind of information would be very useful even if you don't coordinate it with the package list. I might want to give you what I'm doing. What would you be ready to give what kind of would you be ready to give your PCIID, the list of your PCIIDs? Yes. Would you have problem giving your USB IDs? No. Would you have problem giving within DMRID code specifically? Not me personally, but as an enterprise that you know, as the university was. Because any enterprise wouldn't like to disclose the kind of hardware they have because that tells who they source it from. That's a problem as long as it's anonymous if you can track them down, but if you actually provide the serial number of the machine anyone can contact the vendor and ask for support for that serial number. I'm really bad at it here, but if I'm the internet provider of a big university I'm also probably a big enterprise running hardware and if the mail is running in the mail which is likely then I kind of see what kind of hardware the university uses and go back with offers and stuff. You have a new software. That's not a problem. What else? Even if we stripped out serial numbers and that sort of thing you still might be able to infer who a particular user is if they have enough machines of a certain type or whatever like Google happened to be running Debian and they turned this on all of a sudden and we saw that there were more machines of this one type than anything else. You might go figure out that's Google or something you don't even need so, but so it might be best to defer to the user to decide whether or not the information is sensitive to them and would be nice if, for all the different categories that we had that we were collecting things, somewhere in the com file we had, or ideally with Debian we just had a little checkbox that said do you want to send information from proxy view info? Do you want to send ECO? No, that's not too much. Honestly, when I'm installing a machine I can't be asked to go and look up all these things that do. Maybe it's not Debian but we put it somewhere and you make the default send the things that we think are reasonable, but people have the ability, you know, maybe we give them a reading that says, by the way review this and determine if you are uncomfortable sending any of this stuff even in an anonymous form and turn it off. In my understanding this thing you get the data that you have pre-selected in the defaults and that's the only statistically significant one because it will be very few people that bother actually changing defaults so you will get very few submissions that kind of change that. So what are you advocating? Just sending everything? If there can be a same default that doesn't bother people, we can do that and otherwise it's probably our best work. For example, if Peter could put together a list of things that he would be comfortable carrying we are looking at side-by-side. So we define Peter as the most paranoid person? No, but I think you're the the other popular guy. But I think we've already agreed that most of us would be uncomfortable sending the DMI decode data well maybe not most of us but at least some of us would be. But we also agreed that's one of the most useful things because it gives us data about specific hardware uses and that would allow us as Debian to go to those hardware vendors and be able to say there's 10,000 machines running Debian and you need to do your orders. Because we don't really need to but we don't need to provide the whole DMI decode to have the most meaningful information the most meaningful information are probably four or five bytes out of the hundreds one. Yeah, yeah. And it may be not the most sensitive one. Just the scenario about we going approaching a vendor saying we have 10,000 machines running to prove your drive, is that real or realistic? Yeah. And I work for a vendor and I will go to my manager and say look there's 10,000 people who run this machine, this is what we need to do. Okay. But from my point of view and I'm like the hard protection kind of guy I would like to have at least one report from every kind of hardware because that would make it a lot easier for me because then I could actually change the PCI IDs. So the statistics of it doesn't really matter that much to me. I see it's very useful when talking to vendors but at the moment it's more like do this kind of hardware exist at all? Or is it just an entry in the PCI at least? And I suspect that if someone just went ahead and implemented or uploaded the hardware database packet from Ubuntu and that's a menu entry that the user has to manually select and submit once information about this hardware so it's not a regular update and it doesn't make sense to do a regular update because hardware doesn't change every week. That will help me a lot with the discover work and the hardware protection stuff. So how about shifting the collection phase to something like contribute to that end of the menu that can send people to the list of orphaned packets that they have installed, DevTags, Pages to categorize what they have installed a tool that sends hardware information. Do a kind of continue to devian package that among the various options have send a whole lot of things on the SSH HDTS connection to it so the man doesn't see it at least the trust devian. If you don't trust the recommend provider it's a way a bit better that could probably be a better channel to harvest the help out of things and also coordinate with popcorn ashes which you can read from the popcorn log. We could also promise what other log providers don't want to participate in. Lots of people put their configurations online so they've got record of what they set up so they know how to reset it up and then install or tell other people who have got them all the same notebook with the same hardware configuration. A lot of people put those pages and you can make an auto page generator. If you turn into something like that rather than a system where people are just being random where people can say I can look up this hardware and see how it's available. It could be a workspace for your mind but you can have flashy graphics or your friends linking there saying what a great computer you have. So you joke but I worked on the PRS Linux port and there's a hwdb.piras-linux.org and people did a similar thing and HWN put it in existence at the time but there was a tool that HP had that ran under HPX and early on in the port people also had HPX on machine and she ran that and then I put it all sorts of information about the hardware and then you uploaded this thing to the website and parsed it and put it all in there between the gospel register yourself and so after that page you can find you can search by model or by those things and once you get the model of that page it lists all your friends that have that machine and so there are other people you can talk to and say hey can you try this? I definitely like the idea of making this fun. I also like the idea that if it's for hardware a big level is if your hardware is not where you were supposed to be then you click this and port your thing so you actually report and having a problem with this hardware setup which could be actually a motivational thing that would fit into the president and it's motivational and then if we collect lots of problems with one specific bit of hardware we can go and change people and say it's a three way system you have this aspect and it becomes I want to buy a thing by T60 and does it work well and which model should I choose that's more difficult because that means reporting with also this work it doesn't this work it doesn't so I have a problem with my SD card reader and I would say send a problematic hardware report from my entire laptop while the video card is perfect because what you would think is just more pros and cons that you can track it it's just some kind of work and it plays a sound that you press ok or not and does your video card work or press ok or not that kind of information provides it for installation reports doesn't work and there's probably a lot of information we could actually pull through because people do the USB and the USB and the installation reports actually includes the MIT card I think because I'm pretty sure I wrote that part of it yes it could be two way we have the status so we have problems with it modify a wiki page identified by the hardware setup if somebody makes one pulls it up they get the wiki page well that's just the possibility which I'm just coming up with I know it's just about data mining and hardware related things because I think that if you are data mining from time to time you can also be proactive regarding problems with specific hardware for example I have a TV card in my computer and it stops working with the OTV for example and in fact I think in popcorn you will see that I begin to use less TV as I was using it a lot with my card and I begin to use a TV or something I don't remember the other xAWTV xAWTV yes another tool I don't remember the name that's something under KDU it can be proactive regarding that because no one reports about it but they begin to stop using my package all the people that have I don't know I'm doing a USB 2 deodorant and it can be something interesting useful for package maintainers to be able to see that these relationships exist that people who have this hardware don't use my package stop using or uninstall it and don't use it I don't have the hardware but I discover that actually before we said we don't want to have the same idea with the hardware and the packages well for my point of view I want to have it because that's what you can see yeah what I was interested I was going back to what exactly during the work here we were discussing about this I'm quite interested in getting people having better support for the laptop hardware because today many people hopefully start on my marketing platform and so right now on the Wiki on the page I'm trying to set up something so to help people write reports and one of the key things I'm confident in doing on the Wiki is capable insert your lspci in the page insert the lspci and then have some important invite you run the bug script for discover data it provides all the useful information about your hardware should we just decide on a name so we can just coordinate the stuff future on the Wiki should we just call it hwdb I suggest we just copy the packet I don't see the point of re-implementing that thing I'm not suggesting just for future coordination just to put out ideas that we've been talking about there and did we decide did we decide if we can have this information public or not probably be like popcorn most of it will be private but results are processing it will be public so popcorn is not public aggregated data yeah the aggregated data you can imagine how many installed something or used something but that's all I was just thinking if you syndicated it to somebody else because it is a bit of a task to do everything and we just probably did that anyway so yeah, at Wiki.wdo hwdb everyone's happy with that is this really a QA size or is this something different you can just discuss on the QA list I don't know if it works I think it is more or less everything I would say support he would say data mining and some other things oh yeah it's not much traffic there maybe it's not important that sounds like a sensible idea yeah so popcorn I should also mention that Fedora guys started an effort to make a hardware database for Linux and they were very enthusiastic a few weeks and then it stopped they also called it hwdb so the idea has been discussed quite a lot and as far as I know Ubuntu and SUSE is the only one like that we have a roof that's one we have a roof that's one maybe well and if we use similar software then we'd be able to aggregate databases too and then the Linux community as a whole could say hey hardware vendor there are this many and all together there are this many Linux machines so it would be good to work in such a way to make sure it's as well where's the popcorn menu possible it's called popcorn developers perhaps it's on the line popcorn developers why are we up here I guess many people don't know how popcorn does data collection and when we went discussing it we came up with ideas of involving it and explaining it to a wider public in case you can go and find out better collection ideas because popcorn is accurate for some packages but only for some do you want to explain it should I go on well I can for sure it's not accurate for any packet but it's very accurate for a few of them I don't know how many programmers are here but I assume all of you are so it's like a new boot that has GNOME and KDE but that's what KDE is popcorn scripts it's going to do more it's going to read a bit so maybe try for a few more times so the collection code this is the loop looping over all the packages and for each packet it loops over all the files and if the file matches this regular expression it's taken into account it doesn't, it's not taken into account someone can see good job since I someone else can't see it's that behind it's appropriate and it's more likely and what I mean just in a few days is this regular expression the point is that proprietary content collects if a packet is installed if it was used the last week and the usage part is the tricky part if it was used recently, like I think that's the last month and if it was unused, if all the access states are the same as the installation if all the packages have the same access states that's assumed to be the installation date I think and there is quite a lot of packages that do not have any files that matches this regular expression then that's the no file packages in the statistics comparing the A-time and hitting the file that was most recently used in the packet that matches this regular expression and report that alongside the information about the packet and the problem is of course that access time is inaccurate for a few packages all five all libraries in Lib will be accessed every night when they come down to update the library well, at least that's our cache is updated a few other things are like manual pages every night when the manual pages are updated so all these files are no use to actually check if they have been used or not because we know they have been used and it doesn't mean anything also a few packages changes from having no binaries to getting one binary in some release of that end and then you will have some packages showing up with access times and some machines showing up with access times for that packet and some showing up with no files and then you will get a really interesting result for that packet and you have the boot system.map file which is only accessed by the system in boots so that is not really usage of the current packet it's how recently was the machine boot and it's kind of usage of the current packages so current packages can be compared to each other on the relative frequency of boots and the user packages because that boot is actually usage is possibly improvement is it worth patching LD so cache to restore the ATIME just like the DB does probably without a switch so you are only activated an open mic but the access time is only updated on the machines where the access time is integrated with the file system and for laptops it's quite common to mount file systems with no ATIME option to avoid touching the ATIME every time you touch a file to avoid touching the disk every time you access a file and for those there is what you can use the ATIME to detect loads of course there have been proposals to enable accounting and we could then log all process use in one load and then process the accounting information and use that information to report usage or not but then we are getting pretty far off if you enable that in my laptop I can go myself fire on you that's one problem it will burn power but you power on laptops and that kind of defeats the purpose of the no ATIME I think you can mine the library just like you might have everything else we did check mine for non-point if it's not with no ATIME or with ATIME to do another tech at the moment we just conclude that they don't get votes they still call it installations it would be like a really small improvement and we also have the problem that some installations have broken deep hacking database for example there must be something like that because the number of installed proper articles packages is smaller than the number of submissions right not by a user month it's like a fraction of a percent but still there are some machines reporting proper articles that didn't have proper articles installed and that's not possible there's also some machines that report the architecture and the number is higher than it should be when I compare it to the number of installations with an old enough problematic of the target that the architecture was not reported at all so there are some machines where the impact command we use to extract the architecture is actually not recording anything useful that aren't level or something I don't think so I've seen the same in Ubuntu and Ubuntu only have had proper articles that report the architecture they still have like 200 unknown machines machines with unknown architecture we also want to do something looking at the pens perhaps to figure out which like is it being used by recently used packages it should be and it's going to be very complex it's a matter of length I'm going to show you what you said just looking at packaged pens to try and look at which packages are being accessed recently to figure out libraries oh yeah if a library is installed and nobody depends on it just remove it I don't know if you actually read something but until a few weeks ago I already saw some suggestions with the time of like 2036 or something in the future I think it was a really old solution without even an architecture so it was some screwed up system that's what 36 stops like all bits to one something like that I know that a lot of power please sees the old ones when they're dealing with the end era they look out with time of 1900 so I think that is filtered up now but when I started doing all the stuff with the Epochon data I had a lot of solutions with a time of 900 I don't think we actually filter that data with this storage but we remove it up to 30 days yeah that's as you can see there is quite a lot of strange architectures and a few of them is really I don't know what that is it's unavoidable okay it's a 486 architecture you have two machines reporting with the 486 architecture and as far as I know we don't have that probably somebody built the whole thing from scratch yeah maybe I'm for it are there any opportunities architectures there? no we have only three besties and as you can see the number of unknowns is 231 that's architectures unknown and version number is 310 and we didn't introduce the architecture before the version number of a red dot test but still this number of unknowns it kept going down it increases over here that would mean that someone actually installed a large number of real installations I don't think so you could be surprised there could be a big data center but it's still around the building and it would be at that point and it would be in this year that was this year that would be around the release of Edge certainly that's actually installations with something broken suppose that Google has the production environment in Google and they don't want to lead test therefore amazing lot of stuff they run on I don't think there's a problem maybe they have more than 200 machines it was so much more than that Google just you know the reason I don't believe it is that these numbers actually show 38 unknown architectures so that's the boon to it and they have never released the packets of public articles that didn't connect to our center one of the things that you'll need to look out for especially if you're trying to start selling these data to vendors in the future is carrying the stock data poisoning they're going to stop vendor X injecting a whole bunch of Lotus reports saying that their package or their hardware is the best hardware to run it artificially inflating their market share simply how do you stop the mantu or red hat poisoning our database and saying well there beyond the results of rubbish of course you can poison the problematic on this tiny moment there's no track to it's a point to be basically so no one's done that yet it's possible that somebody even does it there's no way to find out you can even submit a huge number of packages are your own name or whatever I was going to think about what people are using in their virtualization you could just make primitive machines and put a potion in the script and you could inject what's the span let's get on the next yeah so so much span just people who generally have a with Merchant was there there could be just a lot of machines generally that might skew things I mean for example this jump was Swedish close-proven MDO teams I did a stop to report and then we came back then we turned off but I do know the guy actually was in that cluster that was a Swedish AMD cluster and we asked if it was okay for us that he actually reported one entry for each machine in the cluster because they are identical and we said if they are three machines well if they are real machines go ahead even virtual machines are real machines in that sense if they are actually doing something go ahead the same interesting point you were raising how did you do you have input but do you have any clues what percentage is actual virtual machine or physical machine I haven't looked for that we could try and see for seven minutes you can guess by installation of the kernels if they are using our kernels then we have our official kernel image the new image 2.6.18 minus xanthic getting around the poisoning problem what about asking people to register I would rather stay the way this until we know it's a problem if we ever get to this that's cool I think I better just cancel everything if it's not poisoning it because the whole progress is gone it's a way to feed back to the we haven't seen anything to improve the distribution and that's the way you've provided and at that time we would have actually be able to try the same information I mean we could really map one person I was thinking looking at that peak about the transfer when you start giving suggestions based on popcorn basically the guy that turns on 500 machines and popcorn can be able to start suggesting everyone to install the same package that they have which luckily there's many customers but the amount of identical machines I have is a good push on a successful package not a big deal I think it will be watered down quite a bit by the whole merging with all other things so it's not much of a problem it's really the relative weight system in this package it's twice as much as this package that you think but you might use it basically not so much as 2,000 this one has 1,000 users for that it's really relative to the package and using that for me it doesn't matter because the more typical user will have the identical configuration as cluster so the amount actually has happened interesting things to read from that for example the working graph is that there are personally three times as many users of stable than unstable or unstable and testing we don't have that's actually the stable install that's actually installation and this is unstable and testing and you also see the long tail it takes quite a long time before everyone upgrades these there are five really installations how long is that long tail? well it's it's very rarely it goes from to zero if you have the energy I would say it takes well you can just measure it like this now so I would say this is really this is large this is large so there is 1,900 of 53 or 54,000 spreading really so you can change the format of the report so that you can generate the same graph for each package we are not going to do it because it's done by DQA people but the graph is that because we are running in the background of the library to collect version number no we don't want that we consider that too much oh because you can track you can track files well we can track users if we know what version the machine has we can track it over time at the moment if we would be evil we have the IP we have packages the user has installed we have the approximate last update he did we have the unstable because we have the last five minutes we don't have the unguess we could block it if we would be evil and then we could run an attack and if we would have version information about each package it would be even worse but actually how can it presumably imply version information from version of pop-up it's interesting to I even get an email from the library is there another talk on that? a meeting distribution of architectures how that changes over time it's quite interesting to see that ARU for example is the fastest growing architecture at the moment not very surprising ARU might not be 64 and this is for the growing that fast the reason that ARU is growing is because it only embedded the buses it sprouts the slots yeah have you got a slide? have you got a slide? yep look at my I have one it is indeed very cool and fun architecture as known by the DTKG so it shows my name which is 10 with 2 cards 8 with 2 what's below ARU? ARU ARU this is a logarithmic graph so it is like 0, 1, 2 do I see the ARU? because if I so if I some ARU member actually doesn't become 3 times RPB has 9 and ARMEA has 6 so what's people sorry I thought they were all the colors become quite similar the one below ARM is RPC can you increase the not for an image that's an image that was an image so that's a sort of list of architectures that could be a curious book I thought I can do that at the moment so the first one is 286, 85% ND64, 11.2% ARM, 1.4% RPC, 1.2% SPAR, 1.5% ALPHA, 1.1% NIPSER, 1.1% HPPA, 1.1% 0.1% sorry I'm really wrong so these are above 1% RPC, ARM, ND64 and the rest is below 1% M68K M68K where is it? 0.02% and it's been pretty stable there for the last 12 years I think it's the same people the number actually increased from like 2 to 9 but 0.02% approximately all the time it's a long time the first time is up so I think we should there are some changes that could be planned that I have in mind also one is to patch and desolation the other is to add packages to provide a file with a revolt file that one could look for like more applications could tell you something and third change is the server side that queue in which you would tell me new files that had a new file that exit the dataset which then I think one could use to keep parallel indexes and other mining things out today that's what I have and also far out improvements on this structure and also what we can manage I'm very sorry to interrupt we are just about to have a meeting in this room we just got a red card so I'm on