 Okay, hey folks. Hands up anyone here who uses the Debian wiki. Wow. Hands up anyone here who doesn't? Who's hiding here out of the rain? People at the back, fine. We don't have necessarily, yay, a huge amount planned for a talk here. The main point of this is to just give an update on what we've managed to achieve in a wiki in the last year. What we're thinking of doing more so in the next year and also to get feedback, ideas, complaints, whatever from the users. So please chime in. Again, as always, please take notes in gobby. If someone can relay questions for my RC, that would be awesome. So let's go. So we are using still the Python wiki engine for wiki.debian.org. The latest version that we have is version 1.9.4, which is actually now installed via back ports. The reason that we've done upgrades is basically for some patches that I've been responsible for. One of the biggest problems we've had over the past few years with the wiki is the amount of spam attacks that have happened. If you're not careful and you run a public wiki on the internet, the spammers will fill it with shit. We know this. We have relied far too much in the past on having admins and friendly users who just keep a watchful eye and revert all the spam. But after a while, especially as there are more and more bots coming to bear, you just can't keep up. If you go and have a look on lots and lots of the less well maintained wikis out there on the net, you'll see what happens. You end up with thousands of pages with adverts for everything, the usual stuff. So the first thing that I added for this was we capture support for creation of new accounts. Simply stopping the spammers from automating creation of new accounts helps. I don't think anyone likes we capture. I certainly don't, but at the time it seemed to be the best way to go. It did help cut down spam. However, we already have, as you'll see in Gobi here, we already have a bug report complaining about how it affects accessibility. I have a great deal of sympathy with that, so I'm tempted to turn off the we capture. I don't think we need it anymore. Does anybody disagree? No, awesome. The reason that we don't need it anymore is for the second patch that I put in comparatively recently. I've added support in Moin, and I've shared this with upstream. There's no sign of them taking it anytime soon, but I've added support for requiring email verification when you create an account. I'm amazed this has never been implemented before. At the moment when you create a new account in Moin, it asks for an email address, but purely and simply for password recovery. There is no requirement that you give any valid email address at all, and so the spammers will happily put whatever junk in and then spam crap all over your site. Actually making them go and give a valid email address before they can log in to start with seems to be a very, very effective way of cutting down the spam, so I think this is the right answer. If people can't give a working email address or if they have mail problems, it's happened a couple of times already. The users typically can find one of the admins who has noticed that, or who noticed that wiki.debin.org was getting incredibly slow for page saves. A number of you, fine. The internals of Moin are showing their age somewhat. The reason for the performance problem was that on every page save, it would go looking to see which users want to be notified about changes for that page. That's fine. This is a good thing, obviously, if you want notifications to work. Unfortunately, the way it did that was every single page save, it would have to go scanning through every single user configuration file, check to see if that user config file listed this page and build a list, and then send an email to that list. That's fine on a typical small wiki. You might have 20 users in a small company, 100 users, whatever. The Debian wiki has a few more than that. Last time I checked, and I was going to give you the exact number, but it's not working right now, we have something like 180,000 users. Physically, just doing the check of loading each of those files in turn and coming back was causing, in some cases, a minute and a half delay on saving a page. There was already a fix that the known people had worked out, so we applied their patch, tweaked it slightly and now page save should be down to worst case maybe about 10 seconds. I think that's reasonable and it's the best we can do at the moment. The reason that I was just looking at system info, no, that's actually an error, not just a timeout. The system info page is meant to tell you how many pages, how many users, all that kind of thing, and I guess we just found another bug. Awesome. Paul? Microphone? That's probably an out of memory error. Yes, the other fun thing with some of the internals of at least Moin 1 is it builds very, very large lists in memory containing, for example, information about all the users or information about all the pages. Even on a fairly big machine, we host it on, I think the machine we're on has many gigabytes of RAM. It's a 4GIG system. It just totally runs out of memory. This is not wonderful. Paul, can you tell us how many pages we have? I can tell you how many users we have. It's 112, 11,200. It's not quite as many as what you said before. Bear with me. Oh, look. We have, no, that's wrong. Okay. Apologies. Ah, I know, I'm getting confused. We have 10,000 users. We have over 100,000 pages, I believe. No. Okay. Sorry. My confusion. We currently have 14,500 pages. We actually had approximately double that number and I went through and deleted, just instead of marking a spam and removing page entries through the wiki, I actually went through and just deleted the crap that was there for a lot of the dead pages. Apparently it seems we are one of the biggest more newsing sites in the world. There are others out there with similar sizes, but there are not many. Again, this is not a great shock. So, moving on to the usual discussion, what else would people like us to do? There was always discussions about, should we move to another wiki engine? During the DebConf in Debian boff just earlier, there was a mention of merging the DebConf wiki into the Debian wiki. Again. Again. Yes. That has been suggested several times and it's not happened yet. I think it's a matter of getting access to the data and all that sort of stuff and sure doing it. Okay. So, yeah, DebConf wiki is a media wiki installation, which is totally separate on debconf.org. Merging the data one way or the other would obviously require a huge amount of page translation because they're using different markups. Yay. As I understand it, the Debian sysadmins are not interested in running a large PHP app at all if they can avoid it. I have every sympathy with that. So, what we would probably be looking at doing is merging all of the DebConf content into the Debian wiki and translating into Moin format. That's as I understand that if other people have strong suggestions otherwise, shout now. Okay. Right. Do we have any volunteers to help with that? Tell you what, better. Do we have anyone who doesn't want to help with that? Put your hand up. Sorry. License issue. Grab a mic. It's better. Okay. Good idea. Wiki DebConf has a different license compared to wiki. Okay. What's the license on the DebConf wiki? It's going to be a cc by SA or something, is it? GFDL. GFDL, really. We'll have to check that. Yeah. Somebody have a quick check, please. Okay. So, we're going to have some interesting from there then. So, we also have to re-license the Debian wiki. Yes. So, it's just more content to re-license essentially. Yeah. The Debian wiki has had this longstanding issue that there was never an overall license chosen when it was first set up. So, back in the dim and distant past when we had a quicky wiki, no one ever bothered to say what the default license should be. So, it's not quite clear what licenses currently apply to a lot of our content. A lot of it got translated into Moin and there's been several attempts by various people over the years to try and fix the licensing. It seems doomed to failure. Paul, you had something to add? Well, quite. I know that's a problem. I guess a question is, are people happy for us to carry on with Moin? Would people rather we shift it to a different wiki engine? If we try to shift to another wiki, I guess many people are going to want to have some features they wanted in a long time, for example, translation and stuff like that. So, that can be, it would have to be taken in account at that point. Yeah, the current internationalization support in Moin is very limited. I think we all acknowledge that. I'll be honest that the thought of switching to a different wiki engine scares me. Again, purely because of the amount of content we have. I mean, the other alternative would be to just throw it all away and recreate as needed. Again, people suggested that a few times. I don't know if it's actually a valid way to go to basically to throw away a wiki full of data that is currently still used. No, I think Paul agrees with me. For all that we have problems, and by God do we have a lot of dead data in the wiki, like every big wiki in the world, throwing everything away, I don't see it as a valid way forward. I don't see anybody here suggesting it is. So, converting to a different engine, icky wiki is one that always gets suggested. I'm not sure that it does what we need. And again, there's the format change. Media wiki, in my opinion, is a non-starter. Moin 2, maybe. The upstream Moin people have been working on Moin 2 for a long time. Fingers crossed it might actually release at some point. Paul? The different wiki engines kind of mesh in an interesting way here. I think with Moin 2, they're switching to a different markup engine. I mean, a different markup format called Creole, which is like a cross wiki format. And apparently icky wiki also supports this, but there are some bugs in the implementation, at least there were last year, when I talked to Joe Hess about it. So, if we switch to Moin 2, then we could more easily switch to icky wiki, which would be an interesting way to go. Fair enough. Oh, I hadn't realised though switching to Moin 2 would also require a format change. Or no, or does it not? Does it still support the old format too? Ah, I hope so, because it would be insane for them to do anything else. Hey, web developers. We love them. We love their output, but sometimes, oh my God, I don't understand why or how they do these things. So, my own preference would be at the point when Moin 2 becomes available, we should probably start looking at moving to it. So, we will set up a test wiki with the existing content and play with it and probably ask for testers. Yeah. Just a question about Moin 2. What back end storage formats does it support? I think I heard something about version control systems. From memory, that wasn't the feature expected in Moin 2, was to have multiple back end support, including VCS and various stuff. Sure. I mean, I even suggested and started hacking on a Moin 1 VCS back end support many, many years ago, and then I got more sane. It just needed too many hacks in too many places, at which point, clearly a total rewrite is something that you would design in. I would hope that it will still support the on-disk format for Moin 1 to a reasonable extent. Yeah. So, blatantly, when Moin 2 happens, I mean, even if it wants to get useful beta releases out, I think it will be worth a while to have a Moin 2 instance hosted alongside the existing wiki and we just play with it and ask people to basically help us help the Moin 2 developers find the bugs. You'll hear about that. We'll mail about it. Otherwise, in theory, there are several admins of the wiki. Not all of us are active. More admins are always appreciated. If you have, it doesn't have to be a lot of time. If you have a little bit of time to help us, basically, clean up after spam attacks is the most common thing. Yay for network. One sec. I've got a net connection. There we go. Fine. Where was I? Yes. We're always looking for more admin help. It doesn't take a lot of time. The main thing is just helping to clean up after spam attacks. Maybe help users if they're struggling, their page saves don't work. If their email registration piece isn't working properly, that kind of thing. Otherwise, things are reasonably simple. I think the issue is, it's nice to have people spread around. So, at the moment, I mean, of course, I'm in the UK, Paul's in Australia. If we can have people spread around somebody in the US, somebody further east in Europe or whatever, it would be nice to have admin cover around the clock. Not that it matters a huge amount, but I really, really see it as a personal goal to make sure that we kill spam as soon as possible after it lands. We have a number of users that help with that. Doing full cleanup still requires admin rights, unfortunately. And then the back end of that is, there's also, we try to send abuse reports to the ISPs or the universities or wherever the spam is coming from. We don't just delete it and move on. We actively try to report spam. We don't just want to stop it. We want to make sure the people doing it are punished appropriately. So, problems. Does anyone have problems with Wiki? Now is a good time to chat. Is anyone still awake? I know, it's the end of the day. It is on. Yeah. How do I integrate the account? Ah, yes, good question. At the moment, the Debian Wiki has a totally separate set of user names and passwords. One of the things that I've been talking to DSA about for a while is we're looking at having a single sign-on in Debian for most of the web services. The plan is that we will do this at some point soon for the Wiki. I think there are a couple of other test sites DSA are working on first. Coming soon. How that will work with the email verification and whatever. I have no idea. Clearly, we're going to need something like that to work across all of our sites. Again, DSA and I, we're going to have to work that out. I hope that answers your question. Yeah. Cool. Any other comments? Any other questions? So, one other thing that we need to fix. There isn't any synchronization between the website theme and the Wiki copy of that theme. It's not in version control or anything like that. So, I think what we need to do is split it up into the Wiki-specific parts, put that in Git, and then use some sort of synchronization from the website CVS. Yeah, probably. I've been meaning to get to that for a while. Okay. Well, I'll say if you're going to do it, then awesome. Yeah. Actually, making the Wiki look like the website can be quite difficult at times. Actually, the layout of Moin is reasonably flexible, but that won't necessarily let you do everything that you want or everything that you can do with, say, with CSS on a normal website these days. We're always looking, again, for help from people to make that work. As we move over to the new theme for the Weezy release, whenever that happens, we'll be looking to do that quickly without much warning, I presume. So, we have an idea from someone to freeze documentation for the release and have it in separate sections. Yes, we often have, there are lots and lots of pages of how-to's and bits and pieces of other documentation in the Wiki, which have their own internal sections saying, do this for etch, do this for Lenny, do this for squeeze. Some of them still say, do this for Weezy, and they will do afterwards. Some of them still mention very, very old releases, which frankly we know we archived and stopped supporting a long time ago. A good way of doing this is probably to split things out by release at a top level, so then we can just mark the pages as obsolete. As it is at the moment, it involves an editing job for somebody to go through and change every single one of those pages. Now, I know we have some great volunteers who are going through and doing that over time and trying to tweak lots and lots of things across the whole Wiki, like how we do the translation links, like how we do links to all kinds of things. It's a thankless task and it's never ending. If we can come up with good ways to automate this, to make the layout work better, so there's less work needed to do this cleanup. Again, please join in, help. If you've got ideas, they're always appreciated. There are great resources at the Wiki, but unfortunately for translations, to keep up to date the translations, we have no way to do it. For example, a suggestion by Christian is do not translate the Wiki. It's problematic in that sense, so a suggestion would be that if you want to choose a new replacement or something like that, something which allows us to keep translations up to date will be awesome for certain resources. Okay, yeah. Do you have any suggestions? I mean, I've seen one of the things that Iki Wiki does is he uses get text for translations. Is that the kind of thing you're thinking of? Okay, fine. For translation, there could be one way. I mean, it has to be discussed with people more involved in ITN, internationalization than I am, but having a tool which, because we enforce the policy of saying all the page names should be in English and not translated, and all the page name of translated page should be prefixed by the language. You could, you'd say, okay, I have a page name foo, and then a page name es slash foo, and you could, in the translated page, you could have the version number of the English one, the version number of the time stamp or something like that, and you could have a tool which says I get the Spanish version. Does it refer to the English version or not, and say this one is not up to date? But the alternative is obviously to use PO files, but PO files means you have to get weekly details on how to use PO files, and I think it's not going to happen. Yeah, I think you have a lovely suggestion on how to do it. Patch is welcome. It was designed and proposed. It's even documented somewhere. Yeah, absolutely. Picking up on version numbers and tracking, which version has been translated, and then maybe mail the person who did the last translation and tell them, look, please do this. That kind of workflow isn't difficult. It's a simple matter of programming, I think. If we think that's useful, please stick a wish list bug against the Wiki, and we know we can go and do it. Anything else? It's end of the day Friday, isn't it? I guess what I'm going to have to say at this point is, well, thank you everybody for coming. If we haven't covered anything that you wanted to talk about, well, you should have mentioned it. Please, obviously, we don't have our own separate mailing list for the Wiki admins, but we all read the Debian WWW list, mail us the post bugs against the Wiki, and please volunteer to help. It's quite rewarding. You can see just how many people out there are using the Wiki every day. There's a huge amount of good documentation. It's a very big resource for the project. It's also a place where you don't have to be a great coder. You don't have to be a great packager. You can help other Debian users directly. Please help us with that. Thank you for coming.