 So, today we have the pleasure of having Brianna Law to talk to us about distributed wikis. Brianna was a Wikipedia editor for about four years and she claims to still be recovering from this. But now she's writing Python. She has a particular interest in free culture and free content generally. And today she'll be exploring with us some ideas around the motivation and potential for what a distributed wiki might mean and might look like. So please join me in welcoming Brianna. Thank you. Good afternoon Freedom Lovers. Thank you very much for coming to my talk, it's good to see you all. So I should dispel a few minutes in case you've come along and you're like, yes, wiki's on peer-to-peer or wiki's on git, I'm all over it. Because that's not really what I'm talking about. So you will be disappointed. So you can leave or just flame me on Twitter, I guess. So my history is as a Wikipedia editor and it was kind of three Wikipedia that I found free software and open source. So I'm quite into that whole culture. And yes, there's not really any code in this talk, so in that respect you might be disappointed. But I think it's helpful to think about things before we build them and imagine what they'll look like and what purpose they will serve. So I hope that this goes some way towards that name. So when we think about distributed wikis, well, first of all, who considers themselves fairly familiar with one or more wiki engines? That's good. And who considers themselves fairly familiar with version control? Distributed version control? Yep, cool. All right, so when we talk about, well, when I say distributed wikis, it seems to me that there are a few different components or different parts that you could consider that might be worth distributing. And so I've listed a few of these here. And maybe the thing that you would normally think about when you talk about distributed wikis would be distributing the repository or the storage. So if we think about wiki as being very closely aligned with a version control system, it'd be just the same except we'd use distributed version control. Well, that doesn't sound very interesting to me. So I think the other more interesting parts are different components. And so we're going to talk about a few of these throughout my talk. And if you have questions, please ask questions throughout. And I'll try and see if you raise your hand. So one here I've got is this marketplace of ideas model with a distributed access point. So these parts are kind of related to each other. So if you had a distributed repository or storage, you might also have a distributed access point. And this is something that Anil Dash talked about recently in a blog post called Forking is a Feature, talking about how awesome forking is and how wikipedia should basically embrace distributed version control. And in some respect, I'll talk about wikis in general. But in other respects, I'll talk about things that are relevant to wikipedia because it is the largest wiki that we have. We just had the 10th birthday and people have been talking about how it's now part of the infrastructure, it's part of the information infrastructure of the web. So it's something that's very important to a lot of us. And so it's the motivation for a lot of these ideas. So the marketplace of ideas where we might have distributed access points to a wiki is basically just we throw away the wikipedia rule that says there is one article per topic. And this rule is extremely important in making wikipedia work at the moment because it forces people to collaborate. You're not allowed to have John's version and Mary's version and Fred's version of an article. They all have to be a single version. And that's what everybody goes to and that's what everybody expects. And in my opinion, that rule is what makes wikis work most of the time. And it seems to me that the opposite of it, if you say well, anyone can have their own version and somehow the best one will rise to the top or people will evaluate them and it'll be ranked higher or something like that, it works for some projects. Seems like that's what Noel did, but it didn't really work for Noel. But things like Urban Dictionary or Stack Overflow where people vote on different responses and the higher ranked ones come to the top. It certainly does work for some domains and some types of information. But I don't think it would work particularly well when it comes to encyclopedia articles. They're gonna be a little bit longer than things like Q&As or dictionary definitions. And there's a few of these problems. And this canonical, reliable version, I see that as being a feature for the user rather than the contributor. So the contributor and the user have sometimes competing needs. So Anil Dash talked about this. So he says it could be based on git style technologies and they could be just not one article per language but an infinite number of them. Each of which could be easily mixed and merged into your own preferred version. And my response to that is kind of like well, how do you know what your preferred version is? How is the average user coming along going to evaluate dozens or hundreds of branches and somehow know how to mix them in? So I think there's a few problems with this idea. And it's hard, I think, knowing about wikis and then learning about version control, they're obviously very closely tied concepts. And if you grok version control, then you will grok a wiki. And so it's like version control but we have this single user interface or point where people contribute stuff. And that's different to version control where people can use the command line or they have clients, they've got a Clips plugin or whatever they've got. And also we're working on pros. So pros and code have their own, I mean generally we're working on pros. Like it's interesting that nobody writes code in a wiki, although they could. But obviously there's a reason why people don't do that. And I'm also very interested in the idea of the copy left license. So things like the GPL, creative commons, attribution, share alike license. Because it's this license that creates the right to fork. And I think that's a really important thing for contributors to any kind of project actually. Because that's what keeps the bastards honest, keeps the people in charge, has to keep them on their toes. Because if you piss enough people off, they can just take the bat and ball or take your bat and ball and leave and start their own game. Now when we come to version control, there's something that software has that wikis don't tend to have. Which is this idea of releases. So although we have a trunk when you're writing code and everyone's committing, you also have a release. Which is where people in charge have decided, okay, this is an official starting point or access point for users, for members of the public. For people who are not contributing code but just want to use what we've created, this is what they should use. And so that gets publicized. That's what gets put up on fresh meat or whatever. And that's where people start. And wikis don't tend to have that. Wikis implicit is the idea that the latest version is always the release version because that's what you get. So it's always a moving target. And that's the same whether you consider the comparison with a single article or with the entire project, multiple articles. And so Wikipedia turned ten just a couple of weeks ago on January the 15th. And so there's quite a lot of celebration in the press. And it's a great thing, it's an amazing achievement. And it makes the web a different place. It makes our world a different place. But can it survive to 20? That's an interesting question. I don't see that the answer is necessarily yes. In a sense, it's too big to fail because we all rely on it and we need it and we assume that it will be there. But I think the same was true about Usenet at one point in history and yet where is it today. So the idea that it's too big to fail is not correct. We can't just assume that everything's hunky-dory and it will continue and it won't face any problems that will cause it to self-destruct. And another related problem, perhaps, is that it's essentially too big to fork. Which is that all the laws and the culture is in place for you to fork it. But practically, when it comes to the hardware or the bandwidth, the only people who could fork it would be like Yahoo or Google. And even then they have another problem which is that they wouldn't be able to have the kind of insta community that you would need to fight off vandalism. So maybe it is too big to fork and maybe that's a problem. So this is, I think, in my mind, this is a very concise way of describing what a wiki is and what its relationship to version control is. But one thing that wikis don't really have is an idea of branches or anything that really resembles it. So every commit is just straight back on to what would be your trunk if it was a single article. So straight off the bat, how could we have branches in a wiki? Would that work? Could that be possible? You can kind of do a kind of manual branching if you say in Wikipedia, keep a copy of an article in your user space and you're working to write up some particular section of it. And you just manually fold in other people's changes that are going on in the main article. But that's not branching, that's manual branching. So what would it be like if we had real branching? So there's a few differences again that come when you're using version control for a prose project versus code. So I don't think it's, they're not exactly, you can go from version control to wiki, apply all the features across. There are a few differences that we need to keep in mind. And so if we have branching then we're going to need really good merging tools. And merging prose is something that I don't know has had a lot of thought about how that works because the intent, when you write code, the intention should be fairly clear. Maybe it's to solve a specific bug and you can just put that bug number or it's to create a specific feature or fix a spelling mistake or something like that. But when you're contributing prose, it can be a lot less clear what your purpose is. If you're just rearranging two sentences or if you're adding like a new section then maybe that's quite straightforward. But a lot of the time the meaning and the overall effect of the article is a lot more intertwined with prose compared to code. So the thing that makes me think that Wikipedia might not survive another 10 years is mainly to do with the feeling around the community. It's so large now that it's quite impersonal. There's a lot of bureaucracy that just continues to expand. It's hard for newcomers to find a foothold and feel like what they contribute might actually be noticed or paid attention to. And without a robust community, Wikipedia will self-destruct in the blink of an eye in my estimation is what Clay Scherke points out about vandalism. So you have a low barrier to entry, high visibility which creates kind of motive to perform vandalism and you have heaps of pages. So you need heaps of people to be monitoring vandalism. And code doesn't suffer this in the same way because the initial barrier to entry is a lot higher. And it's not like if you commit some vandalism in a code project, it's not going to immediately appear in everyone's release. You have that release cycle which slows that down. And you generally have a lot fewer pages. Even if you have a couple of hundred pages of code, that would be small wikis for a prose wiki. Wikipedia is in the situation now that it is a kind of monopoly. And that was never really the intention but that's something that has happened as it's kind of picked up momentum. So there are good things and bad things about it. Obviously for users, it's a good thing if you know it's just Wikipedia that you will have it, that's great. And the fact that so many people link to this single project is really good for the page rank so that brings up the results and so that gets more contributors. And that makes the articles better and that makes the page rank go up. So it creates a virtuous circle in that sense. There's some potential for serendipity in editor activities in that there are many different types of activities going on in Wikipedia. People edit in topic areas but then they also do copy editing or they fix links or they welcome newcomers or they discuss articles for deletion or they add images or there's a million different tasks they can do. And once you get the kind of wiki addiction, maybe you started out being obsessed with Linux. But then you have the wiki addiction and you just go and do something that's completely unrelated that you never would have bothered to do before. So that is definitely a plus and that's a significant benefit for Wikipedia. But on the negative side, as I mentioned, it's so large that it is basically impossible to fork. I don't see how anyone could do it. As I said, Google or Yahoo might have the hardware and the bandwidth but they don't have the Insta community that you would need. If you're going to put thousands and thousands of pages up there open for editing and in a sense you could say well maybe you can fork an individual article but it's hard to see what the motivation would be for people to work on your single article compared to entire Wikipedia. It would take quite a while for the page rank to come up. Then we have this Instruction Creep and Spearocracy as I mentioned and so there's a lot of, if you have a narrow focus then you're just interested in your Linux articles. There's a lot of manual of style and a lot of stuff that's irrelevant to you but it's going on around you and you may or may not engage in that and you may or may not care about that. I mean Wikipedia, a media wiki which runs Wikipedia has a right API. I don't know about other wikis but it's had a right API for quite a few years now. So you could write another interface for it but no one really has that I've noticed. There's another feature that came up, it's been talked about recently which is called penning changes or flagged revs. And so the idea with this is that edits might not immediately be visible on the Wikipedia page and they'd have to be marked as approved before visitors would see them by default. So if a page and they did a trial with it on the English Wikipedia and it's still being used on maybe a couple of hundred pages so it's not hugely visible across the whole wiki compared to the German Wikipedia where they basically rolled it out across the board. So the question is who marks it as approved? There's different user levels and one of the levels is a reviewer. So if you have an account and then you get marked as a reviewer which is relatively easy to get, then you can mark other people's commits as approved. So this article, if you were looking at the article, the top bit, you would see this little icon if it was using pending changes and then you click on that and you can see three pending revisions. So you're missing some of the latest information. And then if you look at the history page it looks like this and so the pending ones which are not yet visible by default are the yellow ones. And so if you have permission, there'd be a link there and you can click it as like markers accepted, something like that. Is it the high risk pages? It was initially used, yeah. The conception is that it's a way of reducing full protection. So articles by default anyone can edit them including anonymous users. But if they're subject to a lot of vandalism and traffic, they can be marked as protected. And that means only administrators can edit them and administrators are nominated or chosen by the community. And so instead of doing that, using pending changes was a way that you could open it to editing by anyone again. But because their changes wouldn't be immediately visible, you could still control the vandalism on that page. So it's used, it's like Britney Spears, George W. Bush. These are some of the pages that I saw in this one. And so they had a community poll about should English Wikipedia continue to use flag revisions or pending changes? And there's not really much consensus, so the community said very loudly maybe is the end result of that one. But I think pending changes introduces some interesting new ideas to the process of editing. And it separates, in my mind, it separates the change I want to make. So like a particular commit from what I want to be visible to users. It's almost like marking something as suitable for release in my mind. And so this goes back to the idea that I said before that Wikis don't really do releases. The release is implicitly the latest version of whatever was edited. So what if there was a way to explicitly mark something as release or approved? So this is what I think it's like now if we were using pending changes. So there is a right API and we've got the free content license. So why haven't people written interfaces to it? I think a couple of the biggest problems are the wiki markup which unfortunately is not very independent from the media wiki engine. They're very closely tied. And the templates, template syntax is like this awesome control for users. But on English Wikipedia it's like this incredibly horrible mess of nested, nested, nested, nested stuff. And there's some ifs, you can put some programming logic in it. And it's bad. So most people will ignore templates but there's information in them which you occasionally need like warnings are included in templates. A lot of the info box tables are included in templates. The nav box templates which have links to related topics. They're all in templates, citation needed, that's going to be a template. Even a lot of the date stuff is templates to force dates to appear consistently. So I think a way forward for the Wikipedia community is to embrace more strongly ideas relating to wiki projects. So wiki projects are something that has grown up on English Wikipedia and probably other language Wikipedia's which is when it became so large that the whole Wikipedia was too large to be considered one community. You could create a sub community which is people who are interested in editing articles around a topic area the same as you. So it could be dogs, could be Linux, could be Brisbane, could be Queensland, could be floods in Australia, could be any topic like that. And so they're just completely self-nominated. You show up, put your name on a list and you're a member. That's how you get in. And that's how you start one. You just start a new page and say, okay, now there's a wiki project for this. And so because they have a narrower focus, you know, there's fewer people, you already have some common ground with the people who are also on that project. So what I would like to see is a way or what I think would be interesting is if people were able to create separate interfaces that were wiki project specific. So you might have a completely separate website that had a front end, but all the edits went straight back to Wikipedia through the right API. But you were able to enjoy, you know, having an interface that was specific to your topic area. You would spend time there with other people who were just with the same wiki project. So you'd be less concerned with what was going on in the general Wikipedia bureaucracy. But your contributions would go straight back. So you'd still have that motivation. I mean, we look at, you know, Twitter is obviously a much simpler thing, just 140 characters and you're not really, you're not editing each other's work. But their API, you know, how many people use the web interface to Twitter versus one of the clients? It's been embraced, you know, a lot more widely and I think there could be some benefit if we were to see a similar thing happen with media wiki. I mean, it's a lot more complex because you're interacting with existing code and you don't want to recreate the kind of interface that version control has. So maybe we need to unify those things a little bit. So I saw this question asked, you know, why would anyone contribute to this feeder wiki? It's like, why would you bother contributing to this little wiki instead of to Wikipedia proper? And I think people would do it if their content was going to Wikipedia and they had the promise of a smaller community that was more relevant to whatever they were working on. You know, if people are working on a small wiki and the idea is, well, one day maybe our stuff will get accepted by Wikipedia upstream, that's not a very good promise and I don't think too many people are going to contribute to that. But if there was a much more transparent way for the branches to be merged back, for example, then there could be a real uptake in that kind of idea. And so in a way, that idea is the kind of forking of, not forking of distribution of the interface and also the community. So like, what does it mean? Does it mean anything if we say you could have a distributed community? If you think about Linux as an operating system, there are people who work on the kernel, there are people who work on GNOME, there are people who work on individual pieces of software and they may or may not see themselves as contributing to the entire Linux machine. But there are things that are able to be pushed upstream and accepted. So could something like that work for Wikipedia? The things that are key, I think, for that to work are agreement on the intent and the aims, which is basically adopting the key policies of Wikipedia, like neutral point of view and the free content license. Obviously it's not going to work if you have conserverpedia who don't have neutral point of view trying to push their article on George W. Bush back to Wikipedia, that's not going to be accepted. But if there was another Wiki doing it that was also adhering to neutral point of view, adhering to no original research, citing sources, that kind of thing, I don't see why the Wikipedia community would reject such a thing. So this is all coming back to this idea that forking Wikipedia is kind of impossible and that's bad because then there's no check on the power that they have. So what can we do to make forking practical again? And so what I'm thinking of with these forking the UI and the community thing is forking the topic area. So topic area of Linux or dogs or mathematics. So making, you know, the whole Wikipedia is too big and one article is too small, but maybe a topic area is just the right size that is suitable for forking. But I think this is a question we should ask ourselves and I'm not, I don't ask it with malice or implication that Wikipedia is corrupt and needs to be forked and needs to be brought down because I don't think those things are true. But I think having the check on power is important. Helps keep them honest and that threat of competition is something that is good for the community. And I'm possibly really short, I don't know. That's the end of my talk. Do we have any questions? Well, I have one. How do you actually see, like say you had a community that was forking off of Wikipedia to work on, I don't know, Australia or Brisbane, for example, like what would that look like? Would it be an install of MediaWiki and then these people working on this and then eventually then kind of releases back to Wikipedia or? It should be MediaWiki, but it doesn't have to be MediaWiki because of, as I said, the right API. So if you're tying yourself to MediaWiki, you're tying yourself I think to the markup and well, the history and the logs, all those things are good, but the markup is something that is a barrier. And so it'd be quite a bit of work, but maybe there's a potential for someone to create something closer to a WYSIWYG interface that converts back to, well, has to convert back to WikiMarkup to push anything back to Wikipedia. But yeah, I see in a way a community could work on an article or a set of articles until they think that they reach some fit state and then propose that revision or that set of revisions to be pushed back to Wikipedia and the Wikipedia community. In a way, I see that the Wikipedia community could become the thing that is managing the branches rather than doing the writing themselves. Probably not everything would be in a Wiki project that someone wants to set up separately, so there would still be editing happening on Wikipedia, but there would be a lot more, I see that there could be a lot more explicit kind of branch management, basically. Very similar to the way that Linux kernel is developed with sort of lieutenants. Yeah, maybe. You mentioned people being approved to approve revisions. Do you see there being any connection between that sort of thing and the process of scientific peer review where people who have published papers in a particular area will be the people who would be able to make the articles the most approved, so you can get someone who's a real specialist in the area to say, yeah, I've checked this out, and it conforms to what we were all talking about the last time we had our conferences about this. Yeah. That sort of thing. So you're talking more about expertise. Yeah, yeah. Expertise in doing that review, because at the moment, if you've just got one, sort of one global hierarchy, I mean, someone who knows something about geology might not know something about, might not know everything about, you know, the synthesis of climate change as a topical example. So that, to me, seems like it could be, this could be something that could be useful for scientific journals to kind of like, or for universities, that sort of thing, to have a sort of a really scientific, scientifically backed version of Wikipedia. So that was more of a discussion than a question. That's okay. So the way the permissions are handed out on Wikipedia at the moment, they're done quite liberally, and it's not that you're an expert in any particular area, but that we trust you to use your best judgment. And part of that best judgment is respecting, you know, contributions from what the community says about what is appropriate. So I think the way that that would work at the moment is that people who know about a topic area should, you know, post on like a talk page and say, this is great, this should be included, or this is rubbish, this is missing, this is biased. And those opinions should be taken into consideration when doing the merging. So there's this model where the people who have permissions like reviewer or administrator are not the experts, but are more like the janitors and are just enacting the consensus of the community. Of course, it doesn't totally work like that, but that's kind of the ideal of how it should work. But I mean, if a university had its own wiki, they could set it up in a completely different way and say, all right, this reviewer permission means you are this expert for this area. And that seems to me a little bit similar to what Citizenium tried to do or is trying to do. They have that kind of, you know, you're the expert on this and this is your responsibility. Any other questions? And that sort of thing in our wiki's engines is because we're worried that it's too difficult for users or at least too much of a startup cost understanding the consequences of making a branch, merging it, figuring out what happens when 12 people around the world have made changes that are not compatible and figuring out how to bring that together. And for us, we think in pros that might be too hard to do. You're basically telling us technical people that we're wrong. Sorry, saying it. You're basically telling us coders of wiki engines that we're wrong, which sounds great. But I guess my big question is, are you sure? Okay, so your question is about branches and merging with pros. Yeah, because that's really the basic building block that you'd need to make wikipedia or any other wiki more distributable. Yeah. And like, we've made a few attempts in our wiki engine and it's... So you're a wiki author. Well, FOS wiki now, but yeah. It's always ended up being something where most of the users don't understand what's going on, so they're afraid to edit. And that's then potentially the opposite of what you're trying to achieve. Yeah. I mean, distributed version control is quite a recent technology and it's mostly used by technical people. Oh yeah. I've mentioned this about merging. Like, I don't know. I'm actually interested because I haven't heard a lot about attempts to do branching and merging with pros, so I'll be really interested if you have some research or just some anecdotes about that. You know, is it possible? Is pros, can it be atomic in the same way that code is? You know, it's not at all clear that it can be. And maybe that's just... Higher than what each individual developer often has as well. Yeah. So some companies, for example, have somebody who's in charge of merging all the changes. Yeah. Because they're the only person who understands everything. Yeah. So I don't think pros is necessarily more difficult than code. It's just that you're trying to get a non-technical user group to understand what currently is a technical area. Yeah. And confuses the heck out of most developers. Yeah, for sure. It's very confusing and getting the interface right is the hard bit. But, like, it depends a bit on the size of your wiki, but often a wiki will have, you know, it'll have a range of people with different technical abilities and some people are just happy adding text and they stick with that. But then other people will get into the technical stuff. And you could have the case where you're merging and your branch management is done by a small group of people who, as I mentioned before, they're kind of empowered by the community, but they're not really making the decisions about what's appropriate to merge. They're just doing whatever the community consensus is. So, yeah, probably you don't want every contributor having the ability to branch and merge because it'll just confuse the hell out of them. And you want to have the default thing be just like make your edit to the trunk or the latest, the most recently edited branch or something like that. But I think, like, I'm just interested to see ideas around this because there's so much potential. You know, what could be possible? When I went to Wikimania in 2005, we had a discussion that Ward suggested we think about. What do we do when you can't have all of Wikipedia in one system, which is pretty much your thesis as well, really. And we were starting to look towards things like what happens with large wikis. For example, as a funny case, you're flying to Mars, so your pipe to the world is nice and small and you've taken with you a version of Wikipedia and you're updating it with what you know. And so you've effectively got a latency of information, which is the same as what you'd have when you merge only when people have decided the Mars stone's been reached. And that sort of thing. There was some discussion on it, but again, we just threw up our hands and went, we don't know how to approach them normally. Yeah, I mean, you don't even need to go to Mars. The whole idea of offline editing is like something that's supremely interesting to Wikipedia for countries which don't have good internet connectivity. And I don't think it's solved or even, you know. Fundamentally, I think it's fantastic that you're interested. We obviously need more people like you to tell us what is functional because we can make Distributive Wiki really easily, but only we can use it. Yeah, like this is why a lot of people are like, oh, Git and Wikipedia, it's awesome. But it's like for... Okay. To what end, you know? Yeah. So yes, I've tweeted you anyway. Okay. You'll know where to go. All right. Have one last question or... Yeah. Do you think it sort of comes down to the nature of how we do distribution of information on the web at the moment where we have big pipes, huge websites, and when things get slow, we make them bigger rather than splitting them off into small different services? And do you think it kind of reflects on the underlying technical issues about websites and how we do that sort of stuff? I don't know, but it's interesting because it seems like a lot of the benefits and the power in the web is that anyone can put up a website really easily. So that's quite distributed. But then somehow we have these monoliths like Google and Facebook and Wikipedia and Yahoo come out and they one day disintegrate and then we go back to lots of little things again and then they build up and then they disintegrate. I think that's an interesting pattern. I don't know how that works. Seems to me that the larger monolithic things are bad for the open web, but they're good in some senses because they're good for users like you just go to Google or you just go to Wikipedia or you just go to Facebook and all your friends are there. So how do you balance the distributed thing which is good for freedom versus the kind of monolithic thing which is good for, I don't know, first-parset usability, I guess. Could Wikipedia been interfaced to a number of smaller sites? Yeah, in a sense, that's kind of what I was thinking about. But if you put it like that, I'm not sure the community would embrace it. But yeah, interesting ideas. All right, thank you very much, Brianna.