 No change. So the things you're looking for in a distributed revision control system are the cost to branch, and you want a really, really low cost to branch. You're also looking for the ability to publish easily. And a lot of the distributed revision control systems allow you to publish your branches very, very easily. So we're getting to the point now where you can take any upstream, literally just say, I want to branch that upstream, then publish your own branch and have other people start to collaborate. What I'm going to do is demonstrate with the one that I use, which is Baz, which is based on TLA, Arch. And the way Baz works is it has the idea of your own personal archive, and you can have multiple archives. And archives are just immutable data files that you can publish over HTTP. So one of the huge advantages of that infrastructure is that just with a web server, if you've got a writable location that's published by HTTP, you can effectively publish your branch. You don't need to set up like a CVS piece of all the subversion equivalents. You don't need any kind of daemon listening to a special network port. You just need the ability to write to a directory which you then publish over HTTP. So for example, can we bring up what's on my screen? Is that on for you? How's that? OK, so what I've got over here is just a directory with a single file in it that I've imported to a branch. If I do Baz version, you can see it tells you exactly what branch it is. It's got the archive name, which I've just used an email address. You want a unique archive name so that, effectively, every branch has a unique identifier because it's a combination of the archive name and the branch name. And then the branch name is kind of the product name and then a branch name and then some sort of version name. Now Baz inherits this from TLA. I think it's a bit crack-full to kind of have this structured and forced namespace. Subversion has a much better idea, which is basically just use a URL because every URL is effectively unique. There are some advantages to this because you can replicate an archive with a series of mirrors, effectively, where everyone knows where the canonical location is and that this is a mirror. And you can't really do that with a URL space pity. Right, so you can actually Baz get a URL, effectively, and it figures out all the archive names. I think it's in Carves. My archives are just in my home directory. And if you go in there, you'll start to see, OK, so now we've got demo, demo develop. That's what an archive revision actually looks like. So the first revision is called Baz zero. And it's basically whatever you imported as a table with GPG signatures and so on so that you have long term security. And I can now just start publishing that directly over the web and everyone will see this. If I make a change and then commit it, OK, you can see that what it's done is it figured out the diff from the previous revision, compressed and digitally signed that, puts it in the archive. So if that was being published through the web, everyone else would be seeing that immediately. And then updates the various tracking files in the working directory so that this working directory knows that it consists of that patch as well. So if I go in here, now you'll see you have the base zero, which was the import, and patch one. And if you go and look in patch one, you can see the check sums and the GPG signatures and so on that uniquely identify that. Now one of the weird things about distributed revision control is that you can easily end up in a situation where there's no central repository. You just have everybody's branches. And people might maintain a kind of a mainline branch, but then also feature branches. So what you end up doing is you say, OK, Joe over there is the lead developer. So here's, if we say it's Joe at food.com, product dash dash develop dash dash zero. That's the mainline, effectively. So if you think of that as a branch, and if this is sort of Joe's working, Joe's space, effectively, Joe might have multiple branches that come off and merge back, come off where he's developing other features, all sort of active at the same time. Somebody else then, if this is somebody else's archive, might have multiple branches as well, all in parallel. And so you kind of have to change your thinking away from the idea of having a centralized repository. Some projects do like having a centralized repository. We saw that in the kernel, for example, where Linus has his branch, and Andrew Morton has his branch, and other guys each have their own branches. You can set up an automated branch, which is effectively managed by something called a patch queue manager, which is basically an email address that you can send a message to merge from one of your branches too. So if you've got this centralized branch, which is being managed by a patch queue manager, you basically email it and say, please merge from this branch, or this branch, or this branch of mine. And what it can do is effectively create a temporary working directory, check out the existing mainline, merge into that from your branch, and then run all of your tests. And if those tests pass, then you can tell it to commit to that mainline. And if you're publishing that mainline, then that becomes kind of the project mainline. So you can have a centralized revision control system. Some of the new work that's being done is to kind of create hybrids of distributed and centralized revision control, so that you can have a branch which multiple people have direct commit access to, like with CVS or Subversion, but you also can then effectively instantly branch off that. And I think this is going to be really, really important for projects like OpenOffice and Firefox, where you want to have a consistent public repository that everybody knows where the sort of central mainline is, but you also don't want to discourage people from creating their own branches. And just to finish off, I want to show you something here, which is because we're starting to use distributed revision control so much. We are importing the upstreams from a lot of many of the open source projects out there. We've currently got 480, 490 of those upstreams that are published as bizarre branches. So just publish these archives. And what this is, is this is, say, upstream GDB. And there's nothing in it. Let me see if I can find, nope, here we go. OK, so this is FVWM. And what's happened is that we've gone through the CVS repository and effectively tried to figure out, although CVS doesn't tell you revision by revision, it tells you file by file what was committed, we try to consolidate all of those into a coherent change set across the whole of the branch and then commit those through. And then sync that every day so that effectively we can work internally on that upstream just using distributed revision control. So anybody in the project can branch that. But because that's published, and because it's distributed, anybody else can branch from it. So other people are starting to branch from that. So this import was done on the 1st of July. And then you can see, as there have been commits upstream effectively down here, 5th of July and 6th, then 9th and 11th of July, those have been synced in as well. So that you can effectively track upstream using Baz. So you can create your own archive, work on any upstream that's being published as a Baz branch or as a distributed revision control branch. And there are some key things that I'd like to see in the distributed revision control world. It would be nice to have a standard change set format so that if you are working with docs, for example, you can publish a change set and somebody who's working with the same upstream source in Baz can just apply that change set and have it translated into the local format. That would be great because there's suddenly a bunch of different projects to work with distributed revision control. It would be nice if we could identify effectively a standard format. There's nothing really complicated about that. It's just a bunch of patches that are all applied at the same time. But then there are also metadata things like directory renames, file renames, and stuff like that. What's awesome about working in this kind of environment is that you kind of tend to forget about what's happening at the center point of the revision control. So you can rename files and continue to work. And when somebody else merges that from you, they will just instantly get those file renames. And any changes that they've made to those files since you renamed them will just flow naturally. And the other cool thing that you can do is you can start to merge between these branches. So for example, this guy can create a branch of his branch. And this guy can create a branch of his branch. And then he might say, I want the feature that's being worked in there so he can basically pull in. Then you can have another guy who's created a branch of his branch, and he can then merge from that. And he's now inherited effectively that feature and that feature. And then if he merges to mainline over there, this one would have inherited that and that and what he was doing over here. So within a project, you can start to effectively have these very dynamic situations where groups of people cluster around a feature or an idea. Some of the stuff we've evolved to do is to use a wiki to define a feature. So you create a wiki page for the feature, and then you start to document the feature over there. And in the wiki page, you specify the branch name. So then people find the wiki page. They've got some documentation as to what it's about. They can just merge from the branch and get going. Yeah, a distributed revision control system will keep track of, remember, this branch over here is a set of patches, right? Patch one, patch two, patch three, patch four. And at this point, say that's patch one, two, three, four, that got merged into here. So then this thing knows that it's got from this branch patches one, two, three, and four. And from this branch patches one, two, and three. So when that lands over here, mainline, or this branch over here, knows that it's got one, two, and three from that branch, one, two, three, four from that branch and this point over here. If you continue to develop here a couple more patches and then merged there, it would say, OK, I've already seen one, two, and three. And so it effectively tries to apply only those ones. It gets kind of hairy, but it's amazing how few conflicts you get in a situation like that. There's a lot of research being done now on the algorithms required to support that. But it becomes a very productive way of working. You can ping-pong. You work on a few more features and you merge from your partner. He works on a few more features and mergers from you. So you ping-pong develop like that. And then land it all. Yeah? It's very cool. Yeah? There's a lot of revision control systems. Money distributed as well. Why do you think BAS is better than the others? So BAS has strong roots in Arch. There's some very good, nice ideas in Arch, the idea that you can publish without having, just by publishing the files, the fact that it inherits GPG signatures on every revision nicely and cleanly. Some very sort of cool ideas in there. But also a fair amount of crack, right? TLA was sort of shell script converted to C with a Perl script, something like that. It was a Lisp script. OK. That makes it much better. It's got kind of a scary internals. And it's a little bit difficult to work with as a code base. So what we did is we got Martin Poole. I don't know if you know Martin Poole. He was at HP, actually. Sorry for your help, but thank you. Colonel Hacker, Samber Hacker, and has written quite a lot about revision control system. And he did an analysis of Doc's monotone bitkeeper Arch. Right. He wrote those two beautiful pieces of work. And we've got him working on BazNG, which is a Python-based implementation. If you go to bazaar-ng.org, you'll see it. And so that's kind of a reference implementation of where Baz is going. And then the Baz team are currently on version 1.5, and they're effectively converging on that standard reference implementation, but from the Arch code base. What I really like about that is Arch is, yeah, it's crafty and has its issues and so on. If you use Baz right now, there are issues. But there's this huge code base of stuff that's out there. And we're rapidly moving towards this much cleaner distributed revision control system, which the bright folks who've looked at, say, has significant technical advantages over and includes the best features of Doc's and monotone and bitkeeper and others. Is that a question, Andrew? So it's a slightly tricky thing to have a version 1.5, which is under active development, as well as a reference version 2.0, which is under active development. But what we find is that some of the community members will tend to use Baz for the stuff that they're working on that's got upstreams and so on, and BazNG for new projects that they're starting. And they tend to hack more on BazNG, because it's in Python, it's easier. There's a lower learning curve. It's very well documented as to where that's going. Andrew? I guess when you started looking at revision control systems, Git wasn't around. No, Git wasn't around. But it's moved incredibly quickly. Have you been looking at it? Yeah, absolutely. And what was interesting is that when Linus sat down and wrote the first cut of Git, many of the key ideas were exactly the ideas that Martin had already documented and laid out for BazNG. Time will tell how that works out. My sense at the moment is it gets a little bit chaotic and sort of optimized for the kernel guy's needs. I'm not sure that it has the generalization and the cleanliness required to become a subversion or CVS replacement across the whole of the upstream world. I've been evaluating it pretty recently. And my feeling about it is that it is extremely elegant in its core functionality. And that actually hasn't changed significantly in quite a while now. And that where all that chaotic and extremely active development is happening is around the scripting and to support the use of it. And so the user interface development is what's happening now and that the core Git development is done. Pretty static. Yeah, and it's got some really interesting features. The idea of being able to, perhaps, move code from one program to another as, and have the revision control know that you've actually taken a subroutine and put it in a different file, but that you haven't changed the functionality in some core ways, really interesting features there. Yeah, the other thing that's very popular, for example, is the ability to move a file from one branch to another. Because often you start, you're working on a project and it's all in one branch. Then you start working on a library for that project. And then further down the line, you actually want to have a separate branch just for that library. But now all the revision control history for that, the early part of that library is inside the product branch effectively. And so you want to be able to move that out. And some people are still going to be hacking on that library in the product branch. Some people are not going to be hacking on it as a dedicated thing. And you want to be able to merge between them. Distributive revision control has the beginnings of support for that kind of thing. Yeah. We've been using ARCH and then TLA and then BAS. And I've been following closely the work of Martin Paul that you've been backing. And having used ARCH quite a while, the whole infrastructure, quite a while, I find the major concern that I find is that it's sometimes hard to converge. Because every time you merge, you generate a new commit. So there are always, the more you try and converge, the more pending commits that the other branch seems to have. And in that sense, I'm tempted and I'm looking into some kind of a mixture of identity base. I'm not sure how to describe it, but Git has this very powerful concept of identifying things by their hashes, by MD5 hashes, which makes it very easy to discover identity that you have actually converged. I think exactly the same concept is in Baz and G, where each patch has a unique ID, and you can track exactly which patches are where. I don't know enough about the different ones to get into a detailed discussion. I'm afraid I'm sort of a mere user of this. And my interest in this is basically just to make sure there's a damn good distributed revision control system out there. I don't actually care which one it is. But I do care that we have this infrastructure, because I think socially this is one of the big bottlenecks in the open source process, that idea that there are these guys with commit rights, and then sort of everybody else who's a mere pleb and must email a patch, which is fine, but doesn't allow for the same level of collaboration. But it'd be great to have a bof actually pulling together people who are actively working with all of these different distributed revision control systems and say, what do you like about each of those? And how can we get that community to compete and cooperate to make this all happen faster? That's all I care about. Yeah, and as I said, there is an IRC channel. I think Martin Poole is there as well. And they're all getting together. So it's really fun to be in that channel and see what happens with everything. This is a really exciting time for the revision control. No one knows who's going to be the winner, but clearly they're all trying to get there. And ultimately, it's much easier to move between these formats than it is to do this. This part of getting from CVS to something else is really crappy, because there are so many CVS repositories that are a mess. It's too easy to go in and jimmy stuff around. You have to jimmy stuff around to move files and do stuff like that. And of course, that makes it very, very hard to map that into a clean revision control system where you have change sets that span multiple files and meta data renaming and deletion and addition and so on. But once we get to the point where if you're running with your infrastructure in subversion even, because that has consistent change sets, it's pretty easy then to map that into docs, or it's pretty easy to map that then into Baz or Git. And if you're running in Git, then it's pretty easy to map to. So it doesn't really matter which one of these things wins what matters is that we all start to think in terms of distributed revision control. The next thing that's going to be really interesting, I think, is if you consider a package as an orage.tgz and a set of patches, and then you look at that over time, this is effectively a branch, right? That's mainline. And if you look at branch plus patch 1, that's a branch 2 because patch 1 evolves over time to fit with that. And the same happens here and the same happens here. So a package over time is basically a collection of branches. And so the thing that I'm trying to drive in the Ubuntu community is that we stop thinking of patch management and more think of branch management. This allows us to then start saying, hey, have you merged from that branch? And would allow us, for example, to move code much more easily between different distributions. So we start to be able to collaborate at the branch level. You might have guys from multiple distributions collaborating on a branch, effectively, which is much easier and much saner than trying to collaborate on a patch, particularly for packages in open office and kernel and so on, which are so huge. And we don't yet have the infrastructure to give us a clean view of that. But that's one of the areas that I'm very, very interested in driving a lot of work. Any other questions? We've probably got more detailed experts here than me to answer technical questions anybody might have, so far away. OK, cool. Thanks.