 talk. I'm going to speak about Subversion. My name is Greg Stein. I was one of the original Subversion developers when the project started six years ago. The Subversion 1.0 was released about two years ago. It took us four years to reach that point. And so I'll go over basically what Subversion is about, why we designed it the way we did, some of the features it brings, and in particular some of the improvements over CVS. So what is it? It's a version control system and right from the start it was intended to replace CVS. We didn't really want to go and do something entirely new like what people are experimenting with art or monotone or DRCS. We wanted to replace CVS and so that meant that we wanted to keep the same sort of model, some of the same ideas, but really do it right this time. And also where we can make it more than just a simple replacement but make it a very useful tool to build applications on, to use as a simple version control system, to build new clients, all kinds of things. We wanted to make it very easy to do all kinds of things with Subversion where there was a lot of difficulty around CVS. And of course we wanted to make it open source. The project, the initial launch and everything was originally funded by CollabNet, a company in the States, and they wanted to make sure it was open source so that people would actually adopt it. Even though that they hired a lot of developers, they could have made it private but they wanted it to be open source so that it would get the kind of adoption that CVS has and can really sort of meet that goal of replacing CVS. So why replace CVS? There's a lot of missing features. In particular, move and rename is one of the big missing features in CVS. A lot of people effect a move by going and manipulating the repository, but it's kind of a weird way to do it and you have to have access to the server and it's very difficult. And if you don't do it right, then the project looks very funny when you start to look at the history. CVS has very poor handling of binary files. It's very easy to get a binary file corrupted in CVS if you aren't careful when you first add the thing in. CVS is not very good with the network. It has poor usage on the server. It's very hard to administrate. And then the CVS application is one big monolithic executable. And so there's really no way to use, to sort of link it to CVS to build additional tools. Instead you have to run the thing and then try and parse the output. And there's actually a large class of problems where you're analyzing the repository that you can't use that parsed output. It's not rigorous enough to be able to do it that way. So building all kinds of solutions and things on CVS is a very difficult thing. But why should it be like CVS? CVS is incredibly popular. I mean it is probably one of the most used version control systems on the planet when you look at how many developers actually use it. I mean pretty much every open source developer has used CVS at some point or another. You look at sites like sourceforge.net that for years and years all it was is CVS. So everybody that was working with sourceforge was using just CVS. But that popularity also said a couple things. One that that model of how the developer interacts with the server is valid. If that many people could use it then that centralized repository, the update, edit, and merge and commit, that model works. It also verified sort of the remote that in a WAN environment over the internet that that model works. As opposed to some systems like Clearcase or Perforce where you can really only use those on a LAN because of the traffic between the clients and the servers. The other thing about CVS is it's very conceptually simple. You get the code, you start making some changes, and then you commit back to the server. In the simplest case that's all it is and all you ever have to do. There's some harder problems dealing with merging and branching and all those kinds of things but you can grow into that beyond just the very simple edit and commit. And we also wanted a very easy upgrade path that people familiar with CVS, people familiar with that model could easily upgrade to Subversion without having to go through a lot of retraining, without having to change their development model. So by keeping it like CVS we were able to make it very easy for people to move to Subversion. Over the past couple years we've really seen that happen quite a lot. So Subversion feels very familiar to CVS users. We kept the command line very much the same. We omitted a couple things in CVS that were completely broken. They're edit and watch and the history thing. Neither of those really do what they advertise and they aren't very useful. So we took those out but all the rest is still there. We changed around some of the options. In CVS depending on where you place the options it can either be a global option or it can be local to the sub command. It's very confusing. In Subversion there's really no requirements. You can put the options anywhere you want on the command line. So we got rid of some of the funny things that made CVS kind of hard to use from the command line. But we also added a bunch of new sub commands into Subversion to do a lot of different things since we've gotten more capability. I mean just the simple copy and move which is one of the big new features of Subversion. It gets its new commands. We've also added a bunch of other things dealing with metadata, dealing with reserve checkouts and other things. I'm not going to go into detail on each of these commands but I'm going to sort of go through and highlight a lot of the different features. And those features do come through these different commands. So a big change from CVS is that Subversion has one global revision number. So each time you make a commit it assigns a new revision number. You start at zero and the first commit is then revision one and then revision two, revision three and so on. Each of those commits contains the author and log message and the date of that commit. In CVS that information is copied across all the different files and when you're trying to figure out what was done you have to kind of look and reassemble. Where did Gstein make a commit at 218 on Saturday and then you can see all the files and understand oh okay that's what the change was. In Subversion the whole package of changes goes in as one single change set with the author and log and date associated with it. And the other thing about that is that it's atomic. Either the entire commit occurs or it doesn't. You never get a partial commit which is actually very possible to do in CVS. It's sort of a known problem. You can kind of work around it but it is possible to get part way through a commit in CVS, lose your network connection and then you have just a partial change recorded in the repository and it's actually very difficult to recover from on the client side. You have to go and do stuff in the server. So one of the things about having a global revision number like that is it makes it very easy to talk about it. So you know a mailing list or in person you can say oh yeah I fixed that bug in revision 2524 or in your bug tracker you can close out the bug and you can say this was fixed in revision 2524. It provides a nice little way that the people can discuss and refer to individual changes. And each revision is sort of a name for a unique state of your project at each point in time. So versioning, we actually versioned some more data now. The directories themselves are versioned so anytime that you have a set of files in a directory that information is versioned. So if you add or remove a file that's actually a change to the directory that is versioned. When you rename a file in a directory that's a change that gets versioned. Moves and copies are essentially just moving pointers to files around through those directory entries. So again all that stuff is versioned. And you can go ahead and analyze back in time okay when did I move this directory over here and you can see it in the revision information in the revision log. All the files and directories have metadata properties that you can attach to them. And so in like CVS you may be familiar with the CVS ignore file. You say okay here are all the different file patterns I want to ignore in this directory. In subversion it's just a property on the directory itself. Or if a file is binary it's a property on the file that we record what the MIME type is supposed to be for that file. We can also record other arbitrary data on those properties. So depending if you want to build some kind of a neat application on top of subversion you can use those properties to store you know like what's the last known good build or what's the testing state of your your source code. You can attach those into properties. The properties on the files and directories are versioned so you can even track the changes of those properties over time. One of the things about how subversion does its moves and copies like I said it's really just sort of you know pointing new directory entries at the actual files or subdirectories and that means that branching and tagging is very cheap because you're just creating a new directory entry pointing to existing content to make a copy. Or to move you unlink it from over here and link it over here. And so branching and tagging are implemented using these cheap copies. What we do is to make a branch you just copy the main trunk of development over to a new path and then you start changing it over there. Then when you're done you can merge those changes back in. And tags are essentially where you make that copy but you never actually go in and change it. So in this example here what I've done is organized my all of my code under this directory called trunk which is the main trunk of development. I've got my source and docs and build tools in there but there's also branches where there's a branch where I'm doing some personal work but there's also a branch that people are coordinating on to fix a particular issue. And so what was done is just copied that trunk over into branches slash gstein and then you can work on it there. And so the essentially the branches and tags are just particular paths in the file system. A tag is a copy that you just never go and change. So this works really well and you can also investigate very easily what are the current branches that are in my repository. What's being worked on. And then once issue 1003 is done you can actually go and delete that branch. It'll still always be in the history. You can always get back to see what was the path of change that happened to fix issue 103. But when you go and look at the branch directory it's clean. It's just you know okay what's currently active. And because these things are just you know other portions of the repository you can move and reorganize them. If your branches directory starts to get really big where you've got say a thousand different branches going on you can reorganize them and say like branches slash issues and put all your issue ones in there. And then you can say branches dash features and have feature development going on in there. It's all sort of arbitrary how you do it but what we do like to do is recommend to people to lay out the repositories this way with the trunk tags and branches because a lot of tools understand that general pattern. And so you can leverage the tools understanding if you use this pattern. But you can structure it really however you want. So the clients in subversion in particular of course the command line client. As I said before we sort of cleaned up some of the options so that it's not as funky. The other thing is the output is parsable. We made sure that all the output from the command line client could be parsed by tools. And there's other ways for tools to use it and that's the next part where instead of a monolithic application we built subversion as a set of libraries. And so the command line client is actually just a very thin little tiny program over a client library that has all the basic operations like updating and committing and stuff like that. And that those libraries will manipulate your working copy or talk to the server to get a list of changes or interact with the repository sitting on the server. And so at each of those points there are different libraries depending on where your application wants to hook in. For example C which used to be called ViewCVS it has subversion support in it and it uses those server side libraries to directly look at the repository and expose those through web pages rather than needing to use you know an actual client that it can talk to you know over the wire or anything like that. It can use the libraries directly. Those libraries all have swig wrappers around them so that they're available to different languages. In particular ViewVC is written in Python and so it can go in there and use all the different subversion stuff from Python. We also have Java bindings, Perl bindings, Ruby bindings all those different ways to actually be able to use subversion as a tool rather than just a command line client. And so the tortoise SVN which is a GUI application for Windows that was written using the the subversion libraries. They didn't have to do like in tortoise CVS that has to spawn off CVS processes trying to figure out what the heck is going on in the in the window. Whereas with tortoise SVN it's just direct calls into the libraries. So networking there's two main options for actually networking this. You can use subversion locally entirely on your own system without using any networked without using any servers but when you go to use a server we've got two options. One is using Apache as the server and the other is using a custom server that speaks a custom protocol. And so there's different reasons why you might choose these. The Apache web server it exposes the repository via web dev which means that all the content is accessible using just a standard HTTP get. And so you can point your browser at a subversion repository and browse the the tip the the latest change. You can't do all the history type of stuff. You can't do diffs. That's all for tools like VVC. But you can browse the the the current state of the repository with any web browser and fetch files down and things like that. A lot of times what we do in the subversion project is like when we we post news announcements and we we want to give people install instructions we point them at the live file sitting in the subversion repository where subversion itself is hosted. And so people can go and look at those install directions and see the latest install right there live in their browser and that's being pulled right out of the the repository. The other thing about using web dev means that you can actually mount a subversion repository on your desktop. So in macOS the the standard file sharing that's web dev based. And so you can take macOS and mount a subversion repository and then start making changes to it using any of your tools on your Macintosh. So the same facilities available in windows and then like if you use Nautilus under Nome on Linux KDE has facilities for speaking web dev. All these are ways you can actually directly interact with subversion without even having to get subversion software. This is all just standard you know rfc stuff. The other thing about using htp is that you can use standard htp authentication and client certificates and server certificates and all of that security infrastructure. You can use proxies and firewalls and all of this is standard htp and so you can leverage all of that. With the custom protocol that one is a little more efficient in in how it works and you can also tunnel it via ssh. So if you have a large ssh installation where everybody's already passed around all the different ssh keys you can use that version. It doesn't have the same kind of flexibility as using Apache of course. Apache is you know very old and robust. It's been around for a while and it's very configurable but the svn serve is good for certain environments. So one of the things about the network protocol is that we send differences in both directions. So if you add one line to a file when subversion sends that change up to the server we send just the one line. This is in contrast to CVS where it has to send the entire file up to the server. Both CVS and subversion will send down just diffs from the server so it's much more optimized there but on the return subversion is much better. One of the ways that it does that is when you do a checkout it keeps a pristine copy in its little private directory and so then it can always see what your local changes are and send those changes back to the server. But it leads into this third item which is that there are many operations that don't require a network turnaround. If you do an svndiff it has the pristine version in the local directory and it can immediately produce the diff. It's in contrast to CVS where it has to go to the server to get the original and then present a diff to you. The status command in subversion doesn't contact the server unless you specifically ask it to but if you say you know svnd status it'll show all the different files that have been modified maybe a file's missing some different things like that and it's in a nice you know one line per file format very easy to read very manageable but you can also tell it well go contact the server and tell me what files are new on the server that i may need to pull down. You can add files remove files revert changes all of those are offline operations. The only real time that you contact the server is when you do an svnd update to pull changes down or when you're doing a commit to push changes back up and so it works very well you know if you're sitting on a plane you can get a lot of work done you can't commit while on the plane but you can still do a lot of a lot of work. So some of the other improvements much better binary file handling like i said earlier cbs is particularly bad with this when you add a file if you forget to say dash kb or if you haven't set up your repository to look for certain extensions cbs will go and try and do a new line translation or keyword expansion in those binary files and corrupt them. In subversion what we do is we take the file we add and we don't do anything to it until you explicitly tell subversion okay do keyword translation do end of or keyword expansion do new line translation between windows and unix or whatever it is and so at that point the the translation will start to occur when you check it out but even if you do it incorrectly and you say do new line translation on this binary file it only occurs on the client side it occurs when you check it out so you can actually turn it back off without affecting what you checked in on the server it'll still be a nice clean binary that hasn't been corrupted. With the the the binaries we also our diff algorithm for sending changes to the server or changes coming down from the server is actually byte oriented it's not a line oriented diff format so we do binary diffs as we send changes to and from the server and so that's very important if you know you're making some changes to say a large binary you know like a word document or something like that if you add a paragraph somewhere well it's going to you know maybe append it at the end you don't have to to transmit the entire file up there you know the the delta will say okay they inserted a paragraph here and so it's just a minimal amount of change. The the it helps quite a lot in in this and also on the server how we actually store the stuff using the the binary diffs. One thing that the subversion added just a couple revisions ago is a locking or what's also known as reserved checkouts and reserved checkouts means all right I'm working on this file other people you know please don't work on it because otherwise you might mess up what I'm doing. Now in subversion it's purely advisory anybody can actually go in and essentially unlock the other guy's thing and then go ahead and start making changes although you can use some hook scripts even affect that behavior so that you you don't allow but the default is okay it's advisory I'm doing this I have to take special precaution to unlock the other guy and then I can make my changes but this is very important for binary documents because the the typical text merge where you line by line merge that doesn't work for binary files if you're editing say a word document an excel document or an image or a sound file none of those things there there aren't really diff and merge algorithms available for those or at least they haven't been plugged into subversion at this time so you have to really tell people don't change the sound file I'm about to to make some changes and check in and CVS had some rudimentary stuff this is one of those things where I said we pulled out the edit watch stuff because it didn't work but we did get it working very well in subversion that also integrates well with the web dev locking so if you're using a web dev client subversional follow all the web dev locking operations transform those into subversion lock operations uh subversion has a replacement for CVS modules that where we use a property called SVN externals SVN externals essentially is a way to say bring in these external subversion repositories and put them in my working copy when I do a checkout or when I update um you know update the stuff from this repository but then follow the SVN externals links and go on and bring in updates for those too and so this allows you to you know really start to incorporate all kinds of third-party stuff into a larger work when you have a dependency on say a library you can bring that right into your source tree and make changes in there the SVN externals being a property also is versioned and so in you know one version you might be pulling in a specific library but then in the next one you now have a a new dependency and so when you do an update subversion sees the change to the externals and goes and grabs that extra library into your working copy so that you can build in length uh subversion has also been localized to many languages I don't even know what the count is but all of the the error messages and strings and help text and all of that has been localized to um probably like 10 languages now I think and that's continuing on an ongoing basis as volunteers arrive and are performing the translations so on the server side one of the the big things is that the storage for each revision it's proportional to the actual change size as opposed to the size of the file this is one of CVS is very very one of the things that was particularly poor about CVS is with the binary files since it didn't have any sort of binary diff it would store new copies of those binary images and so if you had you know like a 10 megabyte binary file and then after 100 revisions you know you're looking at a gigabyte for all that data whereas with subversion that 10 megabyte file will just store the individual deltas and so it doesn't grow on the server side as much as um as CVS would and you know a lot of people are always saying ah but disk space is cheap but the more disk space you use the more you have to back up the more you have to transfer if you're keeping off-site backups you know more on tape whatever there's a lot of things around if you've got you don't necessarily want to waste space you do want to try and at least store some so in subversion again each of the revisions as they get stored on the server it's proportional to what you change whether it was just one file or a thousand files the hook system that we did is also much cleaner than CVS CVS has a strange with these tag info and commit info files and then they they get fired at weird times or sometimes not at all um and it's very hard to to hook into different types of CVS operations in subversion we've cleaned up the the hook scripts and how those are called how they're invoked it's just simple you know shell scripts now that are past um uh some arguments that say okay here's where the repository is here's the new revision number that was created go ahead and send out a commit email or something and um it's really kind of nice because it's also part of um in in CVS uh there's temporary checkout of all the changes that were made and um it goes away and you you can't really sort of rerun commit emails in in CVS because of the this whole mess but in subversion many times where i've had problems with mail i can go back in and just manually invoke that that post commit hook and get mail generated because then all it does is it goes and it looks in the repository for the change and sends out the mail anyhow um the the point is using the hooks you can control property changes you can control commits you can control the locking behavior um all very very simply so uh futures for subversion uh 1.4 is uh sort of in the the planning stages uh sort of the release planning stages now most of this work has already been done and the the subversion team is kind of figuring out okay uh is there anything else that we want in 1.4 what are we going to wait for should we go ahead and package it up now so 1.4 should see like maybe an early you know uh release candidate uh maybe in the next couple months i would say and so in there the the working copies are going to become much more efficient today each file that gets checked out we store about five files on your local hard disk and that's a lot of inodes a lot of different files to open and close and stat when we're doing different operations and we're going to collapse those so that it's it's more efficient um the repository on the server is also going to get smaller because when you first check in a file we'll do some compression on that file right away whereas today we check in the the full text um for the the initial commits there's also a neat tool that has been written called svnsync which is a client side tool to fully synchronize a repository down from the server uh and this allows uh you to essentially make a copy of any server out there including the totality of its history all the changes that have been made since day one you can synchronize this down and this is particularly important for uh it makes it very easy to do backups and and uh you know a backup replica of a subversion repository you can just you know once an hour run svnsync to pull down any new changes that have occurred in a repository this is also very nice if you're hosting a subversion repository somewhere you can always make sure that you've got a full copy of your repository in case you want to change hosting providers it's very easy to just take your copy and give it to a new person uh so this this tool we haven't been able to do this before so this is you know really sort of a nice tool for some of our our backup and administration and uh distributed copies and things so further ahead we're going to be doing a thing called merge tracking merge tracking is actually a very important feature that's basically it avoids the double merge problem and and things like darks they don't see this because they just grab all the changes and and combine them in one set but uh in subversion there is you know sort of a time order and if you branch and if you make some changes on the branch later you merge those back to trunk and then if you continue some more work on the branch and you say okay i want to merge the changes on the branch back to the trunk you have to make sure that you don't include the ones you've already done otherwise if you do use those again you'll get a conflict because you've already made the change it's trying to apply the change again and so merge tracking is essentially referring to knowing what has been merged and so in that case when you say okay merge the branch over it'll just pick up whatever hasn't been merged already and do it it'll also if you cherry pick individual revisions to merge from the branch it'll remember those so that it won't try and merge them again but this has been a a very big feature requested for the past few years but it's a very difficult problem i think it's probably going to get tackled for subversion one five though and that'll probably be one five's big feature is merge tracking one of the things that people have also wanted pretty often is a relational back end so that you can tag individual files with certain properties and so then you can say okay show me all the files in the repository that were authored by gstein and do different kinds of queries and interest that you know relational databases do very well right now the the subversion repository doesn't have those kinds of indexes or search capability there's also some things around web dev where we want to improve some of the compatibility and in particular working with some of the different clients that are out there that don't really understand that they're talking to a versioning system and so as they make changes you end up with a lot of spurious commits and things like that and so we want to improve some of that and to get a broader range of clients so migrating to subversion like i said you know from a user standpoint it's pretty easy the two models are very very similar you have a central repository you check out from it you make your changes you commit to it all about the same thing the command lines are actually very similar but how do you actually get your your history over into subversion well there's a neat little tool called cvs to svn and that thing will look at your cvs repository do a full analysis of all the different files and aggregate all those changes together so it'll look and say okay these 10 files were changed at approximately the same time by the same person with the same log message you know i say approximately the same time because actually cvs as it's making a commit the time stamps vary because it's based on when the file gets up to the server and the server actually makes a change so a commit can actually span a couple hours in cvs if you're on a really slow link but so this tool figures all those things out reassembles full change sets and then time orders those creates tags and branches and puts changes on each of those and assembles a brand new subversion repository that mirrors all your history in cvs now if you've used some of the the cvs munging tools or gone in and edited the comma v files of course that translation is not going to work as well as you'd want but you know you shouldn't have done that in the first place so the the tools that cvs to svm.tigers.org all the the stuff is there this has been used to convert really really large repositories the kde repository was converted about a year ago using this tool the apache software foundation also uses the version they've converted all of their cvs repositories using this tool so it is very robust and capable and and it does do the job very well so for more information the main subversion website is subversion.tigers.org there's a book written for it published by o'reilly and it's actually under a creative commons license the entire book is available online at svmbook.org i took a look at the o'reilly table i didn't see any there but you can certainly ask them for a book but this book is very good it's really sort of the the definitive guide to using subversion and it talks about how to use the version both from you know a user standpoint to an administration standpoint to developing new tools on top of it the the full range there are two mailing lists one uh users at subversion.tigers.org that's probably where you would go first for any questions you may have the developer mailing list is dev at subversion and that's really if you've got some patches or if you want to get involved in developing subversion itself and so that's it for my presentation i'll take any questions now people might have any questions yeah so in the the question was if if uh an error has crept into your code base and you don't know exactly what change it was and so in cbs okay well let me check out you know january 20th okay that one's good let me you know do a binary search through the dates how would you do the same thing in subversion well in subversion you could use the dates but subversion has these global revision numbers so you can just back up and say okay check out revision 1000 where you know currently you're at say 2000 1000 okay that one's good let me check out revision 1500 is that good or bad okay that one's good let me go to 1750 and so you can just do a binary search across the revision numbers themselves so it's actually much easier to do because again these you know time skews in cbs can sometimes mess you up but in subversion you have these individual changes and so then you can nail it down to you know eventually say okay it was revision 1823 that broke the build and so then people can look at what those changes were and try and you know figure out how to fix it so the the question is is there a way to plug in special diff tools for different formats beyond just text files and it is kind of one of our this would be nice things and you know we'd like to put in at some point the feature isn't there yet but we certainly have thought about it because it would allow us to do merging on those those files once you can start to do diffs and see the different changes and then hopefully you can also have a similar merge type functionality for those but nobody's really started working on it yet or put a lot of thought into the problem so i don't know when that might come around it's been just kind of a blue sky nice to have they already have so what he's saying is tortoise svn has actually implemented that for microsoft word documents that they call into microsoft word to do the merging so at least on that binary format yes yeah so the question is are there any repository verification tools um no mostly people have been kind of eyeballing it or maybe they do some checkouts to see okay if i check out from the old cvs check out from subversion are they the same um i haven't heard of any specific tool um as far as like repository like as an administrator there's other types of verification that subversion does um sort of as an ongoing basis what you want to do um we we have a uh all the files and all the changes and everything that sit in the repository are check summed and so then like maybe once a week what you want to do is run the the svn admin verify function it'll make sure that you haven't you know due to a disk error lost any data um and that's very important so that if there's a disk error and this command will notice it and you can restore from backup uh it's very important in a version control system to have that kind of functionality because if you're your repository is you know 10 15 20 years old you don't want to find out that your disk went bad five years ago because you don't have you know that that backup tip anymore but that's a different kind of verification but the the conversion process you could probably you know script something up pretty easy to do parallel checkouts it's just going to be a little bit harder in cvs because you don't have an easy handle to get each of those changes so i think in cvs you'd probably kind of be advancing it by date and you can use data operations in subversion people don't normally do it in day to day operation they refer to revisions rather than dates but you can do that with a tool pretty easily i would think so the question is uh does subversion you know can you use it for applying version control to like slash etc your various system configuration files uh certainly and i know a lot of people that that do that they also keep their home directories in subversion so whenever they get to a new machine they just check out all their their home directory files i don't know that subversion is going to be any easier than cvs i mean both of them are you know trying to control the those files in etc and you know with those a lot of those files also need special users and group and permission things like that and neither of the tools really tracks that kind of information so i don't know that subversion is going to be a it's an easier tool to use but i don't think it's going to really help that particular scenario any more than cvs yeah certainly if you know somebody changes one of those files subversion can tell you that it has been changed yes uh there was another question so the question is are there any other systems that i've used before and in the past i've used rcs i've used visual source safe at work i use perforce every day and perforce one of its commands called integrate and that's essentially what the merge tracking is all about in perforce i can say okay take all the the changes that have happened over here and anything that i haven't put on this release branch bring those changes over and it just remembers everything that's been merged onto the branch to that point and so it brings over all the new stuff and this is great for releasing code um that whenever i go to make a release i pull the extra changes onto my branch and it just remembers what has been pulled and what hasn't and so that's a very excellent feature of perforce and what and is precisely what we want to do with the merge tracking feature um you know are there other things i don't know i mean you know subversion i already like it you know i liked it a long time ago it certainly satisfies all the stuff i need to do i think there's uh you know some sort of complex release management type scenarios where subversion could potentially um you know provide some additional level of functionality but those are you know some real kind of esoteric scm type scenarios so the question is how efficient is it compared to cvs that you know the network usage is more efficient actual uh cpu well local disk um we certainly store more than cvs because we keep that that copy of the file in its original state so if you check out 10 megabytes of source code there's going to be another 10 megabytes stashed away so that it can compute all the diffs very quickly in general in a day-to-day usage subversion is much more efficient because it doesn't have to hit the network for most of those operations um on the server subversion um is more processor intensive than cvs cvs on the other hand on the server the working set size is unbounded so you can crash any cvs server by attempting to commit a gigabyte file what you'll do is you'll end up making the server run out of available memory or go into a swap thrash or anything like that and it's even worse when you try and check that gigabyte file back out because at that point subversion or cvs will bring two copies into memory so it's actually possible in cvs to check something in you can't get it back out because there isn't enough RAM to get it back out um and it's particularly bad if you check in like an iso image into cvs uh yeah and then you get five people doing an updates down you know five processes are trying to double up this iso image and server falls over real fast but um subversion it's working set size on the server is basically constant and it's you know maybe 10 20 meg per client um what we do in subversion is make sure to keep the the memory in a sort of a streaming fashion so that as we're processing files we do it serially we never try and load the whole thing into memory at once so memory wise on the server subversion is better but it uses more cpu we use a bit more space uh disk space on the server on the client we use more disk space local cpu i've never really run tests on that uh but just day-to-day usage it's much faster because of those offline operations other is is there any way to avoid the local copy at this point uh no we we always keep that second copy but it is a feature request and it is going to get implemented in subversion at some point for example id software the guys that do doom and quake they use subversion for storing media assets you know images and sound and video and all these different things those are very big and so they in certain scenarios want to be able to do a checkout without an extra copy but what it does mean though is that um when you do a commit you're going to have to send the whole file back because you can't compute that binary diff and in certain environments that's totally fine so um eventually it'll get there but not today um so the the problem is what happens if somebody commits a username and password to the repository or maybe a sensitive legal document that you know you are not supposed to be sharing or some you know personal information whatever you don't want it in the repository today with subversion there is no way to just go in and delete that what you have to do is dump out the repository into um our our uh dump file format and then run it through a program called svn filter that can go and just eliminate that file and then load it back into your repository and this can take a long long time if you've got a lot of revisions a lot of content and it's a a real pain uh so the question is is there going to be a way to fix that in the future we don't have any plans i mean right now perforce has a operation called perforce obliterate that does that it goes back in time and goes in and tweaks it and so we've sort of reserved you know well we might have an svn obliterate at some point but it's not it's not on the on the plan at this point um it's very dependent on the back end of subversion um but you know we should be able to do it um it's just someone has to be interested in encoded up the questions so the question is is there any authentication or authorization stuff in subversion and it depends on which server option you use um if it's Apache you can use standard Apache mechanisms if it's svn serve we have a couple that are available so i think i'm um maybe one or two more questions and then we're out of time of course uh generally you can't use it offline how well does subversion work offline very very well the the only thing that that you can't do offline is is update or commit sorry i like powerpoint you know what can i say um yeah you can't do an update you can't get fresh changes or you can't commit changes back one thing that i would like to see at some point in the future is an offline commit that'll package up a change and just kind of set it aside and then you can do some more work package up a commit set that aside and then when you reconnect it'll attempt to commit each of those and then you know deal with the merging problem but as you commit each of those and that would actually solve a huge amount of sort of the the problem scenarios that distributed version control systems are attempting to solve today where you know if you take like bitkeeper you can load the repository onto your laptop and make changes while you're on the plane and make you know commits um but you don't necessarily have to keep the whole repository all you really want to do is that offline commit so i think we'll see that in subversion at some point and uh that'll really help you know with people that like distributed version control for that that offline commit process one more question what's the overhead of hosting 150 repositories instead of having one repository with 150 subdirectories um they're pretty equivalent uh the the actual repository format uh the the amount of space it uses is dependent on what's in there so if you've got 150 empty ones they aren't using more than maybe a couple meg on disk um so that would be about your overhead so we're out of time um i will be around for a little while you can ask me questions outside thank you very much