 So many different SCMs now, you're not needing to know. And other distributed source code control systems. Okay, so as a start, I'll explain to you what I want this talk to be about. First, I want to remain really simple because Git is a tool that often frightens people, whereas it really shouldn't. And I'll try to show in many points of my talk what I do for real with Git and how I explain today isn't really stable yet. And for now, I only use these methods for my own packages without any commentators. So I've started to package LLVM with Arthur. He's somewhere here, here, okay. And we work together using those schemes. And for now, it seems to work pretty well. Ah, fuck. Okay, I'm sorry for the tool, but XPDF is doing really nasty things. Okay. So I'll start with explaining why I've chosen Git. And all my talk will be about Git, but most of it is applicable to any kind of distributed versioning system. It's not really specific to Git. Some of the tools are also. So for starters, I've packaged in Debian big packages like KDE for a long time, something like two years. And my point is that it's not really interesting, not really efficient to do for many reasons. The first is that I don't always have network because I've moved my house three times in between, two times. And I've not had network access for many days. And with SVN, you cannot work offline. Okay, there is SVK, but I never got around it. You cannot work incrementally. What I mean is if you want to do a major change in your packaging, you have to cope with other patches and then push all your work in one big patch, or you have to break the SVN for an unlimited amount of time. And this sucks a lot, especially if your team is the KDE one where big packages are really big. And if you lose four times, four hours building your package, and that in the end it doesn't work because some random guy broke it because he uses SVN, it really sucks. Or think OpenOffice if you want. The other thing that sucks a lot with SVN is you cannot take the try and see approach. That means you don't really know where you are going, and you want to commit a lot, and then say, okay, this was not so brilliant idea, and you want to drop everything. Well, as SVN comes to that, you will put a lot of things, and then, okay, it was a bad idea, you have to revert the patch, doing a new patch at unwind, the same thing you committed before. It creates really ugly histories. And when you want to know what's happened for real, you have a lot of cruft in the middle, and that sucks too. And the last point is that I like to have the full upstream sources when I'm packaging unpacked, because when you package, don't fool me, nobody writes the perfect patch the first time. You want to patch the upstream source directly, not using your patch system, and then when you have the good patch, you extract it from your upstream repository, and put it in the patch system. I mean, Kilt allows you to do that to some extent, but as soon as you have a big enough package, you will forget to use Kilt Edit or whatever, and you end up needing to rebuild the patch, and it's quite hard. And with SVN, the point is, storing a full KDE package in SVN is way too big. Each new extreme, you will take too many space on the SVN server, and SVN protocol to get a new release won't be very efficient. So in KDE, for example, the KDE packaging team only versions the DBN directory. The rest isn't under SVN, and it's not very practical. At least it's my opinion. And here, Git has answers for most of these problems. First, offline work is possible because it's a distributed SCM. Nothing to say more. Incremental work is possible because you commit locally, and when you have a full feature that is ready to be merged, you push it to the world, but you can really toy with it and you don't really need to force the user to get your work before it's ready. Like I said, sometimes you want to try something in Git. You can forget anything you did if you never shared it with anyone, because you never published it, so it wasn't even here to start with. And then, and I will explain a few more numbers, Git storage is extremely efficient. The thing is, the more you commit, the less a commit costs to you. Here are some numbers. I took some not-so-random examples. For example, the Xorg server Git repository generates a Git pack file. That's the only thing you really need. The other things are just indexes of that pack file that take also some kind of space, but it's regenerated on your end. You don't really need to keep it. The Xorg pack file is 20 megabyte big, whereas the last table is 8 megabyte big, and unpacked, it's more than 80. And the Git pack file has a whole history since Xorg's Git repository started back in the year 2K. For DPKG, for example, the whole history is since 96. It's a pack file that is smaller again than the last unpacked source. And the better example is the GNULA IPC. The pack file, again, is smaller than the unpacked source, and you have all history since the 80s. Though I must say there is a little problem with that, and I explained in some words why, Git packing is very efficient because it's delta-based. That means that, in fact, it takes the biggest file in the history, and every other version is expressed as what you have to remove from that file to have the older version. The checkout of the recent file is really efficient because it's the head of the delta-based chain, and removals are cheap to express because you say, okay, you start at that position and you remove that number of bytes. Whereas when you add data, you have to say, okay, at this position, I add this data and you have to say which data it is. And this way, when you add new data, it becomes what takes the place, and the other revisions only are delta, and deltas are really cheap. So here is the big way a new commit always is cheaper than if you did it with very same one some time ago. And the last point is that, and this is specific to Git, but most DSCM have the same tools and the same features. Git was designed by Linus to maintain the kernel where there is something like the last estimate was for one recycle cycle, a thousand of contributors. It has been designed to help maintaining a source, and that means that any Git operation to look at the history is really, really efficient, and you have any kind of filtering you might ever want. That means unlike using UCVS or any similar tools, when you want to browse the history, you can filter out files you don't care about, you can filter out even authors or commitors you don't care about, and only focus on the few files you have some worries about because you think there is a regression and so on. And when you have the full history of a project, it helps a lot to tracking regressions or to backport bug fixes. Also, any Git operation to integrate patches are also very efficient and very fast, and when you want to integrate, to cherry pick some changes upstream, it's really useful. Git supports really advanced merging features. It's not specific to Git, most DSCM do, and it's really useful because I do use Git to replace Git in my packages, and I use Git to regenerate the patches, and it's where I gain some time because Git doesn't have the merging features, and when you have to refresh the patches, sometimes you have to do the work by hand where Git can help you. This is a bit redundant. In other words, and that's what I just said, for me, Git is a complete toolbox and it supersedes SVN and kilts at the same time, or your preferred patch system. So now I'll go a bit in the details and show to you how my repositories are organized. First, there is two cases. Either my upstream has an SCM I can access to and that Git can work. That means basically it's either Git or SVN or a simple CVS. For example, the numbers I gave you for the G-Lipsy are on the imports of the G-Lipsy CVS, but it's somehow broken in the sense that tags are not at the good place, so it's unusable to work with, but the pack file is essentially the list of files that are in it and the numbers are good. A good import will have the same size. So when my upstream has an SCM, I try to import that in Git and all his branches are under an upstream name space. The nice thing in Git is that you can have name spaces for your branches and you can really sort things really cleanly. If my upstream... Another problem is that often upstream re-starballs that are not extracted from the SCM directly. For example, they give auto tools and auto-configured sources and you have a lot of craft added. When this happens, I have a specific upstream branch that fakes the tarball that I try to connect to the upstream branches. The reason for that is that I gain with this operation all the history of upstream connected to the Damian package. It's really important to track regulations and to see what's changed between versions. And if that doesn't happen, I just inject the tarballs in a specific branch. The examples I'll show you today are in the third category. And last but not least, I use the JOAS tool, Pristine Tar, to save the original tarballs directly in Git. That means that I don't have any files in a round. When I stop working on a package, I can remove everything except the .Git directory and I have everything stored in it. I'll show an example. So it's time for a bit of demo. I hope you'll see everything. Don't hesitate to interrupt me to ask questions because for me, what I do is really obvious and it may not be for people not used to Git. Okay. So first. Okay, the green is not really bright, but... Hmm? Cool, do that. Is that better? Okay. So the first step is to inject the source tarball. I know there is Git build packages and whatnot. I don't use it because I don't know it. And frankly, it's not really useful. So first I go in my upstream branch. I don't know if you can see it, but at the top here. Okay. There is a branch of Git written in really light blue. So I'm in my upstream branch and then I just remove everything. That's because tar is released dump and cannot track files that disappeared. Then I unpack my new upstream. Okay, it was here. That was almost it. Okay. And now you can see all the files that changed. And I just need to commit everything. Okay, done. And like you see, Git is really fast. Okay. And now I want to remember my tarball. And here is what I do. Let's start over. Okay. Yeah, that was... Almost. I can do it. Okay. And then I use Prestin Tar to say hello. Here is exactly the current state of Git. And if we look at the history, and here is a novel tool that Git gives to us to look at the history. I'm sorry if you cannot see a lot, but what is important are the little green things on the left. Where is it? Prestin Tar, here. Prestin Tar created a new tarball. And I can use Prestin Tar checkout to get my tarball back. Okay, because in Prestin Tar life, it didn't choose the proper the proper reference. And so I'm used to type it. But I think the last version does what it showed. You know, when you have that in the fingers, you type it in. Okay. So let's back to the talk. Let's go back to the talk. And that was just importing a new upstream. Now, second of it, I don't use Git like I said. I have a specific branch derived from the upstream branch that I usually called suffixing plus patches to the branch I'm patching. There are as many reasons to that. First, I only need to know one tool. Like I said, Git doesn't know about these three. And Git is really, really better at rebasing patches. So it gives me new features. And also, like I'll show you, I publish my branch of patches for upstreams. And I really believe it to be nicer to upstreams if they want to take patches back from my packaging. So let's see how it works. So if we look at the thing you see here, the three patches that I applied to my upstream and upstream was here before. What we did earlier was to make the upstream branch progress here and I'll just rebase the patches to be here. So let's do it. I go to my patches branch and let's rebase it on to upstream. Okay. I'm using breeding edge Git. It seems to be broken. Now that's a bit... What happened? Yeah, no. But I also did the path to... Well, not very pretty. What happened? Okay, my shell was supposed to get that, but... Okay, we're there. Okay, so the rebasing process is basically taking a pile of patch storing them somewhere and taking them one by one and applying them to the points as specified. Here what we see and it's expected is that the third patch it's said here failed. And I'm not surprised because this is the patch where I reconfigure the source package to support K3BSD. Okay. So I know how to regenerate this patch. I don't really need Git for this one. I just auto-recount. And when it's done I say, okay, this patch is done and I want to commit it and I say Git rebase dash dash continue to tell okay, this patch is merged. Next. Here will happen something really interesting. Git rebase says, okay, but there is nothing to commit. And the thing is my upstream finally fixed its auto-conf packaging. So in fact I will have to drop this patch. It's not useful anymore. No problem. I say there is no patch anymore. And if we look at GitK again my patches are here. There are only two of them and it's fully rebased. For this example there wasn't any conflict. I didn't manage to generate simple ones. And GitKilt would have done basically the same job. So it's not very brilliant as a demonstration but I think you get the idea. Okay. With new versions of the patch I mean the patch had a burn patch. Would you just abandon that? Well, I'll come to that but as a matter of fact I'm not really sure that when I bring every state the upstream plus patches branch had is really interesting. You just want to remember a serialized version of it. And it's just what I do. I export that under DBN patches. I'll come to that. In fact, I generate something that is really near a Kilt series in the end. Okay. And finally, all my work my packaging work is done in a DBN branch. I do usually one branch per street one for edge for back ports one for seats maybe one for experimental. And I merge upstreams directory in those branches. So it solves the problem I had when I package KDE my upstreams is merged in my DBN branch so I can experiment in it and do whatever I want. And then, what I just said I sterilize my patches under DBN patches so that if someone comes and wants to enmue my package he knows how to add a patch. It's trivial, just add one under DBN patches and I take care of putting that in my DBN plus patch branch. Let's release Tokyo cabinet. So now I go to my DBN seed branch I say, okay now I merge my new upstream and now that I've done that I want to refresh my patches I don't really need that. Okay. This thing just use Git commands. I can show you the snippet not really interesting but somewhere here. Okay, it's here. Git is able to generate a patch series really easily because it's how kernel developers exchange patches so I just use that to generate my list of patches and you can see it here. I have one patch per I have one patch per fix and the patch for them is actually really simple it's just a matter of listing every patch and applying it and that's done. Right now I have a new let's see what it looks like I have a new branch where I merged where I merged my new upstream and I've refreshed my patches now all the work I need to do is just DBN packaging. Okay. These are the points where Git is really useful to me not only in DBN packaging what I really like is I don't know how you work but when I work I'm really sloppy. I mean my mind goes in every direction I do many things at the same time and I like to use Git like I save my buffer in my editor so I commit a lot and in the end there's a lot of useless commits and you can then use Git to simplify your history and to pretend you did a really awesome work from the first time. I'll try to do something with Tokyo cabinets I didn't prepare this because I wanted to be dirty I'll try to remain I mean to not do that really I'll try to do that in 10 minutes okay so basically Tokyo cabinet is a really simple package it's a library that is a DB like library and I know for sure that this new library has an S&M bump so I'll start with that okay not very clever there was a stupid mistake the DBG package doesn't need to have this name I'll fix that and that done okay so now I build my package I'll spare you the test suite yeah it should be cached but okay it's really fast to build the point is I will have errors because I forgot on purpose something it's that I didn't rename my DH so it won't be able to install I hope so okay it didn't so actually I want to rename every file with a new surname and that is an example of sloppy work I mean there is a state where my repository isn't buildable because these files weren't named properly and that here, that git is really helpful to me when I'm done packaging Tokyo cabinet before pushing my branches and showing my work I can pretend those two patches were one that's how to do this once again I use git rebase it's really an awesome tool I say okay I want to rewrite interactively my two last patches let's do that it spawns an editor to ask me what I want to do and here is a really nice trick you want to drop a package really simple you remove the line package patch of course you want to reorder patches really simple you reorder all the lines you want to edit a patch because it sucks you say I want to edit it and here I want to merge two patches so it's really simple I say this this patch I want to squash it on the previous one and then I git my editor and he says okay here is merge patch that's a nice thing to show you what your patch is about and it gives you your two commit messages in the same buffer and you can fix it so okay prepare for SNM bump because I'm clever I did rename my dbpap file on the first run and if we look at the history again almost there is only one patch and nobody will ever know that I needed to do two patches for that actually my experience shows if you use git to commit a lot it's really easy to do that work because your work will basically to be two operations maybe three dropping a few that you don't really need and most of the work is reorganizing lines and merging the ones that you should have done in one step this work is really really fast to do when you split it really frequently what is really hard to do is to split a patch splitting a patch needs a lot of time because usually well there is two cases either you want to split a patch that touches really different paths this is easy because you just unwind your patch and you recommit it into pieces what is really harder is when your patch touches the same file and the chunks of the diffs are mixed if you do that it's really hard to split the patch because your sole option is basically to rewrite it from scratch whereas if you do incremental really small commits I think you saw my laptop is running at one gigahertz because it's on PowerSafe it was really really fast to committing unlike SVN is really really fast I mean most of the time I don't even notice it