 All right, my name is Scotcha Cohn and this talk is going to be on using Git in Ruby applications This is not a Git intro really. How many of you guys use Git? Nice How many of you guys just had a curiosity how many of you guys use GitHub? Nice actually that probably makes it slightly less useful for us to be here, but okay All right, so real quick. My name is this is me. My name is Scotcha Cohn. I work at github You can find all my git t stuff at github.com slash s to cone There's a bunch of the code that I'm going to be talking about in this talk is in here if you're interested in stuff other than Ruby Too I've done partial implementations of git in action script and in objective C So if you're interested in that code as well You can go here check it out I wrote the git internals p code PDF I run gitcast.com though. I haven't posted very much there lately We're trying to redo that a little bit, but it has some screencasts on how to learn git But most of you guys seem like you know it a little bit And then I run gitscm.com which hopefully in the near term like in the next couple weeks or so git.org.cz is going to start redirecting to this site. So Relatively soon, this should be the official git homepage And my email address is s to cone at gmail.com if you have any questions or you know want to follow up with any of these things clarify whatever so Why would you use git in your ruby application? The point of this talk is not how to use git to manage ruby It's actually has almost nothing to do with source code management at this point The point of this talk is to sort of demonstrate how you can use git in other ways How you can use git as a tool within your ruby application to do some interesting things so Let's say that you need Document-oriented database Something that's distributed something that's efficient something that's versioned something is checksum branch and merge capable It doesn't necessarily have to be managing source code that these things are useful in what you do, right? So I'm going to go over a couple of examples of where this functionality can be useful in other Domains than managing your own source code So how does git help? Well, it's snapshot based, right? So most of you guys hopefully will know that Git is not sort of a Delta storage mechanism like CVS or subversion It stores individual snapshots of a tree of your code base, right? And so you can store snapshots of things other than code that can be useful to you It has lightweight branching so you can experiment with things try different things go in different directions without losing any work It has a really efficient storage mechanism for storing all of the history of whatever you're doing It implements standard transfer protocols so ssa Http And it's free and open So I'm going to go over real fast what I mean by some of these concepts in case you're not familiar with how Git stores its data and what the object model sort of looks like because that's what we're going to be Exploiting when we're using git for other purposes than source code management Every time that you run git commit it creates an object in its database and basically what the commit has it actually looks like this It has a pointer to a tree shot It has a pointer to its parent or parents it can have several of them. It can have zero if it's first one It has author and committer data and then some sort of commit message you can put anything in that you want but for simplicity we're going to show it like this and What the tree points to is a directed graph of a file system And they call it trees and blobs because it's not really files on your file system It's sort of in this sort of meta file system that git maintains. That's right It also points to its parent which points to a tree and so on and so forth So you have this nice history of snapshots, right? And you can take any one of them and create that snapshot in your file system again, right? and Then it takes all of these and checksums them and creates little little objects that are g zipped and sticks it in its Repository So you have all of these objects just sitting in your file system somewhere and they're sort of out of sight out of mind You can't really you can go through and you can see what they are and stuff, but for the most part You know, they're just off in the background when you check out something it takes the Commit and it walks the tree and gets all the trees and blobs downstream from that and checks it out into what's called an index is a staging area and Then checks that out into your working directory, right? But you can work with git with just the repository and index you actually don't have to have a working directory There are ways to manipulate that which is the library that I'm going to be getting to And then you can edit files in here and when you run git add It puts them back into the index or the staging area and then when you run git commit It takes that snapshot that you've built up in the staging area and sticks it back in your repository And so sort of the the mechanism of git is working with this index It's it's building up a snapshot of content that you can write Into your object directory right into your object repository. So The object model for this sort of looks like a commit which is a pointer to the snapshot a tree Which is a directory listing? Which points to one or more other trees or files, right? So for the next stuff, I'm going to be referring to that as this little c1 deal What does lightweight branching mean Well in git branches are little movable pointers to commit So it's a file that has 40 characters in the file and that's all that it is It's just a pointer to a commit that says this is where this branch head is And you can move them around just as easily as rewriting 40 characters in a file Right, so you don't have to do some sort of per-forces a version thing We have to have a whole nother directory checked out or whatever. It's really light lightweight And you can keep pointers to stuff if you want to remember it So a tag in a branch are basically the same thing. It's just saying at this point. That's what this is So if I'm sorry if we have two commits and our head is pointing to our default branch our default branch or master Is a more common term for that? To see one and we wanted to branch we could say git branch experiment It just creates a new pointer. It just writes 40 characters to a file and says here's where it's pointing now and that's it If you run git branch at this point You can see that you're on default because that's where heads pointed and that you also have an experiment branch and In reality they're pointed the same place, right, but it doesn't really matter What matters is this star that says where head is pointing and then If you want to change that you can say check out experiment It'll move the head over make your working directory look like that If you run git commit it's going to start adding new commit objects those the the simple little object that had and writing the new tree snapshots and Having the new commits have the parent wherever head was pointing before right so as you're going through here and you're committing stuff It's just going to keep moving that branch head forward and using the previous one as his parent And then you can check out go back, you know move the head back to where it was a couple commits ago and start committing again and That's how you do sort of divergent branches and git right so it's really lightweight at any point You can remove these these branch head pointers and you know go back to say say make Experiment my default branch and just move the pointer and forget about C4 and it doesn't really matter right because everything just goes downstream so this is Go back and commit again So there's a concept of merging in git which is also very simple where you just say All right, I'm going to check out my default branch. I'm going to merge experiment It just takes those two trees C4 and C5 whatever those directories look like merges them directly It doesn't look at any of the history behind them Creates a new tree and then writes a new commit object with two parents in there and so you can follow the the parent history either way and It doesn't stop there right you can go back and keep committing on the other branch And you can go back and you can merge them back in again It doesn't really matter you can do that as many times as you want because at any time It's only really looking at the snapshots of whatever object you're doing So it's looking at a snapshot of C6 and C5 or it's looking at a snapshot of or in this case C6 and C7 to make C8 Right, it's just it's looking at two snapshots of what this content look like It's also distributed Really easily because all of these objects are just individual files that can be packed up and moved around But they're treated as individual objects You can just go to a server and say what objects do you have that? I don't and pull just those ones down right and you don't have to dirty up a working directory You don't have to do anything you can just get all the objects You don't have yet and and get any of the pointers that they have that say okay. Here's where this starts If you push into a public repository it moves all your objects up to that repository if somebody fetches from that They just pulls all their objects down If they start committing and adding new objects into theirs they can push to theirs right it goes back And then you can say alright I want to fetch from that one and it only pulls the difference it just says okay Here's the three that I don't have yet and pulls it down and so it's really simple to do this sort of distributed Object movement and you can push and it just pushes the ones that hasn't haven't been pushed up yet up You can also fairly easily have multiple remotes So in this case the commit is a screen one a tree is this purpley one and the blobs is this bluey one and You can push into say a public directory Somebody Nick in this case can clone that get all those objects Jessica and start committing Jessica can clone that and Start committing different stuff and they can push into public repositories themselves And then you can add them as remotes right so you can have all these different repositories that have versions of your project that have Objects that you might not have yet and you can say all right I want to nickname this Nick and I want to nickname this Jess and just add those as remote so your project knows about them now and You can fetch Nick and it'll just pull down whatever objects you don't have yet that he has and You can fetch Jess and it'll pull down any objects You don't have yet that she does and now you have sort of the overall so you can see that all these sort of Repositories are a little bit different at this point right but they're all the same project and it's easy to say What's the difference and just move those objects around? so if I create some new objects here and I push them up Then we can see that Nick's objects that he wrote are now propagated through Nick's and mine and Jessica's objects that she wrote are now propagated through hers and mine and then they can you know pull from the and then my objects have been pushed up as Well, so then they can pull from that and just get that difference again. So it's really nice and easy At a fundamental level to do this sort of distribution of project The other thing that git provides out of the box is a shot checksum integrity so all of the objects that we're seeing going back and forth here if there's some sort of problem if You know they get corrupted in transit or something you can tell very easily Everything is checksum. So if you run, you know LS tree or something you have a checksum for every sub directory every file every commit So you can't you can't screw up anything without get knowing about it And if you look at this one, you know, you get a subtree of that and they're all checksum as well So you can't really corrupt data and get very easily or at least without it knowing about it It also has very efficient storage. So a Rails checkout is about 11 megs just the raw files and the entire rails history in git is about 16 megs So that's every version of every file All the way back through time if you if you clone git If you clone rails and git it's a 16 meg checkout or 16 meg clone and then you check out 11 megs Just for your one working copy of the head So it doesn't take that much more space than just writing all the files to disk right to keep a version of every file ever Some of the trend standard transfer protocols that I mentioned are SSH and some of the There are several different transfer mechanisms in git and they all have different pluses and minuses to them So SSH is secure obviously. There's no special server Damon You can just run SSH Damon that you might already have running on your server anyways and everything runs over over that There are user-based permissions So you can give some users that log in with their normal SSH keys and everything write or read permissions to the git repositories And you can push over it. HTTP is nice because it's simple all you have to do is Basically take your git repository and stick it into a static server Directory somewhere as long as HTTP can can serve the files It'll serve the file statically and so there's there's no like SVN DAV or something That's really expensive and difficult to set up you just stick the git repository into a public Public directory that Apache serving or something And then you can also make it secure right you can you can have like You can have SSL certificates that validate so that people that don't have the private key can't Or the public key can't serve them The downsides are that it's slower. It's a little less efficient It and there's no user-based permissions right if it's accessible it's accessible And then there's a git protocol which is special to git and it's super efficient and super simple. It's very fast It's slightly more difficult to set up and keep running. It's another Damon that you have to have It's insecure. There's no authentication on it whatsoever And you can't push over it. So this is only for serving files, but these are some Some protocols that you can have to move these objects around so That being said that being sort of the background of what git does for you What git basically is is it's a file system, right? It's a it's sort of a meta file system that you can use for anything that you would use a normal file system for so But on top of that you get history you get distribution you get branching you get snapshots, right? All of the you can use anything any project that you have that you're thinking of writing You know data to files for you can write data to git instead And it's just about as easy and you get all of this stuff on top of that, right? So what are some of the cool things that we might be able to do with this? I'm going to go over a couple of examples of things that we can use git for instead of a file system or instead of a database That are interesting to me, but there's tons of different things that you can use this for that You might be able to find get useful for So the obvious one is SEM tools right github obviously runs on a ruby Get language binding called grit Which I'll be demonstrating at the end of this this talk But you can write github based on that that library is open-sourced and it's exactly how we use it on the site So that's the heart of the github site. You can take that library and build your own thing on top of that In fact get torius. I think still uses that so if you're using get torius, that's still the same library that's running github So that's an easy implementation of that right to read out data from from the git SEMs source code stuff a slightly more interesting implementation of this is gist Which is a paste site a code paste site This data these files in gist are not ever stored on disk We don't write the files to disk and we don't write the files into a database what we do is we write them directly into a git object repository and then Have that URL there so you can clone it and push back into it But the files that you paste into here are never put on the file system and they're never put in a database So the entire thing is backed by git repositories Some other interesting things you can do with it are any sort of document based applications right where you're writing files You have the same thing you get by writing stuff to a file system But you have the history and the distribution all the stuff that goes along with that for free So an example might be a wiki or documentation system right on github There's a git wiki project that does this and it's basically a wiki where every page is written to the file system as You know whatever the the name of the wiki page is but then it's committed into git All right, so you basically every time that you save a wiki page It writes it to the file system and then it runs a git ad and to get commit on it And so that's not it doesn't take very much time But you get some interesting things with that you get for free without having to do any access versioned or anything like that You have a snapshot of every every What what that wiki look like at every point in time right forever and and you don't have to do anything else You don't have to do any weird access version stuff. You don't have to keep all the versioning information It keeps author data. It keeps commit. You know the message is right in the commit. So it's a nice model for that So for those two files to do and home page You know writes a couple of blobs in it writes a tree in that references those and it writes a commit in with the message maybe that you typed in the you know in the web website when you saved the wiki page and It writes all those objects into the database and then if you go in and you change like the to-do page somebody You know modifies wiki to do or whatever It'll write a new commit a new tree a new blob for that one and then references the old blob for the other one And so it'll just write it directly into the git repository So The other interesting thing with this that you won't get from using access versioned or you won't get from using you know Some other versioning mechanism for a wiki is you get you can do branching right? So you can have this concept of branching your wiki you can say all right I want to go through and I want to read Like reorganize the entire wiki. Let's say like I want to change a home page layout change all the things Refactor all the stuff what you can do what you know what might be interesting to do is to branch the wiki You know say okay my experiment branch wiki go through do all your wiki changes Everybody else can still see the old wiki while you're doing it and then at some point merge that back in right or switch Everything over to it to a full force So so you know you can you can start a rework branch and commit Do a bunch of page changes on that and then and then you know while master somebody's still doing a bunch of wiki commits on the master branch and Then somebody say somebody clones that wiki right you can have somebody you could say alright I want to clone this wiki it pulls all those objects over and they do work on their own and you can merge that in so I Mean it might be kind of cool to have a distributed Documentation system or distributed wiki that people can work on off-site They can work on on other machines. They can clone and branch and merge The entire snapshot of the wiki Right, so at this point. We actually have three completely different looking wikis right people Pages are changed everything's rearranged or whatever and then you can go in and you can merge them all together If you wanted to pull in those changes you can cherry pick individual page changes that people have made So if you're trying to do like a really massive complicated documentation system This might be a really interesting way of doing that relatively easily because in git or in Ruby All you have to do is basic file system operations and then add some git stuff at the end and get provides you with all of that other All of that other functionality for free Another use of git that you can do with this and this is something that I did at my last job is a content delivery network so Let's say that you have an advertising network For example, and you have like a flat panel and a little computer and a evdo card or something And what you want to do is build a network of these in your store or You know in malls or wherever you want to do it and you want to run advertising on them Like the little things you see in you know Starbucks or something like that And so you have a server that has all the assets on them and your problem is getting all of these assets out to the right Right displays at the right time and The content consists of say some XML configuration files some images HTML pages JavaScript files say that's that's how your little engine is built to run So how do you distribute the content right? How do you get the content to? Server or to display a and they the right content to display be and they might have different advertising, you know Loops that they're running on or something so you you don't want to just tar up So so so option is basically you just tar everything up right every ad that you have you tar it up in one big tar file And send it out to every machine every night, right? So that's a super inefficient way of doing it, but it's really really easy to do right you could r-sync it There's some other options But you can also use get to do this right so this is a really strange way of using it But it actually works really really well You install get on all the different clients You put all the get report you put your get repository in an HTTP server path say so that You know you can just put an Apache web server in front of it that scales really well You build refs on the server so you use that index and you say all right for client number one It needs add a at B at D right and it just adds them in and builds the tree in the index It doesn't even need to do on the file system You just build it in the index and write it to the repository and then you build a ref on the server a branch name That says okay This is the branch for for a client one and then you just have that client always pull its own branch Right and it'll only get the stuff that it needs. It'll only get that tree that you've pre-built And you just put a crown on the clients to always just say all right Give me the branch server one or give me the branch client one Give me the branch client one and it keeps doing it and every time that it does it nothing's been changed It just basically does a no op it just says okay, you're supposed to have this shot you have it It says yes all right never mind, and it's really lightweight. It's just one HTTP call As soon as you change something then it says okay I don't have that give me all these objects and it fetches only the objects that it needs right So you can just do something like this Every five minutes every ten minutes something like that and so this is really really easy to set up I mean you can do this in an afternoon and you have a really efficient content delivery network That takes very very little work so Why is this helpful? What does this do? What does this add on to building some you know custom content delivery network? That's really super spiffy One it fetches only the objects that it needs so it'll it won't fetch content twice right even if the content is renamed Even if the content if you have several Subdirectories that all have the same object in it the same file say you have some sort of footer That's the same across all of your ads, and you don't have you don't want to do some sort of shared Some sort of shared Management on the client side you just have it every single time yet only puts one object in it only pulls that object down One time and then checks it out under different names everywhere right so Because everything is stored by its checksum you only ever have to pass around anything one time It verifies the integrity of the objects automatically so as soon as it pulls it down It'll check the integrity of it if it's corrupt for some reason the HTTP fetch will retry it again So you don't have to you don't have to implement that Again, it doesn't fetch objects that already has under under other names So if you've ever put that file on the client you never have to do it again Um HTTP fetch will pick up where it left off So if it gets part of the way through it gets 10 objects out of 20 or something And then your evdo card goes down comes back up again the next time that you know it tries to do a fetch It'll pick up where it left off you can make it do that It'll scale hugely you can just put you know, I mean how many static requests and an Apache server handle You can add clients To the end of time and Apache should be able to fill this up pretty easily or you can just put another one out there with the exact same repository and you know, it'll run just as well, so Scales really well And it provides a history of the content and all of this stuff without you know I mean you can basically implement this entire thing in an afternoon. It's really not that difficult to do So that is a non-SEM way of using git that might be able to save you a ton of time The other interesting thing that it can do is that it's distributed So you could have one if you have a whole bunch of these things in one mall or something, right? And you only have like a land and then one evdo card on one of them You could pull all the objects onto one of them and then have that have everything else fetch from that, right? So you can you can do this sort of cascading distributed effect another Interesting thing that you might be able to do with git is a ticketing system So I wrote the beginnings of this which is a project that is called ticket tick git and Basically what it does is It stores all the tickets in the object database But it has the ticket tree has no common ancestor with the code tree So it has it has this sort of this commit history with all these different Statuses of what your ticket? You what all of your current ticket states are but it doesn't it doesn't share the same history with your code base, right? so Basically the way that refs working git is that if you go, you know your dot get directory slash refs You have something that looks like this where you have all of your heads that are just pointers to all the different Shaws of your heads and then you have remotes that are pointers to the shaws of where your where your the remote heads are And you can have like tags and stuff like that But this is sort of by convention right when you run get branch basically all it does is look through So let's do this When you cat the head file all it is is a symbolic reference So that's saying your head right now is refs heads grit to right and so if you say get check out head It's just going to say okay What is this what is this file contain and you can cat that file very easily and it's just the shaw right? So the interesting thing about this is that when you run like get branch all it basically does is It goes through refs heads and just lists out everything That's that's all they get branch does when it when it's showing all the branches that it has and whatever The head to file is pointing to is where it puts a little star and that's that's all that it does right? It's really simple So and when you do get remote it basically just goes through this rest remotes and sees what all is there What you can do is add other sub directors into here So you can have you know refs tickets right and have different head pointers that that point to your own objects And when you run get branch it doesn't show up and when you run get tagged it doesn't show up right You can't see it in any of the normal get commands But if you say get check out refs tickets master it will check that out right? You can it will completely replace your working directory with something different if you if you wanted to have two totally different projects contained in the same get repository So if you say get check out master You know it'll look here and get that shaw and check out the tree that that commit points to and maybe it looks like this And if you say get check out refs tickets master You can have a history that looks like this where c0 through c4 is all your code stuff and c8 through c11 is all of your ticket information right and There's no common ancestor to it So you say get check out refs tickets master and it looks like this right totally different code totally different files So if you guys have used ticket I actually didn't do this I have refs heads tickets stuff So when you do get branch you actually see it which is a pain in the ass and I wanted to change it to this But but you can store different refs that are that are in Different subdirectories and get will ignore them, but you can still use them in get if you actually specify refs tickets Whatever or refs whatever you want to do wherever you want to have pointers for stuff other than branches and so What how ticket works is it actually writes the the objects directly into the get repository just just has an Index file and writes it in and never actually has these files on disk right? So Why why would you do that? Well The nice thing about this is that your tickets move around with the source code right so if somebody says get clone Whatever my project is and you're using like a ticketing system that stores it in this manner They'll get all your object and they'll get all of your tickets right you can set it up So that they'll get all your tickets to and it doesn't Clutter up their working directory, so you don't have like a ticket sub directory or something like that Which otherwise you would have to do if you wanted to move the tickets around with the source code It can work with or independently of whatever branch you're on right? So if you had the tickets in the the the source directory as you're branching and merging and stuff It would actually go with your source where some tickets may not want that maybe you have feature requests or something Right, it doesn't really matter which branch it's in you just want to have it in your in your repository Or in the message you can reference some particular branch or some particular commit if you wanted to do that But having it independent of the source code tree and independent of the source code objects can be very nice And it's offline so you can work on tickets and close them and resolve them and do whatever and open them offline And then push up to like some central Repository when you're online again, right and it's distributed So people can be working on different tickets differently offline and then everybody can sync up using the syncing mechanisms that you get already has And that you're already using anyways because you're moving your source code around that way It doesn't show up in the branch command if you do the refs tickets deal And It can be completely transparent to the user right if you're working on tickets And you have like a command that uses that your your repository that way then you can see them If you're just want to check out of the source code You still pull down the objects, but you can completely ignore them and nobody will ever see them It doesn't muddy up your source directory So that's an interesting use of git other than for source code, right? Another one that I've actually seen recently is SQL Git which is basically something like SQLite with git as the back end, right? And so it stores all of its data in a git repository, which basically gives it everything that SQL would have You know SQLite would have for example except its version, right? You can go back to any snapshot that that of the database that you've ever had So the basically the way that I that I've seen this Explained is that a database is basically reference pointers every branch that you have can be a different database A table would be sort of your first level tree entry would reference your table And then rows would be sub tree entries of that where you have the the primary key as the name The you know that would point to a blog which would be all the row data and then commits would be your transactions So every time that you actually do a commit you get a new transaction It moves it forward and you can go and see what the state of the database looked at any time in the past And so this isn't ridiculously scalable, but for something like SQLite where you're not looking for ridiculously scalable It is very very useful. And so there are people that are That are trying to write sort of a SQLite sort of engine on top of git where you get all this distribution, right? You can still do Still do some interesting things like why why would anybody do this like why why complicated this way, right? Well, you can get full version history and auditing So you can go back to the snapshot at any time you can branch your databases and and try stuff out and then merge them back in Assuming that the tools smart enough to do that You have checked some integrity on all your data give easy replication, right? All the stuff that that I've mentioned before they get gives you with any of these projects Wherever you're writing files to the in sort of plain text to the file system If you do it and get instead you can get a lot of this for free Which which might be really interesting in a project that you're doing So how would you do this in Ruby? Say you're let's say that you're Rubyists and you wanted to do something like you wanted to do one of these projects Or you wanted to do a project that you know that is interesting to you That that would take advantage of all of these features of git You can use grit and so at github.com Tom's going to kill me because I never say this right majombo slash grit You can get the source code for this and again, this is exact same source code that runs github So it's well tested and it's relatively fast And it works something like this you required grit You initiate instantiating a repo and You can say r.comits and it gives you you know the last ten commits on whatever the head's pointing to all right So that's fairly easy. You can say commits that first that I give you a commit object Which has attributes like the Shaw You know the many parents usually it's just one if it's emerged it'll have two or more Author and you can do two hash it gives you all the information for that commit So this is a fairly easy way of programmatically using git objects You can also look at trees right so you do see that tree that contents It gives you all the different tree entries that you have You can use this slash operator to to traverse the tree if you wanted to do it that way and One of the interesting things about grit that you're not going to find in python or any of the other binding So one of the things about git particularly is that there's no linkable library Right, there's no shared library for git. It's really hard to write stuff on it Without doing fork exec calls to the git binary Which is ridiculously expensive And so they're they're starting work on that, but it's not going to be for a long time until that actually comes to pass So one of the nice things about grit is that it has a partial pure ruby git implementation So we've actually rewritten Reimplemented a lot of git stuff in pure ruby That runs a hell of a lot faster than doing fork exec calls for a lot of stuff Especially when you're doing individual object manipulation And you don't have to do a fork call for everything to figure out what the object contents are with the size of the object Is there something you can just do it a ruby it runs a lot faster So in the older versions of grit before a couple months ago It was much much slower than it is now so all of the stuff where you're saying The git dot commits won't but if you're saying git dot commit and you give it a shaw It will do that entirely in pure ruby it won't do a fork fork exec call a Lot of the heads reading it'll do in pure ruby as much as we could do that we knew would make it faster We're now doing in pure ruby some of the only things that it doesn't do in pure ruby is like get log Where you could possibly have to walk thousands of commit objects That's actually faster in one fork exec call than it is trying to do it in ruby so We're trying to optimize it as much as we can but for most of the stuff for for the types of projects that I'm talking about Where you're just doing individual trees and snapshots and stuff It's this grit is actually really really fast and it's probably faster than you could do It's probably faster in ruby with grit than you could probably do any how else unless you're actually programming and see And trying to use sort of the internals of git which is not easy to do Even if you're good at C So Great will also do write operations and right now. It's a little bit weird It's very close in design to how you would actually do it on the command line giving git up like Plumbing operations. There's a bunch of for those of you know git The recent versions of git like 1.6 only have It only he really has I think three commands that are in your bin directory, right? Previous versions of that had like 150 of them because they used to be static executables for each of the different commands that you could do and like 30 of them are ones that you'd use day-to-day and the rest of them are plumbing operations that they use to do stuff But you can still use those directly, right? And so if you try to do actually writing trees and writing commits directly, this is sort of what it would look like in code So you'd get a new index object, right? So it's that sort of center bar on that on that first one that we looked at and this gives you sort of a staging area that you can Build a tree that you want to write into your repository and then you can say read trees You can populate that index with some particular tree. So if you know, you know, if you want to say if you say Get dot commit master. It'll give you back the Shaw of whatever your master branches or head It'll give you the shop whatever your head branches and so you can pre populate with that So use this as a starting point and then add this content to it, right? And if the files there it'll add that content and the test stuff is actually the content of the file And again, it doesn't write this to disk it puts it in memory And it says okay when we write this tree to the get repository. This is the content that I'm going to stick in there And then you can say you can do that multiple times just say okay. Oh, here's the content for all these different files This is exactly how Just does it so when you do a paste site in just it will just add it to a sort of index and memory and then write that To the file system as get objects it doesn't doesn't ever put it on to files on disk And basically what's shown here is the commit message the parents that the commits going to have and so this is you Actually don't need to do parents for any of them I mean you could just have a whole bunch commit objects that have no parents And you're just using the refs pointers to point to the snapshots of the tree if you wanted to do it that way This way you get the history of it if you actually go in here and run get log You can see you know every commit you had on there user is a There's a grit actor sort of class that you can use to put name and email and information uses that for the author and the Committer data Nils a tree so if if you put a tree shot in there And it's exactly the same as the shot that read tree came in with then it's not gonna do it's gonna do a no ops not gonna Do anything that's just for your is to make things a little bit easier on you So don't have to check it yourself And then master is the ref that it's gonna move forward when it writes this right so it'll write a new commit object And I'll move master forward for you And part of the nice thing about this as well is that all of these rights are in pure ruby as well So it doesn't do you know Get read tree and get commit tree and all of that stuff that you would normally have to do if you're running this on the Command line, and you actually knew the get plumbing stuff It actually just does it all on ruby so it's really fast, and you don't have to worry about that So in grid Some of the next things that we're going to be doing that I'm gonna be doing is easier write operations So you don't have to do that sort of convoluted thing There's a project called gash out there that you might want to look at For now and it actually does do fork exact calls for all this stuff So it's not very fast, but it's a lot easier to do To build a staging environment and then write that into your get repository I have a pure ruby get Damon that is written And I think I pushed it out there But if you want to see how the get Damon actually works the get Damon is actually really simple It's just a C server wrapper that that forks out to To a couple of different commands depending on whether somebody's pushing to it or pulling from it but But if you want to see how that actually works and not have to go through all of the C that's relatively convoluted You can you you can look at this. It's not very long, and it actually works as a real ruby Get Damon. It's not incredibly efficient, but it'll show you how it's done And you don't need get installed for it to work The pure ruby fetch client so get SSH HGP works across all of those I actually have versions of this written in pure ruby that never does a fork exact call Which again is kind of cool because at some point it might be kind of nice to be able to do like You know gem install grit and then have like a get Are or get ruby command line client that you can say you know get ruby fetch this So you don't actually have to install get somewhere if you just want to get a copy of the files or whatever These are also very nice if you're interested in get and you want to prototype stuff Like what would it be if we change you get Damon to do this? It's an easy way to prototype stuff quickly and see if it works to see if you get performance gains And Branching merging rebasing functions are not done grit yet This I'm working on right now, so we'll have this relatively soon where you'll be able to do that that sort of stuff per programmatically as well But that's about it, so Here's some resources on getting grit stuff Does anybody have any questions? Yes We run so we run Again just uses all most of the advanced features of this get have used all the right all the read stuff So read read stuff is really fast As fast as something that still does fork exact calls can be But we have a timeout so you can set a timeout So if some fork exact call takes longer than five seconds you can make it just bail And so if you see this ad octa cat sometimes that's what happened But again, you know, we're always improving the grit library to make that not happen It's in our best interest to do so a Lot of the stuff with a pure ruby got tons tons faster So that actually works really well the right stuff is a little bit slower than I'd like it to be But it's fast enough to run. I mean we got Doug on on gist and it held up fine So under fairly heavy loads the library works fine. Yeah Pons of places in which I don't think it would be the right tool though. You wouldn't know it by me talking like this But yeah, I mean I think a ticketing system with git would be awesome And I really want to get back into that project and get it done I've actually used the git wiki the the git wiki project does a lot of that branching and merging stuff that I was talking About where you can branch a wiki or you can clone a wiki so And like merge it in or look at different things And I've actually used that in work environments and it works out fairly well. It's it's not a nice back-end for that But yeah, I mean the sequel thing may not be a great idea I'll have to see when it comes out that was sort of a stretch but I mean the point of the talk for you guys that the reason why I wanted to do it is because the content delivery network for example was used at a company that I worked at before called reactrix And it worked really really well actually there was there was it didn't take us very long to do We had very few problems with it We were doing different we had tried different types of content delivery before for you know hundreds or thousands of clients And like a single server and git was by far the easiest way of doing that. It was really efficient It's nice to have the downloads that will start over again. It's nice to have the partial Downloads without having to actually calculate what has changed between one content release and another I mean the types of things that git gives you for that are really really phenomenal I actually got into git with that That's how I learned git and then I was like oh we could use it. We were using perforce As our source code stuff and then we were using git for all this stuff And then we were like, you know what we should probably be using this for the SCM as well But but yeah, I mean anything where you're not where it doesn't make sense to write the files to a file system Like like for wiki you can you can imagine a wiki where you write files to the file system for the for the storage mechanism the wiki, right? For the content delivery. It's the same thing, right? Anything that you can't where you need binary data or something like that gets not going to work very well for so Anything where you would use a file system normally, but then you get all this stuff tacked on for almost for free. It's a nice win Yeah, so you're saying you want to be able to do pack operations without the git daemon part Yeah, so the Git server implementation that is I believe it's in grid if not and you're interested in it. Let me know and I'll push it up there Okay, yeah, the git server stuff. Yeah, so so I've implemented pack file operations a bunch of times I can I can you know tell you how that works It's not very complicated to do the heuristics for actually doing the delta compression stuff in the pack file is really complicated But you can do it with no delta compression at all and it works fine I mean it's not nearly as efficient, but it's easier to write and you can do it that way and you pass those files around that way so Yeah, I mean if you wanted to implement something it like basically the the pure Ruby git client and pure Ruby git server are for stuff that you want to do that Maybe sort of out of the norm for these sorts of projects That gives you an easier way to learn it possibly or just you know try it out So if you want to do something weird with pack files moving them around in different transports or something You can rip that code out and try it out And then if you want to make it faster, which you almost certainly would unless you're not doing a lot of a lot of traffic with it Then you can write it and see or look at the C implementation or something and also the other nice thing is that this is MIT So you can do whatever the hell you want with it Whereas the git stuff is all in GPL, so you can't you know you can't distribute stuff or whatever All right, so I'll be around if you guys are interested in other stuff having to do with git or or anything like that Just you know buy me a beer. I'd be happy to talk with you for as long as it takes